Scientific integrity · AI in research
trust.md: declaring how much to trust a repository's knowledge
Scientific writing has always mixed cited fact with reasoned inference with author opinion — but AI-assisted authorship makes the blend invisible at scale. trust.md is a lightweight manifest placed at the root of any repository: it declares who and what produced the knowledge, how each claim is graded, and what the aggregate corpus profile looks like — so readers and machines can tell the difference.
The problem the AI era sharpens
Scientific prose has always interleaved evidence and interpretation. A paragraph may move from a directly cited finding to an inference the author drew from it, to a normative recommendation, to a speculative forward-look — often with no punctuation between them. Readers calibrate this mix by knowing the author, the journal, the genre. The signal is social, not structural. Large language models accelerate this problem. They produce fluent, uniformly authoritative prose regardless of whether a sentence rests on a primary source, emerges from plausible-sounding interpolation, or is a normative position the author has smuggled in. The social cues that help readers calibrate — hedging language, explicit "we argue", "data suggest" — are often smoothed away in the editing pass. Making the epistemic register of each claim structurally legible — not just stylistically signalled — is a requirement for responsible AI-assisted authorship, not an academic preference.
What trust.md declares
trust.md is a Markdown file with a YAML front-matter block, served at https://yourdomain/trust.md. Its v0.1 specification defines three substantive declaration blocks.
produced_by lists the human contributors (name, ORCID, role) and AI agents (name, role, oversight level). The oversight field takes one of four values: human-reviewed, human-in-the-loop, automated, or none. For this repository, the agent entry reads: Claude (Anthropic), via Claude Code, role: retrieval, drafting, epistemic markup — oversight: human-reviewed, meaning no agent output is published without human review.
governance states the editorial policies: what the source of truth is (the human-approved record, not raw model output), a boolean commitment against fabricated citations, a review policy, a correction policy, and a conflict-of-interest disclosure. For a consultancy like Neuronautix, the last item matters: promotional or normative statements are explicitly tagged as the view category rather than presented as cited fact.
epistemic_model defines the grading model: five categories and an independent 0–100 confidence scale. The categories are cited (directly supported by a cited source), consensus (widely accepted domain knowledge), inference (reasoned from sources, not stated verbatim), hypothesis (forward-looking or speculative), and view (explicit interpretation or normative conclusion). The 0–100 scale runs in five bands from Speculative (0–29) through Very high (90–100). The two axes are independent by design: a view may be sincerely held but low in evidentiary trust; a hypothesis may be well-motivated yet tentative.
Macro and micro: the corpus profile is derived, not asserted
trust.md is the macro companion to per-claim inline markup. Every substantive sentence in a note is wrapped in an nnx-claim span carrying data-epi (the category) and data-trust (the score). The corpus and artifacts blocks in trust.md are then derived from those spans, not manually asserted. This is what makes the corpus profile auditable: a reader or a script can verify the aggregate figures from the source markup.
This macro/micro relationship is where trust.md adds value beyond a simple author statement. An author can claim high trust in their prose; a corpus profile derived from per-claim markup cannot be inflated without changing the markup itself — and the markup is visible to readers via the Trust Lens toggle. The two layers are mutually constraining.
The Trust Lens — the reader-side toggle on each note — is the micro layer. trust.md is the repository-level summary that tells an arriving reader what kind of site they are on before they open a single note. The two instruments are designed to be read together.
The worked example: this corpus
This site is the reference implementation of the convention. The marked corpus currently comprises 19 notes, 440 graded claims, with a mean confidence score of 74/100. The category distribution is: 53% cited · 22% view · 19% inference · 3% consensus · 3% hypothesis.
The most evidence-dense notes — the NAMO ontology note and the IND submission note — average 88 and 83 respectively. Both are heavily cited literature reviews where virtually every claim is directly sourced.
The explicitly forward-looking notes score lowest by design: the HCM 2050 vision averages 54 (10 hypothesis claims) and the NAM Evidence Commons build-plan averages 55 (16 view claims).
A low average here is a feature, not a quality failure. It tells the reader that the piece is a position paper or a prediction, not a literature report. The per-artifact table in /trust.md makes this auditable at a glance. Epistemic honesty means signalling uncertainty clearly, not suppressing it to appear authoritative.
Standing on shoulders: the prior art stack
The file's name is superficially similar to the JournalList trust.txt [2] — which should be carefully distinguished. trust.txt declares an organisation's trusted relationships: memberships, ownership structures, vendor affiliations. It is an organisational transparency mechanism. trust.md instead declares the epistemic status and confidence of content — an orthogonal dimension.
trust.md aligns with — rather than replaces — the formal assertion/provenance stack. W3C PROV-O [3] provides the OWL2 ontology for provenance and authoring of assertions; trust.md's produced_by block maps to its Agent and Activity patterns. PAV (Provenance, Authoring and Versioning) [4] adds authorship-specific terms that align with the human contributor entries.
Nanopublications [5] formalise the pattern of assertion + provenance + publication information as a machine-readable unit. trust.md's per-claim two-axis model (category + confidence) is a pragmatic, web-native cousin of this pattern: it does not require a triple store to deploy, but it is designed to be compatible with one.
SEPIO (Scientific Evidence and Provenance Information Ontology, Monarch Initiative — the same project behind NAMO) [6] and the Evidence & Conclusion Ontology (ECO) [7] provide formal evidence and assertion modelling. trust.md's five category IDs (cited, consensus, inference, hypothesis, view) are designed to map to SEPIO evidence types and ECO codes, enabling future alignment without requiring it on day one.
schema.org ClaimReview [8] makes each graded claim machine-harvestable as JSON-LD, and the Trust Lens inline encoding (data-epi, data-trust) maps cleanly to its properties. This repository now does exactly that: a generator reads the inline markup and injects per-claim ClaimReview JSON-LD into every note — including this one — making the corpus queryable by fact-checkers, citation managers, and AI retrieval pipelines that understand ClaimReview.
The key design principle — shared with the FAIR movement itself [1] — is that conventions must be cheap to adopt to spread. trust.md is a front door, not a replacement for these formal standards. You can start with a text file and a few span attributes; you can grow toward JSON-LD ClaimReview and SEPIO term URIs later.
How to adopt trust.md
The adoption path is deliberately low-friction:
-
Copy the template file from neuronautix.com/trust.md or from the spec repository. Serve it at
https://yourdomain/trust.md. -
Fill in
produced_byandgovernancehonestly. The human/AI split and the review policy are the blocks that matter most for readers evaluating an AI-assisted repository. If there is no AI involvement,agents: []is a valid and useful declaration. -
Mark claims inline using the
nnx-claimspan syntax (or your own equivalent encoding) so thecorpusandartifactsprofiles can be derived automatically rather than hand-asserted. The EPISTEMIC-MARKUP.md authoring guide documents the exact span syntax and five-category rubric. -
Pair with
fair.md(companion note published 2026-06-08). A repository can have scrupulous epistemic markup but poor findability; it can be highly FAIR but publish content of uncertain provenance. The two files answer complementary questions and are designed to coexist in the same root. -
Review and update
last_reviewedperiodically. The corpus profile is only meaningful if it tracks the current state of the markup. The spec requireslast_reviewedas a required field precisely to make staleness visible.
Partial coverage on day one is fine. A trust.md that marks up 40% of claims and declares that honestly is more useful to science than one that claims full coverage. The convention is designed to be improvable incrementally — and the spec's validation rules distinguish warnings (missing recommended fields) from errors (missing required fields).
References
- [1] The FAIR Guiding Principles for scientific data management and stewardship — Wilkinson MD et al. Scientific Data. 2016. Founding statement of the Findable, Accessible, Interoperable, Reusable framework; the principle that conventions must be cheap to adopt to spread.
- [2] trust.txt specification — JournalList.net. Declares organisational trusted relationships (memberships, ownership, vendors) — explicitly distinct from trust.md's epistemic content confidence.
- [3] W3C PROV-O. OWL2 ontology for provenance and authoring of assertions; trust.md's
produced_byblock maps to its Agent and Activity patterns. - [4] PAV — Provenance, Authoring and Versioning. Ciccarese P & Soiland-Reyes S. J Biomed Semantics. 2013. Authorship-specific provenance terms that align with trust.md's human contributor entries.
- [5] Nanopublications. Formal pattern of assertion + provenance + publication information; trust.md's per-claim model is a pragmatic, web-native cousin.
- [6] SEPIO — Scientific Evidence and Provenance Information Ontology (Monarch Initiative). Formal evidence and assertion modelling; same project as NAMO. trust.md's category IDs are designed to map to SEPIO evidence types.
- [7] Evidence & Conclusion Ontology (ECO). Vocabulary for evidence types supporting biological assertions; formal grounding for trust.md's category IDs.
- [8] schema.org ClaimReview. Structured data type for claim annotation; the planned path to make Trust Lens markup harvestable as JSON-LD triples.
Work with Neuronautix
Apply epistemic transparency to your research outputs
Neuronautix provides independent consulting on FAIR metadata strategy, epistemic markup, and the data infrastructure needed to make AI-assisted scientific publishing trustworthy and auditable. Contact us to discuss how trust.md applies to your repository or research programme.