Skip to content

FAIR metadata · Scientific integrity · AI in research

fair.md and trust.md: portable manifests for FAIR data and epistemic trust

Damien Huzard, PhD

Scientific writing — now routinely AI-assisted — mixes cited fact, reasoned inference, forward-looking hypothesis, and personal position with no visible boundary between them. FAIR is widely endorsed but rarely self-declared in a place a human or a machine can find in five seconds. Two small root files, placed at the top of any repository, can fix both problems at once.

The problem: mixed epistemic register, invisible to readers and machines

The FAIR Guiding Principles — Findable, Accessible, Interoperable, Reusable — were formulated precisely because the scientific record had become too fragmented and inconsistently described to support reliable data reuse [1]. A decade on, endorsement of FAIR has spread widely through policy and funding mandates, but consistent self-declaration of FAIR posture at the level of individual repositories remains rare [1]. The gap between "we endorse FAIR" and "here is an honest, inspectable, structured account of how FAIR this repository actually is" is where the convention breaks down in practice.

A parallel and newer problem sits at the sentence level. AI-assisted scientific writing accelerates drafting but deepens an existing opacity: the same paragraph may contain a directly cited finding, an inference the author drew from it, a forward-looking speculation, and a normative recommendation — with no visible signal to distinguish them [12]. When the drafting agent is a language model, the surface fluency of the text actively conceals this mix: cited fact and generated inference read identically [12]. Making the epistemic register of each claim legible — to readers, to downstream systems, to future AI agents ingesting the corpus — is no longer an academic nicety. It is part of responsible publishing in an age of AI-assisted authorship.

A small idea: two readable root files

The proposal is minimal: place two Markdown files at the root of your repository, just as you might place a robots.txt or an llms.txt. Each file is a front door to a different dimension of provenance.

fair.md is a FAIR self-assessment manifest: a single file containing a structured YAML block that declares, per sub-principle of FAIR [1], whether the repository's data resources are yes / partial / planned / no / n/a for each of F1–F4, A1–A2, I1–I3, R1–R1.3. It acts as the human-readable front door to deeper machine-readable companions: CITATION.cff [5], codemeta.json [4], and ro-crate-metadata.json [6], pointing to FAIR Signposting affordances [7]. It is designed to be cheap to write and honest about gaps.

trust.md is an epistemic provenance declaration: it states who and what produced the knowledge in the repository (human authors, AI assistance, review policy), defines the grading model used (epistemic category and confidence band per claim), and provides a corpus-level trust profile derivable from inline markup. It is the repository-level companion to per-claim epistemic markup — the macro statement that contextualises the micro annotations [8][9][10][11][12].

The two files answer complementary questions. fair.md asks: can you find, access, and reuse this resource? trust.md asks: how much should you trust what it says, and how was it produced? Together they provide both the findability axis and the credibility axis of a repository's provenance declaration.

Lineage and prior art

llms.txt [2], proposed by Jeremy Howard in 2024, establishes the root-Markdown convention that fair.md borrows directly: a well-known, human-first file placed at the site root so that machines — particularly LLMs — can find context about a site without scraping its full HTML.

DESIGN.md [13] is a convention for placing design documentation at the repository root, adopted by google-labs-code, among others. fair.md follows the same ergonomic principle: one file, root level, readable without tooling.

The JournalList trust.txt [3] specification — to which trust.md is superficially similar in name — should be carefully distinguished. It declares an organisation's trusted relationships: memberships, ownership structures, vendor affiliations. It is an organisational transparency mechanism, not an epistemic one. trust.md instead declares the epistemic status and confidence of the content itself — a different axis entirely.

The CodeMeta Project [4] and the Citation File Format (CITATION.cff) [5] provide rich machine-readable metadata for software authorship, versioning, and citation. They are excellent but verbose, and rarely read by a human visiting a repository. fair.md points to them as companions — it does not replace them.

RO-Crate [6] and FAIR Signposting [7] provide robust, specification-grade packaging and navigation for FAIR Digital Objects, aimed at repositories and infrastructure providers. They are the appropriate choice when you need a full FAIR Digital Object. fair.md is the lightweight front door that sits in front of them, written in ten minutes, and readable in thirty seconds.

For the trust side, the prior art is substantial: W3C PROV-O [11] provides an ontology for provenance and authoring of assertions. Nanopublications [8] formalise the pattern of assertion + provenance + publication information. SEPIO (the Scientific Evidence and Provenance Information Ontology, Monarch Initiative) [9] and the Evidence & Conclusion Ontology (ECO) [10] provide formal evidence and assertion modelling. schema.org ClaimReview [12] offers a harvesting path for machine-readable claim annotation. trust.md is a pragmatic, web-native complement to this stack — lightweight enough to adopt on a static site, compatible with it by design.

The worked example: this site

This site is the reference implementation of both conventions. The corpus of notes currently comprises 19 notes, 440 graded claims, with a mean confidence score of 74/100 — all derivable directly from the inline data-epi / data-trust markup and summarised in /trust.md.

The category mix is: 53% cited · 22% view · 19% inference · 3% consensus · 3% hypothesis. This profile is itself a meaningful signal: a corpus that is majority cited but carries a significant fraction of explicit views and inferences is doing something different from a corpus that buries all three under a uniform authoritative voice.

The most evidence-dense notes — the NAMO ontology note and the IND submission note — average 88 and 83 respectively. The explicitly forward-looking pieces, the HCM 2050 vision (avg 54, ten hypotheses) and the NAM Evidence Commons build-plan (avg 55, sixteen views), score lowest by design. A low average is a signal, not a failure: it tells the reader that the piece is a position or a prediction, not a literature report. The per-artifact table in /trust.md makes this auditable at a glance.

The /fair.md for this site similarly practices the honesty the format prescribes: discovery and accessibility are assessed as strong (canonical URLs, sitemap, open HTTPS); the gaps — an undeclared license and not-yet-embedded JSON-LD vocabularies — are declared explicitly as partial or planned rather than omitted or inflated [1].

Why this matters for AI-assisted science

As AI systems ingest more of the scientific literature — for training, retrieval augmented generation, and automated evidence synthesis — the inability to distinguish a cited fact from a generated inference from a normative view in the source material propagates into downstream outputs [12]. Structured epistemic markup — whether encoded in HTML data- attributes, as JSON-LD using schema.org ClaimReview [12], or in a future standard — would make the human/AI split and the evidence basis of each claim legible to harvesters, citation managers, and AI agents. The path from the Trust Lens conventions here to machine-harvestable ClaimReview triples is short.

Responsible AI-assisted scientific publishing requires making the human/AI authorship split explicit (the provenance problem, addressed by W3C PROV [11] and by the produced_by block in trust.md), and making the epistemic type of each output claim explicit (the credibility problem, addressed by inline Trust Lens markup and the epistemic_model block). fair.md and trust.md together provide the repository-level declaration that contextualises both.

Call to adopt

The adoption path is deliberately low-friction. For fair.md: copy the reference file from neuronautix.com/fair.md, replace the YAML with your project's values, serve it at https://yourdomain/fair.md, and be honest — partial and planned are features, not admissions of failure. The point is a truthful, improvable baseline. Add the cheaper companions first: CITATION.cff [5] costs an hour; codemeta.json [4] covers software; RO-Crate [6] when you need a packaged object.

For trust.md: copy the reference file from neuronautix.com/trust.md, fill in produced_by and governance honestly — especially the human/AI authorship split and the review policy. Adopt the epistemic model, or define your own. Mark claims inline so the corpus and artifacts profiles can be auto-derived rather than asserted by hand. The authoring guide documents the exact span syntax and the five-category rubric.

Use both together. A repository can have a pristine FAIR posture but publish content of uncertain provenance; it can have scrupulous epistemic markup but fail basic findability. fair.md and trust.md together close both gaps. The spec repositories for both conventions are on GitHub and open to fork.

The conventions make no demand for completeness on day one. A fair.md that declares two partial and one planned is more useful to science than a fair.md that declares everything yes to appear compliant [1].

References

  • [1] The FAIR Guiding Principles for scientific data management and stewardship — Wilkinson MD et al. Scientific Data. 2016. Founding statement of the Findable, Accessible, Interoperable, Reusable framework.
  • [2] llms.txt proposal — Jeremy Howard, 2024. Convention for placing a well-known Markdown file at a site root for LLM consumption; the ergonomic precedent for fair.md.
  • [3] trust.txt specification — JournalList.net. Declares organisational trusted relationships (memberships, ownership, vendors) — distinct from trust.md's epistemic content confidence.
  • [4] The CodeMeta Project. Machine-readable software metadata standard; companion to fair.md for software repositories.
  • [5] Citation File Format (CITATION.cff). Machine-readable citation metadata; recommended as first companion step after fair.md adoption.
  • [6] RO-Crate. Specification for packaging and describing FAIR Digital Objects; full-weight companion to fair.md for packaged datasets.
  • [7] FAIR Signposting Profile. HTTP-level navigation for FAIR resources; infrastructure-tier complement to the fair.md front door.
  • [8] Nanopublications. Formal pattern of assertion + provenance + publication information; trust.md's per-claim model is a pragmatic web-native cousin.
  • [9] SEPIO — Scientific Evidence and Provenance Information Ontology (Monarch Initiative). Formal evidence and assertion modelling ontology; same project as NAMO.
  • [10] Evidence & Conclusion Ontology (ECO). Vocabulary for evidence types used to support biological assertions; formal grounding for trust.md categories.
  • [11] W3C PROV-O. OWL2 ontology for provenance and authoring of assertions; trust.md's produced_by and governance blocks align with its intent.
  • [12] schema.org ClaimReview. Structured data type for claim annotation; the planned path to make Trust Lens markup harvestable as JSON-LD.
  • [13] DESIGN.md — google-labs-code. Convention for root-level design documentation; ergonomic precedent for placing structured declarations at the repository root.

Work with Neuronautix

Apply transparent provenance to your research outputs

Neuronautix provides independent consulting on FAIR metadata strategy, epistemic markup, and the data infrastructure needed to make AI-assisted scientific publishing trustworthy and auditable. Contact us to discuss how fair.md and trust.md apply to your repository or research programme.