COST TEATIME webinar · 2026-06-09
How FAIR metadata respects the 3Rs and unlocks AI-ready datasets and Virtual Control Groups. With live demos of FAIR3R.fr and Metadatapp.
Damien Huzard, PhD (Neuronautix · PHEN-ICS) & Benoit Petit-Demoulière (IGBMC)
Metadata is the data about the data — without context, a measurement is just a number.
The problem
Cryptic identifiers (120sdz-0204dz-dzgr9-…)
Spreadsheets with no schema
Protocols buried in PDFs
Data that only the original team can read
Findable by humans and machines
Accessible under defined conditions
Interoperable across datasets
Reusable without contacting the author
Wilkinson et al., Sci Data 2016 — 18M accesses, 15k citations. The FAIR Guiding Principles.
The framework
go-fair.org/fair-principles
Principle 1
A persistent, globally unique ID for the dataset, the study, the animal, and the researcher.
Enough context attached for someone unfamiliar with the study to understand what it is.
Metadata links back to the data and to other related identifiers — e.g. ORCID for researchers.
Example: ORCID 0000-0003-4820-7951 — one identifier, one career, every paper linked.
Principle 2
HTTP, FTP, SMTP — open, documented, implementable. Not WhatsApp, not Zoom.
The access mechanism does not require a paid client or a proprietary stack.
Access can require authentication and authorization — accessibility is not the same as openness.
Principle 3
Datasets describe themselves with the same vocabulary other systems can read.
Controlled vocabularies and ontologies — themselves findable, accessible, and reusable.
Cross-references to other datasets and concepts so context travels with the data.
Principle 4
The other framework
Use non-animal methods wherever they can answer the scientific question — NAMs, in vitro, in silico.
Use the smallest number of animals compatible with a scientifically valid result.
Improve welfare, minimise pain and distress, refine housing and procedures.
Russell & Burch, 1959 — the foundation of modern animal research ethics.
The thesis
FAIR data is not in tension with the 3Rs — it is the substrate that makes the 3Rs operational. Better metadata means fewer animals, better welfare, and reusable evidence.
Huzard, Petit-Demoulière et al. — WellFAIR: connecting FAIR data and the 3Rs. doi.org/10.1016/j.nsa.2026.106998
Where FAIR meets HCM
The (meta)data chapter in Home Cage Monitoring in Rodents: A Global Effort (Gaburro & Mandillo, eds., Springer Nature 2026) maps FAIR principles onto 3Rs decisions and welfare assessment.
Structured metadata for HCM datasets — schemas, vocabularies, exchange formats.
Better data → fewer redundant studies, smaller cohorts, refined endpoints.
Continuous, machine-readable welfare signals feeding into ethics review.
doi.org/10.1007/978-3-032-19781-8_10
Hand-over
Benoit Petit-Demoulière · IGBMC
The long-term goal
What do we need to create reliable knowledge that will let us build Virtual Control Groups — and eventually a Virtual Mouse?
Reuse historical controls instead of running new ones
Requires harmonized metadata and eligibility criteria
Direct path to animal reduction
Computable model of physiology and behaviour
Built only on reusable, well-described datasets
A goal, not a tomorrow product
An analogy
Science builds on previously admitted knowledge. Each study is a brick. We tell ourselves we are "building knowledge, one layer at a time" — but do we really have all the bricks?
If half the bricks are missing or invisible, what does the wall actually hold up?
The cracks
The reproducibility crisis is not a slogan. It is the structural consequence of how we currently build knowledge — incomplete sharing, missing context, lost negative results.
Today's reality
Supplementary files attached to papers
National unstructured repositories
Collaborative libraries (Zotero, shared drives)
Personal laptops and lab servers
Not FAIR by any of the four letters
Discoverability is near zero
Reuse requires personal contact
Most of the data is simply missing
The exception
The International Mouse Phenotyping Consortium ties DATA, METADATA, and ONTOLOGY together at scale — one of the rare working models in preclinical science.
Standardised phenotyping pipelines across centres.
Structured, queryable context for every measurement.
Mammalian Phenotype Ontology aligning terms across the consortium.
mousephenotype.org
Shadow data
Shadow data = what is missing, unavailable, invisible, or unusable in a dataset — despite shaping what we conclude. More than the file-drawer problem.
Poor protocol design — the variable was never measured.
Negative or inconclusive results that never reach the literature.
Collected and stored, but not findable or reusable — a FAIR failure.
FC3R short notes — fc3r.com/short-notes-FC3R.php
Survivorship bias
The classic WWII bullet-hole diagram: we only observe the planes that made it back. The visible damage shows where a plane can be hit and still survive — not where it actually fails. Published data show survivors, not failures.
What we see is not the full phenomenon, but the subset that passed through a selection filter. The absences are not noise — they are part of the structure of knowledge.
Known biases
Respect for the source — the animals — and rigour of science require us to use every data point.
Demo 1
Self-hosted — current host is Strasbourg University
Codebase shared so any institute or country can deploy an instance
CKAN backend — the worldwide standard for open data portals
All plugins released open-source
FAIR repository with one source of truth for form completion
All communities can add their sources
Open-source release coming very soon — GNU GPLv3
Contributions and issues welcome
fair3r.fr — please spread the word
Demo 2
A single hub that holds the metadata threading through every system you already use: ELNs, scientific articles, science networks, data repositories, colony management, behavioural assays & HCM, regulatory agencies, user info, ontologies, protocols, pre-registration, project management.
Metadata captured at study design — not retrofitted at publication.
.doc · .pdf · .xlsx · .csv · .xml · .owl · .rdf — meet the field where it actually lives.
Same metadata feeds ethics applications, ELN entries, and final publication.
metadatapp.net
From FAIR to AI and VCG
Can we just assume frontier models will fix the data problem for us? Why curate data upfront if AI can do it later?
"New frontier models will sort the mess"
"Curation upfront is a bottleneck"
"Just throw it all at the model"
Advanced models cannot fix poor-quality data
Garbage in, garbage out — model quality is bounded by data quality
Curated data are accessible, interoperable, reusable by design
Curating upfront is the foundation of trustworthy AI
Without curation: bias, waste, irreproducible science. See thebehaviourforum.org.
Resources & acknowledgements
Damien Huzard, PhD · Neuronautix · PHEN-ICS · damien.huzard@gmail.com
Benoit Petit-Demoulière · IGBMC · petitd@igbmc.fr