Neuronautix ← All presentations

COST TEATIME webinar · 2026-06-09

WellFAIR. FAIR data, the 3Rs, and the road to AI/VCG.

How FAIR metadata respects the 3Rs and unlocks AI-ready datasets and Virtual Control Groups. With live demos of FAIR3R.fr and Metadatapp.

Damien Huzard, PhD (Neuronautix · PHEN-ICS) & Benoit Petit-Demoulière (IGBMC)

Context is King.

Metadata is the data about the data — without context, a measurement is just a number.

The problem

Why FAIR? Because raw data without context is a dead end.

What we get today

Cryptic identifiers (120sdz-0204dz-dzgr9-…)

Spreadsheets with no schema

Protocols buried in PDFs

Data that only the original team can read

What FAIR delivers

Findable by humans and machines

Accessible under defined conditions

Interoperable across datasets

Reusable without contacting the author

Wilkinson et al., Sci Data 2016 — 18M accesses, 15k citations. The FAIR Guiding Principles.

The framework

FAIR. Four principles, one infrastructure.

F — Findable Allow people (and machines) to find each other's data.
A — Accessible Data can be accessed "easily" under clear, standard conditions.
I — Interoperable Datasets can interact with each other through shared language.
R — Reusable Data can be reused "properly" — with provenance, license, and context.

go-fair.org/fair-principles

Principle 1

Findable. Every object gets a name machines can resolve.

Unique identifier

A persistent, globally unique ID for the dataset, the study, the animal, and the researcher.

Rich metadata

Enough context attached for someone unfamiliar with the study to understand what it is.

Linked metadata

Metadata links back to the data and to other related identifiers — e.g. ORCID for researchers.

Example: ORCID 0000-0003-4820-7951 — one identifier, one career, every paper linked.

Principle 2

Accessible. Retrievable by standard protocol — not by email.

Standard protocol

HTTP, FTP, SMTP — open, documented, implementable. Not WhatsApp, not Zoom.

Open & free

The access mechanism does not require a paid client or a proprietary stack.

Authentication

Access can require authentication and authorization — accessibility is not the same as openness.

Principle 3

Interoperable. A shared language for the field.

Shared language

Datasets describe themselves with the same vocabulary other systems can read.

FAIR vocabularies

Controlled vocabularies and ontologies — themselves findable, accessible, and reusable.

Linked references

Cross-references to other datasets and concepts so context travels with the data.

Principle 4

Reusable. The whole point of the exercise.

Rich attributes Enough description that a third party can decide whether the data fits their question.
Clear license Explicit terms of reuse — no ambiguity, no implicit "ask me first".
Provenance Where the data came from, how it was processed, and by whom.
README.md A human-readable entry point. Boring, mandatory, often missing.

The other framework

The 3Rs. Replacement, Reduction, Refinement.

Replacement

Use non-animal methods wherever they can answer the scientific question — NAMs, in vitro, in silico.

Reduction

Use the smallest number of animals compatible with a scientifically valid result.

Refinement

Improve welfare, minimise pain and distress, refine housing and procedures.

Russell & Burch, 1959 — the foundation of modern animal research ethics.

The thesis

FAIR + 3Rs = the WellFAIR concept.

FAIR data is not in tension with the 3Rs — it is the substrate that makes the 3Rs operational. Better metadata means fewer animals, better welfare, and reusable evidence.

Paper

Huzard, Petit-Demoulière et al. — WellFAIR: connecting FAIR data and the 3Rs. doi.org/10.1016/j.nsa.2026.106998

Where FAIR meets HCM

Home-Cage Monitoring needs FAIR (meta)data to scale.

The (meta)data chapter in Home Cage Monitoring in Rodents: A Global Effort (Gaburro & Mandillo, eds., Springer Nature 2026) maps FAIR principles onto 3Rs decisions and welfare assessment.

FAIR principles

Structured metadata for HCM datasets — schemas, vocabularies, exchange formats.

3Rs

Better data → fewer redundant studies, smaller cohorts, refined endpoints.

Welfare decisions

Continuous, machine-readable welfare signals feeding into ethics review.

doi.org/10.1007/978-3-032-19781-8_10

Hand-over

Now — from principles to practice.

Benoit Petit-Demoulière · IGBMC

The long-term goal

Science is data.

What do we need to create reliable knowledge that will let us build Virtual Control Groups — and eventually a Virtual Mouse?

Virtual Control Groups

Reuse historical controls instead of running new ones

Requires harmonized metadata and eligibility criteria

Direct path to animal reduction

Virtual Mouse

Computable model of physiology and behaviour

Built only on reusable, well-described datasets

A goal, not a tomorrow product

An analogy

Knowledge as a wall of bricks.

Science builds on previously admitted knowledge. Each study is a brick. We tell ourselves we are "building knowledge, one layer at a time" — but do we really have all the bricks?

The question

If half the bricks are missing or invisible, what does the wall actually hold up?

The cracks

The wall is fragile.

The reproducibility crisis is not a slogan. It is the structural consequence of how we currently build knowledge — incomplete sharing, missing context, lost negative results.

Today's reality

Data availability is a patchwork.

Where data ends up

Supplementary files attached to papers

National unstructured repositories

Collaborative libraries (Zotero, shared drives)

Personal laptops and lab servers

What this means

Not FAIR by any of the four letters

Discoverability is near zero

Reuse requires personal contact

Most of the data is simply missing

The exception

IMPC. What proper FAIR infrastructure looks like.

The International Mouse Phenotyping Consortium ties DATA, METADATA, and ONTOLOGY together at scale — one of the rare working models in preclinical science.

Data

Standardised phenotyping pipelines across centres.

Metadata

Structured, queryable context for every measurement.

Ontology

Mammalian Phenotype Ontology aligning terms across the consortium.

mousephenotype.org

Shadow data

What is missing shapes what we think we know.

Shadow data = what is missing, unavailable, invisible, or unusable in a dataset — despite shaping what we conclude. More than the file-drawer problem.

Not collected

Poor protocol design — the variable was never measured.

Not published

Negative or inconclusive results that never reach the literature.

Inaccessible

Collected and stored, but not findable or reusable — a FAIR failure.

FC3R short notes — fc3r.com/short-notes-FC3R.php

Survivorship bias

Published data are returning aircraft.

The classic WWII bullet-hole diagram: we only observe the planes that made it back. The visible damage shows where a plane can be hit and still survive — not where it actually fails. Published data show survivors, not failures.

What we see

What we see is not the full phenomenon, but the subset that passed through a selection filter. The absences are not noise — they are part of the structure of knowledge.

Known biases

Preclinical research has structural blind spots.

Sex bias Documented in multiple disease models and publication corpora.
Positive-result bias Positive results are far more likely to be reported than negative or inconclusive ones.
Sharing gaps Reagents, protocols, and raw data are often not fully shared — reuse is blocked at the practical level.

Respect for the source — the animals — and rigour of science require us to use every data point.

Demo 1

FAIR3R.fr. A self-hostable FAIR repository for 3R data.

What it is

Self-hosted — current host is Strasbourg University

Codebase shared so any institute or country can deploy an instance

CKAN backend — the worldwide standard for open data portals

All plugins released open-source

What it delivers

FAIR repository with one source of truth for form completion

All communities can add their sources

Open-source release coming very soon — GNU GPLv3

Contributions and issues welcome

fair3r.fr — please spread the word

Demo 2

Metadatapp. The connective tissue across the research lifecycle.

A single hub that holds the metadata threading through every system you already use: ELNs, scientific articles, science networks, data repositories, colony management, behavioural assays & HCM, regulatory agencies, user info, ontologies, protocols, pre-registration, project management.

Born-FAIR

Metadata captured at study design — not retrofitted at publication.

Cross-format

.doc · .pdf · .xlsx · .csv · .xml · .owl · .rdf — meet the field where it actually lives.

One source of truth

Same metadata feeds ethics applications, ELN entries, and final publication.

metadatapp.net

From FAIR to AI and VCG

Is AI the way to go?

Can we just assume frontier models will fix the data problem for us? Why curate data upfront if AI can do it later?

The temptation

"New frontier models will sort the mess"

"Curation upfront is a bottleneck"

"Just throw it all at the model"

Reality

Advanced models cannot fix poor-quality data

Garbage in, garbage out — model quality is bounded by data quality

Curated data are accessible, interoperable, reusable by design

Curating upfront is the foundation of trustworthy AI

Without curation: bias, waste, irreproducible science. See thebehaviourforum.org.