Skip to content

NAMs & data infrastructure

NAM Evidence Commons: a realistic tool stack to build first

Damien Huzard, PhD

A NAM Evidence Commons should not start as a full enterprise data warehouse. In Neuronautix's view, the first useful version is a schema-first, validation-first tool that turns NAM experiments into reviewable evidence objects, using existing schema, provenance, and metadata standards rather than inventing every layer from scratch [1][2][4].

The product should begin with one hard problem

The temptation is to design a complete data commons: storage, search, governance, provenance, analytics, AI, dashboards, and exports. That is too broad for a first build. The useful wedge is narrower: make one NAM study record complete enough that a reviewer, a scientist, and a machine can understand what was done, what was measured, what evidence supports a claim, and what remains uncertain.

That means the core product is not a document repository. It is an evidence-object builder. A record should link context of use, test article, biological model, platform, exposure protocol, endpoint, raw and processed result, controls, validation evidence, uncertainty, source files, analysis workflow, and regulatory claim. NAMO provides a domain model for NAM systems and validation evidence, while RO-Crate and BioCompute Object provide patterns for packaging research objects and computational workflow provenance [1][4][5]. If that graph is complete, storage and search become useful. If that graph is absent, storage only preserves ambiguity.

A realistic first stack

The first version can be built with conservative tools. Use LinkML as the schema source of truth, with a NAMO-inspired domain model and assay-specific profiles for MPS, organoids, PBPK, QSAR, high-content imaging, and omics endpoints [1][2]. LinkML can generate JSON Schema for API validation and JSON-LD contexts for semantic export [2]. PostgreSQL JSONB supports storage and indexing for structured JSON records, while normal tables can cover users, projects, study records, files, validation runs, and review decisions [8].

For the interface, start with a small web app: project list, study record editor, schema-driven forms, validation report, evidence graph view, and export panel. A Python API layer such as FastAPI fits the ecosystem because LinkML validation, RO-Crate generation, BioCompute-style packaging, and scientific file parsing are already Python-friendly [2][4][5]. File storage can be local or S3-compatible at first; enterprise storage connectors can wait.

Borrow standards, do not rebuild them

Several existing standards provide the pieces. CEDAR shows the value of ontology-assisted metadata templates and forms [6]. RO-Crate provides a practical JSON-LD package for research objects where data travels with metadata, identifiers, license, authorship, software, equipment, and provenance [4]. BioCompute Object provides a regulatory-facing pattern for documenting computational workflows [5]. FAIRSCAPE shows how evidence graphs, persistent identifiers, datasets, software, computations, runtime parameters, and personnel can be combined into a reusable biomedical commons [3].

The NAM Evidence Commons should not implement every standard in full on day one. It should map to them. Internally, the system can use a normalized record plus JSONB payloads [8]. Externally, it should export RO-Crate for study packages, BioCompute-style workflow records for computational endpoints, JSON-LD for linked evidence, and simple reviewer guides for human use [2][4][5].

The minimum viable product

The MVP should have four modules. First, a schema registry where each assay profile is versioned and documented, grounded in LinkML and NAMO-style models [1][2]. Second, a metadata capture UI that guides the scientist through required fields before data freeze, following the CEDAR pattern of metadata templates and form instances [6]. Third, a validation service that fails closed when context of use, endpoint units, protocol version, ontology terms, controls, or provenance are missing [2]. Fourth, an export service that emits a package with the evidence record, source-file manifest, validation report, and reviewer-readable summary [4][5].

An agent can be added early, but only as an assistant. It should ingest protocols, CRO reports, ELN notes, and instrument exports; propose field values; cite the source passage; and route uncertain values for expert review. The approved record, not the model output, is the source of truth. This follows the same provenance principle behind OpenLineage-style lineage records and the source-attributed metadata capture expected for scientific curation [7].

What I would build first

I would start with one context of use: hepatotoxicity evidence from a Liver-Chip or comparable MPS assay. The first schema would cover test article identity, cell and donor metadata, chip architecture, flow conditions, exposure, endpoints, controls, assay acceptance criteria, raw and processed results, analysis script, validation evidence, uncertainty, and claim linkage. This scope is narrow enough to implement and broad enough to demonstrate why the product matters.

The technical milestone is simple: upload a study folder, complete the schema-driven record, run validation, see an evidence graph, and export a reviewer package. If that workflow works for one NAM context, the product can expand to organoids, PBPK, QSAR, and imaging endpoints [1]. The build should prove that metadata capture is not a compliance burden. It is the mechanism that turns a NAM result into reusable evidence [3][4].

References

  • [1] NAMO: New Approach Methodology Ontology and Schema — Monarch Initiative. LinkML-based framework for standardising NAM metadata across organoids, organ-on-chip, and computational models.
  • [2] LinkML — Linked data Modeling Language. Generates JSON Schema, JSON-LD contexts, RDF, OWL, SHACL/ShEx, GraphQL, SQL DDL, and Python classes from a schema.
  • [3] FAIRSCAPE — Python framework for AI-ready biomedical data, semantic provenance graphs, validation, APIs, GUI, Datasheets, and Croissant metadata.
  • [4] RO-Crate Technical Overview — Research Object Crate. JSON-LD packaging pattern for research data, metadata, provenance, software, people, identifiers, and licenses.
  • [5] BioCompute Object Documentation — IEEE 2791-2020. JSON-based standard for documenting computational workflows with provenance, descriptive metadata, inputs, outputs, execution, and parameters.
  • [6] CEDAR metadata form instances — CEDAR. Metadata templates and instances for ontology-assisted scientific metadata capture.
  • [7] OpenLineage — Open standard and reference implementation for collecting lineage metadata about datasets, jobs, and runs.
  • [8] PostgreSQL JSON Types — PostgreSQL documentation. JSONB storage and indexing for structured JSON records.

Work with Neuronautix

Build the NAM Evidence Commons MVP

Neuronautix is exploring tooling for schema-first NAM evidence capture: metadata schemas, validation reports, evidence graphs, provenance packages, and human-reviewed AI-assisted curation.