LLMs · Knowledge graphs · FAIR metadata

Why scientific LLM workflows need metadata, ontologies, and graphs.

Not just longer context.

Damien Huzard, PhD · Neuronautix · 18 May 2026 · 12 min

More context. Less understanding.

The anti-pattern

When the LLM underperforms — give it more documents.

The universal-LLM-reader reflex. More tokens, more retrieval, more context — and still the same brittle output.

Long context

Long context is not understanding.

Relevant content lost in the middle

Few models hold accuracy past 64k tokens

RAG retains a clear cost advantage

"Just send everything" is technically weak

Retrieval architecture

RAG retrieves chunks. Not relationships.

Chunk-only retrieval

Top-k passages by similarity. Isolated fragments. No entity relationships. Reassembly is the LLM's job.

Graph-guided retrieval

Entities, relations, paths. Coherent multi-hop context. Relationships preserved. An architecture, not a single tool.

Ontologies = semantic compression.

Ontology-grounded retrieval

Four reasons it wins.

Concept

Fixed identifiers replace synonyms and free text. The model resolves to a known term, not a guess.

Relation

Typed edges replace prose like "is associated with". Graph queries become possible.

Constraint

Units, ranges, required fields become executable. Validation is deterministic, not generative.

Provenance

Every term has a source and a definition. Every claim has a path back to evidence.

Hybrid architecture

The pipeline. Five steps.

Schema

Required fields, types, units — machine-actionable.

Capture

Structured forms, importers, ontology-based suggestions.

Graph

Entities, relations, provenance packages.

Retrieval

Ontology-grounded, KG-guided. Minimal grounded context.

LLM

Synthesis only at this step. Last, not first.

Reframe

Metadata is not paperwork.

It is machine-actionable infrastructure. FAIR requires rich, domain-specific, machine-readable templates — not narrative documentation.

Three patterns

Constraints reduce burden. They do not increase it.

Templates

CEDAR Embeddable Editor — author once, publish everywhere. Templates live inside the platform that needs them.

Packages

RO-Crate — research artefacts travel with JSON-LD metadata, identifiers, provenance, relations, annotations.

Recommendations

Ontology-based field suggestions accelerate authoring and improve accuracy at data-entry time.

Biomedical KGs · Today

This is not theoretical.

MedGraph

PubMed entities · MeSH terms · citations · grants · authors → semantic biomedical retrieval

PubMed KG 2.0

Papers · patents · clinical trials · biomedical entities · author networks · project metadata

Life-sciences KG ecosystem

Ontologies + heterogeneous biomedical data → AI-powered research substrate

Real-world data graphs

Graph data models for heterogeneous clinical and research data — new analyses become tractable

Inference economics

Token economy is energy economy.

Output length

Drives energy

Inference energy correlates with output token length and response time.

Reasoning depth

Has a cost

Emissions scale with model size and reasoning behaviour across 14 LLMs.

Generality

Is expensive

General-purpose generative AI can be orders of magnitude more energy-expensive than task-specific systems for many tasks.

Inference

Compounds

Cumulative inference cost can become comparable to or exceed training cost.

Routing

Route the work. Don't generate it all.

Validators do validation

Graph queries do retrieval

Smaller models do mapping

LLMs do synthesis — only when generative reasoning is required

Energy-per-token as a benchmark

Energy-per-token should complement accuracy benchmarks. Model selection and reasoning depth become routing decisions, not defaults.

Calibration

What graphs do not solve.

Hallucination

Graphs and ontologies reduce risk by grounding retrieval and constraining valid relations. They do not eliminate hallucination by themselves.

Cost

GraphRAG is not always cheaper than long-context LLMs. Graph construction, maintenance, and query design also have costs.

Long context

Long-context models are not always wrong. Sometimes the right routing decision is to use them.

Token prices

Unit prices fluctuate. What scales poorly is total tokens, inference calls, energy, latency, and review burden.

Human-in-the-loop

Place humans upstream and selectively.

Where humans add value

Define concepts and constraints. Validate ontology extensions. Approve high-impact KG changes. Resolve ambiguity at validation gates.

Where humans become a bottleneck

Manually correcting every output. Reviewing routine extractions. Acting as the only validator for deterministic checks. Re-typing what the schema already captures.

The right architecture is hybrid.

Reference pipeline

Five technical steps. One governance lane.

Schema — community-defined, machine-actionable templates

Capture — structured forms, importers, ontology-based suggestions

Graph — entities, relations, provenance packages

Retrieval — ontology-grounded, KG-guided, minimal context

Synthesis — LLM at the last step, routed by energy and accuracy

Review — humans at ontology, validation, and approval gates

Preclinical · NAM evidence

What this means for the bench.

Schema-first

Minimal mandatory metadata set, ARRIVE 2.0 anchored. Enforced at source, not after the study.

Ontology-grounded

NCBITaxon, UBERON, OBI, ChEBI. Controlled vocabularies are how cross-lab comparison becomes possible.

Provenance-packaged

RO-Crate, FAIRSCAPE, BioCompute. Datasets travel with their context and computational history.

Hybrid synthesis

Schema-first agents bounded by deterministic validation. LLMs accelerate curation; they do not decide what is recorded.

The takeaway

Use LLMs where they help. Use structure where they hurt.

Long context is not understanding

RAG without relationships is weak

Ontologies are semantic compression

Metadata is machine-actionable infrastructure

Token economy is energy economy

Humans go upstream — at gates, not on outputs

Structure first. Generate last.

Make the schema explicit.

Make the graph queryable.

Make the human review valuable.

Damien Huzard, PhD · Neuronautix
neuronautix.com/contact · metadatapp.net