AI agents · Research infrastructure

Building a compounding research knowledge engine on a static site

13 May 2026 Damien Huzard, PhD

neuronautix.com is a static HTML site with no database and no build step — yet it runs a structured eight-agent research system with a self-updating knowledge base that makes each published note smarter than the last.

CLAUDE.md and AGENTS.md as an operating system

The site is governed by two plain-text files. CLAUDE.md defines hard constraints: no build step, no inline styles, no external JS libraries, and rel="noopener" on every external link [1]. AGENTS.md defines eight specialized agents — BlogWriterAgent, ContentAgent, SEOAgent, StyleAgent, ProjectsAgent, QAAgent, PresentationBuilderAgent, and DeployAgent — each with an explicit trigger phrase, a bounded scope (which files it may touch), and a non-negotiable rule set [1].

This is not a system prompt. It is a role-definition framework where each agent is activated by task type and operates within well-defined boundaries. The BlogWriterAgent, for example, must read the knowledge base before any web search, update the relevant knowledge file after every research session, and place an inline bracket citation at the exact sentence where each claim appears [1]. The net effect: any sufficiently capable LLM can pick up a task on this codebase and produce correctly structured output without ad-hoc guidance. The intelligence rules are durable and version-controlled alongside the code.

The knowledge base — RAG-lite without a vector database

The knowledge/ directory contains five markdown files covering the site's core domains: HCM systems, behavioral analysis, FAIR metadata, AI agents for science, and NAMs and regulatory science [2]. The protocol is simple: before writing any post, the BlogWriterAgent reads the relevant file in full. After completing research, it merges new sources, concepts, and post ideas back into the file.

This follows the Retrieval-Augmented Generation pattern described by Lewis et al.: grounding LLM outputs in retrieved documents to reduce hallucination and enable source attribution [3]. But it requires no vector database and no embedding infrastructure. The "retrieval" is the agent reading a markdown file in full. The "augmentation" is using that file as structured context before drafting. What it trades in semantic search capability, it gains in transparency: every piece of knowledge carries a source, a date, and a brief statement of what specific claim it supports.

After seven published notes on NAMs and FAIR metadata, the knowledge files now contain source-backed summaries of over a dozen standards bodies, regulatory documents, and peer-reviewed papers. Each new post on a related topic inherits that research without repeating it. This is the compounding effect — and it is the most valuable property of the architecture.

The publishing pipeline as a reproducible workflow

A post follows a nine-step sequence: read the relevant knowledge file, web-search three to five authoritative sources, update the knowledge file with new findings, write the note at notes/YYYY-MM-slug/index.html, place inline bracket citations at each factual claim, insert the card in notes/index.html, add an entry to sitemap.xml, mark the post idea as published in the knowledge file, and generate the LinkedIn variant [1].

Every factual claim carries an inline citation at the point where the evidence is used [4]. The references section lists only cited sources. This matches the ReAct agent pattern: reason over accumulated evidence, act to produce structured output, observe the existing knowledge base before generating new content [4]. The pipeline is deterministic enough to be repeatable and auditable enough to meet a working scientific writing standard — inline citations, source attribution, separation of direct evidence from expert synthesis.

The private RAG layer for unpublished work

Alongside the public site, the rag/ directory contains a local Python RAG prototype for private document retrieval. Documents in .private_corpus/ — protocols, CRO reports, unpublished datasets — are registered and indexed locally. The .rag_index/ directory, which holds embeddings and chunk manifests, is Git-ignored and never deployed. The IngestionAgent handles registration, ingestion, and indexing through a CLI interface [1].

This creates a two-tier knowledge system. The public tier is the knowledge/ markdown files: curated, citation-tracked, and version-controlled in the repository. The private tier is the local RAG index: high-recall retrieval over unpublished material for internal use. The boundary between the two tiers is enforced by Git-ignore rules and explicit agent constraints, not by convention or memory.

Why this pattern is viable for small research teams

The full architecture — eight specialized agents, a compounding knowledge base, a structured publishing pipeline, and a private RAG layer — runs on GitHub Pages (zero hosting cost) with model API calls billed per post. There is no database to maintain, no CMS to update, no server to monitor. The site deploys automatically when the main branch receives a commit [1].

The entire intelligence layer lives in three markdown files: CLAUDE.md, AGENTS.md, and the relevant knowledge file. In Neuronautix's experience, the compounding knowledge base becomes a meaningful accelerator after six to eight posts: research sessions get shorter, citations are more consistent, and the gap between expertise and published output narrows. For solo researchers and small labs publishing in focused domains, this is now a practical and reproducible template.

References

[1] neuronautix.com repository — Neuronautix, 2026. CLAUDE.md and AGENTS.md define agent roles, operating rules, publishing pipeline, and knowledge base protocol.
[2] Neuronautix Notes — Neuronautix, 2026. Published notes grounded in five knowledge-base files covering HCM, behavioral analysis, FAIR metadata, AI agents, and NAMs.
[3] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., 2020. RAG pattern for grounding LLM outputs in retrieved documents to reduce hallucination and enable source attribution.
[4] ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al., 2022. ReAct agent pattern combining iterative reasoning and structured action — here applied to the publishing workflow.

Work with Neuronautix

Build a knowledge engine for your research domain

Neuronautix helps research teams design structured knowledge infrastructure — from metadata schemas to agent-assisted publishing pipelines. Contact us to discuss how this architecture could apply to your domain.