Skip to content

DVC methodology · Analysis pipelines · Knowledge base

Inside the DVC analysis pipeline: the metrics, the methods, and the metadata gap

Damien Huzard, PhD

A paper list tells you what was measured with a Digital Ventilated Cage. It does not tell you how the numbers were computed — and that "how" is exactly what you need to reproduce, standardise, or compare an analysis. So we read the methods sections, all of them, and turned them into a map. This note is that map, and it reads the way the data flow: one raw signal that forks into a connected family of biomarkers — activity, rhythm, place, and the cage environment — each with its recipe, its thresholds, and the cage configurations in which it is valid. The sting in the tail is that the binding constraint on reusing any of it is not the mathematics; it is getting the raw data, and its metadata, into a usable, FAIR form. It doubles as a small demonstration of how our knowledge base works as a second brain.

1. From a list of papers to a map of methods

The starting point was the public Tecniplast DVC® scientific-papers catalogue — the single best index of studies that use the Digital Ventilated Cage [16]. A catalogue, though, is a list of findings; for anyone who actually has to analyse DVC data, the load-bearing detail lives one level down, in the methods sections — the binning, the thresholds, the entropy parameters, the software [16]. This note collapses that level into one referenced reference sheet, so that a DVC user can see, metric by metric, what produced each number.

2. How we built this — a small agent team, in plain terms

Here is the honest behind-the-scenes. Rather than read two dozen papers end to end, we pointed a small team of AI research assistants at the open-access full texts behind that catalogue. Each one was given a handful of papers and a single job — pull out exactly how the data were processed — and they worked in parallel, like five people in a library each taking a shelf. Everything they found was checked against the original PDFs and merged into the Neuronautix knowledge base — a plain set of structured notes the assistants read before writing and add to after. That is the "second brain" idea in practice: the knowledge compounds, so the next question starts where the last one finished rather than from scratch. The result is meant to be two things at once: a genuine scientific resource for DVC users, and a worked example of why a curated, cited knowledge base beats a folder of unread papers.

3. One raw signal underneath everything

Before any metric, there is one stream of numbers. A board of twelve capacitive electrodes, arranged as a 4×3 grid, sits under the floor of each cage; every electrode's capacitance is sampled four times a second (every 250 ms), which adds up to roughly 2.5 MB per cage per day [1][2]. An electrode is said to "activate" when the change in its capacitance between successive samples exceeds a noise threshold calibrated on empty cages [1]. Every named metric below — activity, rest, circadian, spatial, even bedding wetness — is some transformation of that twelve-channel time series, which is why getting the signal right is the whole game [1][2]. That signal is not lab-specific: the activation readout reproduces across independent sites — CNR Rome, The Jackson Laboratory, and Karolinska — which is what licenses treating the recipes below as portable rather than one facility's artefact [3].

From that one stream, the rest of this note follows three branches, so the metrics read as a pipeline rather than a glossary:

  • The activity branchhow much the animal moves (activation / locomotion index, distance, rest and activity bouts) and the rhythm and place of that movement (rest-disturbance, circadian parameters, spatial preference).
  • The environmental branch — what the same electrodes sense about the cage rather than the animal (bedding moisture → Bedding Status Index → Urination Index).
  • The model branch — classifiers and indices layered on top of the first two (severity scoring, cage-change prediction).

One rule cuts across every branch and is worth stating once: metrics computed from the amount or rhythm of the activity signal — the Animal Locomotion Index (ALI) / activation, the rest-bout split, the Rest Disturbance Index, circadian parameters, and the electrode-region spatial metrics — stay valid for group-housed cages of any numerosity, because they read the cage-level signal; only the metrics that reconstruct an individual trajectory — distance walked and the per-animal bout decomposition — require single housing [2]. So ALI is the workhorse that travels across every cage type, and tracking is the high-resolution option you buy by housing one animal per cage.

4. The activity branch (1): activation for any cage, tracking for single housing

From the same raw stream, the literature derives activity in two distinct ways. The first, electrode-activation density (EAD), compares two short averaging windows of the capacitance signal and flags an activation when the absolute change clears the lowest threshold that still rejects empty-cage noise — λ = 1.25 — yielding a binary per-electrode signal that is then averaged per second; crucially, this works for group-housed cages [2]. The second, centroid tracking, reconstructs a position: each electrode's baseline is the maximum capacitance over a one-minute window, the centroid is the electrode coordinates weighted by their "signal drop" to about 1 mm resolution, motion is called when the centroid shifts at least 1 mm between samples, movement-on-the-spot is separated from true locomotion at a 65 mm stride length, and a long rest is any bout of at least 40 seconds — but this only works for a singly housed animal [2]. Reassuringly, the two agree closely — EAD tracks the distance measure at r > 0.95 and the two coincide about 97% of the time — so the cheaper, group-housed-friendly activation metric is a sound proxy for the richer tracking one [2].

5. The activity branch (2): rhythm and place

Still on the same activity series — so these all extend to group-housed cages — the famous indices turn out to be precise algorithms, not vibes. The Rest Disturbance Index — the rest-fragmentation biomarker that recurs across disease models — is computed from minute-binned activity by first zeroing everything below λ = 0.005, then applying a fourth-order Butterworth band-pass filter (normalised cut-offs of 1/2000 and 1/300), and finally taking the sample entropy with parameters m = 2 and r = 0.2 [4]. It has a single mathematical origin and is scale-invariant — every later paper that uses "RDI" points back to that one definition rather than redefining it [4][5]. Circadian phenotyping follows a different chain: a 30-minute running average, then a chi-square periodogram in the ImageJ plug-in ActogramJ, with inter-daily stability, intra-daily variability, and relative amplitude hand-coded in R from the Witting equations — all under a per-cage LED system delivering about 100 photopic lux so each cage is its own light-controlled chamber [6]. And the spatial metrics are read straight off the grid geometry: "frontality" is the share of activity over the six frontal electrodes (numbers 7–12), "wall activity" sums the left and right electrode columns as a thigmotaxis proxy, and a Gini index across the twelve electrodes captures how concentrated the activity is in one spot [8].

6. The activity branch (3): from cage to animal — and why that step is contested

The cage-level signal travels everywhere; turning it into a per-animal number is the step that does not come for free. The common approach is disarmingly simple — divide the hourly cage activity index by the number of mice in the cage, requiring only that each cage holds a single genotype or treatment — and it was validated against a conventional TSE LabMaster infrared beam frame by converting both systems to z-scores and comparing effect sizes [7]. It is worth being candid that this normalisation is not universally accepted: a cage aggregate cannot, in general, be cleanly attributed to one animal — group activity is super-additive when mice interact and sub-additive when they huddle — so dividing by occupancy is a defensible average, not a per-individual measurement, and any study leaning on it should justify the assumption for its own design rather than treat it as settled [7]. For phenotypes that bulk activity misses, an optional GYM500 running wheel logs voluntary-running distance on the very same electrode board, reading the wheel's magnets rather than the mouse [13].

7. The environmental branch: from bedding moisture to a metabolic readout

The most distinctive DVC metrics do not watch the animal at all; they watch its bedding. Increasing bedding moisture is sensed as a decline in the electrodes' electromagnetic-field strength and recorded as the Bedding Status Index (BSI) [10]. From that, the Urination Index is built as a fully deterministic transform — invert each moisture increment, drop the deltas in a window around every cage-insertion event (to reject handling and water-bottle artefacts), take a cumulative sum, and divide by housing density — calibrated at about 1.4 index units per millilitre of added water [10]. Its sibling, the cage-change model, instead trains machine learning on human annotations of soiled bedding, reaching over 90% accuracy at high density and supporting 3–6-week change intervals [9]. The Urination Index is also, as far as this corpus goes, the happy exception: its code is public — UrinatoR, an open-source, MIT-licensed R/Shiny app you can actually run [10][15].

8. The model branch: the transparent classifier worth copying

Not every "AI on DVC data" result is a black box, and one in particular is a model of restraint. A colitis-severity classifier used nothing more exotic than a logistic regression on two features — DVC activity and body weight — trained on a single day's data (day 7) with a probability cut-off of 0.5, and it reproduced the graded disease course [11]. At the more bespoke end, a narcolepsy study added a nest-identification algorithm and found that an inability to sustain activity for more than 40 minutes was itself a robust biomarker [12]. In our view the colitis model is the more reusable template precisely because it is simple, legible, and reimplementable from the paper alone [11].

9. The real bottleneck: raw data and its metadata

Pulling the whole map together exposes a gap that matters more than any single threshold — and it is not the mathematics. The raw electrode stream is, in principle, fully exportable — and that openness is genuinely valuable. In practice it is rarely usable: it tends to sit in per-facility silos, and, more decisively, it arrives without the metadata — strain, sex, age, housing density, cage-change schedule, light programme, interventions and their timestamps — that every recipe above quietly depends on. Raw values with no context are close to inert: you cannot choose the right baseline, exclude cage-change days, normalise per occupancy, or even tell whether a cage was single- or group-housed. The code gap compounds it: across essentially the entire corpus the analysis code is unreleased — statements say "in the article" or "on request," and apart from UrinatoR no public repository exists — so the transform is left as prose and the input it would consume is left under-described [10][15]. And the very first transformation — raw capacitance into the activity and bedding indices — runs inside the proprietary DVC Analytics platform, so even a motivated, well-provisioned user already starts one step downstream of the signal [1]. The practical consequence is that the binding constraint on reuse is data plus metadata, not algorithms: the recipes reproduce fine, but a recipe is worthless without well-described ingredients — exactly the FAIR-metadata, standard-format, and API discipline that recent home-cage-monitoring reviews call for [14].

10. The takeaway: one signal, a connected pipeline, a metadata-shaped gap

Read as a whole, the DVC is not a bag of metrics but a single capacitance signal that forks, by deterministic recipe, into activity, rhythm, place, and environment — and which branch you may legitimately use is set first by your cage configuration: ALI and the series-based indices for any cage and numerosity, trajectory metrics for single housing only. The recipes here — the EAD threshold, the RDI entropy parameters, the electrode-region maps, the Urination Index transform — are reproducible enough to re-implement; what actually decides whether a result is trustworthy and reusable sits upstream of all of them, in the raw data and the metadata that says what the numbers describe. Treat every metric as a provenance record — store the recipe, the baseline, the cage configuration, and the software version with each value — and the gap this note keeps returning to is the one worth closing first [14]. The meta-point is the one we set out to show: a curated knowledge base turned a vendor list of papers into a connected, cited methods map — and because it was written back into that base, the next DVC question we are asked already has this as its memory [16].

References

Work with Neuronautix

Build a reproducible DVC pipeline

Neuronautix provides independent consulting on Home-Cage Monitoring, FAIR metadata, behavioral data analysis, and digital biomarkers. If you want to reimplement a DVC metric from its published recipe, standardise your analyses, or make your home-cage data reusable, we can help you scope the derivation, the baselines, and the provenance.