Validation

Sealed predictions, graded blind

3 min read

The strongest version of the case-study layer (Layer 4 of the engine's validation protocol) is a blind-graded run against a corpus the engine cannot have been calibrated on. This page is the public scorecard for that campaign.

What we tested

Phase 1 is 50 well-documented public figures whose birth datetimes we can cite from a public source — Astro-Databank, the subject's published autobiography, contemporaneous press, or the corresponding Wikipedia entry — and whose Rodden rating is AA, A, or (in a few cases) B. The list is in `src/data/validation-set.ts`. Every entry includes the Astro-Databank URL alongside the Wikipedia URL so the reviewer can independently verify the birth time without trusting our transcription.

The corpus is INTENTIONALLY DISJOINT from the figures in /famous-charts. The case-study layer is allowed to be self-aware (Steve Jobs's chart is the canonical worked example for a reason), but the validation campaign cannot be. We pick people the engine has never been hand-calibrated against.

Counts at a glance:

51 charts in Phase 1
44 rated AA or A by Rodden
7 rated B (lower confidence, explicitly flagged)

How we tested

For every figure, the engine generates a structural reading and we seal four fields: the day master (element + polarity), the chart structure (格局, or null when the engine declines to label one), the dominant element by weighted balance, and the day-master strength category with its favorable-element prescription. Those fields are written to `docs/validation-results/predictions-v1.json` with a schema version and a generation timestamp.

The independent reviewer (an outside person, not the chart owner, not someone who calibrated the engine) reads each sealed prediction next to the figure's biographical record and grades it as one of three things: CONSISTENT (the structural reading lines up with what the biographical record actually shows), INCONSISTENT (the structural reading and the biography disagree), or NULL (there's not enough biographical signal to grade either way). Grades and one-line notes go into the same module that produced the predictions, so the source of truth never desyncs.

We don't claim the model has to be CONSISTENT on every chart. Layer 4 of the validation protocol explicitly requires the case-study set to include at least one MISS per chart, because the model has limits and we want them on the record. The same expectation holds here. A score of 100/0/0 would be evidence of confirmation bias, not engine quality.

Where we are now

Phase 1 ships ungraded. The matrix below is the live scorecard — it reads directly from `VALIDATION_SET` and re-tallies on every build. Counts will fill in as the reviewer works through the corpus.

Grade	Count	% of total
Consistent — the structural reading lines up with the biography	0	0.0%
Inconsistent — the structural reading disagrees with the biography	0	0.0%
Null — not enough biographical signal to grade	0	0.0%
Ungraded — pending reviewer pass	51	100.0%

Schema version of the prediction snapshot: 1. Phase identifier: 1 (50 charts).

How we will get to 100

Phase 2 grows the corpus to 100 in a quarterly content campaign. Replacement criteria are deliberate: figures already covered by the case-study layer are excluded, the new additions sample more non-Anglophone history (the Phase 1 corpus skews modern, English-language, Western), and the Rodden rating must be AA or A unless the lower rating is explicitly flagged. Each phase ships its own JSON snapshot (`predictions-v2.json`, etc.) so a reviewer can compare scorecards across the engine's lifetime, not just the current one.

What this validation does and does not prove

It tests whether the engine's structural label, applied without knowledge of the figure's biography, produces a description that a knowledgeable third party would call a fair characterisation of the public record. It does not test whether BaZi as a typological framework is correct — that is an empirical claim the framework makes and the engine cannot validate. See the methodology page and `/learn/limits` for the editorial position.

The full layered methodology is at docs/validation-protocol.md in the repository (Layer 1: anchor, Layer 2: cross-implementation, Layer 3: attribution audit, Layer 4: this).

Back to About

Last reviewed: 2026-05-02