Scientific Foundations (Extended, Audit-Ready)

Axiom Cortex™ is a measurement instrument. Its purpose is to estimate latent cognitive properties and decision suitability from interview evidence without importing bias, guesswork, or style penalties. This document makes the math explicit, defines the evaluation protocol, and names the guardrails that prevent drift.


0) Scope & Claim Boundaries

We measure: conceptual fidelity to ideal answers, problem-solving behaviors under changing constraints, and collaboration signals—normalized for language proficiency.
We do not claim: personality inference, clinical diagnosis, or life-outcomes prediction. All effects are bounded to interview contexts and validated against engineering-task outcomes.


1) L2-Aware Mathematical Validation Layer

Goal: measure the signal (reasoning content) independent of delivery noise (accent, L2 grammar, code-switch tokens).

1.1 Proficiency-Normalized Scoring

Let an answer be tokenized into semantic carriers (S) and form tokens (F). A base communication score (C) is decomposed: [ C = \alpha \cdot C_{\text{sem}}(S) + \beta \cdot C_{\text{form}}(F), \qquad \alpha \gg \beta,\; \beta \rightarrow 0 \text{ as L2 uncertainty rises} ] L2 proficiency is estimated with a calibrated posteriors model; (\beta) is annealed so grammar/fluency variance is down-weighted while preserving content penalties for genuine ambiguity.

1.2 Cross-Lingual Semantic Fidelity (FSD)

We map answers to multilingual sentence embeddings with class-conditional Gaussians ((\mu_i, \Sigma_i)). Similarity to an Ideal Answer Blueprint (IAB) uses a Fréchet-style distance: [ \text{FSD}(1,2) = |\mu_1-\mu_2|_2^2 + \mathrm{Tr}!\left(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}\right) ] Low FSD ⇒ high conceptual closeness even when lexical choices reflect Spanish-influenced English.

1.3 Optimal Transport with Code-Switch Mask

We compute the 2-Wasserstein alignment between token distributions (P) and (Q) with a neutral cost for common bilingual markers (e.g., “pues”, “o sea”): [ W_2^2(P,Q)=\min_{\gamma\in\Pi(P,Q)} \sum_{i,j} c_{ij}\,\gamma_{ij}, \quad c_{ij}=\begin{cases} 0 & \text{if } (i,j)\in \mathcal{M}_{\text{codeswitch}}
d(w_i,w_j)^2 & \text{otherwise} \end{cases} ] Sinkhorn regularization ensures stable, fast solutions.

1.4 Differential Item Functioning (DIF)

For each rubric item (k), we test invariance across language groups at matched ability (\theta):

1.5 Calibration & Reliability for the Layer


2) Measurement & Alignment Models

2.1 Non-Parametric Latent Measurement

We avoid rigid score→trait assumptions:

2.2 Network Psychometrics (Skill Graphs)

We learn a Gaussian Graphical Model on skill indicators; edges encode partial correlations (conditional dependencies). Sparse structure via graphical lasso with stability selection yields a skill connectivity map, revealing true full-stack depth vs. keyword adjacency.

2.3 Active Interviewing via Information Gain

For adaptive sessions, next question (q^*) maximizes entropy reduction over traits (T): [ q^* = \arg\max_{q \in \mathcal{Q}} \left[ H(T) - \mathbb{E}_{a\sim p(a\mid q)}\,H(T\mid q,a) \right] ] This yields short, high-information interviews that preserve candidate experience.


3) Reliability, Monitoring, and Decision Theory

3.1 Generalizability Theory

We estimate variance components across facets (rater, question, time): [ G = \frac{\sigma^2{\text{universe}}}{\sigma^2{\text{universe}} + \sigma^2_{\text{error}}} ] Acceptance gates: minimum (G) per trait and per rubric dimension; alerts on drops.

3.2 Random Matrix Theory (Spurious Factor Guard)

We compare the empirical eigen-spectrum of embedding features to the Marchenko–Pastur support ([\lambda_-, \lambda_+]). Out-of-support spikes that do not replicate under bootstrap are flagged as noise and removed.

3.3 Constrained Bayesian Decision Theory

Final recommendation (\mathcal{R}) maximizes expected utility under fairness & competency constraints: [ \max_{\mathcal{R}} \; \mathbb{E}[U(\mathcal{R}\mid \mathbf{e})] \quad \text{s.t.} \quad \Pr[\text{Collab}<\tau_c]\le\epsilon_c,\; \text{DIF}k \le \delta,\; \text{Reliability } G \ge G{\min} ] Solved via Lagrangian relaxation; outputs include confidence intervals and gate justifications.


4) Evaluation Protocol (What We Actually Test)

4.1 Data & Splits

4.2 Metrics

4.3 Ablations

4.4 Red-Team & Failure Modes


5) Reproducibility & Auditability


6) Interpreting Scores (Human-Centered)


7) Glossary (Quick)


Required anchors (authority)