29  Lab 13 — Diagnostics Dashboard

Anchor chapter: Chapter 13 — Spectral and Per-Edge Discord Diagnostics.

Goal. Compute per-edge discord, assemble \(L_{\text{free}}\), plot its spectrum, and generate a layer-vs-time discord heatmap for a trained network.

Build the per-edge discord dashboard of Def. 13.10 for a sheaf-trained [2, 4, 1] network on the paraboloid task. Log per-edge, per-example discords at every epoch; render the space–time heatmap with colour-coded edge types; compute the running spectrum of \(L_{\text{free}}(\sigma_t)\) to track \(\lambda_{\min}\) over training. Inject three controlled pathologies — a dead ReLU (hardcoded negative bias), a vanishing gradient (too-small \(W^{(1)}\)), an output saturation (sigmoid head with large pre-output) — and verify each produces the signature predicted by Props. 13.7–13.9.

TipRuns in your browser

This lab requires only NumPy, Matplotlib, and SciPy, loaded automatically via Pyodide. Code cells run directly in the page via WebAssembly — no local Python installation needed.

Prefer a local Jupyter environment? Download lab-13-diagnostics-dashboard.ipynb

29.1 Setup

29.2 1. Build the object

For each training sample \((x_i, y_i)\) and current weights \(\theta\), the per-edge discord \(d_e(x_i) = \|(\delta_\sigma c)_e\|^2\) measures how far the pre-activation cochain at \(x_i\) deviates from a global section on edge \(e\). We log this for every edge and every training example at each epoch, producing a 3D tensor (edges \(\times\) samples \(\times\) epochs). The free Laplacian \(L_{\text{free}}(\sigma) = \delta_\Omega(\sigma)^\top \delta_\Omega(\sigma)\) governs the fast-phase convergence rate; its smallest eigenvalue \(\lambda_{\min}\) is the diagnostic for convergence speed.

29.3 2. Verify a theorem / run an experiment

The discord heatmap (edges × epochs, averaged over samples) reveals which edges have the most tension at each stage of training. The \(\lambda_{\min}\) trajectory shows how the fast-phase convergence rate evolves. We then inject three controlled pathologies and compare their discord signatures: a dead ReLU (all hidden units inactive) produces zero discord on \(e_1\) and maximal discord on \(e_2\); a vanishing gradient (tiny \(W^{(1)}\)) produces nearly-zero discords everywhere but slow convergence; an output saturation (large pre-output magnitude) produces large \(e_2\) discord that barely moves even with gradient steps.

29.4 Exercises

  1. Per-sample heatmap. Instead of averaging over samples, plot the full (sample × epoch) discord heatmap for edge \(e_2\). Identify which training points have persistently high discord and check whether they lie near the boundaries of the activation regions.

  2. λ_min and convergence. According to the theory, the fast-phase convergence rate within each activation region is \(\lambda_{\min}(L_{\text{free}})\). Verify this empirically by running a short fast-phase integration from a random cochain and measuring the empirical decay rate. Compare to \(\lambda_{\min}\) computed from \(L_{\text{free}}\).

  3. Dead-ReLU recovery. Starting from the dead-ReLU initialisation, add a small gradient step on \(b_1\) to “unlock” some neurons (increase their bias slightly). Plot how the discord signature evolves as neurons become active.

  4. Spectral flow. Plot the full spectrum of \(L_{\text{free}}(\sigma)\) (all eigenvalues, not just \(\lambda_{\min}\)) at epochs 0, 20, 40, 60, 80. Use a violin plot or ridge plot. Does the spectral gap \(\lambda_2 - \lambda_1\) grow or shrink during training?