15 Spectral and Per-Edge Discord Diagnostics
Purpose. Develops the diagnostic toolkit: spectral signatures of the restricted Laplacian and per-edge discord measures localized to layer/operation type.
15.1 Key concepts & results
- Per-edge discord d_e(x) = ‖F_{u⊴e} x_u − F_{v⊴e} x_v‖²; sums to Dirichlet energy.
- Spectrum of L_free: eigenvalue clusters tied to layer types (affine vs ReLU vs output).
- Diagnosing bottlenecks (slow-converging modes localized to specific edges).
- Use during training: monitor per-layer / per-operation discord to detect dead neurons, vanishing gradients, etc.
Prerequisites: Ch 9, Ch 12
15.2 Motivating example
Train the [2, 30, 1] paraboloid network of Ch. 12 for a few hundred epochs, but this time record the per-edge discord \(d_e(x) = \|\mathcal{F}_{u \trianglelefteq e} x_u - \mathcal{F}_{v \trianglelefteq e} x_v\|^2\) at every step, organized as a layered heatmap with one row per edge (grouped by operation type: affine vs ReLU vs output) and one column per epoch. Several common pathologies light up immediately in this picture.
Dead ReLU neurons appear as flat zero rows on specific ReLU edges — a neuron whose pre-activation is always negative contributes no discord, because its restriction map \(R_{z^{(\ell)}}\) zeros it out on the post-activation side. Vanishing gradients show up as persistent near-zero discord on affine edges near the input, paired with large discord deeper in the network — the “signal” has nowhere to propagate. Output saturation (sigmoid stuck at \(0\) or \(1\)) shows up as a persistent hot stripe on the output edge. One figure, three diagnostics, each automatically labeled by layer and operation type. Compare this to the standard practice of probing gradient norms and activation statistics from a pile of opaque forward/backward buffers.
15.3 Intuition
Dirichlet energy \(E(x) = \tfrac{1}{2} \|\delta x\|^2 = \tfrac{1}{2} \sum_e d_e(x)\) is a single scalar that tells you how far the state is from harmonic — how badly the network’s internal representations disagree across the sheaf. Factor this sum by edge, and you get a where instead: which edges carry the discord, and how that attribution redistributes over training. Because each edge in the neural sheaf corresponds to a specific operation (affine, ReLU, or output) at a specific layer, the per-edge discord \(d_e\) is automatically labeled — “layer 3 ReLU is contributing 40% of the total residual energy” — making it the sheaf-side analogue of saliency for training dynamics.
The spectrum of the free Laplacian \(L_{\text{free}}\) tells a complementary, pre-training story. Its eigenvalues cluster by edge type (a cluster near \(\|W^{(\ell)}\|^2\) from affine edges, a cluster near \(1\) from ReLU edges, a cluster determined by the output operation), and its slowest-decaying eigenvector localizes on whichever layer has the weakest coupling. Running the heat equation once and watching which modes take longest to die out gives a zero-cost prediction of which layer will bottleneck training. Per-edge discord during training then tells you whether that bottleneck actually materializes or whether it dissolves as \(\theta\) moves.
The whole point is that these diagnostics are not bolted on; they are built into the object. In the feedforward view you have to decide what to probe (gradient norm, activation histogram, weight update magnitude) and then ask what it means. In the sheaf view the object you are already computing with — Dirichlet energy and its edgewise decomposition — is the diagnostic, labeled by operation type for free.
Intuition device (planned): Layered heatmap (layers × time) of per-edge discord, with color-coded operation types.
15.4 Formal development
[TO FILL: formal development — definitions, statements, careful notation]
15.5 Theorem demonstrations
[TO FILL: proofs / proof sketches of the key results named above. Proofs should come *after* the intuition section, as agreed.]
15.6 Worked examples
[TO FILL: worked example(s) carried out by hand]
15.7 Coding lab
lab-13-diagnostics-dashboard —
[TO FILL: one-paragraph description of the lab's goal]
15.8 Exercises
[TO FILL: 3–6 exercises, graded from warm-up to project-level]
15.9 Further reading
[TO FILL: annotated paragraph of 3–6 references]
15.10 FAQ / common misconceptions
[TO FILL: short Q&A for things readers frequently get wrong]