9 Building a Sheaf from a Feedforward ReLU Network
Purpose. Constructs the central object of the paper: the cellular sheaf on the path graph whose vertices are intermediate quantities and whose edges encode affine, ReLU, and output operations.
9.1 Key concepts & results
- Path-graph base: one vertex per intermediate quantity.
- Three edge types: affine edges (W_ℓ + bias b_ℓ), ReLU edges (activation-pattern-indexed diagonal), output edges (identity or nonlinear).
- Stalk dimensions, restriction maps, the state-dependent sheaf view.
- Pinning the input vertex stalk = setting boundary data.
Prerequisites: Ch 1, Ch 4, Ch 5
9.2 Motivating example
Take the [2, 4, 1] running network from Ch. 1. Its computation graph is a short pipeline of intermediate quantities: input \(x \in \mathbb{R}^2\), pre-activation \(z^{(1)} \in \mathbb{R}^4\), post-activation \(a^{(1)} \in \mathbb{R}^4\), and output \(z^{(2)} \in \mathbb{R}\). Each arrow is a concrete operation — affine (\(W^{(1)} \cdot + b^{(1)}\)), elementwise ReLU, final affine. Lay this pipeline flat as an undirected path graph, attach a vector space at each vertex sized to the corresponding quantity, and encode each operation as a pair of restriction maps from its edge stalk into its two endpoint stalks. You now have a cellular sheaf \(\mathcal{F}\) on the path graph. The striking payoff, which Ch. 8 will prove, is that the forward pass of the network is the unique harmonic extension of the input data on this sheaf — the same Dirichlet problem from Ch. 2, but with \(\mathcal{F}\) in place of the constant sheaf \(\underline{\mathbb{R}}\).
In other words, the network’s sequential “run the layers in order” semantics is recovered as one particular way of solving a single, globally defined sheaf-theoretic equation. This chapter is about writing down that sheaf carefully; subsequent chapters exploit it.
9.3 Intuition
A sheaf wraps a directed computation graph inside a symmetric, undirected object. Instead of “input flows into output,” the sheaf view says: “neighboring vertices have to agree across each edge; pick the cochain that makes all those local agreements hold simultaneously.” Each edge \(e = (u, v)\) carries its own vector space \(\mathcal{F}(e)\) — the edge stalk, interpreted as a shared frame in which the two endpoints report their state. Restriction maps \(\mathcal{F}_{u \trianglelefteq e}\) and \(\mathcal{F}_{v \trianglelefteq e}\) push each endpoint’s cochain into that shared frame, and the edge discrepancy \(\mathcal{F}_{u \trianglelefteq e} x_u - \mathcal{F}_{v \trianglelefteq e} x_v\) measures how badly they disagree.
Three edge types encode the three kinds of operation in an MLP. Affine edges wire the pre-activation \(z^{(\ell)}\) to the previous post-activation \(a^{(\ell-1)}\): the restriction on the \(z\)-side is the identity, and the restriction on the \(a\)-side is the weight matrix \(W^{(\ell)}\) (with the bias folded in as an edge offset). ReLU edges wire \(a^{(\ell)}\) to \(z^{(\ell)}\): the restriction on the \(a\)-side is the identity, and the restriction on the \(z\)-side is the diagonal projection \(R_{z^{(\ell)}}\) from Ch. 1 — a \(0/1\) matrix picking out the active neurons. Output edges are either identity (regression) or nonlinear (sigmoid, softmax — Ch. 10). Pinning the input vertex stalk \(\mathcal{F}(v_x)\) to a concrete input vector is exactly the Dirichlet boundary data of Ch. 2.
The one subtlety is that the ReLU edge’s restriction map \(R_{z^{(\ell)}}\) depends on the sign of \(z^{(\ell)}\), which depends on the current cochain. So \(\mathcal{F}\) is a state-dependent sheaf: its restriction maps change as the cochain moves. For a fixed activation pattern \(\sigma \in \{0,1\}^N\), the sheaf is piecewise constant; across activation-region boundaries (Ch. 1) it switches. Chs. 8–9 will exploit the fact that within a pattern, the sheaf is linear and well-behaved, and that Dirichlet energy is a common Lyapunov function across all patterns.
Intuition device (planned): Side-by-side wiring diagrams: PyTorch computation graph on the left, cellular sheaf on the path graph on the right, with arrows linking each layer to its edge.
9.4 Formal development
[TO FILL: formal development — definitions, statements, careful notation]
9.5 Theorem demonstrations
[TO FILL: proofs / proof sketches of the key results named above. Proofs should come *after* the intuition section, as agreed.]
9.6 Worked examples
[TO FILL: worked example(s) carried out by hand]
9.7 Coding lab
lab-07-build-neural-sheaf —
[TO FILL: one-paragraph description of the lab's goal]
9.8 Exercises
[TO FILL: 3–6 exercises, graded from warm-up to project-level]
9.9 Further reading
[TO FILL: annotated paragraph of 3–6 references]
9.10 FAQ / common misconceptions
[TO FILL: short Q&A for things readers frequently get wrong]