10  The Forward Pass as a Harmonic Extension

Purpose. Proves the paper’s foundational identification: the forward pass equals the unique harmonic extension of the input data on the neural sheaf.

10.1 Key concepts & results

  • Lemma 3.2 (unitriangular factorization): in coordinates adapted to an activation pattern, the restricted coboundary δ_free is unitriangular; det δ_free = 1 and L_free positive definite for every activation pattern.
  • Proposition 3.4 (forward pass = harmonic extension): H^0(G, U; F) = 0 and the minimizer of ½‖δc‖² matches forward(x).
  • Why the unitriangularity is ‘exactly the topological order of the computation graph.’

Prerequisites: Ch 5, Ch 7

10.2 Motivating example

Take the [2, 4, 1] sheaf from Ch. 7 with the input vertex pinned to a concrete value \(x_{\text{in}} = (1, -1)\). Write down the coboundary \(\delta\) in coordinates adapted to the current activation pattern — the \(0/1\) pattern of signs of \(z^{(1)}\) at \(x_{\text{in}}\). The resulting matrix is block lower-triangular with identity diagonal blocks. Solve \(\delta c = 0\) for the unknown interior cochain by back-substitution, reading values off one vertex at a time. The sequence of values you recover is, layer by layer, exactly what NumPy returns when you run forward(x_in): the pre-activation \(z^{(1)}\), the post-activation \(a^{(1)} = R_{z^{(1)}} z^{(1)}\), the pre-output \(z^{(2)} = W^{(2)} a^{(1)} + b^{(2)}\), and the prediction \(\hat{y} = z^{(2)}\). No iteration, no training loop — a single linear solve whose structure is the forward pass.

This is the content of Proposition 3.4 of [1]: once the sheaf is built, the forward pass is not a separate algorithm; it is the unique harmonic extension of the input data, made computable by the triangular structure of \(\delta\).

10.3 Intuition

A feedforward network is usually described by the order in which it runs: input first, then layer 1, then layer 2, and so on. The sheaf picture erases that order. It replaces the sequence of substitutions with a single global equation — “find the cochain whose edge discrepancies all vanish, subject to the input boundary data” — and then observes that the computation-graph topology makes this equation trivially solvable by back-substitution.

Concretely: label the vertices of the path graph in topological order — input, \(z^{(1)}\), \(a^{(1)}\), \(z^{(2)}\), output. Fix any activation pattern \(\sigma\) and write \(\delta\) in these coordinates. The key observation (Lemma 3.2) is that each edge’s restriction map on its downstream endpoint is the identity (affine edges read \(z^{(\ell)}\) directly; ReLU edges read \(a^{(\ell)}\) directly; the output edge reads \(\hat{y}\) directly). So the diagonal blocks of the restricted coboundary \(\delta_\Omega\) — the coboundary with input pinned and only the free (interior) vertices varying — are all identity matrices. The lower-triangular blocks come from the upstream restriction maps: \(W^{(\ell)}\), \(R_{z^{(\ell)}}\), or the output Jacobian. A block lower-triangular matrix with identity diagonal is unitriangular; its determinant is \(1\), so \(\delta_\Omega\) is invertible, and there is a unique cochain killing the discrepancies. Reading it off row by row is layer-by-layer evaluation.

Two reframes make the identification land. First, harmonic extension is the statement of the forward pass; sequential evaluation is the substitution order the unitriangular structure makes available. They are not two computations that happen to agree; they are the same computation named twice. Second, the determinant-\(1\) identity is not cosmetic — it is the unique-solvability certificate that upgrades “local agreement” to “global forward pass” without any \(\delta^T \delta\)-invertibility hypothesis. Ch. 14 will flag that this identity fails the moment the computation graph stops being a path (residual connections, attention), which is where most open problems live.

Intuition device (planned): Block-triangular animation showing δ assembled layer-by-layer, with diagonal blocks lighting up as identities.

10.4 Formal development

Throughout this section, fix a feedforward ReLU network with identity output (\(\varphi = I\)), its associated neural sheaf \(\mathcal{F}\) from Ch. 7, and a pinned input \(x \in \mathbb{R}^{n_0}\) on the boundary set \(U = \{v_x\}\). Let \(\Omega = V \setminus U\) denote the free vertices, let \(\sigma \in \{0,1\}^N\) be an activation pattern, and let \(\delta(\sigma)\) denote the coboundary of the neural sheaf with ReLU projections frozen at \(\sigma\).

The pinned Dirichlet problem

Definition 10.1 Definition 8.1 (Neural-sheaf Dirichlet problem). Fix \(\sigma\). The pinned Dirichlet problem on \(\mathcal{F}\) asks for a cochain \(c \in C^0(G; \mathcal{F})\) that (i) satisfies the boundary condition \(c_{v_x} = x\) and (ii) minimizes the Dirichlet energy \[E(c; \sigma) = \tfrac{1}{2} \|\delta(\sigma)\, c\|^2 = \tfrac{1}{2} \sum_{e \in E} \|(\delta c)_e\|^2.\]

The restricted coboundary \(\delta_\Omega(\sigma) : C^0(\Omega; \mathcal{F}) \to C^1(G; \mathcal{F})\) is \(\delta(\sigma)\) acting on free coordinates with the pinned coordinate frozen at \(x\); equivalently, \(\delta_\Omega(\sigma)\) is the block of \(\delta(\sigma)\) whose columns correspond to \(\Omega\). By the dimension count of Remark 7.3, \(\delta_\Omega(\sigma)\) is square: \(\dim C^1(G; \mathcal{F}) = \dim C^0(\Omega; \mathcal{F})\).

Lemma 3.2: unitriangular factorization

Order the free vertices in topological order along the path: \(v_{z^{(1)}}, v_{a^{(1)}}, v_{z^{(2)}}, v_{a^{(2)}}, \ldots, v_{z^{(k+1)}}, v_{\hat{y}}\). Order the edges in matching order: \(e_1^{\text{aff}}, e_1^{\text{act}}, \ldots, e_{k+1}^{\text{aff}}, e^{\text{out}}\). In these coordinates, the restricted coboundary has block structure

\[\delta_\Omega(\sigma) = \begin{pmatrix} I & & & & & \\ -R_{\sigma^{(1)}} & I & & & & \\ & -W^{(2)} & I & & & \\ & & -R_{\sigma^{(2)}} & I & & \\ & & & \ddots & \ddots & \\ & & & & -W^{(k+1)} & I \end{pmatrix},\]

where the first column block (not shown) absorbs the pinned contribution \(-W^{(1)} x - b^{(1)}\) into the edge offsets. Every diagonal block is an identity matrix.

Lemma 10.1 Lemma 8.2 (Unitriangular factorization — Lemma 3.2 of [1]). For every activation pattern \(\sigma \in \{0,1\}^N\), the restricted coboundary \(\delta_\Omega(\sigma)\) is block lower-triangular with identity blocks on the diagonal. In particular, \[\det \delta_\Omega(\sigma) = 1.\]

Since a unitriangular matrix is always invertible, \(\delta_\Omega(\sigma)^{-1}\) exists and can be computed by block back-substitution — which is layer-by-layer forward evaluation.

Corollary: the free Laplacian is positive-definite

Definition 10.2 Definition 8.3 (Free Laplacian). The free (pinned, restricted) Laplacian at activation pattern \(\sigma\) is \[L_{\text{free}}(\sigma) := \delta_\Omega(\sigma)^T \delta_\Omega(\sigma) \in \mathbb{R}^{\dim \Omega \times \dim \Omega}.\]

Corollary 10.1 Corollary 8.4. For every \(\sigma\), \(L_{\text{free}}(\sigma)\) is symmetric positive definite, with eigenvalues bounded away from zero by a constant \(\lambda_{\min}^{\text{free}} > 0\) depending only on the network parameters \(\theta\) (uniform over \(\sigma\)).

The corollary follows from Lemma 8.2 (invertibility of \(\delta_\Omega(\sigma)\)) and the finiteness of the set of activation patterns (uniformity over \(\sigma\)). It is the hypothesis that powers the convergence theorem of Ch. 9.

Proposition 3.4: forward pass = harmonic extension

Definition 10.3 Definition 8.5 (Relative cohomology). The relative \(H^0\) of the pinned pair \((G, U)\) is \[H^0(G, U; \mathcal{F}) := \ker \delta_\Omega(\sigma) / \{0\} = \ker \delta_\Omega(\sigma),\] the space of free cochains with zero edge discrepancy subject to the fixed boundary data. A cochain \(c\) with \(c_{v_x} = x\) and \(\delta c = 0\) is called a relative global section.

Proposition 8.6 (Forward pass as harmonic extension — Prop. 3.4 of [1]). Fix \(x \in \mathbb{R}^{n_0}\). Let \(\sigma = \sigma(\texttt{forward}(x))\) be the activation pattern realized at the forward-pass values. Then:

(i) \(H^0(G, U; \mathcal{F}) = 0\), i.e. the only relative global section at \(\sigma\) is zero discrepancy.

(ii) The unique minimizer \(c^\star\) of \(E(\cdot; \sigma)\) subject to \(c_{v_x} = x\) satisfies \(\delta c^\star = 0\), and its components are exactly the forward-pass values: \[c^\star_{v_{z^{(\ell)}}} = z^{(\ell)}(x), \quad c^\star_{v_{a^{(\ell)}}} = a^{(\ell)}(x), \quad c^\star_{v_{\hat{y}}} = \hat{y}(x).\]

(iii) Equivalently, \(c^\star\) is the unique harmonic extension of \(x\) under \(\mathcal{F}\) in the sense of Ch. 5 (Def. 5.4).

Three statements, all equivalent under Lemma 8.2. Part (i) says the Dirichlet problem has zero tension at the forward-pass value; (ii) identifies the minimizer explicitly; (iii) phrases the identification in the harmonic-extension language of Ch. 2/Ch. 5.

Remark 8.7 (Why “topological order” appears). The unitriangular structure of Lemma 8.2 is a direct consequence of the computation graph being a directed acyclic graph admitting a topological order. The identity-on-the-downstream-side property of each restriction map — \(\mathcal{F}_{v_{z^{(\ell)}} \trianglelefteq e_\ell^{\text{aff}}} = I\), \(\mathcal{F}_{v_{a^{(\ell)}} \trianglelefteq e_\ell^{\text{act}}} = I\), \(\mathcal{F}_{v_{\hat{y}} \trianglelefteq e^{\text{out}}} = I\) — aligns exactly with the topological ordering of vertices, making the downstream-column entries identities and the upstream-column entries \(-W^{(\ell)}\) or \(-R_{\sigma^{(\ell)}}\) sit below the diagonal.

Remark 8.8 (What the unitriangular factorization buys that Cor. 8.4 does not). Positive-definiteness alone would already guarantee a unique minimizer (by strict convexity of \(E\)), but would not tell us that the minimizer has zero discrepancy at every edge. Unit determinant — equivalently, bijectivity of \(\delta_\Omega(\sigma)\) — gives \(\ker \delta_\Omega = 0\), hence \(H^0(G, U; \mathcal{F}) = 0\). Without Lemma 8.2, the forward pass and the harmonic extension could differ by a globally non-trivial section.

10.5 Theorem demonstrations

Proof of Lemma 8.2. Order the free vertices along the path as \(v_{z^{(1)}}, v_{a^{(1)}}, \ldots, v_{z^{(k+1)}}, v_{\hat{y}}\) and the edges as \(e_1^{\text{aff}}, e_1^{\text{act}}, \ldots, e_{k+1}^{\text{aff}}, e^{\text{out}}\). The row of \(\delta_\Omega(\sigma)\) corresponding to edge \(e\) has only two non-zero column blocks: the upstream endpoint (if it is in \(\Omega\)) contributes \(-\mathcal{F}_{u \trianglelefteq e}\), and the downstream endpoint contributes \(+\mathcal{F}_{v \trianglelefteq e}\). By the construction of Ch. 7.1, the downstream restriction map of every edge type is an identity: affine edges have \(\mathcal{F}_{v_{z^{(\ell)}} \trianglelefteq e_\ell^{\text{aff}}} = I\), ReLU edges have \(\mathcal{F}_{v_{a^{(\ell)}} \trianglelefteq e_\ell^{\text{act}}} = I\), and the output edge has \(\mathcal{F}_{v_{\hat{y}} \trianglelefteq e^{\text{out}}} = I\). With the chosen ordering, the downstream block of each edge sits on the diagonal; the upstream block sits strictly below. Hence \(\delta_\Omega(\sigma)\) is block lower-triangular with identity diagonal blocks, and the determinant of a block unitriangular matrix equals the product of its diagonal determinants, namely \(1\). \(\square\)

Proof of Cor. 8.4. For any \(\sigma\), Lem. 8.2 gives \(\det \delta_\Omega(\sigma) = 1 \neq 0\), so \(\delta_\Omega(\sigma)\) is invertible and \(L_{\text{free}}(\sigma) = \delta_\Omega(\sigma)^T \delta_\Omega(\sigma)\) is symmetric positive definite: for any nonzero \(v\), \(v^T L_{\text{free}}(\sigma) v = \|\delta_\Omega(\sigma) v\|^2 > 0\). The uniform bound \(\lambda_{\min}^{\text{free}} > 0\) follows because \(\sigma\) ranges over the finite set \(\{0,1\}^N\): taking \(\lambda_{\min}^{\text{free}} := \min_\sigma \lambda_{\min}(L_{\text{free}}(\sigma))\) yields a positive minimum of finitely many positive numbers. The bound depends on \(\theta\) through the weights \(W^{(\ell)}\) but is independent of the cochain \(c\). \(\square\)

Proof of Prop. 8.6. Fix \(x\) and let \(c^\star\) be the forward-pass cochain with components \(z^{(\ell)}(x), a^{(\ell)}(x), \hat{y}(x)\), and let \(\sigma = \sigma(c^\star)\). By construction, each edge agreement condition (Ch. 7.1) — \(W^{(\ell)} a^{(\ell-1)} + b^{(\ell)} = z^{(\ell)}\), \(R_\sigma z^{(\ell)} = a^{(\ell)}\), \(z^{(k+1)} = \hat{y}\) — is the definition of forward propagation, so \(\delta c^\star = 0\). Hence \(E(c^\star; \sigma) = 0\). Since \(L_{\text{free}}(\sigma)\) is positive definite (Cor. 8.4), the quadratic \(E(c; \sigma) = \tfrac{1}{2}(c - c^\star)^T L_{\text{free}}(\sigma) (c - c^\star)\) on the pinned affine subspace \(\{c : c_{v_x} = x\}\) has a unique minimizer, and that minimizer is \(c^\star\). This proves (ii). For (i): \(\ker \delta_\Omega(\sigma) = 0\) because \(\delta_\Omega(\sigma)\) is invertible (Lem. 8.2); translating via the pinned boundary, the space of relative global sections is a single point, and \(c^\star\) realizes \(\delta c^\star = 0\) in it, so \(H^0(G, U; \mathcal{F}) = 0\). For (iii): the harmonic-extension characterization of Def. 5.4 says that a cochain minimizing Dirichlet energy subject to boundary data and satisfying \(\delta c = 0\) is the harmonic extension; (i) and (ii) give exactly that. \(\square\)

10.6 Worked example: solving the Dirichlet problem on the [2, 2, 1] sheaf

Continue the [2, 2, 1] sheaf from the worked example of Ch. 7. Pin the input vertex to \(x = (1, 2)^T\); the realized activation pattern is \(\sigma = (1, 1)\) so \(R_{z^{(1)}} = I_2\).

The restricted coboundary. Stacking the free cochain as \(y = (c_{z^{(1)}},\ c_{a^{(1)}},\ c_{z^{(2)}},\ c_{\hat{y}}) \in \mathbb{R}^6\) and the 1-cochain as \((\delta y)_{e_1^{\text{aff}}},\ (\delta y)_{e_1^{\text{act}}},\ (\delta y)_{e_2^{\text{aff}}},\ (\delta y)_{e^{\text{out}}} \in \mathbb{R}^6\), the restricted coboundary at \(\sigma = (1,1)\) is the \(6 \times 6\) block matrix \[\delta_\Omega(\sigma) = \begin{pmatrix} I_2 & 0 & 0 & 0 \\ -I_2 & I_2 & 0 & 0 \\ 0 & -W^{(2)} & I_1 & 0 \\ 0 & 0 & -I_1 & I_1 \end{pmatrix},\] where the first block-row absorbs the pinned contribution \(-W^{(1)} x - b^{(1)} = -(1, 1)^T\) into the \(e_1^{\text{aff}}\)-edge offset.

Observation. Block lower-triangular with identity diagonals — exactly Lemma 8.2. So \(\det \delta_\Omega = 1\), and a unique solution exists.

Back-substitution. Set \(\delta_\Omega y - (\text{pinned offsets}) = 0\) and solve row by row.

  • Row 1 (\(e_1^{\text{aff}}\)): \(c_{z^{(1)}} - (W^{(1)} x + b^{(1)}) = 0 \Rightarrow c_{z^{(1)}} = (1,1)^T\).
  • Row 2 (\(e_1^{\text{act}}\)): \(c_{a^{(1)}} - R_\sigma c_{z^{(1)}} = 0 \Rightarrow c_{a^{(1)}} = (1,1)^T\).
  • Row 3 (\(e_2^{\text{aff}}\)): \(c_{z^{(2)}} - W^{(2)} c_{a^{(1)}} - b^{(2)} = 0 \Rightarrow c_{z^{(2)}} = 2\).
  • Row 4 (\(e^{\text{out}}\)): \(c_{\hat{y}} - c_{z^{(2)}} = 0 \Rightarrow c_{\hat{y}} = 2\).

The solution is the forward pass. The back-substituted cochain \(c^\star = ((1,2),\ (1,1),\ (1,1),\ 2,\ 2)\) has exactly the component values of \(\texttt{forward}(x)\) from Ch. 7’s worked example. Zero discord at every edge, so \(H^0(G, U; \mathcal{F}) = 0\), matching Prop. 8.6.

Dirichlet energy verification. \(E(c^\star; \sigma) = \tfrac{1}{2} \|\delta c^\star\|^2 = 0\). For any perturbation \(c^\star + \eta\) with \(\eta_{v_x} = 0\), strict convexity of \(E\) gives \(E(c^\star + \eta; \sigma) > 0\) whenever \(\eta \neq 0\) — so \(c^\star\) is the unique minimizer.

What happens at a different pattern? Re-run at \(x = (-1, -1)\). Now \(\sigma = (0, 0)\), so \(R_\sigma = 0\). Back-substitution gives \(c_{z^{(1)}} = (-1, -2)\), \(c_{a^{(1)}} = 0\), \(c_{z^{(2)}} = 0\), \(c_{\hat{y}} = 0\) — again matching the forward pass. Notice the \(\delta_\Omega\) matrix changed (its \(e_1^{\text{act}}\) block-row went from \((-I_2, I_2)\) to \((0, I_2)\)), but remained block lower-triangular with identity diagonals. Lemma 8.2 is uniform in \(\sigma\).

10.7 Coding lab

Lab 08 — Verify Proposition 3.4 — Build the restricted coboundary \(\delta_\Omega(\sigma)\) for a small pretrained MLP (e.g., a [2, 8, 1] regressor) as a dense matrix. Verify Lem. 8.2 numerically (\(\det \delta_\Omega = 1\) up to floating-point noise), solve the pinned Dirichlet problem by one triangular back-substitution, and compare the solution component-by-component against forward(x) from PyTorch. Repeat across 100 random inputs to confirm Prop. 8.6 holds independently of which activation pattern is realized.

10.8 Exercises

  1. (warm-up) For the [2, 2, 1] example, write down \(\delta_\Omega\) at \(\sigma = (1, 0)\) in full and verify that all diagonal blocks are identities.
  2. (warm-up) Compute \(\det \delta_\Omega\) by expanding along the first column for a [3, 3, 2] network with generic weights at a fixed pattern, and confirm it equals \(1\).
  3. (intermediate) Show that if we replace the ReLU edge’s downstream restriction \(I_{n_\ell}\) by any invertible diagonal matrix \(D_\ell\), the unit-determinant conclusion of Lem. 8.2 becomes \(\det \delta_\Omega = \prod_\ell \det D_\ell\). What goes wrong in the derivation of Prop. 8.6?
  4. (intermediate) Using the block structure of \(\delta_\Omega\), derive a closed-form expression for \(L_{\text{free}}(\sigma)^{-1}\) in terms of successive layer matrices \(W^{(\ell)} R_{\sigma^{(\ell-1)}}\). Relate it to the Jacobian of the forward pass.
  5. (advanced) Construct an explicit counterexample — a non-feedforward sheaf on a three-vertex path — where the downstream restriction maps are not identities and \(\delta_\Omega\) has \(\det = 0\). Explain why no ordering of edges can save it.
  6. (project) Quantify the numerical stability of the back-substitution in floating point vs a full LU solve of \(L_{\text{free}}(\sigma) y = b\). Sweep network depth \(k\) from 2 to 20 and plot condition numbers; discuss the result in light of Open Problem 14.6.

10.9 Further reading

[1] §3.1–3.2 is the primary source — Lemma 3.2 and Prop. 3.4 are the namesakes here. For the general theory of harmonic extensions on cellular sheaves, see Ch. 5 of this book (built from [2]) and [3]. The unitriangular / topological-order connection is folklore in automatic differentiation (see any standard AD textbook for the chain-rule-on-a-DAG perspective). For an alternative derivation of the forward pass as fixed-point iteration, see the deep-equilibrium models line of work surveyed in [1] §7.

10.10 FAQ / common misconceptions

Is Prop. 8.6 saying backprop is a harmonic-extension problem? Not quite — it says the forward pass is. Backprop is about parameter gradients and is the subject of Ch. 10 (Prop. 10.5).

Why does \(H^0(G, U; \mathcal{F}) = 0\) matter? Isn’t positive-definiteness of \(L_{\text{free}}\) enough? Positive-definiteness gives a unique minimizer, but doesn’t guarantee the minimizer has zero edge discord. \(H^0 = 0\) upgrades “unique minimizer” to “unique global section,” i.e. zero discord everywhere (Rem. 8.8). For path graphs these are equivalent; in Ch. 14 we see them come apart.

What if two adjacent activation patterns give different harmonic extensions? They can’t — within each pattern the harmonic extension is the forward pass, and the forward pass is uniquely defined per input regardless of the sheaf-theoretic bookkeeping. The only ambiguity is at the measure-zero boundary, handled by the Filippov setup of Ch. 9.

Does the choice of edge-ordering change anything? No. Any topological ordering of edges makes \(\delta_\Omega\) block lower-triangular with identity diagonals; different orderings permute rows but preserve \(\det = 1\).