23 Lab 07 — Build the Neural Sheaf

Anchor chapter: Chapter 7 — Building a Sheaf from a ReLU Network.

Goal. Given a torch.nn.Sequential with ReLU activations, programmatically construct the corresponding cellular sheaf on the path graph.

Construct the neural sheaf for an arbitrary \([n_0, n_1, \ldots, n_{k+1}]\) ReLU MLP as a data structure: vertex list, edge list, edge-type labels, restriction-map callables. Populate it from a trained PyTorch model, evaluate the coboundary \(\delta(\sigma)\) on a concrete input, and verify the dimensional coincidence \(\dim C^1 = \dim C^0 - n_0\) (Rem. 7.3). Includes a side-by-side visualisation of the PyTorch computation graph and the sheaf’s path-graph base.

In-browser execution not available — PyTorch required

This lab imports torch; PyTorch is not yet available in Pyodide. Run locally with the Plan B notebook below.

Plan B — run in Jupyter

Prefer a local Jupyter environment? Download lab-07-build-neural-sheaf.ipynb

Install dependencies: pip install torch numpy matplotlib networkx

23.1 Setup

import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import torch
import torch.nn as nn

rng = np.random.default_rng(0)
torch.manual_seed(0)
plt.rcParams['figure.figsize'] = (6, 4)

23.2 1. Build the object

We extract weight matrices and biases from a torch.nn.Sequential and build a NeuralSheaf object that stores vertex stalks \(\mathcal{F}(\ell) = \mathbb{R}^{n_\ell}\) and the activation-pattern-dependent coboundary \(\delta_\sigma\). For a \([n_0, n_1, \ldots, n_{k+1}]\) network, edge \(e_\ell = (\ell{-}1, \ell)\) has stalk \(\mathbb{R}^{n_\ell}\) with restriction maps \(\mathcal{F}_{\ell \trianglelefteq e_\ell} = I_{n_\ell}\) (identity) and \(\mathcal{F}_{(\ell-1) \trianglelefteq e_\ell}(c) = W^{(\ell)} \Sigma^{(\ell-1)} c\) (ReLU-gated weight). The coboundary is lower-triangular with identity blocks on the diagonal, confirming Lemma 3.2: \(\det \delta_\Omega = 1\).

class NeuralSheaf:
    """
    Cellular sheaf on a path graph induced by a ReLU MLP.

    Parameters
    ----------
    dims : list[int]   layer widths [n0, n1, ..., n_{k+1}]
    weights : list of (n_{l}, n_{l-1}) arrays   weight matrices W^(1)...W^(k+1)
    biases  : list of (n_{l},) arrays           bias vectors b^(1)...b^(k+1)
    """
    def __init__(self, dims, weights, biases):
        self.dims    = dims
        self.weights = weights
        self.biases  = biases
        self.n_layers = len(dims) - 1   # number of weight layers

    def forward(self, x):
        """Run the network; return all pre-activations z^(0),...,z^(k+1)."""
        zs = [x]
        for ell in range(self.n_layers):
            W, b = self.weights[ell], self.biases[ell]
            a_prev = np.maximum(0.0, zs[-1]) if ell > 0 else zs[-1]
            zs.append(W @ a_prev + b)
        return zs   # z^(0) = x (no ReLU), z^(1)...z^(k+1) pre-activations

    def activation_pattern(self, zs):
        """Return list of diagonal Σ^(ℓ) matrices from pre-activations."""
        # Σ^(0) = I (input has no ReLU), Σ^(ℓ) for ℓ=1,...,k
        sigmas = [np.eye(self.dims[0])]
        for ell in range(1, self.n_layers):
            sigmas.append(np.diag((zs[ell] > 0).astype(float)))
        return sigmas

    def coboundary_omega(self, x):
        """
        Build δ_Ω (free rows+cols = z^(1),...,z^(k+1)) for input x.
        Returns the square (Σ n_ℓ, ℓ=1..k+1) × (same) matrix.
        """
        zs     = self.forward(x)
        sigmas = self.activation_pattern(zs)
        dims_free = self.dims[1:]          # n1, ..., n_{k+1}
        N = sum(dims_free)
        delta = np.zeros((N, N))
        # offsets into the free-cochain vector
        offsets = np.concatenate([[0], np.cumsum(dims_free)])

        for ell in range(self.n_layers):   # ell = 0..k: edge e_{ell+1}
            n_head = self.dims[ell + 1]    # dimension of head vertex
            r0 = offsets[ell]              # row/col start for this edge's stalk
            c_head = offsets[ell]          # col start for head vertex z^(ell+1)
            # Head vertex ℓ+1: +I_{n_head}
            delta[r0:r0+n_head, c_head:c_head+n_head] += np.eye(n_head)
            # Tail vertex ℓ (free if ell >= 1):
            if ell >= 1:
                W   = self.weights[ell]    # shape (n_head, n_tail)
                Sig = sigmas[ell]          # shape (n_tail, n_tail)
                c_tail = offsets[ell - 1]
                n_tail = self.dims[ell]
                delta[r0:r0+n_head, c_tail:c_tail+n_tail] -= W @ Sig

        return delta

    def boundary_rhs(self, x):
        """
        Right-hand side vector b̃ for the harmonic extension equation
        δ_Ω c_Ω = b̃, where b̃ accounts for the pinned input vertex z^(0)=x.
        """
        dims_free = self.dims[1:]
        N = sum(dims_free)
        rhs = np.zeros(N)
        # Only edge e_1 sees z^(0)=x: contribution = W^(1) z^(0) + b^(1)
        n1 = self.dims[1]
        rhs[:n1] = self.weights[0] @ x + self.biases[0]
        # Remaining edges have bias-only boundary term (z^(0) does not appear)
        offset = n1
        for ell in range(1, self.n_layers):
            n_ell = self.dims[ell + 1]
            rhs[offset:offset + n_ell] = self.biases[ell]
            offset += n_ell
        return rhs


# ── Instantiate a [2, 4, 1] network from a PyTorch Sequential ────────────
net = nn.Sequential(nn.Linear(2, 4), nn.ReLU(), nn.Linear(4, 1))

dims    = [2, 4, 1]
weights = [net[0].weight.detach().numpy(),
           net[2].weight.detach().numpy()]
biases  = [net[0].bias.detach().numpy(),
           net[2].bias.detach().numpy()]

sheaf = NeuralSheaf(dims, weights, biases)

x_test = np.array([0.5, -1.2])
zs     = sheaf.forward(x_test)
delta  = sheaf.coboundary_omega(x_test)
rhs    = sheaf.boundary_rhs(x_test)

print(f"Architecture: {dims}")
print(f"dim C^0 = {sum(dims)},  dim C^1 = {sum(dims[1:])} = dim C^0 - n_0 (Rem. 7.3)")
print(f"δ_Ω shape: {delta.shape}  (should be {sum(dims[1:])} × {sum(dims[1:])})")
print(f"det(δ_Ω) = {np.linalg.det(delta):.6f}  (Lemma 3.2: should be 1)")

23.3 2. Verify a theorem / run an experiment

We verify Remark 7.3 (\(\dim C^1 = \dim C^0 - n_0\)), Lemma 3.2 (\(\det \delta_\Omega = 1\)), and Proposition 3.4 (forward pass = harmonic extension) on a concrete input. The harmonic extension solves \(\delta_\Omega c_\Omega = \tilde{b}\) by triangular back-substitution; the result should match net(x) to floating-point precision. The path-graph visualisation annotates each vertex with its stalk dimension and each edge with its type.

# ── Verify Prop. 3.4: harmonic extension = forward pass ─────────────────
c_harm = np.linalg.solve(delta, rhs)
z2_harm = c_harm[4:]                           # output stalk

z2_torch = net(torch.tensor(x_test, dtype=torch.float32)).detach().numpy()
print(f"\nHarmonic extension output z₂: {z2_harm}")
print(f"PyTorch forward pass f(x):    {z2_torch}")
print(f"Max absolute error:           {abs(z2_harm - z2_torch).max():.2e}")

# ── Check across 100 random inputs ──────────────────────────────────────
errors = []
for _ in range(100):
    xr = rng.standard_normal(2)
    dr = sheaf.coboundary_omega(xr)
    br = sheaf.boundary_rhs(xr)
    cr = np.linalg.solve(dr, br)
    z_harm = cr[dims[1]:]
    z_torch = net(torch.tensor(xr, dtype=torch.float32)).detach().numpy()
    errors.append(abs(z_harm - z_torch).max())

print(f"\nOver 100 random inputs: max error = {max(errors):.2e}")
print(f"Prop. 3.4 confirmed numerically: {max(errors) < 1e-8}")

# ── Visualise the path graph with stalk annotations ─────────────────────
G = nx.path_graph(len(dims))
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

pos = {v: (v, 0) for v in G.nodes()}
ax = axes[0]
nx.draw(G, pos=pos, ax=ax, with_labels=False, node_color='steelblue',
        node_size=800, edge_color='gray', width=2)
for v, (px, py) in pos.items():
    ax.text(px, py + 0.15, f'ℓ={v}', ha='center', va='bottom', fontsize=10, fontweight='bold')
    ax.text(px, py - 0.18, f'ℝ^{dims[v]}', ha='center', va='top', fontsize=9, color='white',
            bbox=dict(boxstyle='round', facecolor='steelblue', alpha=0.7))
for u, v in G.edges():
    mx, my = (u + v) / 2, 0
    ax.text(mx, my + 0.08, f'e_{u+1}\nℝ^{dims[v]}', ha='center', fontsize=8, color='darkred')
ax.set_title(f'Neural sheaf path graph  [{",".join(map(str,dims))}]')
ax.axis('off')

# Discord per edge at the harmonic extension
ax2 = axes[1]
edge_discords = []
n_prev = 0
for ell in range(sheaf.n_layers):
    n_ell = dims[ell + 1]
    row_block = delta[n_prev:n_prev + n_ell, :]
    d_e = abs(row_block @ c_harm - rhs[n_prev:n_prev + n_ell])
    edge_discords.append(np.linalg.norm(d_e))
    n_prev += n_ell
ax2.bar([f'e_{i+1}' for i in range(sheaf.n_layers)], edge_discords, color='steelblue')
ax2.set_ylabel('‖discord‖'); ax2.set_title('Per-edge discord at harmonic extension\n(should be ≈ 0)')
plt.tight_layout(); plt.show()

23.4 Exercises

Deeper network. Build a NeuralSheaf for a [2, 8, 8, 1] network. Verify that \(\det \delta_\Omega = 1\) and that Proposition 3.4 holds for 50 random inputs. How does the condition number of \(\delta_\Omega\) change with depth?
Singular activation patterns. When a neuron is exactly at the switching boundary (\(z_j = 0\)), the activation pattern is ambiguous. Create a synthetic input that puts one neuron on the boundary and compare the two choices of \(\Sigma\) (0 and 1 for that neuron). Do both give valid harmonic extensions?
Dimensional coincidence. Remark 7.3 states \(\dim C^1 = \dim C^0 - n_0\), which is what makes \(\delta_\Omega\) square. For a ResNet-style skip connection from layer 0 to layer 2, this dimensional coincidence fails. Construct such a network, show that \(\delta_\Omega\) is no longer square, and discuss what happens to the harmonic-extension interpretation.
Sheaf from biases. In the harmonic extension formulation the bias \(b^{(\ell)}\) appears as a boundary-forcing term, not in \(\delta_\Omega\) itself. Reformulate the sheaf by introducing a “bias vertex” with stalk \(\mathbb{R}^1\) pinned to 1 and edges carrying \(b^{(\ell)}\) as restriction maps. Does this change \(\det \delta_\Omega\)?