2 Frontmatter, Notation, and How to Read This Book
Purpose. Establishes the reading contract, target audience, notational conventions, and the dependency graph between chapters.
2.1 Who this book is for
This book is written for a graduate student or researcher who knows some linear algebra, real analysis, and has seen a feedforward neural network, but has no prior exposure to cellular sheaves, sheaf cohomology, or nonsmooth dynamics. The goal is to give you enough background to read Bosca and Ghrist [1] fluently, and enough scaffolding that the key results feel inevitable rather than mysterious.
The book is not a survey of sheaf theory or of deep learning. It stays narrow and deep: one construction (a cellular sheaf built from a feedforward ReLU network), four theorems (Lemma 3.2, Proposition 3.4, Theorem 4.1, Theorem 4.2 of the source paper), and three consequences (pinned neurons, sheaf-based training, spectral diagnostics). Everything else is prerequisite or illustration.
What you will be able to do after reading this book. You will be able to (1) construct the neural sheaf from a given ReLU network by hand, (2) state and prove the harmonic extension result and the convergence theorems, (3) explain why the forward pass is a special case of sheaf diffusion, (4) use the sheaf framework to reason about pinning, training, and diagnostics, and (5) identify where the path-graph argument breaks down for more complex architectures.
What you will not get here. Computational efficiency; large-scale experiments; sheaf neural network architectures (those run sheaf diffusion as a message-passing layer, which is a different idea — see [2], [3]). The framework’s present value is conceptual and diagnostic, not yet algorithmic at scale.
2.2 Assumed background
| Topic | Depth needed | Where to review |
|---|---|---|
| Linear algebra: inner products, eigenvalues, block matrices | Fluent | Any graduate text |
| Real analysis: ODEs, convergence, continuity | Solid | Rudin, or Tao |
| Graph theory: vertices, edges, paths, adjacency | Light | Ch. 2 sets what we need |
| Neural networks: forward pass, weights, ReLU | One course | The first two pages of [1] |
| Topology / sheaf theory | None required | We build everything |
| Nonsmooth analysis | None required | Ch. 3 covers what we need |
2.3 Chapter dependency graph
The chapters fall into three tiers. Part I (Chs. 1–3) builds prerequisites independently: Chapter 1 covers ReLU networks as CPWA maps, Chapter 2 covers graph Laplacians, and Chapter 3 covers nonsmooth dynamics. Each can be skimmed by a reader who already knows the material. Part II (Chs. 4–6) develops cellular sheaves from scratch, requiring only Chapters 0 and 2. Part III (Ch. 7) assembles Parts I and II into the neural sheaf construction. Parts IV–VI (Chs. 8–14) build on Part III.
Ch. 0
├── Ch. 1 (ReLU / CPWA)
│ └── Ch. 3 (nonsmooth dynamics)
├── Ch. 2 (graph Laplacians)
│ └── Ch. 4 (cellular sheaves)
│ ├── Ch. 5 (sheaf cohomology)
│ │ └── Ch. 6 (sheaf diffusion)
│ │ └── Ch. 7 (neural sheaf) ←── Ch. 1, Ch. 3
│ │ ├── Ch. 8 (harmonic extension)
│ │ ├── Ch. 9 (convergence: ReLU) ←── Ch. 3
│ │ ├── Ch. 10 (convergence: nonlinear output) ←── Ch. 3
│ │ ├── Ch. 11 (pinned neurons)
│ │ ├── Ch. 12 (sheaf-based training)
│ │ ├── Ch. 13 (diagnostics)
│ │ └── Ch. 14 (frontiers)
Skip recipes.
- “I know feedforward networks cold.” Skim Ch. 1 for the CPWA/activation-pattern vocabulary; skip the rest of that chapter.
- “I’ve done spectral graph theory.” Skim Ch. 2 up to the Dirichlet problem; pay attention to the boundary-conditions framing since it recurs constantly.
- “I know sheaves from algebraic topology.” Read Ch. 4 only to pick up the concrete Hansen–Ghrist graph-level notation; Ch. 5 will be fast.
- “I just want the main results.” Read Ch. 0, then jump to Ch. 7 and follow the backward pointers.
2.4 Notation
Throughout this book we follow the conventions of [1] and [4]. The table below resolves the most collision-prone symbols.
Symbol table
| Symbol | Meaning | First appears | Potential collision |
|---|---|---|---|
| \(\delta\) | Coboundary operator \(C^0 \to C^1\) | Ch. 4 | \(\delta\) as Dirac delta; \(\delta\) as variation |
| \(\partial\) | Clarke subdifferential \(\partial_C f\) | Ch. 3 | \(\partial\) as topological boundary |
| \(\mathcal{L}_\mathcal{F}\) | Sheaf Laplacian \(\delta^T\delta\) | Ch. 5 | \(L\) as loss (Ch. 12); \(L\) as graph Laplacian (Ch. 2) |
| \(\mathcal{F}\) | A cellular sheaf | Ch. 4 | \(F\) as forward map in some DL texts |
| \(\mathcal{F}(v)\) | Stalk at vertex \(v\) | Ch. 4 | — |
| \(\mathcal{F}_{v \leq e}\) | Restriction map from \(v\) to edge \(e\) | Ch. 4 | — |
| \(C^k(G;\mathcal{F})\) | Space of \(k\)-cochains | Ch. 5 | — |
| \(\widetilde{W}^{(\ell)}\) | Extended weight matrix at layer \(\ell\) (absorbs bias) | Ch. 7 | \(W^{(\ell)}\) without tilde = ordinary weight matrix (Chs. 1–6) |
| \(\tilde{a}^{(\ell)}\) | Extended post-activation vector (appended ones) | Ch. 7 | \(a^{(\ell)}\) without tilde = ordinary activation |
| \(z^{(\ell)}\) | Pre-activation vector at layer \(\ell\) | Ch. 1 | — |
| \(R_{z^{(\ell)}}\) | Diagonal ReLU projection for current activation pattern | Ch. 7 | — |
| \(\sigma\) | Activation pattern \(\sigma \in \{0,1\}^N\) | Ch. 1 | \(\sigma(\cdot)\) = sigmoid function (Ch. 10, always with argument) |
| \(\omega\) | Free (non-pinned) 0-cochain coordinates | Ch. 7 | \(\omega\) as angular frequency elsewhere |
| \(u\) | Fixed (pinned) boundary data | Ch. 7 | — |
| \(\lambda^*\) | \(\min_R \lambda_{\min}(\mathcal{L}_\mathcal{F}[\Omega,\Omega])\) — uniform convergence rate | Ch. 9 | \(\lambda_1\) = Fiedler value (Ch. 13) |
| \(\alpha\) | Step-size / rate in sheaf diffusion | Ch. 6 | — |
| \(\beta\) | Weight learning rate in joint dynamics | Ch. 12 | — |
Graph conventions
All graphs \(G = (V, E)\) are finite, connected, and undirected with a chosen orientation on each edge. An orientation assigns to each edge \(e\) a head vertex \(v^+(e)\) and a tail vertex \(v^-(e)\); reversing orientation flips the sign of the coboundary but not the Laplacian. A path graph \(P_n\) has vertices \(v_0, v_1, \ldots, v_n\) and edges \(e_i = v_{i-1} \to v_i\) for \(i = 1, \ldots, n\). The neural sheaf lives on a path graph (Ch. 7).
Sheaf conventions
All sheaves are cellular sheaves on graphs in the sense of [5]. Stalks are finite-dimensional real inner product spaces, identified with \(\mathbb{R}^n\) and the standard inner product. Restriction maps are identified with their representing matrices. The coboundary on an oriented edge \(e = u \to v\) is \[(\delta x)_e = \mathcal{F}_{v \leq e}\, x_v - \mathcal{F}_{u \leq e}\, x_u,\] so the downstream endpoint enters with a positive sign. The sheaf Laplacian is \(\mathcal{L}_\mathcal{F} = \delta^T \delta\), which is always positive semidefinite. The total discrepancy of a 0-cochain \(x\) is \(\|\delta x\|^2 = \langle x, \mathcal{L}_\mathcal{F} x \rangle\); it vanishes exactly when \(x\) is a global section.
The “Feedforward ↔︎ Sheaf” box
From Chapter 7 onward, two-column callout boxes present the same object in both languages. Here is the template format:
2.5 How to read the proofs
Proofs follow the convention: intuition → formalism → demonstration. Each theorem is preceded by a paragraph naming the key idea in plain language; the formal proof follows; exercises ask you to verify or extend the argument. Sections marked with \((\star)\) are more advanced and can be skipped on a first reading without losing the thread.
For the convergence theorems (Chs. 9–10), proofs rely on nonsmooth Lyapunov theory from Ch. 3. We state the Filippov and LaSalle machinery precisely but do not reproduce all details of Gould’s [6] convergence theory; we cite his results and explain what they are doing.
2.6 Exercises
0.1 (Warm-up — notation). Let \(G\) be the path graph \(P_2\) with vertices \(v_0, v_1, v_2\) and edges \(e_1 = v_0 \to v_1\), \(e_2 = v_1 \to v_2\). Write out the coboundary matrix \(\delta \in \mathbb{R}^{2 \times 3}\) for the constant sheaf \(\mathcal{F}(v) = \mathcal{F}(e) = \mathbb{R}\) with all restriction maps equal to 1. Compute \(\mathcal{L} = \delta^T\delta\) and verify it equals the standard combinatorial Laplacian for this path.
0.2 (Notation collision). Explain in one paragraph each the three distinct roles played by the symbol \(\delta\) in this book. Which chapter introduces each?
0.3 (Skip recipe). Suppose you already know spectral graph theory but have never seen a cellular sheaf. Write down the chapter sequence you would follow. Identify the single definition in Ch. 4 that represents the only genuinely new ingredient you need before reading Ch. 5.
0.4 (Project — dependency map). Draw the dependency graph from Section 2.3 as a Hasse diagram. Identify all pairs of chapters that can be read in either order without logical dependence.
2.7 Further reading
The primary reference for this book is [1]. The spectral foundation for cellular sheaves on graphs is developed in [5]; readers wanting a more categorical treatment should start with [7], whose thesis develops sheaves, cosheaves, and their applications from first principles. For the neural network side, [8] studies topological signatures of ReLU activation patterns from a complementary perspective. Ghrist’s Elementary Applied Topology [9] provides an accessible entry to the broader applied topology landscape.
2.8 FAQ / common misconceptions
Q: Is this book about “sheaf neural networks”? No. Sheaf neural networks ([2], [3]) are graph neural network architectures that use sheaf Laplacians as message-passing layers. This book is about something different: taking an existing feedforward network and embedding it inside a sheaf. The network architecture does not change; the sheaf is an interpretive lens.
Q: Do I need to know category theory? No. Everything in this book lives on finite graphs. Stalks are Euclidean spaces, restriction maps are matrices, and cochains are vectors. The category-theoretic formulation is beautiful but not necessary here; [7] develops it for those who want it.
Q: The symbol \(\delta\) appears as both the coboundary and as “small quantity” in analysis. Which is it? In this book, \(\delta\) is always the coboundary operator. We never use \(\delta\) as an infinitesimal or as the Dirac delta. When we need a small positive quantity we write \(\varepsilon\).