Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions master-thesis.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,34 @@ The stages of package installation
</ol>


### Escaping the Hermetic Boundary: What Automated Dependency Prefetch Tools Miss Across Software Ecosystems
Contact: Aman Sharma

Tools like [Hermeto][herm1] promise hermetic container builds by prefetching all declared dependencies before network isolation kicks in. In theory, the build then runs against a closed, auditable set of inputs. In practice, the hermetic guarantee is partial: Hermeto addresses the *declared dependency layer* — what appears in lockfiles like `package-lock.json`, `Cargo.lock`, or `requirements.txt` — but leaves the *toolchain and native dependency layer* to the user. Meanwhile, Nix offers a theoretically stronger model: content-addressed derivations, sandboxed builds, and a store that captures the full dependency closure including compilers and system libraries. An ecosystem of automated translation tools — `dream2nix`, `poetry2nix`, `cargo2nix` [2] — attempts to generate these derivations from standard lockfiles, but their actual hermetic coverage has never been systematically measured.

This thesis investigates the *hermetic gap*: the delta between what automated hermetic build tools declare as the dependency set and what the build actually consumes at runtime. Using syscall tracing (following the methodology of Zheng et al. [3]), the study will instrument builds of real-world projects across npm, pip, Cargo, Go, and Maven — first through Hermeto+container, then through auto-generated Nix derivations — and compare observed file accesses against declared inputs. The goal is not to build a new tool but to characterize where and why existing hermetic build approaches fail, and whether Nix's stronger model actually delivers on its theoretical advantage in practice.

**RQ1 (Hermetic Coverage):** What fraction of actual build-time and runtime dependencies are captured by automated hermetic build tools versus observed via syscall tracing, and how does this fraction vary across ecosystems?

**RQ2 (Escape Taxonomy):** When dependencies escape the hermetic boundary, what classes do they fall into — undeclared system libraries, build toolchain leakage, native extension bindings, or implicit platform assumptions — and which classes are addressable by the tool vs. inherent to the ecosystem?

Related Work:

[1] [Hermeto — prefetch CLI for hermetic container builds](https://github.com/hermetoproject/hermeto)

[2] [dream2nix — automated Nix derivation generation from package manager metadata](https://github.com/nix-community/dream2nix)

[3] [Zheng, Adams, Hassan — On Build Hermeticity in Bazel-Based Build Systems, IEEE Software 2025](https://mcislab.github.io/publications/2025/ieeesw-shenyu.pdf)

[4] [Lamb & Zacchiroli — Reproducible Builds: Increasing the Integrity of Software Supply Chains, IEEE Software 2021](https://arxiv.org/pdf/2104.06020)

[5] [SLSA — Supply-chain Levels for Software Artifacts framework](https://slsa.dev/)

[6] [The Design Space of Lockfiles Across Package Managers, Empirical Software Engineering 2025](https://arxiv.org/abs/2505.04834)

[herm1]: https://github.com/hermetoproject/hermeto


### Dependency Fingerprinting: Reconstructing Full Dependency Trees from Partial Observations
Contact: Aman Sharma, Eric Cornelissen

Expand Down
Loading