Skip to content

Crash in aptly db recover#1565

Open
muresan wants to merge 1 commit intoaptly-dev:masterfrom
muresan:fix/aptly-crash-db-recover
Open

Crash in aptly db recover#1565
muresan wants to merge 1 commit intoaptly-dev:masterfrom
muresan:fix/aptly-crash-db-recover

Conversation

@muresan
Copy link
Copy Markdown

@muresan muresan commented Apr 28, 2026

Fixes crash in aptly db recover

Crash

Notes:

  • my Go skills are not high enough to try to fix this by myself so I had to resort to Claude, maybe this helps a maintaner pinpoint the real issue if this one is not it
  • I cannot reproduce the crash any more because the db and pool are deleted now (no time and space available)

After an aptly pod was killed during a publish, it would not start back up and crash with:

Recovering database...
Checking database integrity...
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d55e44]

goroutine 1 [running]:
github.com/aptly-dev/aptly/cmd.Run.func1()
	github.com/aptly-dev/aptly/cmd/run.go:17 +0xc5
panic({0x2084f60?, 0x4623850?})
	runtime/panic.go:783 +0x132
github.com/aptly-dev/aptly/deb.(*PackageRefList).ForEach(...)
	github.com/aptly-dev/aptly/deb/reflist.go:83
github.com/aptly-dev/aptly/deb.FindDanglingReferences(...)
	github.com/aptly-dev/aptly/deb/find_dangling.go:15
github.com/aptly-dev/aptly/cmd.checkRepo(0xc0018082a0)
	github.com/aptly-dev/aptly/cmd/db_recover.go:63 +0xa4
github.com/aptly-dev/aptly/cmd.checkIntegrity.(*LocalRepoCollection).ForEach.func1({0xc0000e45a0?, 0xc00088b268?, 0x1?}, {0xc001116000, 0x7e, 0x80})
	github.com/aptly-dev/aptly/deb/local.go:229 +0xa7
github.com/aptly-dev/aptly/database/goleveldb.(*storage).ProcessByPrefix(0xc0008fa000?, {0xc00088b268, 0x1, 0x1}, 0xc0008f9240)
	github.com/aptly-dev/aptly/database/goleveldb/storage.go:114 +0x17d
github.com/aptly-dev/aptly/deb.(*LocalRepoCollection).ForEach(...)
	github.com/aptly-dev/aptly/deb/local.go:222
github.com/aptly-dev/aptly/cmd.checkIntegrity()
	github.com/aptly-dev/aptly/cmd/db_recover.go:51 +0x7f
github.com/aptly-dev/aptly/cmd.aptlyDBRecover(0x41132e?, {0xc000712730?, 0x2?, 0xc00089bdb0?})
	github.com/aptly-dev/aptly/cmd/db_recover.go:27 +0x93
github.com/smira/commander.(*Command).Dispatch(0xc0001f4900, {0xc000712730, 0x0, 0x0})
	github.com/smira/commander@v0.0.0-20140515201010-f408b00e68d5/commands.go:305 +0xd1
github.com/smira/commander.(*Command).Dispatch(0xc0001f5200, {0xc000712730, 0x1, 0x1})
	github.com/smira/commander@v0.0.0-20140515201010-f408b00e68d5/commands.go:283 +0x165
github.com/smira/commander.(*Command).Dispatch(0xc00074aea0, {0xc000712720, 0x2, 0x2})
	github.com/smira/commander@v0.0.0-20140515201010-f408b00e68d5/commands.go:283 +0x165
github.com/aptly-dev/aptly/cmd.Run(0xc00074aea0, {0xc000108280?, 0xc000078740?, 0x3c66368?}, 0x1)

I've ran this through Claude which generated the current PR, and the reasoning:

Now I have a complete picture of the bug. Here's the full cause chain:

Root cause — LoadComplete in deb/local.go:172-173

func (collection *LocalRepoCollection) LoadComplete(repo *LocalRepo) error {
    encoded, err := collection.db.Get(repo.RefKey())
    if err == database.ErrNotFound {
        return nil          // <-- returns nil error WITHOUT initializing repo.packageRefs
    }
    ...
    repo.packageRefs = &PackageRefList{}   // only set when key IS found
    return repo.packageRefs.Decode(encoded)
}

When a repo has no ref key in the DB (empty repo or corrupted key), LoadComplete returns nil without touching repo.packageRefs, leaving it as its zero value: nil.

Propagation — checkRepo in cmd/db_recover.go:63

dangling, err := deb.FindDanglingReferences(repo.RefList(), ...)
//                                           ↑ returns nil packageRefs

Crash — ForEach in deb/reflist.go:83

func (l *PackageRefList) ForEach(...) error {
    for _, p := range l.Refs {  // l is nil → SIGSEGV dereferencing nil pointer

There are two good places to fix this:

Fix 1 (root cause) — initialize packageRefs before the early return in LoadComplete:

func (collection *LocalRepoCollection) LoadComplete(repo *LocalRepo) error {
    repo.packageRefs = &PackageRefList{}   // move this up
    encoded, err := collection.db.Get(repo.RefKey())
    if err == database.ErrNotFound {
        return nil
    }
    if err != nil {
        return err
    }
    return repo.packageRefs.Decode(encoded)
}

Fix 2 (defensive) — nil guard in ForEach:

func (l *PackageRefList) ForEach(handler func([]byte) error) error {
    if l == nil {
        return nil
    }
    ...
}

I'd recommend both: Fix 1 is the correct semantic fix (an unfound ref key means an empty list, not a missing list), and Fix 2 makes ForEach safe to call on nil receivers as defensive programming, since PackageRefList is used widely throughout the codebase.

Requirements

All new code should be covered with tests, documentation should be updated. CI should pass.

Also, to speed up things, if you could kindly "Allow edits and access to secrets by maintainers" in the
PR settings, as this allows us to rebase the PR on master, fix conflicts, run coverage and help with
implementing code and tests.

Description of the Change

Checklist

  • allow Maintainers to edit PR (rebase, run coverage, help with tests, ...)
  • unit-test added (if change is algorithm)
  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable)
  • documentation updated
  • author name in AUTHORS

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.89%. Comparing base (a20eb68) to head (c5960d4).

Files with missing lines Patch % Lines
deb/reflist.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1565      +/-   ##
==========================================
- Coverage   77.22%   76.89%   -0.34%     
==========================================
  Files         161      161              
  Lines       15080    15082       +2     
==========================================
- Hits        11646    11597      -49     
- Misses       2291     2344      +53     
+ Partials     1143     1141       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant