Skip to content

Aggregated block proof - devnet5#717

Merged
tcoratger merged 22 commits into
mainfrom
aggregated-block-proof
May 20, 2026
Merged

Aggregated block proof - devnet5#717
tcoratger merged 22 commits into
mainfrom
aggregated-block-proof

Conversation

@anshalshukla
Copy link
Copy Markdown
Collaborator

@anshalshukla anshalshukla commented May 14, 2026

🗒️ Description

Implements multi-message recursive block signature aggregation. All the attestation and block signatures are coalesced into a single block proof.

The blocks are deconstructed for additional attestation_data when they can help move the target. These new payloads are only aggregated when the chain is in sync as catching up nodes should not spam the network.

coverage-gate, fill-tests, and interop-tests all switched ubuntu → macos in this PR, and Justfile dropped -n auto from fill-ci. This might be due to inapt compilation target of linux bindings via leanMultisig-py which lead to 4x CI runs in linux based systems. Will be looked upon in followup PR.

✅ Checklist

  • Ran local quality checks to avoid unnecessary CI fails:
    just check
  • Considered adding appropriate tests for the changes.
  • Considered updating the online docs in the ./docs/ directory.

@anshalshukla anshalshukla force-pushed the aggregated-block-proof branch from 2069744 to c7c93d5 Compare May 14, 2026 04:45
@anshalshukla anshalshukla force-pushed the aggregated-block-proof branch from 2c4babc to 4d0630b Compare May 15, 2026 05:17
@anshalshukla anshalshukla marked this pull request as ready for review May 16, 2026 17:52
Comment thread src/lean_spec/forks/lstar/containers/block/block.py Outdated
Comment thread src/lean_spec/forks/lstar/spec.py
Comment thread src/lean_spec/subspecs/sync/service.py Outdated
Copy link
Copy Markdown
Contributor

@g11tech g11tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my major review of this PR:

https://github.com/leanEthereum/leanSpec/pull/717/changes#r3257938656

I think split and restitching belongs with proposer and not an aggregator, or atleast needs to be backed up by proposer as well incase it didn't see aggregator's restitched payload which could be a better solution I guess

so I recommend such split and restitching in block proposal build block as well

@anshalshukla
Copy link
Copy Markdown
Collaborator Author

The only problem in restitching in block proposal as well is that it will further slow the block building process which will eventually lead to missed slots. Or we will have to bump up the spec for validators.

@g11tech
Copy link
Copy Markdown
Contributor

g11tech commented May 18, 2026

The only problem in restitching in block proposal as well is that it will further slow the block building process which will eventually lead to missed slots. Or we will have to bump up the spec for validators.

we still need it as backup otherwise block proposal will fail if proposer can't repack attestations from a branch which moved justification/finalization

so we can still have / expect aggregators to do this job and do heavy lifting but in case of anything that is missing, we still need proposer to be able to do this.

expecting proposers to have a bit more compute itsn't too unrealistic imo, by 2028/2029 we might have more compute as well as optimizations available on a commodity hardware

@anshalshukla
Copy link
Copy Markdown
Collaborator Author

Okay, I've updated it to do deconstruction in onBlock even in case of validators and not just in case of aggregators. I think it is better to continue to do it at the same place and update the store than to look for blocks which can help justify in the build_block function

Copy link
Copy Markdown
Collaborator

@tcoratger tcoratger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anshalshukla, thanks a lot for this. I've just went through a first careful review through a Claude discussion. I've noticed a couple of points that I think are valid. Please let me know what you think about them. I've let Claude pushed the comments since he is very clear and precise in his explanations.


the 8f9bdc1 move out of the aggregator-only branch is good, but there's a small bug in the new gate (see the inline at sync/service.py:743). Also that helper currently only fires on the gossip path — blocks that arrive via sync/backfill skip it — so a node catching up still doesn't end up with per-attestation Type-1s.

The verify_signatures binding — the proof binds (message_i, slot_i, pubkeys_i) per component internally, but the spec never compares those internal (message_i, slot_i) against hash_tree_root(att_i.data) / att_i.data.slot from the block body. The snark verifies that the supplied pubkeys signed the proof's internal messages — not that those messages match the body. Details inline at spec.py:913. Worth a sync with the lean-multisig folks on whether they'd rather take a verify_type_2_with_messages(...) upstream, or expose info after decompress_without_pubkeys so the spec can compare in Python.

The latest_known_aggregated_payloads seedingon_block still inserts empty sets per attestation (spec.py:1296-1299) and the comment block above still describes the pre-PR behavior. The local write-back you added in _maybe_publish_reaggregated_attestations_from_block lands in latest_new_aggregated_payloads, not latest_known, so on import the block-imported votes still don't feed fork-choice weight on the receiver. Inline at spec.py:1292.

Test coveragetest_reaggregate_from_block.py's own header acknowledges the happy path can't be exercised because split_type_2_by_msg isn't in test-mode bindings yet. The three test bodies all assert published == [], so the ~50 lines that actually do the split/merge are uncovered. Might be worth a line in the PR description so it doesn't slip through as "tested".

CIcoverage-gate, fill-tests, interop-tests switched to macos-latest and Justfile dropped -n auto from fill-ci. GH Actions macos runners are roughly 10× the ubuntu cost — if the switch was working around a flake or OOM, would be great to capture the why in the PR description (xdist + setup_prover's global side-effects in lean_multisig_py is the likely suspect).

The rest are smaller — comments/docstrings that drifted and a couple of typos. Left inline so they're easy to scroll through.

Comment thread src/lean_spec/forks/lstar/spec.py Outdated
Comment thread src/lean_spec/forks/lstar/spec.py Outdated
Comment thread src/lean_spec/forks/lstar/spec.py Outdated
Comment thread src/lean_spec/forks/lstar/spec.py Outdated
Comment thread src/lean_spec/subspecs/sync/service.py Outdated
Comment thread .github/workflows/ci.yml
Comment thread Justfile
Comment thread src/lean_spec/subspecs/xmss/aggregation.py
Comment thread src/lean_spec/forks/lstar/store.py Outdated
@anshalshukla
Copy link
Copy Markdown
Collaborator Author

@tcoratger you can take a look again

tcoratger and others added 3 commits May 19, 2026 19:35
Resolve conflicts in src/lean_spec/subspecs/sync/service.py against
main's recent readability sweep:

- Drop set_publish_agg_fn; reuse main's publish_aggregated_attestation
  as a direct dataclass field on SyncService.
- Drop the standalone _check_sync_trigger; main inlined the SYNCING
  trigger into on_peer_status.
- Drop ZERO_HASH and _ancestor_set imports; main inlined the ancestor
  walk in _default_process_block.
- Keep _pending_block_aggregates, _deconstruct_block_into_store, and
  _publish_pending_block_aggregates from this branch; restore the
  publish-drain call after _replay_pending_attestations in
  on_gossip_block.

While reconciling the deconstruction logic, take the opportunity to:

- Remove the validator_id is None early-return so non-validator nodes
  also accrue block-imported attestation weight after the next
  acceptance tick.
- Gate _pending_block_aggregates.extend on self.is_aggregator so
  non-aggregators no longer republish to gossip.
- Hoist the per-message pubkey layout out of the per-attestation loop.
- Narrow the post-decode except clause to (ValueError, IndexError).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix the broken sentence in LstarSpec.verify_signatures' docstring
  (dangling colon now reads as a coherent description of what the
  Type-2 proof binds).
- Delete test_noop_when_not_a_validator: the validator-id early-return
  it exercised was removed so non-validator nodes also accrue
  block-imported attestation weight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes flagged in the second round of consensus and Python architecture
review:

- sync/service.py: widen the decode_bytes catch to (SSZError,
  ValueError, IndexError). Pre-fix narrowing missed the SSZ exception
  family raised by Container deserialization, so a malformed proof
  would crash _process_block_wrapper instead of being demoted to a
  debug log.
- sync/service.py: align the per-attestation loop variable name with
  the second loop.
- validator/service.py: validate each participant index against the
  active validator set before indexing validators[vid] in _sign_block,
  raising a clear ValueError instead of a deep KeyError if a stale
  partial aggregate is passed in.
- spec.py: document the one-slot deferral of fork-choice weight from
  block-imported attestations.  The empty-set seeding plus the
  subsequent acceptance-tick migration is intentional and not a
  missed update_head call.
- test_aggregation.py: add five Type-2 unit tests covering the
  aggregate-rejects paths, the verify round-trip, the message-swap
  rejection (pins the per-component binding security property), and
  the length-mismatch rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tcoratger tcoratger merged commit ac5f259 into main May 20, 2026
24 of 25 checks passed
MegaRedHand added a commit to lambdaclass/ethlambda that referenced this pull request May 26, 2026
…370)

Branched from #378

## Summary

- Bump `lean-multisig` / `leansig_wrapper` to devnet5 HEAD (`0242c909`)
and rewrite `ethlambda-crypto` on the new Type-1 / Type-2 API.
- Align the on-wire block proof with [leanSpec PR
#717](leanEthereum/leanSpec#717) —
`SignedBlock.proof` carries the SSZ-encoded `TypeTwoMultiSignature {
proof: ByteList512KiB }` container, which collapses to `[4-byte LE
offset = 4][type2_wire]` on the wire.
- Port leanSpec PR #717's `SyncService._deconstruct_block_into_store` to
the actor: imported blocks are SNARK-split per attestation, merged with
local partial Type-1s, and (on aggregators) re-published on gossip.
- Re-fixture against leanSpec `d9d2e67` (just past PR #717) and migrate
the prod_scheme key JSON shape so signature and forkchoice spec tests
cover the real cryptographic verifier end-to-end.

## Branch commit list

| sha | what |
|--|--|
| `1cd80dd` | Crate-level integration: new Type-1 / Type-2 wrappers,
real merge in `propose_block`, real `verify_type_2` in
`verify_block_signatures`. |
| `2c9dec0` | First pass at PR #717 envelope: `TypeOneInfo {
participants, proof }`, `TypeTwoMultiSignature { info, proof }`, drop
`bytecode_claim`. `split_type_2_signature(index)` →
`split_type_2_by_message(message)`. |
| `5361136` | Strip per-component Type-1 bytes when packing the Type-2
envelope — real Type-1s are ~225 KiB, N+1 copies blow the (old) 1 MiB
`ByteListMiB` cap. |
| `3199e7d` → `70c7cdb` | Experimental `--crypto-merge-t1-into-t2` flag,
**reverted**. The merge runs synchronously on the actor today; moving it
off-thread is a follow-up. |
| `604ea4c` | Plan B: flatten `TypeOneMultiSignature` to `{
participants, proof }`, delete `TypeOneInfo` / `TypeOneInfos` / Rust
`TypeTwoMultiSignature` wrappers, rename `ByteListMiB` →
`ByteList512KiB`. |
| `cc3df59` | Plan A: `reaggregate_from_block` module + actor hook. Caps
`MAX_REAGGREGATIONS_PER_BLOCK = 4`, skips attestations behind the
store's justified checkpoint, runs only when the chain is in sync.
Aggregator-only republish on gossip. |
| `4238a94` | Bump pinned `LEAN_SPEC_COMMIT_HASH` to `d9d2e67`, switch
fixture generation to `--fork Lstar --scheme=test/prod`, port fixture
parsers to the PR #717 schema (`signedBlock.proof.data` blob,
`attestation.proof.proof.data` for gossip aggregates). |
| `961aba4` | Restore the thin SSZ container header in front of the
merged proof: `SignedBlock::wrap_merged_proof` / `merged_proof_bytes`
helpers; lower `MAX_ATTESTATIONS_DATA` from 16 to 8 to match leanSpec PR
#717. |

## Crypto crate API

| function | wraps | notes |
|--|--|--|
| `aggregate_signatures(pks, sigs, msg, slot)` | `aggregate_type_1([],
raw_xmss, …)` | Type-1 from raw XMSS only |
| `aggregate_mixed(children, raw_pks, raw_sigs, msg, slot)` |
`aggregate_type_1(children, raw_xmss, …)` | mixed Type-1 children + raw
XMSS |
| `aggregate_proofs(children, msg, slot)` | `aggregate_type_1(children,
[], …)` | recursive Type-1 merge |
| `verify_aggregated_signature(proof, pks, msg, slot)` | `verify_type_1`
| Type-1 SNARK verify + explicit binding check |
| `merge_type_1s_into_type_2(parts)` | `merge_many_type_1` | bundle N
Type-1s into a Type-2 |
| `verify_type_2_signature(proof_bytes, pks_per_component,
expected_bindings)` | `verify_type_2` | Type-2 SNARK verify +
per-component binding check; takes `&[u8]` after envelope strip |
| `split_type_2_by_message(proof_bytes, pks_per_component, message)` |
`split_type_2` (after locating index by message) | disaggregate to one
Type-1; mirrors leanSpec `split_by_msg` |

Type-1 / Type-2 proof bytes are `compress_without_pubkeys()` form
throughout. `verify_type_2_signature` and `split_type_2_by_message` take
`&[u8]` so callers feed the raw bytes (post-envelope-strip) directly.

## Wire format

```
TypeOneMultiSignature  { participants: AggregationBits, proof: ByteList512KiB }
SignedBlock.proof bytes: [4-byte LE offset = 4][raw lean-multisig Type-2 wire]
```

The 4-byte prefix is the SSZ Container-with-one-varlen-field offset
header — the spec's `TypeTwoMultiSignature { proof: ByteList512KiB }`
container. `SignedBlock::merged_proof_bytes()` /
`SignedBlock::wrap_merged_proof()` keep the magic number off the call
sites. No Rust struct for `TypeTwoMultiSignature` — per-component
participants come from `block.body.attestations[i].aggregation_bits` and
`block.proposer_index`, not the envelope.

`MAX_ATTESTATIONS_DATA = 8` (down from 16, matching leanSpec PR #717).
The merged Type-2 binds `MAX_ATTESTATIONS_DATA + 1 = 9` components,
within lean-multisig's `MAX_RECURSIONS = 16`.

## Reaggregate-from-block

New `crates/blockchain/src/reaggregate.rs`. After `process_block`
succeeds and the chain is in sync, the actor:

1. Selects up to 4 attestations whose target outruns the store's
justified checkpoint AND whose participants extend the local coverage.
2. `split_type_2_by_message`-splits each one out of the block's merged
Type-2 proof.
3. Merges with locally-held partial Type-1s via `aggregate_proofs`.
4. Writes the combined proof into `latest_new_aggregated_payloads`.
Aggregator-role nodes republish on gossip.

5 unit tests cover the candidate-selection rules without paying SNARK
cost (target-slot gate, participant subset gate, hard cap, priority
ordering).

Each `split_type_2_by_message` runs a fresh SNARK; it currently executes
synchronously on the actor thread. Moving to an off-thread worker
mirroring `aggregation::run_aggregation_worker` is a natural follow-up
if profiling shows it bleeding into the slot budget.

## Test coverage

| Suite | Result |
|---|---|
| `signature_spectests` | 13 / 13 |
| `forkchoice_spectests` | 84 / 84 |
| `stf_spectests` | 35 / 35 |
| `ssz_spectests` | 51 / 51 |
| `test_driver_e2e` (Hive) | 8 / 8 |
| `ethlambda-blockchain` unit | 24 / 24 (incl. 5 new
reaggregate-from-block) |
| Workspace total | 419 / 0 |

Fixture regeneration: `LEAN_SPEC_COMMIT_HASH = d9d2e67`, generated with
`make leanSpec/fixtures` (`uv run fill --fork Lstar --scheme=prod -o
fixtures`). The prod_scheme key JSON shape upstream still has the
pre-#725 flat layout (`attestation_public` / `attestation_secret` at top
level) which the post-#725 `keys.py:395` no longer reads; this branch
carries a one-shot local migration to the nested shape
(`attestation_keypair.public_key` etc.) so the prod-scheme fixture
filler runs. A small upstream PR converting the 12 `prod_scheme/*.json`
files would let us drop the local step.

## Devnet validation

Pending — re-run on a multi-node devnet now that `verify_type_2`
actually executes on the import path. Expected to surface latency cliffs
that the `--crypto-merge-t1-into-t2` flag previously hid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants