This is a full implementation of Erlay. Its purpose is to check the integrity and correctness of the implementation against changes/additions that may originate from the review process and/or rebases on top of newer functionality.
This is not to be merged. Functionality will be spread across multiple smaller PRs to ease the review process.
Approach
The implementation approach builds on the following assumptions:
Fanout (the current relay method) is faster than Erlay, but less bandwidth efficient
Fanout is optimal if the node we want to announce a certain transaction doesn’t know about it (but of course, we don’t have that information)
The general approach works as follows:
Reconciliation is used alongside fanout to relay transactions across the network. For Erlay nodes, the relay method will be decided per-transaction, instead of per connection, meaning that Erlay connections will do both fanout and reconciliation depending on the transaction (legacy connections will do only fanout, obviously).
The parameters selected for fanout are minimized to maximize the bandwidth saving. The current selected defaults are 4 outbound peers and 10% of inbounds. The relay logic depends on the type of connection and how the transaction has been received:
For outbound connections, if the transaction was received via fanout (or originates in us), we fanout until up to N peers know about it. We do this by checking the m_tx_inventory_known_filter, so their announcements also count. If the transaction was received via reconciliation, we simply reconcile with the rest of our peers.
For inbounds, we select 10% of our connections and rotate that selection periodically.
The reasoning for this is trying to guess how far the transaction has made it into the network with imperfect information. Knowing that fanout is faster than reconciliation, we want to have a higher fanout rate at the very beginning of the propagation, to get as far as we can, being fully efficient. This can be tied to how many of our peers know about the transaction already. Once the transaction is sufficiently spread, we can just reconcile it with the rest of our peers.
This does not apply to inbounds, as they are not trusted, and the metric will be easily abused, plus it may be used to leak transaction origin information. For them, we just keep a low fanout rate.
Testing and simulating
The last two commits of this PR are currently for simulation only. They allow to easily config the inbound/outbound fanout rate without having to recompile the code, and make full reconciliation more efficient.
DrahtBot
commented at 5:18 pm on June 12, 2024:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
See the guideline for information on the review process.
A summary of reviews will appear here.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#32189 (refactor: Txid type safety (parent PR) by marcofleon)
#30116 (p2p: Fill reconciliation sets (Erlay) attempt 2 by sr-gi)
#29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
#28463 (p2p: Increase inbound capacity for block-relay only connections by mzumsande)
#27826 (validation: log which peer sent us a header by Sjors)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
LLM Linter (✨ experimental)
Possible typos and grammar issues:
“annoincing” -> “announcing” [typo in comment slows comprehension]
“for out own transactions” -> “for our own transactions” [typo in comment]
“based simulation results” -> “based on simulation results” [missing “on”]
“No transaction were created” -> “No transactions were created” [plural agreement]
“REQRXRCNCL” -> “REQTXRCNCL” [message name typo in test log]
drahtbot_id_4_m
sr-gi marked this as a draft
on Jun 12, 2024
sr-gi force-pushed
on Jun 12, 2024
DrahtBot added the label
CI failed
on Jun 13, 2024
DrahtBot
commented at 4:17 am on June 13, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sr-gi force-pushed
on Apr 14, 2025
sr-gi force-pushed
on Apr 15, 2025
sr-gi force-pushed
on Apr 15, 2025
sr-gi force-pushed
on Apr 15, 2025
sr-gi force-pushed
on Apr 16, 2025
sr-gi force-pushed
on Apr 16, 2025
DrahtBot removed the label
CI failed
on Apr 16, 2025
sr-gi force-pushed
on Apr 18, 2025
sr-gi force-pushed
on Apr 18, 2025
sr-gi force-pushed
on Apr 18, 2025
DrahtBot added the label
CI failed
on Apr 18, 2025
DrahtBot
commented at 8:13 pm on April 18, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sr-gi force-pushed
on Apr 18, 2025
sr-gi force-pushed
on Apr 21, 2025
sr-gi force-pushed
on Apr 21, 2025
sr-gi force-pushed
on Apr 21, 2025
DrahtBot removed the label
CI failed
on Apr 21, 2025
sr-gi force-pushed
on Apr 22, 2025
DrahtBot added the label
CI failed
on Apr 22, 2025
DrahtBot removed the label
CI failed
on Apr 24, 2025
DrahtBot
commented at 11:09 pm on April 29, 2025:
contributor
🐙 This pull request conflicts with the target branch and needs rebase.
DrahtBot added the label
Needs rebase
on Apr 29, 2025
sr-gi force-pushed
on Jun 18, 2025
dergoegge
commented at 3:43 pm on June 20, 2025:
member
I was fuzzing this branch with fuzzamoto and it looks like it actually found a remotely reachable assertion.
I’m surprised our existing fuzz tests do not catch this (process_messages might but I haven’t tried), but it looks like we actually never call Minisketch::Deserialize with bytes straight from the fuzzer but rather only with result from Minisketch::Serialize (maybe to avoid the assertion? not sure):
Bsae64 of the bytes passed to Deserialize: 0NDf7u7u0NDQ7u7u7u7u7u7Q7u7u7u7qKSm40NDQ0NDQ
sr-gi force-pushed
on Jun 20, 2025
refactor: remove legacy comments
These comments became irrelevant in one of the previous code changes.
They simply don't make sense anymore.
78b13474df
p2p: Functions to add/remove wtxids to tx reconciliation sets
They will be used later on.
2a57e290b2
p2p: Make short id collision detectable when adding wtxids to tx reconciliation setsd83a72a382
p2p: Add PeerManager method to count the amount of inbound/outbounds fanout peerscabb66c12b
p2p: Cache inbound reconciling peers count
It helps to avoid recomputing every time we consider
a transaction for fanout/reconciliation.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
8eac55df27
p2p: Add method to decided whether to fanout or reconcile a transactions
Fanout or reconciliation is decided on a transaction basis, based on the following criteria:
If the peer is inbound, we fanout to a pre-defined subset of peers (which is rotated periodically).
If the peer is outbound, we will reconcile the transaction if we received it via reconciliation, or
defer the decision to relay time otherwise. At relay time, we will fanout to outbounds until a threshold is met
(selecting peers in the order their timers go off) and reconcile with the rest.
With this approach we try to fanout when we estimate to be early in the propagation of the transaction,
and reconcile otherwise. Notice these heuristics don't apply to inbound peers, since they would be easily
exploitable. For inbounds we just aim for a target subset picked at random.
6e94c65f6a
p2p: Add transactions to reconciliation sets
Transactions eligible for reconciliation are added to the reconciliation sets. For the remaining txs, low-fanout is used.
Co-authored-by: Gleb Naumenko <naumenko.gs@gmail.com>
749b586641
p2p: Add consider_fanout to RelayTransaction
This can be squashed into the previous commit, it is split for now to ease review
When scheduling the relay of a transaction (RelayTransaction) we should consider
whether it is worth fanning it out, or only reconciling it. This depends, partially,
on how the transaction was received.
For non-Erlay peers, we always consider_fanout (in fact, we only fanout).
For Erlay peers, if the peer is inbounds, we always consider fanout, and deffer the
decision of to whom to relay time. If the peer is outbound, we consider fanout if the
transaction was received via fanout, and only reconcile if it was received via reconciliation
(who to fanout to is also deferred to relay time).
Until the Erlay P2P flow is merged, consider_fanout is always true.
c3cce43d0c
p2p: Add helper to compute reconciliation tx short ids and a cache of short ids to wtxids475f845e21
p2p: Deal with shortid collisions for reconciliation sets
If a transaction to be added to a peer's recon set has a shot id collisions (a previously
added wtxid maps to the same short id), both transaction should be fanout, given
our peer may have added the opposite transaction to our recon set, and these two
transaction won't be reconciled.
422085b919
p2p: Add peers to reconciliation queue on negotiation
When we're finalizing negotiation, we should add the peers
for which we will initiate reconciliations to the queue.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
3f38c76b9f
p2p: Track reconciliation requests schedule
We initiate reconciliation by looking at the queue periodically
with equal intervals between peers to achieve efficiency.
This will be later used to see whether it's time to initiate.
40f3bfb7a4
p2p: Initiate reconciliation round
When the time comes for the peer, we send a
reconciliation request with the parameters which
will help the peer to construct a (hopefully) sufficient
reconciliation sketch for us. We will then use that
sketch to find missing transactions.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
Store the parameters the peer sent us inside the
reconciliation request.
15cd93cd76
p2p: Add helper to compute sketches for tx reconciliation09c8da4619
p2p: Respond to a reconciliation request
When the time comes, we should send a sketch of our
local reconciliation set to the reconciliation initiator.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
b94b43739a
p2p: Add a function to identify local/remote missing txs
When the sketches from both sides are combined successfully,
the diff is produced. Then this diff can (together with the local txs)
be used to identified which transactions are missing locally and remotely.
c813c9aa9b
Use txid/uint256 in CompareInvMempoolOrder
This will help to reuse the code later on in the function
to announce transactions.
3bf5ec5454
p2p: Handle reconciliation sketch and successful decoding
When computing sketches, the capacity was derived from the size of the received data, but it was never checked that the received data size was a multiple of the sketch element size {BYTES_PER_SKETCH_CAPACITY}. Therefore, a sketch could be created such that the capacity was smaller than the data to be decoded into it, making it crash.
Happy to run the fuzzer over the new code in case I’ve missed anything.
sr-gi force-pushed
on Jun 20, 2025
dergoegge
commented at 4:45 pm on June 23, 2025:
member
Have been running the fuzzer all day and the bug appears to be fixed (and no other bugs so far).
SQUASH-ME: Flags tx as received via recon if it was requested via recondiff
TODO: We may be OK defining a smaller m_recently_requested_short_ids, since
its contents only really matters for less than a minute
58d5ffa897
p2p: Request extension if decoding failed
If after decoding a reconciliation sketch it turned out
to be insufficient to find set difference, request extension.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
6d518abe11
p2p: Be ready to receive sketch extension
Store the initial sketches so that we are able to process
extension sketch while avoiding transmitting the same data.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
09e60e684c
p2p: Prepare for sketch extension request
To be ready to respond to a sketch extension request
from our peer, we should store a snapshot of our state
and capacity of the initial sketch, so that we compute
extension of the same size and over the exact same
transactions.
Transactions arriving during this reconciliation will
be instead stored in the regular set.
Co-authored-by: Gleb Naumenko <naumenko.gs@gmail.com>
5a3d23576b
p2p: Keep track of announcements during txrcncl extension6954ad5e02
p2p: Handle reconciliation extension request
If peer failed to reconcile based on our initial response sketch,
they will ask us for a sketch extension. Store this request to respond later.
98e7d34573
p2p: Respond to sketch extension request
Sending an extension may allow the peer to reconcile
transactions, because now the full sketch has twice
as much capacity.
443dec95e7
p2p: Handle sketch extension
If a peer sent us an extension sketch, we should
reconstruct a full sketch from it with the snapshot
we stored initially, and attempt to decode the difference.
Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
ca7d5972be
p2p: Add a finalize incoming reconciliation function
This currently unused function is supposed to be used once
a reconciliation round is done. It cleans the state corresponding
to the passed reconciliation.
f3394738fa
p2p: Handle reconciliation finalization message
Once a peer tells us reconciliation is done, we should behave as follows:
- if it was successful, just respond them with the transactions they asked
by short ID.
- if it was a full failure, respond with all local transactions from the reconciliation
set snapshot
- if it was a partial failure (only low or high part was failed after a bisection),
respond with all transactions which were asked for by short id,
and announce local txs which belong to the failed chunk.
ced49517f1
p2p, test: Add tx reconciliation functional tests
We may still need to add more tests, specially around extensions (if we keep them)
Co-authored-by: Gleb Naumenko <naumenko.gs@gmail.com>
64010cdf35
REMOVE-ME, SIMS-ONLY: Adds options to configure in/out fanout rates3f0ab2c22a
REMOVE-ME, SIMS-ONLY: shortcut full-recon
If the fanout rates are set to zero, simply shortcut ShouldFanoutTo
b257252a46
sr-gi force-pushed
on Jun 23, 2025
maflcko
commented at 3:34 pm on June 26, 2025:
member
I’m surprised our existing fuzz tests do not catch this (process_messages might but I haven’t tried), but it looks like we actually never call Minisketch::Deserialize with bytes straight from the fuzzer but rather only with result from Minisketch::Serialize (maybe to avoid the assertion? not sure):
I tried this by starting 8 fuzz processes 5 days ago. 7 still run and one of them crashed after two days. I minimized the input:
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-06-30 12:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me