meta: Isolated fuzzing of net processing

dergoegge commented at 4:06 pm on April 20, 2023: member

Efficient isolated fuzzing of our message processing code (net processing) would be very valuable. However, to make that deterministic, fast and fuzzer friendly it appears that extensive refactoring is required. There are three main blockers: module separation (net/net processing/validation split), determinism, performance.

I am gonna use this issue to track and motivate the work that needs to be done. It would be great if we can achieve the same outcome with less refactoring but I don’t see how at the moment. Open to suggestions and feedback.

Module separation

Once we have a clean separation between our net, net processing and validation modules, we can fuzz/test them in isolation to maximize the bug yield (🐛, 🪲, 🪳). By design, most fuzzing engines are not great at finding bugs when the scope of the targets is too large (e.g. fuzzing net, net processing and validation all at once).

net / net processing split (https://github.com/bitcoin/bitcoin/pull/28252)
- Move application layer data from net to net processing (#19398)
  - #26621
- Define a clean interface between net processing <-> net
  - #25268
    - #25500
    - #25572
  - #27257
net processing / validation split (lots of overlap with kernel)
- Reduce cs_main usage in net processing
  - #26151
  - #26140
  - Kill CNodeState
- Define a clean interface between net processing <-> validation

Determinism

Non-deterministic fuzz targets are less efficient at finding bugs and debugging non-reproducible bugs is annoying/costly.

All required components should be newly setup for each target execution
- Can be done currently using our testing setup suite but that’s waaaay too slow for fuzzing.
Avoid GetRand() (mock it?)
Avoid globals (are there any globals left in net processing?)

Performance

Fuzzing is a search, the faster the search - the better.

LevelDB (is in memory but still slow?)
- Probably solved by mocking out validation?
PeerManager construction is expensive because it requires multi megabyte allocations each time (mostly due to large bloom filters)
Disk usage
- Block storage
  - #27125
  - Create a mock of BlockManager for in memory block storage?
- BanMan (banlist is written to disk)

Misc

Decouple net processing from ArgsMan (args setup is slow)
- Allows to fuzz the peerman options without depending on the global args man
- #27499

Please let me know if you think there are PRs that aren’t listed here but should be.

MarcoFalke commented at 4:27 pm on April 20, 2023: member

I don’t think that #https://github.com/bitcoin/bitcoin/pull/27499 is going to increase fuzz performance. Parsing should currently only be done at the beginning once, before any fuzzing happens. Even if parsing were in the hot loop, it probably wouldn’t be noticeable. And if you want to construct a PeerManager for each fuzzing iteration, it seems too slow either way, unless you find a way to skip the memory allocations (as you said yourself).

MarcoFalke added the label Brainstorming on Apr 20, 2023

MarcoFalke added the label Tests on Apr 20, 2023

dergoegge commented at 4:44 pm on April 20, 2023: member

I don’t think that #https://github.com/bitcoin/bitcoin/pull/27499 is going to increase fuzz performance.

I think the separation from gArgs makes sense either way, but yea not really a performance improvement. I added the args stuff because I was investigating performance for a target that creates a new TestingSetup each iteration. The argsman setup with all arguments actually was one of the slower things (among block tree db, chainman setup and blockfile creation).

And if you want to construct a PeerManager for each fuzzing iteration, it seems too slow either way, unless you find a way to skip the memory allocations (as you said yourself).

Afaict the allocations are mostly due to one or two large bloom filters. I think we could lazily initialize the filters on demand, so executions that don’t reach code that use the filters would benefit. Or alternatively, we could allow to set the size of these filters when creating a peerman (i think that would be fine for tests).

ajtowns commented at 1:56 am on August 14, 2023: contributor

What is this going to fuzz exactly? The possible behaviour of an individual p2p message in isolation? The p2p protocol state machine for a single peer? Is this intended to be abstracted away from validation/mempool behaviours (ie, stubbing them out)?

dergoegge commented at 12:50 pm on August 14, 2023: member

What is this going to fuzz exactly? Is this intended to be abstracted away from validation/mempool behaviours (ie, stubbing them out)?

Mostly all code in net_processing.cpp (and relevant modules like orphanage, tx tracker) but excluding net, validation or mempool logic. Stubbing the latter out gives us control over testing net processing behavior related to these adjacent modules by being able to mock their behavior and state.

I’m not saying that we shouldn’t fuzz the other modules, just that it makes more sense to do in isolation for the type of fuzzing we currently utilize. We can also fuzz the integration of them all but snapshot fuzzing is probably much better for that due to quite heavy initial state requirements (I have been working on this as well using https://nyx-fuzz.com/).

The possible behaviour of an individual p2p message in isolation? The p2p protocol state machine for a single peer?

Yes and yes, each would be its own target. Another example would be the processing of a sequence of messages e.g. a target per protocol flow we are interested in testing: version handshake, compact block relay, tx relay, etc. All of this can happen for a single peer or multiple.

ajtowns commented at 1:52 am on August 17, 2023: contributor

Mostly all code in net_processing.cpp (and relevant modules like orphanage, tx tracker) but excluding net, validation or mempool logic. Stubbing the latter out gives us control over testing net processing behavior related to these adjacent modules by being able to mock their behavior and state.

I guess the thing I’m wondering is if/how you get coverage on behaviours like “tx is missing witness data”, “we got a tx via txid so can cancel out requests via wtxid”, “compact block is invalid”, “header chain does/doesn’t have sufficient work”, “parent is (not) in mempool”, etc?

I wonder if it would make sense to have a … PeerManChainPoolInterface that (when not stubbed) has the mempool/chainman references instead of PeerManager having them, and (as a result) all of PeerManagerImpls calls to mempool/chainman go through it? After #27675 there’s only a single trivial LOCK(m_mempool.cs) left in net_proc, so I think it would just be a bunch of search-and-replaces to abstract those out? That seems like it would be handy for stubbing, but also make it a lot clearer what parts of validation/mempool are actually needed by net_processing, which might help with reducing cs_main or otherwise cleaning up validation/mempool?

dergoegge commented at 1:54 pm on August 17, 2023: member

I wonder if it would make sense to have a … PeerManChainPoolInterface

Yea that is what I had in mind as well and what I was alluding to by “net processing / validation split” in the PR description. Alternatively we could work on making the actual mempool/chainman interfaces mockable/more abstract but that might be more work and would also interfere with kernel.

For the PeerManChainPoolInterface approach, I would probably go with a separate interface for mempool and chainman but that’s just a minor detail. I think abstracting out chainman would be most of the work but also the biggest gain for fuzzing since that would avoid chainman setup/teardown.

I can code up a parent PR/branch (similar to #28252) for this (I already had some of this done while working on a block download module).

ajtowns commented at 2:04 pm on August 17, 2023: contributor

I think even just having it as documentation of what parts of mempool/validation are used by net would be interesting – I don’t think the boundary there is entirely clean; eg the CompareInvMempoolOrder stuff could probably be moved to txmempool.

meta: Isolated fuzzing of net processing #27502

Module separation

Determinism

Performance

Misc