Separate Contextual checks and handling & switch on enum in net_processing.cpp #10145

pull JeremyRubin wants to merge 1 commits into bitcoin:master from JeremyRubin:netprocessing_enum_rebased changing 1 files +261 −102
  1. JeremyRubin commented at 10:13 pm on April 3, 2017: contributor

    edit 0: Updated to reflect updates mentioned in #10145 (comment)

    This PR separates ProcessMessage into two functions, ContextualProcessMessage and ~ProcessMessage~ Process*Message. This helps with readability, verifiability, and maintainability of the code.

    ContextualProcessMessage generates a list of context dependent “whitelists”, all of which must pass for the incoming message before a call to ProcessMessage may be made. If the whitelists fail, the code that follows should be identical to the previous behavior. The choice of whitelists over blacklists is because it is better to explicitly enable the behaviors desired, rather than to try to block the potential bad features (e.g., adding something unsafe and new won’t be permitted in unstudied contexts). This design should be extensible for adding new features (~up to 64 netmsgs total~ unlimited network messages) as well as new contexts (easy to add new whitelists). There should be very little overhead to check these whitelists as it is all ~bitwise~ bool array lookups.

    ~ProcessMessage now uses an enum to switch to~ Dispatch is now done using a std::map lookup to get the appropriate handler, and is semi “stateless” (the map is const). This makes it easier to verify the code and make dispatch more modular.

    I haven’t benchmarked that the conversion from string->~enum~std::pair<handler_t, whitelist_index> has any performance implication, negative or positive. In theory this code could be faster given fewer branch mispredictions due to the ~switch~function pointer call. Another PR could improve the lookup algorithm ~(trivially, inlining getAllNetMessageTypes might help the compiler a lot)~, but unless it is exotic it should be compatible with this design by replacing the map with the desired scheme. ~I didn’t think there was something obviously faster than the linear lookup, because n is small.~ A std::map lookup should be fairly fast, but perhaps a custom map could be faster.

    ~The correctness of this code is dependent on NetMsgTypeEnum::tag and allNetMessageTypes having the same index order. It would be nice to verify this property at compile time, which should be possible with some recursive constexpr static_assert magic. The default return of ProcessNewMessage is now also false, because the last return is unreachable.~

    See #9608 and https://github.com/theuni/bitcoin/commit/f1e4e281e3f1eb884f8010ac941c82752174bdbe for related work/alternatives.

  2. gmaxwell commented at 10:23 pm on April 3, 2017: contributor
    I don’t get the bitfield stuff here. Why? It adds a lot of code with indirect effects, and means that we cannot use a perfect hash to set the enum value (e.g. stuck with a map of strings at best, though this code doesn’t do that).
  3. theuni commented at 10:39 pm on April 3, 2017: member

    @gmaxwell I agree. I asked @JeremyRubin to check my logic in https://github.com/theuni/bitcoin/commit/f1e4e281e3f1eb884f8010ac941c82752174bdbe, and after reviewing, he wanted to take a stab at a more direct mapping for the initial filter.

    This is an interesting approach, but I think this is much less clear than f1e4e281e3f1eb884f8010ac941c82752174bdbe, and it tangles the rules up with the enum values.

  4. JeremyRubin commented at 10:41 pm on April 3, 2017: contributor

    It’s actually trivial to make this work very well with a perfect hash and this implementation doesn’t add any network dependency on the order. If you have a perfect hash available I could demonstrate :)

    The bitwise stuff is the set of current policies. It’s much more explicit about which operations are allowed in which contexts compared to a list of conditionals. There might be a better way to aggregate those rules together, but those are the rules. Blacklists would be shorter (I used one for the bloom stuff, now that I think of it), but it’s generally easier to audit what is permitted rather than what is not.

  5. gmaxwell commented at 11:16 pm on April 3, 2017: contributor

    If you have a perfect hash available

    gperf works in a pinch

    cat protocol.cpp | grep ‘^const char *’ | cut -d’"’ -f2 | sort | gperf -lCcE

    . It’s much more explicit about which operations are allowed in which contexts compared to a list of conditionals.

    /Generally/ moving function preconditions far away from their code results in defects. When its something that applies basically universally (like the check for the version handshake finishing), then it can make sense… but I think having if (importing) return; at the top of a message handling function is a lot more maintainable than what is effectively an additional state machine.

  6. fanquake added the label P2P on Apr 3, 2017
  7. JeremyRubin commented at 2:12 am on April 4, 2017: contributor

    cat protocol.cpp | grep ‘^const char *’ | cut -d’"’ -f2 | sort | gperf -lCcE

    As promised, perfect hashing patch in https://github.com/JeremyRubin/bitcoin/tree/perfect_hashing (a little bit less clean than it could be, because I didn’t want to modify protocol.h, but you get the idea).

    It’s like 8 lines of code. At initialize you just fill up a translation table to map the perfect hashes. Note that I slightly tweaked the API of the gperf hash to return the hash key and max+1 on fail to make this design easier.

    What’s nice about the translation table is it could be modified to store the function pointers to the handlers directly as well, skipping the jump table/switch.

    /Generally/ moving function preconditions far away from their code results in defects. When its something that applies basically universally (like the check for the version handshake finishing), then it can make sense… but I think having if (importing) return; at the top of a message handling function is a lot more maintainable than what is effectively an additional state machine.

    I’m mixed on this one. I agree that having preconditions closer to the code can be good; but it’s also good to have a non-exposed function which only takes sanitized inputs, and do the sanitizing elsewhere. I agree with the state machine comment, but that was the existing state of the code: there is currently an implicit state machine on what order messages were allowed to come in. This PR makes it more explicit; if you want to make it even more explicit (e.g., ProtocolStateMachine class I think that would be great :)). This at least makes it really easy to do whatever you want with the actual dispatch as it is state independent.

    I don’t think it’s more maintainable to have repeated preconditions throughout the code, because then it is easier to forget to check a precondition in a handler and you repeat precondition checking code, leading to more opportunity for error. It’s a trade off.

    It is possible to re-check these preconditions if critical, which would maybe be “the best/worst” of both worlds.

  8. laanwj commented at 6:27 am on April 4, 2017: member

    A serious drawback of perfect hashing is that it makes it hard to add message types, especially in PRs or third-party patches. It reduces flexibility.

    Unless it can be clearly shown from performance results that matching a small string in e.g. a sorted table or run-time constructed hash is really a performance sink, and given the small number of small messages I would be really surprised (12 bytes isn’t even two 64-bit words!), I’d prefer if adding a message type was just adding a line.

    ProtocolStateMachine

    If you go this way, I don’t think there should be one protocol state machine. Different concerns (e.g. initial negotiation, handling pings, handling transactions, handling blocks, handling filters) could be separate state machines that handle groups of messages (this was @sipa’s idea). Creating a separate handler class for every message type would be typical OOP overkill, but for separate concerns I think it’d make sense and would help untangle the current labyrinth.

  9. sipa commented at 7:07 am on April 4, 2017: member
    I think using a perfect hash is unnecessary overkill here.
  10. JeremyRubin force-pushed on Apr 4, 2017
  11. JeremyRubin force-pushed on Apr 4, 2017
  12. JeremyRubin commented at 9:09 pm on April 4, 2017: contributor

    Ultimately if we want to check preconditions, it can’t hurt to check some general preconditions in ContextualProcessMessage which apply to either the ProcessMessage state machine (e.g., that only one version may be received) or affect more than one handler. Individual handlers that rely on global preconditions should also re-check these global preconditions to harden them against a failure in ContextualProcessMessage.

    I pushed up a better version.

    More Modular:

    • there is no external dependency on the value of message types in the enum
    • Each handler is a separate function (this duplicates/replaces some of @jtimon’s work)

    Cleaner:

    • The whitelists are now a std::array<bool, …> rather than (previously) a uint64_t, so no masking just indexing
    • Identifiers are renamed to be a bit more clear
    • whitelists are now separated into global and connection specific
    • diff is easier to read

    More Extensible:

    • The whitelists are now a std::array<bool, …> rather than (previously) a uint64_t, so no refactoring needed to have more than 64 messages. Adding new whitelists is also easier/cleaner.
    • The use of the function pointers/whitelist arrays makes future work easier, including having per-node handler tables/whitelists (e.g., if a client says it is SPV disable certain operations).

    old version exists at https://github.com/jeremyrubin/bitcoin/tree/netprocessing_enum_backup for reference.

  13. jtimon commented at 4:44 pm on April 5, 2017: contributor

    I only looked fast at the changes and read the rest of the comments, but this does way more than #9608 , even if #9608 would rebase on top of https://github.com/theuni/bitcoin/commit/f1e4e281e3f1eb884f8010ac941c82752174bdbe (which at a glance look like something good to do before #9608, but I haven’t reviewed it deeply because I’m not that familiar with the network code).

    #9608 is very easy to review and trivial to rebase (or re-write) and personally I think it makes the network code easier to read for people like me (perhaps combined with https://github.com/theuni/bitcoin/commit/f1e4e281e3f1eb884f8010ac941c82752174bdbe even more, I haven’t tried combining them). In contrast, it seems like this PR would make harder for me to read at a glance (or maybe not, or maybe it is still worth it, I honestly don’t know). Although I’ve written #9608 (which as said is trivial to write IMO, just a little bit painful [could have been lesss painful if I still used eclipse http://help.eclipse.org/neon/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2FgettingStarted%2Fqs-ExtractMethod.htm ]), I don’t feel very capacitated to review this PR or even https://github.com/theuni/bitcoin/commit/f1e4e281e3f1eb884f8010ac941c82752174bdbe which looks much simpler.

  14. Separate Contextual checks and handling in net_processing.cpp. 1b389b0e7e
  15. JeremyRubin force-pushed on Sep 5, 2017
  16. JeremyRubin commented at 11:34 pm on September 5, 2017: contributor
    rebased to 1b389b0
  17. in src/net_processing.cpp:1175 in 1b389b0e7e
    1193-        } else {
    1194-            pfrom->fDisconnect = true;
    1195-            return false;
    1196-        }
    1197-    }
    1198+static bool ContextualProcessMessage(CNode* pfrom, const std::string&
    


    ryanofsky commented at 5:24 pm on September 6, 2017:
    Maybe leave this as ProcessMessage instead of renaming to ContextualProcessMessage. “Contextual” doesn’t seem that elucidating, and the PR could be a little smaller without this change.
  18. in src/net_processing.cpp:1179 in 1b389b0e7e
    1197-    }
    1198+static bool ContextualProcessMessage(CNode* pfrom, const std::string&
    1199+        strCommand, CDataStream& vRecv, int64_t nTimeReceived, const
    1200+        CChainParams& chainparams, CConnman& connman, const std::atomic<bool>&
    1201+        interruptMsgProc);
    1202+namespace ProcessMessageHandler {
    


    ryanofsky commented at 5:28 pm on September 6, 2017:
    Style guide probably changed since this PR was written, but currently says to use a snake_case name for a namespace, put the brace on the next line, and not indent the contents. Could maybe name this “handlers” instead of “handler” too since it is a collection of handlers.
  19. in src/net_processing.cpp:1206 in 1b389b0e7e
    1199@@ -1218,19 +1200,12 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
    1200                 LogPrint(BCLog::NET, "Unparseable reject message received\n");
    1201             }
    1202         }
    1203+        return true;
    1204     }
    1205 
    1206-    else if (strCommand == NetMsgType::VERSION)
    1207+static bool ProcessVersionMessage(CNode* pfrom, CDataStream& vRecv, int64_t nTimeReceived, const CChainParams& chainparams, CConnman& connman, const std::atomic<bool>& interruptMsgProc)
    


    ryanofsky commented at 5:47 pm on September 6, 2017:

    You might be able to cut down on the size of the PR by making these handlers classes instead of functions. E.g. you wouldn’t need to repeat these pfrom, vRecv, nTimeReceived declarations because they could be members initialized by an inherited constructor. strCommand variables could be static members, so PushMessage calls wouldn’t need to change. And you could replace handler_registry table with initialization from a flat list of handler classes.

    I could imagine other reviewers liking functions more than classes though, and the differences are pretty superficial, so should just consider this a tentative suggestion.

  20. in src/net_processing.cpp:2627 in 1b389b0e7e
    2626+        GETBLOCKTXN,
    2627+        BLOCKTXN,
    2628+        // Unknown Index
    2629+        MSG_TYPE_UNKNOWN,
    2630+        // Size
    2631+        INDEX_COUNT,
    


    ryanofsky commented at 5:57 pm on September 6, 2017:
    Maybe add a comment here that this must be last.
  21. in src/net_processing.cpp:2673 in 1b389b0e7e
    2672 
    2673 
    2674 
    2675+    //! Message Processing Whitelists (default init 0)
    2676+    //! Not const for initialization
    2677+    static std::array<bool, INDEX_COUNT> KNOWN_MESSAGES {};
    


    ryanofsky commented at 6:05 pm on September 6, 2017:
    No good reason not to make these constant, I think. You can assign them values at initialization by returning from a function or lambda.

    ryanofsky commented at 6:39 pm on September 6, 2017:

    Would be good to have comments describing what each whitelist does, and maybe the types of messages make sense to have in the whitelist. For example, it’s not immediately obvious which messages should go in KNOWN_MESSAGES (all the message types, or only the messages types with handlers).

    Alternately could add an overall comment here pointing to the ContextualProcessMessage function for specifics on what the whitelists do.

  22. in src/net_processing.cpp:2681 in 1b389b0e7e
    2680+    static std::array<bool, INDEX_COUNT> BEFORE_VERSION {};
    2681+    static std::array<bool, INDEX_COUNT> AFTER_VERACK {};
    2682+    static std::array<bool, INDEX_COUNT> NO_REQUIRE_BLOOM {};
    2683+
    2684+    bool init_whitelists() {
    2685+        for (auto x : { VERSION, VERACK, ADDR, INV, GETDATA, MERKLEBLOCK,
    


    ryanofsky commented at 6:36 pm on September 6, 2017:
    Could you initialize this from the registry table instead of listing all the messages again?
  23. ryanofsky commented at 7:17 pm on September 6, 2017: member

    Light utACK 1b389b0e7e57534a2a4b18a2fead70c24dd69ced. Light because I didn’t spend as much time as I would like verifying changes in this PR don’t change current behavior. Will spend more time when there are concept acks for the overall approach.

    I think this is a nice improvement. The logic and explanations in ContextProcessMessage make it easier to understand why various checks are being done. The centralized whitelists should make it easier to add new message types with correct checking in the future because you can see at a glance how all the existing message types behave. The separate message handling functions make code more navigable.

  24. theuni commented at 10:25 pm on September 22, 2017: member

    I’ve thought about this a good bit lately, sorry for the delayed response.

    After rewriting this about a dozen different ways now, I’m now convinced that the approach in #9608 is the way to go.

    A dispatcher with registered functions and a state checker (like you’ve done here, and like I attempted as well) seems like the obviously correct approach, but once implemented, it’s far less straightforward than the simplistic #9608.

    We can avoid some duplication and keep some of the whitelist functionality here with a few helper functions like CanProcessWhileImporting(command).

    A dumb if/then/else parser seems icky, but I think it’s the least likely to cause us issues in the future :(

  25. JeremyRubin commented at 0:16 am on September 23, 2017: contributor

    A dumb if/then/else parser seems icky, but I think it’s the least likely to cause us issues in the future :(

    Can you clarify the kinds of future issues you’re worried about? I think something like #9608 is more likely to cause issues. I think verifying correctness of changes to the processing logic is more complicated under #9608. I’d love to understand what other future considerations you’re making though.

    A dispatcher with registered functions and a state checker (like you’ve done here, and like I attempted as well) seems like the obviously correct approach, but once implemented, it’s far less straightforward than the simplistic #9608.

    I think that there may be a similarity bias. #9608 is more similar to the existing code, which makes it more straightforward. But my guess would be that #10145 is much easier to understand for anyone new to the project/seeing the code for the first time.

    Personally, I think this design has the following benefits over the ‘simpler’ design:

    • Easier to ‘prove correct’ design/write exhaustive tests for
    • Extensible to per connection handler tables
    • Extensible to middlewares (e.g., a statistics middleware or maybe an encryption one)
    • Easier to add new message types & less chance of hitting an unforseen edge case
    • Lower latency dispatch
  26. theuni commented at 8:26 pm on September 25, 2017: member

    Can you clarify the kinds of future issues you’re worried about? I think something like #9608 is more likely to cause issues. I think verifying correctness of changes to the processing logic is more complicated under #9608. I’d love to understand what other future considerations you’re making though.

    Because of all of the pesky interactions with our own state, we can only go so far in laying out the rules up-front.

    For example, someone could be forgiven for assuming (based on the whitelist here) that INVs are processed while importing/reindexing, though in reality, there’s only a tiny wallet interaction.

    Similarly, IsInitialBlockDownload() isn’t handled by the dispatcher, leaving it up to individual messages to decide. And that’s as it should be, because each message will have their own criteria for that.

    So if the rules can’t be completely enumerated, we’re only spreading them out and adding more places to check (or miss). The current behavior (and that of #9608) is to only check for proper handshake behavior outside of individual message parsing. That basically boils down to:

    0if (!node->nVersion && strCommand != NetMsgType::VERSION) return;
    1else if (!node->fSuccessfullyConnected && strCommand != NetMsgType::VERACK) return;
    2Dispatch(strCommand);
    

    I might be convinced that a slimmed-down version of this that only performed ^^ checks before dispatching would be reasonable.

  27. MarcoFalke added the label Needs rebase on Jun 6, 2018
  28. laanwj referenced this in commit 021dce935a on Aug 25, 2018
  29. MarcoFalke commented at 4:31 pm on August 25, 2018: member
    Closing for now. Let me know when you want to continue working on this
  30. MarcoFalke closed this on Aug 25, 2018

  31. JeremyRubin commented at 6:43 pm on September 16, 2018: contributor

    @MarcoFalke my read was that there was not consensus that this approach was welcome.

    If it’s worth refreshing, I’d be happy to.

  32. laanwj removed the label Needs rebase on Oct 24, 2019
  33. PastaPastaPasta referenced this in commit c261173669 on Jan 25, 2020
  34. UdjinM6 referenced this in commit 31066138c0 on Jan 31, 2020
  35. UdjinM6 referenced this in commit e5e3572e9d on Feb 3, 2020
  36. FornaxA referenced this in commit fa74145cee on Jul 6, 2020
  37. ckti referenced this in commit 77002d0909 on Mar 28, 2021
  38. MarcoFalke locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me