net: fix remaining net assertions #9609

pull theuni wants to merge 5 commits into bitcoin:master from theuni:net-version changing 4 files +102 −111
  1. theuni commented at 6:54 am on January 21, 2017: member

    First change is from @TheBlueMatt.

    https://github.com/bitcoin/bitcoin/pull/8708/commits/462965635e8bf9c2754c27946f8b05560b8ddd84 added assertions to ensure that we never send out any messages before version negotiation has completed, but a few issues remain.

    In cases where the version message failed to deserialize fully, we can end up with only some of the vars set, leading to sending messages that shouldn’t be sent.

    Additionally, the ForEach* functions allow the caller to send messages to all nodes, including those that aren’t yet fully connected. To prevent that, filter out nodes that definitely shouldn’t be sending/processing messages.

    As a follow-up, I’d like to remove the assert, as I think it’s done its job, but I’ll save that discussion for another PR because I think @TheBlueMatt mentioned that he’d prefer to keep it.

  2. fanquake added the label P2P on Jan 21, 2017
  3. MarcoFalke added this to the milestone 0.14.0 on Jan 21, 2017
  4. TheBlueMatt commented at 3:14 pm on January 21, 2017: member

    (fixes #9212)

    On January 21, 2017 1:54:26 AM EST, Cory Fields notifications@github.com wrote:

    First change is from @TheBlueMatt.

    https://github.com/bitcoin/bitcoin/pull/8708/commits/462965635e8bf9c2754c27946f8b05560b8ddd84 added assertions to ensure that we never send out any messages before version negotiation has completed, but a few issues remain.

    In cases where the version message failed to deserialize fully, we can end up with only some of the vars set, leading to sending messages that shouldn’t be sent.

    Additionally, the ForEach* functions allow the caller to send messages to all nodes, including those that aren’t yet fully connected. To prevent that, filter out nodes that definitely shouldn’t be sending/processing messages.

    As a follow-up, I’d like to remove the assert, as I think it’s done its job, but I’ll save that discussion for another PR because I think @TheBlueMatt mentioned that he’d prefer to keep it. You can view, comment on, or merge this pull request online at:

    #9609

    – Commit Summary –

    • Dont deserialize nVersion into CNode, should fix #9212
    • net: deserialize the entire version message locally
    • net: don’t run callbacks on nodes that haven’t completed the version handshake

    – File Changes –

    M src/net.cpp (7) M src/net.h (38) M src/net_processing.cpp (54)

    – Patch Links –

    https://github.com/bitcoin/bitcoin/pull/9609.patch https://github.com/bitcoin/bitcoin/pull/9609.diff

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9609

  5. in src/net_processing.cpp: in bd3725ef6c outdated
    1246         if (!vRecv.empty()) {
    1247-            vRecv >> pfrom->nStartingHeight;
    1248+            vRecv >> nStartingHeight;
    1249         }
    1250         {
    1251             LOCK(pfrom->cs_filter);
    


    gmaxwell commented at 6:36 pm on January 21, 2017:
    This block no longer accesses a cs_filter protected variable. The lock and braces should be removed.
  6. in src/net_processing.cpp: in bd3725ef6c outdated
    1257@@ -1254,7 +1258,19 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
    1258             return true;
    1259         }
    1260 
    1261+        pfrom->nServices = nServices;
    1262         pfrom->addrLocal = addrMe;
    1263+        pfrom->strSubVer = strSubVer;
    1264+        pfrom->cleanSubVer = SanitizeString(strSubVer);
    1265+        pfrom->nStartingHeight = nStartingHeight;
    1266+        pfrom->fRelayTxes = fRelay;
    


    gmaxwell commented at 6:36 pm on January 21, 2017:
    This was protected by cs_filter above.
  7. in src/net_processing.cpp: in bd3725ef6c outdated
    1266+        pfrom->fRelayTxes = fRelay;
    1267+        pfrom->fClient = !(nServices & NODE_NETWORK);
    1268+
    1269+        // Change version
    1270+        pfrom->SetSendVersion(nSendVersion);
    1271+        pfrom->nVersion = nVersion;
    


    gmaxwell commented at 6:44 pm on January 21, 2017:
    nVersion should be made atomic.
  8. theuni commented at 6:47 pm on January 21, 2017: member
    @gmaxwell ACK all of those, will fix.
  9. theuni force-pushed on Jan 21, 2017
  10. in src/net_processing.cpp: in 3769071e1b outdated
    1290@@ -1278,11 +1291,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
    1291         UpdatePreferredDownload(pfrom, State(pfrom->GetId()));
    1292         }
    1293 
    1294-        // Change version
    1295         connman.PushMessage(pfrom, CNetMsgMaker(INIT_PROTO_VERSION).Make(NetMsgType::VERACK));
    


    gmaxwell commented at 8:00 am on January 22, 2017:
    If we set succesfully connected and nversion before sending VERACK is it possible that some other thread will send some other message to this peer before VERACK has been sent?

    theuni commented at 6:10 pm on January 22, 2017:

    Yes, and that’s ok. It’s seen on the current network.

    What’s not ok is that messages before VERACK could possibly end up being sent with the upgraded version. So i suppose setting the sendversion should stay as-is, after the VERACK.


    theuni commented at 6:52 pm on January 22, 2017:

    Er, that doesn’t work, it just reintroduces the problem that we’re attempting to solve.

    A mutex for a group of these attributes may be unavoidable. I’ll poke at it some more.


    gmaxwell commented at 7:09 am on January 23, 2017:
    the setsetversion can be run on based on nversion variable. So there is no reason it can’t be done early with the nversion set later, as far as I can tell.
  11. MarcoFalke renamed this:
    net: fix remaining net assertions (fixes #9212)
    net: fix remaining net assertions
    on Jan 22, 2017
  12. theuni force-pushed on Jan 23, 2017
  13. theuni commented at 0:56 am on January 24, 2017: member
    Ok, I think this is solid now.
  14. morcos commented at 3:46 pm on January 24, 2017: member

    tenuous utACK Read through it and makes sense to me, but not too familiar with this code.

    I’m also not very familiar with feeler connections but would it make sense to move the feeler check before the fSuccessfullyConnected, so that we set fDisconnect first, and then won’t send anything to the feeler peers? Are we meant to be sending addr and GetAddr messages to the feeler peers?

  15. gmaxwell commented at 6:26 pm on January 24, 2017: contributor
    @morcos yes we should send addr to feelers– their purpose is to improve knowledge about the node addresses all around. I agree we should probably set Successfully after the disconnect flag is set for them.
  16. in src/net.h: in ae6779968d outdated
    165@@ -166,7 +166,7 @@ class CConnman
    166     {
    167         LOCK(cs_vNodes);
    168         for (auto&& node : vNodes)
    169-            if(!func(node))
    170+            if (NodeFullyConnected(node) && !func(node))
    


    TheBlueMatt commented at 10:10 pm on January 24, 2017:
    Can we just remove unused functions instead of fixing them?
  17. in src/net_processing.cpp: in ae6779968d outdated
    1279+        }
    1280+
    1281+        // Change version
    1282+        pfrom->SetSendVersion(nSendVersion);
    1283+        pfrom->nVersion = nVersion;
    1284+        pfrom->fSuccessfullyConnected = true;
    


    TheBlueMatt commented at 10:54 pm on January 24, 2017:

    If we’re gonna be using fSuccessfullyConnected (which I think we should), can we set it in VERACK instead and use it in SendMessages? I looked over where we check nVersion for “are we connected” and it all looks like that would leave it at just ProcessMessage, which is right.

    This would mean we at least dont do anything but respond to messages if we haven’t gotten the peer’s verack yet.


    gmaxwell commented at 11:47 pm on January 24, 2017:
    what if someone sends us a VERACK without anything else having happened? :P

    theuni commented at 0:46 am on January 25, 2017:
    @TheBlueMatt That disagrees with my intended future usage of fSuccessfullyConnected (net layer only), but I agree that it makes more sense that way for now. I’ll make that change, and we can discuss further post-0.14.
  18. TheBlueMatt commented at 0:13 am on January 25, 2017: member

    Then we can catch it with our nVersion != 0 check :)

    On January 24, 2017 6:47:38 PM EST, Gregory Maxwell notifications@github.com wrote:

    gmaxwell commented on this pull request.

    •    if((pfrom->nServices & NODE_WITNESS))
      
    •    pfrom->nServices = nServices;
      
    •    pfrom->addrLocal = addrMe;
      
    •    pfrom->strSubVer = strSubVer;
      
    •    pfrom->cleanSubVer = SanitizeString(strSubVer);
      
    •    pfrom->nStartingHeight = nStartingHeight;
      
    •    pfrom->fClient = !(nServices & NODE_NETWORK);
      
    •    {
      
    •        LOCK(pfrom->cs_filter);
      
    •        pfrom->fRelayTxes = fRelay; // set to true after we get
      

    the first filter* message

    •    }
      
    •    // Change version
      
    •    pfrom->SetSendVersion(nSendVersion);
      
    •    pfrom->nVersion = nVersion;
      
    •    pfrom->fSuccessfullyConnected = true;
      

    what if someone sends us a VERACK without anything else having happened? :P

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9609

  19. gmaxwell commented at 6:29 am on January 25, 2017: contributor
    ACK a79e00b8b251ab3744bd2f1bd48f265c1a7d7912
  20. laanwj commented at 9:02 am on January 26, 2017: member

    ACK a79e00b8b251ab3744bd2f1bd48f265c1a7d7912

    That commit is not part of the repository :)

  21. gmaxwell commented at 9:51 am on January 26, 2017: contributor
    It’s the head I reviewed and tested however. :P (I wondered if someone would notice.) (it sounds like cfields intends to change this more)
  22. theuni force-pushed on Jan 26, 2017
  23. theuni commented at 8:21 pm on January 26, 2017: member

    Fixed up as requested. fSuccessfullyConnected is set last thing when processing VERACK so that we push out SENDCMPCT/SENDHEADERS first.

    We will now avoid doing anything in SendMessages() until the handshake is complete. @morcos That fixes the feeler issue as well, since after VERSION, no more messages will be processed, and nothing else will go out. @gmaxwell VERACK can’t be seen before VERSION, as we refuse to process anything before version has been set.

    Also removed the unused ForEach* in CConnman as requested by @TheBlueMatt.

  24. gmaxwell commented at 8:25 pm on January 27, 2017: contributor
    utACK. Will test. You should consider making a brief release note that some broken P2P implementations (that never send VERACK) may stop working.
  25. in src/net.h: in ae6779968d outdated
    212@@ -213,36 +213,43 @@ class CConnman
    213     void ForEachNode(Callable&& func)
    214     {
    215         LOCK(cs_vNodes);
    216-        for (auto&& node : vNodes)
    217-            func(node);
    218+        for (auto&& node : vNodes) {
    


    TheBlueMatt commented at 5:22 pm on January 28, 2017:
    I know it was already there, but can we remove the auto here? It doesnt save any space, and during review I always find myself thinking “OK, gonna go grep for CNode to find every use of it in the codebase”, which our ever-increasing use of auto breaks.

    theuni commented at 6:01 pm on January 28, 2017:

    It’s conducive to large type changes, though. With this one in particular, when CNode becomes a shared_ptr<CNode>, this code requires no changes.

    For exactly that reason, I actually have local changes which switch all loop operations on vNodes to for (auto&& node : vNodes) :\

  26. TheBlueMatt commented at 5:29 pm on January 28, 2017: member
    Other than my general objection to increased use of auto in the codebase (which I’ll probably go through and get upset and rip out soon enough), utACK 47134d7de2ae57fbcb0715922094eca624ccb0fb
  27. TheBlueMatt commented at 6:28 pm on January 28, 2017: member

    I’m much happier reviewing a massive pile of sed-generated diff than having auto everywhere, but maybe I’m alone… I’ll bring it up at the next meeting (because we didn’t /just/ have a discussion of style or anything…).

    On January 28, 2017 1:01:29 PM EST, Cory Fields notifications@github.com wrote:

    theuni commented on this pull request.

    @@ -213,36 +213,43 @@ class CConnman void ForEachNode(Callable&& func) { LOCK(cs_vNodes);

    •    for (auto&& node : vNodes)
      
    •        func(node);
      
    •    for (auto&& node : vNodes) {
      

    It’s conducive to large type changes, though. With this one in particular, when CNode becomes a shared_ptr<CNode>, this code requires no changes.

    For exactly that reason, I actually have local changes which switch all loop operations on vNodes to for (auto&& node : vNodes) :\

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9609

  28. in src/net_processing.cpp: in 5e35c8ee1f outdated
    2720@@ -2721,8 +2721,8 @@ bool SendMessages(CNode* pto, CConnman& connman, std::atomic<bool>& interruptMsg
    2721 {
    2722     const Consensus::Params& consensusParams = Params().GetConsensus();
    2723     {
    2724-        // Don't send anything until we get its version message
    2725-        if (pto->nVersion == 0 || pto->fDisconnect)
    2726+        // Don't send anything until the version handshake is complete
    2727+        if (!pto->fSuccessfullyConnected || pto->fDisconnect)
    


    gmaxwell commented at 11:03 pm on January 28, 2017:
    Why isn’t this !NodeFullyConnected(pto) ?

    TheBlueMatt commented at 0:21 am on January 29, 2017:
    I believe because Cory wanted NodeFullyConnected to be a net-specific thing, and not a net_processing thing…its all a bit intermixed at the moment but I believe @theuni wants to fix it up soon.

    theuni commented at 0:58 am on January 29, 2017:
    Yep, exactly that. I have no problem with changing it for the time-being though if you think it’d be more clear.

    gmaxwell commented at 7:24 pm on February 2, 2017:
    I don’t care, just looked like an obvious omission.
  29. MarcoFalke commented at 12:09 pm on January 30, 2017: member
    Needs rebase after #9626
  30. sdaftuar commented at 4:45 pm on February 1, 2017: member
    I’ll echo the tenuous utACK (though this needs to be rebased). I spent some time trying to find ways to break this code and wasn’t able to come up with anything interesting. I do think that we should remove the asserts for the 0.14 release, but I’m indifferent to whether we keep them around in master.
  31. TheBlueMatt commented at 4:54 pm on February 1, 2017: member
    Yes, I’d agree with @sdaftuar and remove the assertion in the 0.14 branch, but leave it in in master.
  32. laanwj commented at 8:50 am on February 2, 2017: member

    I really hate the assertions in the net code. They can and will lead to remotely-triggered DoS attacks. It’s just not acceptable to do debugging in this way, sorry.

    They should be replaced with logging, or maybe an option ‘stop on network errors’ that needs to be enabled explicitly while debugging network code. But this should never be the default behavior.

  33. TheBlueMatt commented at 2:00 pm on February 2, 2017: member

    We use assertions in many places to ensure our logic is functioning - here we’re asserting that the peer we’re accessing has completed the handshake, a pretty important cleanup IMO. Indeed, this assertion is somewhat shit due to the half-cleaned-up state of net at the moment, but I don’t think that means we should remove the assertion in 0.15. Let’s remove it after the branch and open an issue to revisit before 0.15 release.

    On February 2, 2017 3:50:40 AM EST, “Wladimir J. van der Laan” notifications@github.com wrote:

    I really hate the assertions in the net code. They can and will lead to remotely-triggered DoS attacks. It’s just not acceptable to do debugging in this way, sorry.

    They should be replaced with logging, or maybe an option ‘stop on network errors’ that needs to be enabled explicitly while debugging network code.

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9609#issuecomment-276900251

  34. jtimon commented at 2:37 pm on February 2, 2017: contributor

    I see the way we use asserts as equivalent to exceptions, but making sure they cannot be catched. Perhas activating them optionally as wumpus suggests is interesting. I really don’t understand the reasoning for doing this in 0.14 but not in master. If it needs to be revisited, fine, it can be done after this change. I really dislike that version branches differ from master more than they need, specially when it’s for no good reason like it seems is tje case here. IMO it should be only version change, release notes, etc. And bug fixes that we were NOT aware of before doing the branch.

    On 2 Feb 2017 15:00, “Matt Corallo” notifications@github.com wrote:

    We use assertions in many places to ensure our logic is functioning - here we’re asserting that the peer we’re accessing has completed the handshake, a pretty important cleanup IMO. Indeed, this assertion is somewhat shit due to the half-cleaned-up state of net at the moment, but I don’t think that means we should remove the assertion in 0.15. Let’s remove it after the branch and open an issue to revisit before 0.15 release.

    On February 2, 2017 3:50:40 AM EST, “Wladimir J. van der Laan” < notifications@github.com> wrote:

    I really hate the assertions in the net code. They can and will lead to remotely-triggered DoS attacks. It’s just not acceptable to do debugging in this way, sorry.

    They should be replaced with logging, or maybe an option ‘stop on network errors’ that needs to be enabled explicitly while debugging network code.

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/pull/9609#issuecomment-276900251

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/9609#issuecomment-276964566, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9jSg6dazVJUu6tZvFzCK6zEbW9N6Q5ks5rYeGYgaJpZM4Lp_La .

  35. TheBlueMatt commented at 3:03 pm on February 2, 2017: member
    @jtimon the reason for not doing it on master is that the assert, if it fires, actually is an indication of a bug. On the flip side, because of the relative youth of this code (and the fact that such bugs are long-standing issues, not regressions), its probably better to not ship something with the assert in just yet. I dont see any problem having master and 0.14 differ by one line…
  36. laanwj commented at 3:10 pm on February 2, 2017: member

    We use assertions in many places to ensure our logic is functioning

    In places where there is no (direct) interface with the outside world it’s somewhat more acceptable. At least if there’d be crashes it is not remotely triggerable. It’s assertions in net code that I have a problem with.

    Let’s remove it after the branch and open an issue to revisit before 0.15 release.

    What I am afraid of is that that will be forgotten and they somehow will make it into a release anyway. I think leaving master even temporarily in a “dangerous” state is very risky.

  37. TheBlueMatt commented at 4:08 pm on February 2, 2017: member
    @laanwj ok, fair… @sdaftuar points out that its probably time to rethink our assertion infrastructure - get more agressive with assertions when not running a production build and get weaker on a few of them when doing so (and probably dont use assert(), since it is rarely what we want, but maybe log such errors in production with a big fat “PLEASE REPORT THIS” tag).
  38. theuni commented at 4:35 pm on February 2, 2017: member

    (Back in town, sorry for slow response)

    I agree with removing the assertion. It probably never should have been there, as it introduces the possibility of remote crashers. I must admit that I’m glad that it forced us to fully address this problem, though.

    I’ll rebase and turn the assertion into a log.

  39. Dont deserialize nVersion into CNode, should fix #9212 80ff0344ae
  40. net: deserialize the entire version message locally
    This avoids having some vars set if the version negotiation fails.
    
    Also copy it all into CNode at the same site. nVersion and
    fSuccessfullyConnected are set last, as they are the gates for the other vars.
    Make them atomic for that reason.
    2046617b5e
  41. net: don't run callbacks on nodes that haven't completed the version handshake
    Since ForEach* are can be used to send messages to  all nodes, the caller may
    end up sending a message before the version handshake is complete. To limit
    this, filter out these nodes. While we're at it, may as well filter out
    disconnected nodes as well.
    
    Delete unused methods rather than updating them.
    12752af0cc
  42. net: Disallow sending messages until the version handshake is complete
    This is a change in behavior, though it's much more sane now than before.
    7a8c251901
  43. net: log an error rather than asserting if send version is misused
    Also cleaned up the comments and moved from the header to the .cpp so that
    logging headers aren't needed from net.h
    08bb6f4ed4
  44. theuni force-pushed on Feb 2, 2017
  45. laanwj commented at 2:37 pm on February 3, 2017: member
    utACK 08bb6f4
  46. TheBlueMatt commented at 4:56 pm on February 3, 2017: member
    utACK 08bb6f4ed48359aedd869450b99799b9c734084b
  47. gmaxwell commented at 11:38 pm on February 3, 2017: contributor
    ACK 08bb6f4
  48. sipa commented at 0:39 am on February 4, 2017: member
    utACK 08bb6f4ed48359aedd869450b99799b9c734084b, testing with -fsanitizes.
  49. laanwj merged this on Feb 4, 2017
  50. laanwj closed this on Feb 4, 2017

  51. laanwj referenced this in commit 496691741d on Feb 4, 2017
  52. in src/net_processing.cpp: in 08bb6f4ed4
    1198@@ -1199,50 +1199,51 @@ bool static ProcessMessage(CNode* pfrom, std::string strCommand, CDataStream& vR
    1199         CAddress addrFrom;
    1200         uint64_t nNonce = 1;
    1201         uint64_t nServiceInt;
    1202-        vRecv >> pfrom->nVersion >> nServiceInt >> nTime >> addrMe;
    


    rebroad commented at 9:57 am on February 10, 2017:
    this has been in the code forever… is it really needing to be changed now?

    laanwj commented at 10:03 am on February 10, 2017:
    Unfortunately, many of the race conditions have been in the code forever and are only now being discovered and solved.
  53. theuni deleted the branch on Feb 11, 2017
  54. HashUnlimited referenced this in commit 27c7dac826 on Feb 27, 2018
  55. random-zebra referenced this in commit 777638e7bc on Aug 27, 2020
  56. random-zebra referenced this in commit cbd9271afb on Sep 7, 2020
  57. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me