net: RecordBytesSent under cs_vSend lock #18784

pull MarcoFalke wants to merge 1 commits into bitcoin:master from MarcoFalke:2004-netLockRecordBytesSent changing 1 files +5 −7
  1. MarcoFalke commented at 5:24 pm on April 27, 2020: member

    The CNode member nSendBytes is incremented under the node’s lock cs_vSend. However, RecordBytesSent is not. An RPC thread that acquires the cs_vSend lock puts the message handler thread on hold. While the thread is on hold, getnettotals returns “stale” values or values that don’t add up.

    This can be fixed by making cs_vSend a “write lock” for the total bytes sent in connman.

  2. DrahtBot added the label P2P on Apr 27, 2020
  3. MarcoFalke added this to the milestone 0.21.0 on Apr 28, 2020
  4. net: RecordBytesSent under cs_vSend lock
    The CNode member nSendBytes is incremented under the node's lock
    cs_vSend. However, RecordBytesSent is not. An RPC thread that acquires
    the cs_vSend lock puts the message handler thread on hold. While the
    thread is on hold, getnettotals returns "stale" values or values that
    don't add up.
    
    This can be fixed by making cs_vSend a "write lock" for the total bytes
    sent in connman.
    
    After this commit, both calls to RecordBytesSent are done under the
    LOCK(pnode->cs_vSend);
    faae6ca9e8
  5. MarcoFalke force-pushed on May 13, 2020
  6. DrahtBot commented at 1:41 am on July 23, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19673 (Move cs_vSend into SocketSendData and resolve RecordBytesSent lock inconsistency by troygiorshev)
    • #19509 (Per-Peer Message Capture by troygiorshev)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  7. troygiorshev commented at 12:51 pm on July 24, 2020: contributor

    ACK faae6ca9e8a3f67c737c03e122fd315cef5e66d7

    Reviewed, ran tests.

    This is a nice readability improvement too 📖

    Changed my mind, see below.

  8. jnewbery commented at 4:36 pm on July 31, 2020: member

    getnettotals returns “stale” values or values that don’t add up

    Can you explain this a bit? What do the values not add up to? As far as I can tell, getnettotals simply returns the totalbytessent and totalbytesreceived at some point in time. There are not consistency guarantees with subsequent calls to the getpeerinfo values.

  9. troygiorshev commented at 9:44 pm on August 5, 2020: contributor

    I’ve changed my mind on this one. getnettotals and getpeerinfo are two separate rpcs, we shouldn’t try and make them sync up. nTotalBytesSent is separate from a node’s nSendBytes. Looking at the analogous structures on the receiving side, nTotalBytesRecv is clearly separate from nRecvBytes.

    However, I think a PR like this is needed to bring consistency between the two calls to RecordBytesSent (as it stands before this PR, one is under cs_vSend and the other is not). I can see a few more improvements available too.

  10. promag commented at 0:23 am on August 15, 2020: member

    Concept ACK.

    What is the motivation to have the lock when calling RecordBytesSent? got it.

  11. MarcoFalke commented at 3:09 pm on August 27, 2020: member

    I’ve changed my mind on this one. getnettotals and getpeerinfo are two separate rpcs, we shouldn’t try and make them sync up. nTotalBytesSent is separate from a node’s nSendBytes. Looking at the analogous structures on the receiving side, nTotalBytesRecv is clearly separate from nRecvBytes.

    Why are they separate? Let’s assume we connect to a peer, send 100 bytes, disconnect. Send 100 bytes to another peer, then the total bytes sent should simply equal 200 bytes.

  12. troygiorshev commented at 3:35 pm on August 27, 2020: contributor

    Why are they separate? Let’s assume we connect to a peer, send 100 bytes, disconnect. Send 100 bytes to another peer, then the total bytes sent should simply equal 200 bytes.

    Maybe I’m completely off-base here, I’m not trying to make a big statement.

    Isn’t it impossible to keep these RPCs synced? They are two separate calls, something might happen in between them.

    1. Call getnettotals, note the value of totalbytesrecv
    2. node receives a message from another peer, of size 100 bytes
    3. Call getpeerinfo, see that the sum of bytesrecv for each node is 100 bytes more than totalbytesrecv from above
  13. MarcoFalke commented at 3:51 pm on August 27, 2020: member

    Ok, I see. The reason I created this pull was that a call to

    0        net_totals_before = self.nodes[0].getnettotals()
    1        peer_info = self.nodes[0].getpeerinfo()
    2        net_totals_after = self.nodes[0].getnettotals()
    

    and then asserting that before <= sum <= after would sometimes fail.

    I haven’t looked into #19673 so I am wondering if #19673 also fixes that issue.

  14. troygiorshev commented at 3:57 pm on August 27, 2020: contributor

    ^

    Woah that shouldn’t happen. Will investigate.

  15. MarcoFalke commented at 4:14 pm on August 27, 2020: member
  16. jnewbery commented at 9:00 am on October 28, 2020: member

    a call to

       net_totals_before = self.nodes[0].getnettotals()
       peer_info = self.nodes[0].getpeerinfo()
       net_totals_after = self.nodes[0].getnettotals()
    

    and then asserting that before <= sum <= after would sometimes fail.`

    What is sum here? Are you saying that getnettotals.totalbytessent can go down?

    I don’t understand the problem you’re trying to solve.

  17. MarcoFalke commented at 9:19 am on October 28, 2020: member

    Are you saying that getnettotals.totalbytessent can go down?

    No, it is returning the value it returned in a previous call. But getpeerinfo already returned the new correct value.

    #17107 (comment)

  18. jnewbery commented at 9:34 am on October 28, 2020: member

    I agree with Wlad’s assessment in that issue:

    these are meant to be overall statistics, it doesn’t seem super important that the net totals are immediately up to date.

  19. MarcoFalke commented at 9:42 am on October 28, 2020: member

    Ok, then the test is wrong and should be removed:

     0diff --git a/test/functional/rpc_net.py b/test/functional/rpc_net.py
     1index 03c858c694..afb67891af 100755
     2--- a/test/functional/rpc_net.py
     3+++ b/test/functional/rpc_net.py
     4@@ -104,29 +104,15 @@ class NetTest(BitcoinTestFramework):
     5 
     6     def test_getnettotals(self):
     7         self.log.info("Test getnettotals")
     8-        # getnettotals totalbytesrecv and totalbytessent should be
     9-        # consistent with getpeerinfo. Since the RPC calls are not atomic,
    10-        # and messages might have been recvd or sent between RPC calls, call
    11-        # getnettotals before and after and verify that the returned values
    12-        # from getpeerinfo are bounded by those values.
    13-        net_totals_before = self.nodes[0].getnettotals()
    14         peer_info = self.nodes[0].getpeerinfo()
    15-        net_totals_after = self.nodes[0].getnettotals()
    16-        assert_equal(len(peer_info), 2)
    17-        peers_recv = sum([peer['bytesrecv'] for peer in peer_info])
    18-        peers_sent = sum([peer['bytessent'] for peer in peer_info])
    19-
    20-        assert_greater_than_or_equal(peers_recv, net_totals_before['totalbytesrecv'])
    21-        assert_greater_than_or_equal(net_totals_after['totalbytesrecv'], peers_recv)
    22-        assert_greater_than_or_equal(peers_sent, net_totals_before['totalbytessent'])
    23-        assert_greater_than_or_equal(net_totals_after['totalbytessent'], peers_sent)
    24+        net_totals = self.nodes[0].getnettotals()
    25 
    26         # test getnettotals and getpeerinfo by doing a ping
    27         # the bytes sent/received should change
    28         # note ping and pong are 32 bytes each
    29         self.nodes[0].ping()
    30-        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytessent'] >= net_totals_after['totalbytessent'] + 32 * 2), timeout=1)
    31-        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytesrecv'] >= net_totals_after['totalbytesrecv'] + 32 * 2), timeout=1)
    32+        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytessent'] >= net_totals['totalbytessent'] + 32 * 2), timeout=1)
    33+        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytesrecv'] >= net_totals['totalbytesrecv'] + 32 * 2), timeout=1)
    34 
    35         peer_info_after_ping = self.nodes[0].getpeerinfo()
    36         for before, after in zip(peer_info, peer_info_after_ping):
    
  20. jnewbery commented at 10:19 am on October 28, 2020: member

    the test is wrong and should be removed:

    Done: #20258

  21. MarcoFalke commented at 10:21 am on October 28, 2020: member
    thanks
  22. MarcoFalke closed this on Oct 28, 2020

  23. MarcoFalke deleted the branch on Oct 28, 2020
  24. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me