net processing: Move block inventory state to net

jnewbery commented at 8:14 PM on August 28, 2020: member

This continues the work of moving application layer data into net_processing, by moving all block inventory state into the new Peer object added in #19607.

For motivation, see #19398.

DrahtBot commented at 9:21 PM on August 28, 2020: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#20729 (p2p: standardize on outbound-{full, block}-relay connection type naming by jonatack)
#20724 (Cleanup of -debug=net log messages by ajtowns)
#20158 (tree-wide: De-globalize ChainstateManager by dongcarl)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot cross-referenced this on Aug 28, 2020 from issue [net processing] Move Misbehaving() to PeerManager by jnewbery

DrahtBot added the label GUI on Aug 28, 2020

DrahtBot added the label P2P on Aug 28, 2020

DrahtBot added the label RPC/REST/ZMQ on Aug 28, 2020

DrahtBot cross-referenced this on Aug 28, 2020 from issue net, rpc: expose high bandwidth mode state via getpeerinfo by theStack

DrahtBot cross-referenced this on Aug 28, 2020 from issue [RPC] Add connection type to getpeerinfo, improve logs by amitiuttarwar

DrahtBot cross-referenced this on Aug 28, 2020 from issue [net] Cleanup connection types- followups by amitiuttarwar

in src/net_processing.cpp:492 in cc2c15fd73 outdated

 487 | @@ -488,6 +488,9 @@ struct Peer {
 488 |      /** Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission). */
 489 |      bool m_should_discourage GUARDED_BY(m_misbehavior_mutex){false};
 490 |  
 491 | +    /** Starting height of this peer */
 492 | +    std::atomic<int> nStartingHeight{-1};

ariard commented at 12:49 AM on August 29, 2020:

"Discovered at version message exchange. Used at tip update, to decide if our chain is fresh enough compared to peer's one to be worthy of headers announcement." ?

jnewbery commented at 9:02 AM on August 29, 2020:

The comment doesn't need to document everything that the variable is used for. That's what the code does.

in src/net.h:1046 in 3ee51652e9 outdated

 921 | @@ -928,9 +922,6 @@ class CNode
 922 |      // m_tx_relay == nullptr if we're not relaying transactions with this peer
 923 |      std::unique_ptr<TxRelay> m_tx_relay;
 924 |  
 925 | -    // Used for headers announcements - unfiltered blocks to relay

ariard commented at 1:06 AM on August 29, 2020:

What's the distinction you loose by dropping unfiltered blocks to relay ?

I guess this is a reference to sort done at headers announcement, L4218 src/net_processing.cpp: https://github.com/bitcoin/bitcoin/blob/1cf73fb8eba3b89a30d5e943deaec5052a8b21b7/src/net_processing.cpp#L4218

Before to announce headers, we check both with regards to peer's context ("Does this header connect to known peer's tip ?") and our context ("Does this header is part of best chain ?"). It wasn't a clear comment, so yes either drop it or extend ?

jnewbery commented at 9:20 AM on August 29, 2020:

I think drop it.

in src/net_processing.cpp:455 in 3ee51652e9 outdated

 492 | +    RecursiveMutex m_block_inv_mutex;
 493 | +    /** List of block ids we still have announce.
 494 | +     * There is no final sorting before sending, as they are always sent
 495 | +     * immediately and in the order requested. */
 496 | +    std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex);
 497 | +    /** List of headers to relay */

ariard commented at 1:30 AM on August 29, 2020:

vBlockHashesToAnnounce is a list of block hashes not headers ? With regards to renaming follow-up commit, I would lean towards the most verbose option m_blockids_for_inv_relay/m_blockids_for_headers_relay to denotate that these vectors contain an identifier even if the endgoal is to relay the identifiee (either block or headers).

jnewbery commented at 9:19 AM on August 29, 2020:

Ok, I'll update the comment to "List of block ids to announce headers for".

I would lean towards the most verbose option m_blockids_for_inv_relay/m_blockids_for_headers_relay

Sure. I'll update.

ariard commented at 1:37 AM on August 29, 2020: member

Just calls for better comments.

jnewbery commented at 9:27 AM on August 29, 2020: member

Thanks for the review @ariard . I've addressed your comments.

jnewbery force-pushed on Aug 29, 2020

DrahtBot cross-referenced this on Sep 1, 2020 from issue net: improve nLastBlockTime and nLastTXTime documentation by jonatack

DrahtBot cross-referenced this on Sep 1, 2020 from issue Periodically make block-relay connections and sync headers by sdaftuar

DrahtBot added the label Needs rebase on Sep 2, 2020

jnewbery commented at 3:26 PM on September 2, 2020: member

Rebased

jnewbery force-pushed on Sep 2, 2020

jonatack commented at 3:34 PM on September 2, 2020: contributor

Concept ACK

DrahtBot removed the label Needs rebase on Sep 2, 2020

DrahtBot cross-referenced this on Sep 2, 2020 from issue p2p: Improve diversification of new connections by naumenkogs

DrahtBot added the label Needs rebase on Sep 3, 2020

jnewbery force-pushed on Sep 3, 2020

jnewbery commented at 3:56 PM on September 3, 2020: member

rebased

DrahtBot removed the label Needs rebase on Sep 3, 2020

hebasto commented at 7:06 AM on September 5, 2020: member

Concept ACK.

in src/net_processing.cpp:492 in e04fb9a2df outdated

 487 | @@ -488,6 +488,14 @@ struct Peer {
 488 |      /** Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission). */
 489 |      bool m_should_discourage GUARDED_BY(m_misbehavior_mutex){false};
 490 |  
 491 | +    /** Protects block inventory data members */
 492 | +    RecursiveMutex m_block_inv_mutex;

hebasto commented at 8:46 AM on September 5, 2020:

e04fb9a2df9f4d77c587bd25c608ce806c97ea45 Could using of RecursiveMutex type be avoided? (as there was Mutex cs_inventory;)

jnewbery commented at 9:41 AM on September 5, 2020:

Good catch! cs_inventory was a RecursiveMutex when I opened this PR, and I missed the change when I rebased. Fixed.

hebasto commented at 9:48 AM on September 5, 2020:

#19347 is yours, btw :)

jnewbery commented at 10:05 AM on September 5, 2020:

:flushed:

in src/net_processing.cpp:502 in 575c5f55cc outdated

 517 | +            peer_copies.push_back(peer.second);
 518 | +        }
 519 | +    }
 520 | +
 521 | +    for (auto&& peer : peer_copies) {
 522 | +        func(peer);

hebasto commented at 8:51 AM on September 5, 2020:

575c5f55cc44348f0ebc7740c90013b96bdb8c27 It is good for concurrency to limit the lock scope. But is it safe to call func on potentially stale peer_copies?

jnewbery commented at 9:42 AM on September 5, 2020:

Yes. peer_copies holds shared_ptr references, so the objects pointed to can't be destructed while it's in scope.

hebasto commented at 9:51 AM on September 5, 2020:

I mean not language safety rather client behavior safety: if another thread removes a peer from the g_peer_map, can we rely on result of func(peer)?

jnewbery commented at 10:05 AM on September 5, 2020:

Yes, because peer still exists in that case.

jnewbery force-pushed on Sep 5, 2020

jnewbery commented at 9:42 AM on September 5, 2020: member

Thanks for the review @hebasto . I've addressed both of your comments.

hebasto approved

hebasto commented at 2:35 PM on September 5, 2020: member

ACK 06841521613f1e7312f3f51597b43a20c193a363, tested on Linux Mint 20 (x86_64).

DrahtBot added the label Needs rebase on Sep 7, 2020

jnewbery commented at 6:51 PM on September 7, 2020: member

rebased

jnewbery force-pushed on Sep 7, 2020

jnewbery removed the label Needs rebase on Sep 7, 2020

DrahtBot added the label Needs rebase on Sep 8, 2020

jnewbery force-pushed on Sep 8, 2020

jnewbery commented at 9:44 PM on September 8, 2020: member

rebased

DrahtBot removed the label Needs rebase on Sep 8, 2020

jnewbery renamed this:
~~[net processing] Move block inventory state to net_processing~~
net processing: Move block inventory state to net_processing
on Sep 14, 2020

jnewbery removed the label GUI on Sep 16, 2020

DrahtBot cross-referenced this on Sep 19, 2020 from issue net: guard vRecvGetData with cs_vRecv and orphan_work_set with g_cs_orphans by narula

DrahtBot cross-referenced this on Sep 19, 2020 from issue net processing: Move peer_map to PeerManager by jnewbery

DrahtBot cross-referenced this on Sep 19, 2020 from issue net, rpc: expose connection type in getpeerinfo by jonatack

DrahtBot cross-referenced this on Sep 23, 2020 from issue net, rpc, cli: expose peer network in getpeerinfo; simplify/improve -netinfo by jonatack

DrahtBot added the label Needs rebase on Sep 26, 2020

jnewbery commented at 9:50 AM on September 28, 2020: member

rebased

jnewbery force-pushed on Sep 28, 2020

DrahtBot removed the label Needs rebase on Sep 28, 2020

in src/net_processing.cpp:1530 in 8d3a0684ba outdated

1581 | @@ -1542,6 +1582,7 @@ static void RelayAddress(const CAddress& addr, bool fReachable, const CConnman&
1582 |  
1583 |  void static ProcessGetBlockData(CNode& pfrom, const CChainParams& chainparams, const CInv& inv, CConnman& connman)
1584 |  {
1585 | +    PeerRef peer = GetPeerRef(pfrom.GetId());

jonatack commented at 1:18 PM on September 28, 2020:

In 8d3a068 I suppose this is placed at the top of the function, instead of where it is used 150 lines below, as you plan to add more uses of it to this function.

jnewbery commented at 9:35 AM on September 29, 2020:

Correct. Future commits will use peer throughout the function, and there's no downside to holding on to the object for a bit longer.

in src/net_processing.cpp:800 in 8d3a0684ba outdated

 947 | @@ -913,6 +948,10 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
 948 |      LOCK(cs_main);
 949 |      int misbehavior{0};
 950 |      {
 951 | +        // We remove the Peer object from g_peer_map here, but it's not guaranteed that
 952 | +        // we'll destruct Peer here since another thread could potentially still hold
 953 | +        // a reference to the object. Be careful not to do any processing here that
 954 | +        // assumes Peer won't be changed before it's destructed.
 955 |          PeerRef peer = GetPeerRef(nodeid);
 956 |          assert(peer != nullptr);

jonatack commented at 2:23 PM on September 28, 2020:

In commit 85e7deb, perhaps go a bit further in the added comment to clarify that the refcount is often > 1 here.

In my testing, adding assert(peer.use_count() <= 1); after this line as suggested in #19607 (review) causes bitcoind on mainnet to halt nearly immediately after net processing begins and 1-2 peers start to connect.

jnewbery commented at 9:40 AM on September 29, 2020:

Done

in src/net_processing.cpp:490 in 8d3a0684ba outdated

 518 | @@ -502,6 +519,24 @@ using PeerRef = std::shared_ptr<Peer>;
 519 |  Mutex g_peer_mutex;
 520 |  static std::map<NodeId, PeerRef> g_peer_map GUARDED_BY(g_peer_mutex);
 521 |  
 522 | +/** Get a reference to each Peer, then call a function on each of them */
 523 | +template<typename Callable>
 524 | +void ForEachPeer(Callable&& func)

jonatack commented at 3:49 PM on September 28, 2020:

cff5d286 What are your plans with respect to this function ForEachPeer() in your roadmap...end game interface, or more of an intermediary step?

jnewbery commented at 11:39 AM on September 29, 2020:

I don't have a strong opinion. I've done it this way here to minimize the difference in logic, but if people want to change the internal implementation later, I'm fine with that.

in src/qt/rpcconsole.cpp:1138 in 8d3a0684ba outdated

1146 | @@ -1148,6 +1147,8 @@ void RPCConsole::updateNodeDetail(const CNodeCombinedStats *stats)
1147 |              ui->peerCommonHeight->setText(QString("%1").arg(stats->nodeStateStats.nCommonHeight));
1148 |          else
1149 |              ui->peerCommonHeight->setText(tr("Unknown"));
1150 | +
1151 | +        ui->peerHeight->setText(QString::number(stats->nodeStateStats.m_starting_height));

jonatack commented at 3:50 PM on September 28, 2020:

9b14eb372 verified manually that this displays as before, as there is no GUI test coverage. Unfortunately, it is broken at this commit and returns only the initial value of -1 until 3 commits later at f27a1e221d.

jnewbery commented at 11:39 AM on September 29, 2020:

Oops. I think this was a bad rebase. Should be fixed now.

in src/rpc/net.cpp:234 in 8d3a0684ba outdated

 208 |                  // banscore is deprecated in v0.21 for removal in v0.22
 209 |                  obj.pushKV("banscore", statestats.m_misbehavior_score);
 210 |              }
 211 |              obj.pushKV("synced_headers", statestats.nSyncHeight);
 212 |              obj.pushKV("synced_blocks", statestats.nCommonHeight);
 213 | +            obj.pushKV("startingheight", statestats.m_starting_height);

jonatack commented at 3:53 PM on September 28, 2020:

9b14eb37 verified this functions as before per ./src/bitcoin-cli getpeerinfo | grep 'startingheight' | sort, as there appear to be no functional tests for getpeerinfo#startingheight. As above, it is broken at this commit and returns only the initial value of -1. Functionality returns 3 commits later at f27a1e221d.

jnewbery commented at 11:39 AM on September 29, 2020:

As above, should now be fixed.

in src/net_processing.cpp:459 in 8d3a0684ba outdated

 496 | +    std::vector<uint256> m_blocks_for_inv_relay GUARDED_BY(m_block_inv_mutex);
 497 | +    /** List of headers to relay */
 498 | +    std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);
 499 | +
 500 | +    /** Starting height of this peer */
 501 | +    std::atomic<int> m_starting_height{-1};

jonatack commented at 4:09 PM on September 28, 2020:

9b14eb372 m_starting_height presumably still needs to be std::atomic_int?

If you retouch, perhaps s/height/block height/ for the Doxygen doc.

jnewbery commented at 11:42 AM on September 29, 2020:

m_starting_height presumably still needs to be std::atomic_int?

I don't understand. It's a std::atomic<int> before and after this PR.

If you retouch, perhaps s/height/block height/ for the Doxygen doc.

I've updated the Doxygen comment to be clearer.

in src/net_processing.cpp:2281 in 8d3a0684ba outdated

2374 | @@ -2334,6 +2375,7 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
2375 |          return;
2376 |      }
2377 |  
2378 | +    PeerRef peer = GetPeerRef(pfrom.GetId());

jonatack commented at 4:14 PM on September 28, 2020:

In general, is there a plan to eventually no longer need to set PeerRef peer = GetPeerRef(...) in many net processing methods?

jnewbery commented at 9:42 AM on September 29, 2020:

Yes, potentially we could just pass a Peer& to many of the private functions (just as CNodeState* is passed to some of the functions now).

in src/net_processing.cpp:4242 in 8d3a0684ba outdated

4238 | @@ -4196,7 +4239,7 @@ bool PeerManager::SendMessages(CNode* pto)
4239 |                  // Try to find first header that our peer doesn't have, and
4240 |                  // then send all headers past that one.  If we come across any
4241 |                  // headers that aren't on ::ChainActive(), give up.
4242 | -                for (const uint256 &hash : pto->vBlockHashesToAnnounce) {
4243 | +                for (const uint256 &hash : peer->m_blocks_for_headers_relay) {

jonatack commented at 4:27 PM on September 28, 2020:

f27a1e221 nit, while touching this and lines 4334 and 4366 (3 places), if so inclined

                for (const uint256& hash : peer->m_blocks_for_headers_relay) {

jnewbery commented at 11:47 AM on September 29, 2020:

done

in src/net.h:1000 in 8d3a0684ba outdated

 939 | @@ -940,8 +940,6 @@ class CNode
 940 |      mapMsgCmdSize mapRecvBytesPerMsgCmd GUARDED_BY(cs_vRecv);
 941 |  
 942 |  public:
 943 | -    uint256 hashContinue;
 944 | -    std::atomic<int> nStartingHeight{-1};
 945 |

jonatack commented at 4:31 PM on September 28, 2020:

8d3a068 nit, remove extra line created by this deletion

jnewbery commented at 11:38 AM on September 29, 2020:

done

in src/net_processing.cpp:502 in 8d3a0684ba outdated

 497 | +    /** List of headers to relay */
 498 | +    std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);
 499 | +
 500 | +    /** Starting height of this peer */
 501 | +    std::atomic<int> m_starting_height{-1};
 502 | +    /** The final block hash in a INV message. When the peer requests this

jonatack commented at 4:32 PM on September 28, 2020:

8d3a068 pico-nits, s/a INV/an INV/ and s/them/it|the peer/

jnewbery commented at 11:49 AM on September 29, 2020:

fixed

in src/net_processing.cpp:1387 in 8d3a0684ba outdated

1438 | @@ -1399,11 +1439,11 @@ void PeerManager::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlockInde
1439 |              }
1440 |          }
1441 |          // Relay inventory, but don't relay old inventory during initial block download.
1442 | -        m_connman.ForEachNode([nNewHeight, &vHashes](CNode* pnode) {
1443 | -            LOCK(pnode->cs_inventory);
1444 | -            if (nNewHeight > (pnode->nStartingHeight != -1 ? pnode->nStartingHeight - 2000 : 0)) {
1445 | +        ForEachPeer([nNewHeight, &vHashes](const PeerRef& peer) {
1446 | +            LOCK(peer->m_block_inv_mutex);

jonatack commented at 4:44 PM on September 28, 2020:

In 9b14eb37 is it safe to remove LOCK(pnode->cs_inventory); this early? Another lock doesn't replace it until 3 commits later in f27a1e22.

jnewbery commented at 11:49 AM on September 29, 2020:

Oops. Bad rebase. Should be fixed now.

jonatack commented at 5:25 PM on September 28, 2020: contributor

First pass through this follow-up to #19607. A few comments below; feel free to ignore the nits. My main concern is that several commits aren't hygienic as mentioned below and that we're missing a test that exhibits this without manually looking at getpeerinfo or the GUI peer window.

jnewbery force-pushed on Sep 29, 2020

jnewbery commented at 11:54 AM on September 29, 2020: member

Thanks for the very thorough review, @jonatack. I've addressed all of your comments.

jnewbery removed the label RPC/REST/ZMQ on Oct 1, 2020

jnewbery added the label Refactoring on Oct 1, 2020

in src/net_processing.cpp:878 in 1a9e191167 outdated

1010 | @@ -971,6 +1011,7 @@ bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats) {
1011 |      PeerRef peer = GetPeerRef(nodeid);
1012 |      if (peer == nullptr) return false;
1013 |      stats.m_misbehavior_score = WITH_LOCK(peer->m_misbehavior_mutex, return peer->m_misbehavior_score);
1014 | +    stats.m_starting_height = peer->m_starting_height;

jonatack commented at 10:03 AM on October 1, 2020:

0a9839058c

net_processing.cpp: In function ‘bool GetNodeStateStats(NodeId, CNodeStateStats&)’:
net_processing.cpp:982:11: error: ‘struct CNodeStateStats’ has no member named ‘m_starting_height’; did you mean ‘nStartingHeight’?
  982 |     stats.m_starting_height = peer->m_starting_height;
      |           ^~~~~~~~~~~~~~~~~
      |           nStartingHeight

jnewbery commented at 10:53 AM on October 1, 2020:

fixed

jonatack commented at 10:05 AM on October 1, 2020: contributor

The new changes (git diff 8d3a068 1a9e191) look good. One rebase fixup in 0a9839058c.

jnewbery force-pushed on Oct 1, 2020

jnewbery commented at 10:53 AM on October 1, 2020: member

Thanks @jonatack . I fixed the bad rebase.

jonatack commented at 1:32 PM on October 1, 2020: contributor

commits f9136f4bf, 7199d81968, 8ff3e9319c11 and 40ea4a42d9

net_processing.cpp: In function ‘bool GetNodeStateStats(NodeId, CNodeStateStats&)’:
net_processing.cpp:982/1000/1009: error: ‘struct CNodeStateStats’ has no member named ‘nStartingHeight’; did you mean ‘m_starting_height’?
  982 |     stats.nStartingHeight = peer->nStartingHeight;
      |           ^~~~~~~~~~~~~~~
      |           m_starting_height

(we'll get there!)

jnewbery force-pushed on Oct 1, 2020

jnewbery commented at 6:01 PM on October 1, 2020: member

Oops. Sorry Jon. Should be good now.

jnewbery cross-referenced this on Oct 12, 2020 from issue Complete the BIP155 implementation and upgrade to TORv3 by vasild

DrahtBot cross-referenced this on Oct 15, 2020 from issue tree-wide: De-globalize ChainstateManager by dongcarl

DrahtBot added the label Needs rebase on Oct 19, 2020

jnewbery force-pushed on Oct 19, 2020

jnewbery commented at 8:16 AM on October 19, 2020: member

Rebased

DrahtBot removed the label Needs rebase on Oct 19, 2020

jnewbery force-pushed on Nov 19, 2020

jnewbery commented at 11:10 AM on November 19, 2020: member

Rebased

DrahtBot added the label Needs rebase on Dec 9, 2020

jnewbery commented at 6:42 PM on December 9, 2020: member

This is a slightly difficult rebase since the commits need to be structured a little differently to all build. Will work on it tomorrow.

jnewbery cross-referenced this on Dec 11, 2020 from issue net processing: Remove nStartingHeight check from block relay by jnewbery

jnewbery commented at 10:54 AM on December 11, 2020: member

I'm going to mark this as WIP for now. It'll be a lot easier to rebase after #20624 is merged.

If you want to help with #19398, please review #20624 first!

jnewbery marked this as a draft on Dec 12, 2020

MarcoFalke referenced this in commit eec9366f7d on Dec 14, 2020

jnewbery force-pushed on Dec 14, 2020

jnewbery marked this as ready for review on Dec 14, 2020

jnewbery commented at 12:09 PM on December 14, 2020: member

Rebased. This is now ready for review.

jnewbery cross-referenced this on Dec 14, 2020 from issue Move remaining application layer data to net processing by jnewbery

MarcoFalke commented at 12:42 PM on December 14, 2020: member

Running script for: fe391ca326f73337275d726f7de9db384ac35466

sed -i 's/vBlockHashesToAnnounce/m_blocks_for_headers_relay/g' src/net_processing.cpp

sed -i 's/vInventoryBlockToSend/m_blocks_for_inv_relay/g' src/net_processing.cpp

diff --git a/src/net_processing.h b/src/net_processing.h

index 1ad40982b..4998a1e04 100644

--- a/src/net_processing.h

+++ b/src/net_processing.h

@@ -66,9 +66,9 @@ struct Peer {

     /** List of block ids we still have announce.

      * There is no final sorting before sending, as they are always sent

      * immediately and in the order requested. */

-    std::vector<uint256> m_blocks_for_inv_relay GUARDED_BY(m_block_inv_mutex);

+    std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex);

     /** List of headers to relay */

-    std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);

+    std::vector<uint256> vBlockHashesToAnnounce GUARDED_BY(m_block_inv_mutex);

 

     /** This peer's reported block height when we connected */

     std::atomic<int> m_starting_height{-1};

Failed

jnewbery commented at 12:44 PM on December 14, 2020: member

Thanks @MarcoFalke. Should be fixed now.

jnewbery force-pushed on Dec 14, 2020

DrahtBot removed the label Needs rebase on Dec 14, 2020

in src/net_processing.cpp:1320 in a50dee50fd outdated

1320 | -        LOCK(pnode->cs_inventory);
1321 | -        for (const uint256& hash : reverse_iterate(vHashes)) {
1322 | -            pnode->vBlockHashesToAnnounce.push_back(hash);
1323 | +    {
1324 | +        LOCK(m_peer_mutex);
1325 | +        for (auto& it : m_peer_map) {

Sjors commented at 11:41 AM on December 15, 2020:

a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1 : ForEachNode performs an additional check NodeFullyConnected (pnode->fSuccessfullyConnected && !pnode->fDisconnect;).

Not sure if that matters much here. Could we prematurely send headers before VERACK?

jnewbery commented at 9:48 PM on December 15, 2020:

This is an excellent observation. Block hashes could be pushed into m_blocks_for_headers_relay before the version/verack handshake is complete. However, we'll never send any blocks inventory before we're fully connected due to the guard clause at the top of SendMessages():

https://github.com/bitcoin/bitcoin/blob/c434e2cca9181ce184efef946ab2ce62f2cd2ee7/src/net_processing.cpp#L4077-L4078

I think this behaviour is ok. The version-verack handshake usually lasts no more than a second or two, and at the worst case times out after a minute, so we're never going to push many block hashes onto this vector.

MarcoFalke commented at 12:41 PM on December 19, 2020:

a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1

Any reason to not have a ForEachPeer member function?

jnewbery commented at 3:05 PM on December 19, 2020:

I originally had one but removed it because it didn't seem worth it. We can revisit that decision later if it seems useful.

ForEachNode was useful because it allowed passing a lambda into CConnman which would be executed while the cs_vNodes lock was held. Here, the m_peer_mutex is in PeerMan, so it can be held directly.

Sjors commented at 11:54 AM on December 15, 2020: member

tACK b5b27ea modulo NodeFullyConnected clarification

jnewbery cross-referenced this on Dec 17, 2020 from issue Replace m_tx_relay/nullptr checks in net_processing.cpp by sdaftuar

jnewbery commented at 7:15 PM on December 18, 2020: member

@hebasto @ariard @jonatack - this is now rebased with all comments addressed. Do you have time to rereview?

in src/rpc/net.cpp:234 in 4ac61d85f1 outdated

 230 |                  // banscore is deprecated in v0.21 for removal in v0.22
 231 |                  obj.pushKV("banscore", statestats.m_misbehavior_score);
 232 |              }
 233 |              obj.pushKV("synced_headers", statestats.nSyncHeight);
 234 |              obj.pushKV("synced_blocks", statestats.nCommonHeight);
 235 | +            obj.pushKV("startingheight", statestats.nStartingHeight);

hebasto commented at 8:58 PM on December 18, 2020:

4ac61d85f1624b1cd7af4a85c8bca0bd430801f0 Position of "startingheight" in RPCHelpMan should be changed accordingly.

jnewbery commented at 11:41 PM on December 18, 2020:

Will fix.

jnewbery commented at 2:52 PM on December 19, 2020:

Fixed

hebasto commented at 2:55 PM on December 19, 2020:

Not sure if it is done in the latest push (58f4c8567ba77fbdbd65a12c74628505aacd0d3c).

in src/net_processing.h:66 in a50dee50fd outdated

  60 | @@ -61,6 +61,15 @@ struct Peer {
  61 |      /** Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission). */
  62 |      bool m_should_discourage GUARDED_BY(m_misbehavior_mutex){false};
  63 |  
  64 | +    /** Protects block inventory data members */
  65 | +    Mutex m_block_inv_mutex;
  66 | +    /** List of block ids we still have announce.

hebasto commented at 9:19 PM on December 18, 2020:

a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1

I understand that comment text has been preserved, but maybe

    /** List of block ids we still have to announce.

?

jnewbery commented at 11:41 PM on December 18, 2020:

Agree. Will fix.

jnewbery commented at 2:52 PM on December 19, 2020:

Fixed

sipa commented at 9:31 PM on December 18, 2020: member

utACK b5b27ea052905ff7f26a962baa961fcc762c5ee0

Is m_hash_continue protected by any locks? A "this is only accessed by the singular net_processing thread" rationale would be sufficient, but if there happens to be a lock that's held in all access locations, perhaps it's easy to add?

hebasto commented at 9:45 PM on December 18, 2020: member

ACK b5b27ea052905ff7f26a962baa961fcc762c5ee0, tested on Linux Mint 20 (x86_64), I've verified getpeerinfo RPC output, and detailed peer info in the "Peers" tab of the "Node window".

in src/net_processing.cpp:1322 in a50dee50fd outdated

1322 | -            pnode->vBlockHashesToAnnounce.push_back(hash);
1323 | +    {
1324 | +        LOCK(m_peer_mutex);
1325 | +        for (auto& it : m_peer_map) {
1326 | +            Peer& peer = *it.second;
1327 | +            LOCK(peer.m_block_inv_mutex);

sdaftuar commented at 10:16 PM on December 18, 2020:

I think this is the first place where we are acquiring a lock while already holding m_peer_mutex. Any thoughts on what is the best way to document this design, so that future code writers know that acquiring m_peer_mutex while holding a peer's m_block_inv_mutex is not allowed?

Or, would it be better to first copy all the Peer objects to a temporary place, release the m_peer_mutex lock, and then grab the per-peer m_block_inv_mutex locks to avoid holding both at the same time? (I don't know or have an opinion; just trying to figure out the locking design here.)

jnewbery commented at 11:38 PM on December 18, 2020:

Would a comment next to m_peer_mutex suffice here? This design (take m_peer_mutex first, then lock the mutexes protecting the individual data members) is analogous to ForEachNode() was doing here before (take cs_vNodes first, then lock the mutexes protecting the individual data members in the lambda - here cs_inventory).

jnewbery commented at 2:52 PM on December 19, 2020:

Added comments to the first commit. Let me know if you think more is required.

in src/net_processing.h:76 in b5b27ea052 outdated

  71 | @@ -72,6 +72,11 @@ struct Peer {
  72 |  
  73 |      /** This peer's reported block height when we connected */
  74 |      std::atomic<int> m_starting_height{-1};
  75 | +    /** The final block hash in an INV message. When the peer requests this
  76 | +     * block, we send a BLOCK message to trigger the peer to request the next

sdaftuar commented at 10:21 PM on December 18, 2020:

We send an INV message, not a BLOCK message.

jnewbery commented at 11:41 PM on December 18, 2020:

Yes, will fix.

jnewbery commented at 2:52 PM on December 19, 2020:

Done.

sdaftuar commented at 10:26 PM on December 18, 2020: member

Concept ACK, just one question about locking and a doc nit.

jnewbery commented at 11:42 PM on December 18, 2020: member

Is m_hash_continue protected by any locks? A "this is only accessed by the singular net_processing thread" rationale would be sufficient, but if there happens to be a lock that's held in all access locations, perhaps it's easy to add? @sipa could we just make it a std::atomic? It doesn't need to be synchronized with any other data, so as long as we're guaranteeing that reads/writes are atomic, I think that's enough?

jnewbery commented at 11:43 PM on December 18, 2020: member

This has three code review ACKs, with just three minor comments outstanding. I think we can merge now and I can fix those outstanding comments in the next PR in the #19398 sequence. Also happy to fix them here if that's what others prefer.

jnewbery cross-referenced this on Dec 19, 2020 from issue Net: Move ping data to net_processing by jnewbery

jnewbery commented at 11:25 AM on December 19, 2020: member

I've addressed @sdaftuar, @hebasto and @sipa suggestions in the first three commits of #20721.

in src/net_processing.cpp:1866 in 4ac61d85f1 outdated

1843 | @@ -1843,7 +1844,8 @@ void PeerManager::ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHe
1844 |              // Headers message had its maximum size; the peer may have more headers.
1845 |              // TODO: optimize: if pindexLast is an ancestor of ::ChainActive().Tip or pindexBestHeader, continue
1846 |              // from there instead.
1847 | -            LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n", pindexLast->nHeight, pfrom.GetId(), pfrom.nStartingHeight);
1848 | +            PeerRef peer = GetPeerRef(pfrom.GetId());

MarcoFalke commented at 12:27 PM on December 19, 2020:

4ac61d85f1624b1cd7af4a85c8bca0bd430801f0:

Can you explain why a nullptr-Assert is not needed here? Also, why isn't the peer passed in from outside, where it already exists?

jnewbery commented at 3:00 PM on December 19, 2020:

Yes, passing in from outside is a much better idea. I've done that now.

in src/net_processing.h:70 in a50dee50fd outdated

  65 | +    Mutex m_block_inv_mutex;
  66 | +    /** List of block ids we still have announce.
  67 | +     * There is no final sorting before sending, as they are always sent
  68 | +     * immediately and in the order requested. */
  69 | +    std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex);
  70 | +    /** List of headers to relay */

MarcoFalke commented at 12:54 PM on December 19, 2020:

a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1

The list is unfiltered, as was mentioned in the previous comment. Any reason to remove that?

jnewbery commented at 3:05 PM on December 19, 2020:

Ok, I've added the word unfiltered back.

in src/net_processing.h:75 in 22c2b3eff6 outdated

  65 | @@ -66,9 +66,9 @@ struct Peer {
  66 |      /** List of block ids we still have announce.
  67 |       * There is no final sorting before sending, as they are always sent
  68 |       * immediately and in the order requested. */
  69 | -    std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex);
  70 | +    std::vector<uint256> m_blocks_for_inv_relay GUARDED_BY(m_block_inv_mutex);
  71 |      /** List of headers to relay */
  72 | -    std::vector<uint256> vBlockHashesToAnnounce GUARDED_BY(m_block_inv_mutex);
  73 | +    std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);

MarcoFalke commented at 1:00 PM on December 19, 2020:

22c2b3eff601cbc1aed2e466502bdc5549e11ae2

I don't think this name is correct. Headers relay isn't implied. It might very well fall back to inv relay. Most importantly when the peer asked us to. Though, there are other reasons, too.

jnewbery commented at 3:14 PM on December 19, 2020:

We hope to announce these blocks via headers. If we can't, we'll move the hashes onto the m_blocks_for_inv_relay and then announce via inv.

I've updated the comment. Let me know if you think it's clear enough now.

MarcoFalke commented at 8:23 AM on December 20, 2020:

Thanks. Looks good now.

in src/net_processing.h:75 in b5b27ea052 outdated

  71 | @@ -72,6 +72,11 @@ struct Peer {
  72 |  
  73 |      /** This peer's reported block height when we connected */
  74 |      std::atomic<int> m_starting_height{-1};
  75 | +    /** The final block hash in an INV message. When the peer requests this

MarcoFalke commented at 1:14 PM on December 19, 2020:

b5b27ea052905ff7f26a962baa961fcc762c5ee0

I don't think this is true either. m_hash_continue is a purely internal helper, never sent over the wire directly. Instead, the tip hash is sent to trigger the peer to ask further.

jnewbery commented at 3:15 PM on December 19, 2020:

I've updated the comment to be "The final block hash that we sent in an inv message to this peer." Let me know if you have suggestions for better wording.

MarcoFalke commented at 8:22 AM on December 20, 2020:

Thanks. Looks good now.

MarcoFalke commented at 1:16 PM on December 19, 2020: member

review ACK b5b27ea052905ff7f26a962baa961fcc762c5ee0 I reviewed the code and it probably wont crash, but I had a few questions 🚟

<details><summary>Show signature and timestamp</summary>

Signature:

Timestamp of file with hash aea33e795fcbc5727e94f7d8ccebdce44be9f051f6b87be26df0c7f774cd7c32 -

</details>

jnewbery commented at 2:32 PM on December 19, 2020: member

Thanks for the review @MarcoFalke . There are enough review comments here that it makes sense to address them in this PR instead of a follow-up. It also looks to me like this isn't entirely a clean merge, so I'm going to rebase on master.

jnewbery force-pushed on Dec 19, 2020

[net processing] Improve documentation for Peer destruction/locking

Suggested here:

- https://github.com/bitcoin/bitcoin/pull/19607#discussion_r467071878
- https://github.com/bitcoin/bitcoin/pull/19829#discussion_r546116786

717a374e74

jnewbery commented at 3:16 PM on December 19, 2020: member

Addressed all comments.

Is m_hash_continue protected by any locks? A "this is only accessed by the singular net_processing thread" rationale would be sufficient, but if there happens to be a lock that's held in all access locations, perhaps it's easy to add? @sipa I've added a commit to guard m_hash_continue with m_block_inv_mutex.

jnewbery force-pushed on Dec 19, 2020

in src/net_processing.cpp:1781 in 9aad3e4f33 outdated

1774 | @@ -1764,7 +1775,9 @@ void PeerManager::SendBlockTransactions(CNode& pfrom, const CBlock& block, const
1775 |      m_connman.PushMessage(&pfrom, msgMaker.Make(nSendFlags, NetMsgType::BLOCKTXN, resp));
1776 |  }
1777 |  
1778 | -void PeerManager::ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHeader>& headers, bool via_compact_block)
1779 | +void PeerManager::ProcessHeadersMessage(CNode& pfrom, const Peer& peer,
1780 | +                                        const std::vector<CBlockHeader>& headers,
1781 | +                                        bool via_compact_block)

jonatack commented at 3:42 PM on December 19, 2020:

2f023a8 nit, the diff would be clearer without the line break changes, and I also find the code less readable here than on one line

jnewbery commented at 9:59 AM on December 20, 2020:

This is permitted by our clang formatter. I find it easier to read lines that are less than 100 columns, so I think this is a matter of taste.

in src/net_processing.cpp:1872 in 9aad3e4f33 outdated

1866 | @@ -1854,7 +1867,8 @@ void PeerManager::ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHe
1867 |              // Headers message had its maximum size; the peer may have more headers.
1868 |              // TODO: optimize: if pindexLast is an ancestor of ::ChainActive().Tip or pindexBestHeader, continue
1869 |              // from there instead.
1870 | -            LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n", pindexLast->nHeight, pfrom.GetId(), pfrom.nStartingHeight);
1871 | +            LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n",
1872 | +                                 pindexLast->nHeight, pfrom.GetId(), peer.m_starting_height);

jonatack commented at 3:42 PM on December 19, 2020:

2f023a8 nit, as above, easier diff to review and more readable without the added line break

jnewbery commented at 10:00 AM on December 20, 2020:

Again, this is personal taste, so I'm going to resolve this.

in src/net_processing.h:205 in 9aad3e4f33 outdated

 201 | @@ -180,7 +202,9 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
 202 |  
 203 |      void ProcessOrphanTx(std::set<uint256>& orphan_work_set) EXCLUSIVE_LOCKS_REQUIRED(cs_main, g_cs_orphans);
 204 |      /** Process a single headers message from a peer. */
 205 | -    void ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHeader>& headers, bool via_compact_block);
 206 | +    void ProcessHeadersMessage(CNode& pfrom, const Peer& peer,

jonatack commented at 3:46 PM on December 19, 2020:

2f023a8 nit, as above, easier diff to review and more readable here without the line break changes

jnewbery commented at 10:00 AM on December 20, 2020:

as above

in src/rpc/net.cpp:234 in 9aad3e4f33 outdated

 230 |                  // banscore is deprecated in v0.21 for removal in v0.22
 231 |                  obj.pushKV("banscore", statestats.m_misbehavior_score);
 232 |              }
 233 |              obj.pushKV("synced_headers", statestats.nSyncHeight);
 234 |              obj.pushKV("synced_blocks", statestats.nCommonHeight);
 235 | +            obj.pushKV("startingheight", statestats.m_starting_height);

jonatack commented at 6:05 PM on December 19, 2020:

In 2f023a8c0e445985a20, startingheight should remain before synced_headers and synced_blocks (here and in the help) for consistency with the GUI peers info (or maybe less preferably, the GUI should be updated to follow this change)

jnewbery commented at 10:01 AM on December 20, 2020:

done

in src/net_processing.h:80 in 9aad3e4f33 outdated

  75 | +    std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);
  76 | +    /** The final block hash that we sent in an `inv` message to this peer.
  77 | +     * When the peer requests this block, we send an `inv` message to trigger
  78 | +     * the peer to request the next sequence of block hashes.
  79 | +     * Most peers use headers-first syncing, which doesn't use this mechanism */
  80 | +    uint256 m_hash_continue GUARDED_BY(m_block_inv_mutex) {};

jonatack commented at 6:17 PM on December 19, 2020:

25ccfd3b while renaming hashContinue, I wonder if it can be better named (but don't have one to propose)

jnewbery commented at 10:05 AM on December 20, 2020:

Changed to m_continuation_block

jonatack commented at 6:18 PM on December 19, 2020: contributor

ACK 9aad3e4f3347990641da23c0d9a3f05e1669cabf modulo the RPC fields/help order (see comment below)

practicalswift commented at 7:48 PM on December 19, 2020: contributor

Concept ACK

DrahtBot cross-referenced this on Dec 20, 2020 from issue Cleanup of -debug=net log messages by ajtowns

in src/net_processing.h:49 in 717a374e74 outdated

  45 | @@ -46,6 +46,8 @@ struct CNodeStateStats {
  46 |   * Memory is owned by shared pointers and this object is destructed when
  47 |   * the refcount drops to zero.
  48 |   *
  49 | + * Mutexes inside this struct must not be held when locking m_peer_mutex.

MarcoFalke commented at 8:06 AM on December 20, 2020:

717a374e74b64b7b90bc1b2995e8900212bd0bfe

nit to avoid the inversion and give a rationale:

"m_peer_mutex should be locked before locking a mutex inside this struct to avoid lock order violations"

This will also weaken the requirement, because for some mutexes it could make sense to not require m_peer_mutex beforehand.

jnewbery commented at 9:53 AM on December 20, 2020:

That isn't precise enough. It's fine to lock m_peer_mutex, take a shared_ptr to one of the Peer objects, release m_peer_mutex and later lock one of the inner mutexes. The only rule is that we don't want to lock m_peer_mutex while already holding a lock on one of the inner mutexes.

This is the same as cs_vNodes and the inner mutexes in CNode.

MarcoFalke commented at 12:05 PM on December 20, 2020:

Ah right. I didn't parse this correctly. Thanks for the explanation.

in src/net_processing.h:216 in 717a374e74 outdated

 211 | @@ -210,7 +212,8 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
 212 |        * on extra block-relay-only peers. */
 213 |      bool m_initial_sync_finished{false};
 214 |  
 215 | -    /** Protects m_peer_map */
 216 | +    /** Protects m_peer_map. This mutex must not be locked while holding a lock
 217 | +     *  on any of the mutexes inside a Peer object. */

MarcoFalke commented at 8:06 AM on December 20, 2020:

same

jnewbery commented at 9:53 AM on December 20, 2020:

as above

in src/net_processing.cpp:1618 in 9aad3e4f33 outdated

1622 | -            // wait for other stuff first.
1623 | -            std::vector<CInv> vInv;
1624 | -            vInv.push_back(CInv(MSG_BLOCK, ::ChainActive().Tip()->GetBlockHash()));
1625 | -            connman.PushMessage(&pfrom, msgMaker.Make(NetMsgType::INV, vInv));
1626 | -            peer.m_hash_continue.SetNull();
1627 | +        LOCK(peer.m_block_inv_mutex); {

MarcoFalke commented at 8:20 AM on December 20, 2020:

What syntax is this? I don't think we use LOCK(); { ... } anywhere in the codebase

MarcoFalke commented at 8:21 AM on December 20, 2020:

Also, doesn't this lock violate the documentation you added in the first commit?

jnewbery commented at 9:58 AM on December 20, 2020:

This is my brain not working properly. I've fixed the syntax.

This doesn't violate the documentation. That says that it's not permitted to lock m_peer_map while holding a lock on an inner mutex. This is locking an inner mutex.

MarcoFalke commented at 8:21 AM on December 20, 2020: member

review ACK 9aad3e4f33 🔱

Changes since previous review:

Add lock-order docs
Pass in Peer& to ProcessHeadersMessage
Shuffle order in RPC doc
Rebase
Doc fixups
New commit with a lock

<details><summary>Show signature and timestamp</summary>

Signature:

Timestamp of file with hash 431bceaaa7cf87a17743f11d77f4a61c5cdf9d1374a5f4fcd4e5683b564af330 -

</details>

[net processing] Move nStartingHeight to Peer 77a2c2f8f9

[net processing] Rename nStartingHeight to m_starting_height

Not done as a scripted diff to avoid misnaming the local variable in
ProcessMessage().

78040f9168

[net processing] Move block inventory data to Peer 53b7ac1b7d

scripted-diff: rename vBlockHashesToAnnounce and vInventoryBlockToSend

-BEGIN VERIFY SCRIPT-
sed -i 's/vBlockHashesToAnnounce/m_blocks_for_headers_relay/g' src/net_processing.*
sed -i 's/vInventoryBlockToSend/m_blocks_for_inv_relay/g' src/net_processing.*
-END VERIFY SCRIPT-

c853ef002e

[net processing] Move hashContinue to net processing

Also rename to m_continuation_block to better communicate meaning.

184557e8e0

[net processing] Guard m_continuation_block with m_block_inv_mutex 3002b4af2b

jnewbery force-pushed on Dec 20, 2020

jnewbery commented at 10:11 AM on December 20, 2020: member

Thanks for the latest reviews @jonatack and @MarcoFalke. I've addressed/responded to all of your comments.

I'd like to stabilize this PR now unless any functional issues are found. We have ACKs on the code change at various times from @MarcoFalke, @jonatack, @hebasto, @sipa and @Sjors. If there are any further style suggestions, I can address them in the next PR in this series.

MarcoFalke approved

MarcoFalke commented at 12:10 PM on December 20, 2020: member

re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662 🌓

only changes:

Move RPC help once more
Fix LOCK scope {} nit
Rename a member

<details><summary>Show signature and timestamp</summary>

Signature:

Timestamp of file with hash 03268669e27bdf510ed13b406639896a3bedb0573f1b2d13c4c88ed636966237 -

</details>

jonatack commented at 12:30 PM on December 20, 2020: contributor

re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662 per git diff 9aad3e4 3002b4a

DrahtBot cross-referenced this on Dec 20, 2020 from issue p2p: standardize outbound full/block relay connection type naming by jonatack

Sjors commented at 11:37 AM on December 22, 2020: member

Code review re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662

MarcoFalke merged this on Dec 22, 2020

MarcoFalke closed this on Dec 22, 2020

jnewbery deleted the branch on Dec 22, 2020

sidhujag referenced this in commit ca4c150d06 on Dec 22, 2020

jnewbery cross-referenced this on Dec 23, 2020 from issue net-processing refactoring -- lose globals, move implementation details from .h to .cpp by ajtowns

mjdietzx referenced this in commit 65e3729684 on Dec 26, 2020

ryanofsky cross-referenced this on Jan 29, 2021 from issue Multiprocess bitcoin by ryanofsky

Fabcien referenced this in commit 7a72a411d7 on Jan 24, 2022

deadalnix referenced this in commit eab69427ac on Jan 24, 2022

Fabcien referenced this in commit a694ef7cc3 on Jan 24, 2022

Fabcien referenced this in commit 5586623f41 on Jan 24, 2022

Fabcien referenced this in commit 81ea1b17c2 on Jan 24, 2022

Fabcien referenced this in commit 3368e0a457 on Jan 24, 2022

Fabcien referenced this in commit 9d5d1a6e74 on Jan 24, 2022

bitcoin locked this on Aug 16, 2022

net processing: Move block inventory state to net_processing #19829

Conflicts