p2p: Try to preserve outbound block-relay-only connections during restart

hebasto commented at 10:46 pm on November 9, 2019: member

This is an implementation of #17326:

all (currently 2) outbound block-relay-only connections (#15759) are dumped to anchors.dat file
on restart a node tries to connect to the addresses from anchors.dat

This PR prevents a type of eclipse attack when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers.

fanquake added the label P2P on Nov 9, 2019

DrahtBot commented at 11:42 pm on November 9, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#19869 (Better intervals between feelers by naumenkogs)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

laanwj commented at 11:03 am on November 10, 2019: member

Don’t think this should get in the way of implementing something like this, but have to note here, as it’s a significant change of behavior from before: reconnecting to peers “aggressively”, was, up to now, the reserve of spy nodes and mass-connectors and such. For normal nodes the chance of making the same outgoing connection twice was pretty low. Historically this has been one of the ways used to detect these.

gmaxwell commented at 6:11 pm on November 13, 2019: contributor

It would be really good to limit it so that an honest node won’t attempt to connect twice to the same peer through this mechanism for some prudent amount of time… like two minutes.

I don’t think it would cause any harm to the intent of this mechanism to not bring them up as fast as possible, and it would avoid tripping up detection of aggressive mass connectors.

sdaftuar commented at 8:49 pm on November 14, 2019: member

Not sure if I’m misreading this code (I haven’t tested it), but it looks like on restart we would only attempt to connect once to each anchor, and then it gets removed from consideration. Does that satisfy the concern about rapid reconnects? I guess if a user rapidly restarted their node, we could see rapid connections to the same peers, but that seems pretty weird.

Anyway I just skimmed the discussion in #17326. I think Greg or Matt commented that we shouldn’t do this with all our block-relay-only connections, but since we only have 2 at the moment, perhaps it’s fine for now. And then in the future if we expand to say 8 block-relay peers we might still cap the anchors at 2. Does that reflect the thinking of others as well?

in src/net.cpp:1867 in 99d6f6fc6c outdated

1862@@ -1863,11 +1863,28 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1863             // well for sanity.)
1864             bool block_relay_only = nOutboundBlockRelay < m_max_outbound_block_relay && !fFeeler && nOutboundFullRelay >= m_max_outbound_full_relay;
1865 
1866+            if (block_relay_only && !m_anchors.empty()) {
1867+                addrConnect = m_anchors.back();

sdaftuar commented at 8:52 pm on November 14, 2019:

I think by putting this logic down here, you’re bypassing some important filtering that happens in the loop above, where we skip over some entries that addrMan gives us (for instance, the `IsReachable()‘ check). I think the best thing would be to move this logic up to that loop (similar to the tried table collision selector) so that all those other criteria are still enforced.

hebasto commented at 9:53 am on November 21, 2019:

Done.

hebasto force-pushed on Nov 21, 2019

hebasto commented at 9:56 am on November 21, 2019: member

@sdaftuar

Thank you for your review. Your comment has been addressed.

Also, an additional safety check added to “p2p: Integrate ReadAnchors() into CConnman()” commit in the latest push.

in src/net.cpp:2190 in ea5a307eaf outdated

2186     Init(connOptions);
2187+
2188+    m_anchors = ReadAnchors(m_anchors_db_path);
2189+    if (m_anchors.size() > m_max_outbound_block_relay) {
2190+        // Keep our policy safe.
2191+        m_anchors.clear();

ariard commented at 7:14 pm on January 15, 2020:

Why clear the whole instead of picking randomly m_max_outbound_block_relay among the anchors set? If one of our most recent anchors isn’t reliable we should be allowed to pick up among older anchors to still of the counter-measure.

hebasto commented at 3:53 pm on January 26, 2020:

The worst case is assumed. If condition m_anchors.size() > m_max_outbound_block_relay is satisfied, we could suppose that the anchors.dat file is wrong or corrupted.

ariard commented at 1:12 am on January 30, 2020:

No you’re melting 2 different problems, tracking the anchors set and connecting to peers among this one. I don’t understand why reaching the m_max_outbound_block_relay means we have to delete our whole repository of maybe-good anchors peers.

In the scenario you’re aiming, if victim restarts and the 2 anchors peers are honest but unreliable, you’re going to drop them and pick maybe new outbound peers from attacker-provided ones. You can avoid this if you have a bigger set of potential anchor peers.

amitiuttarwar commented at 10:19 pm on February 27, 2020:

the logic implemented in this PR is minimal, and I think that makes sense for keeping it simple to reason about.

with this implementation, its possible for an attacker to poison the anchors and execute an eclipse attack, its just increases the cost of doing so.

until we develop more robust logic around ensuring our anchors are not malicious, I don’t think it makes sense to give priority to all historic anchors. I think it could sometimes be beneficial but other times degrade security, eg. if you historically were connected to a malicious peer but now your addrman is clear.

in src/net.cpp:2187 in ea5a307eaf outdated

2183     SetTryNewOutboundPeer(false);
2184 
2185     Options connOptions;
2186     Init(connOptions);
2187+
2188+    m_anchors = ReadAnchors(m_anchors_db_path);

ariard commented at 7:17 pm on January 15, 2020:

nit: isn’t cleaner to avoid read() in CConnman constructor and instead do it in CConnman::Start like we do for peers.dat?

hebasto commented at 3:46 pm on January 26, 2020:

~~I’d prefer to initialize m_anchors member in constructor.~~

hebasto commented at 7:24 pm on January 26, 2020:

Done in the latest push.

in src/net.cpp:1898 in d405145e8f outdated

1806@@ -1805,11 +1807,24 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1807             if (nTries > 100)
1808                 break;
1809 
1810-            CAddrInfo addr = addrman.SelectTriedCollision();
1811+            CAddress addr;
1812+            if (!m_anchors.empty() && (nOutboundBlockRelay < m_max_outbound_block_relay) && !fFeeler) {

ariard commented at 7:48 pm on January 15, 2020:

May we check here also nOutboundFullRelay, and so keep same connection order, i.e full-relay then block-only or is there any advantage to invert order ?

hebasto commented at 6:31 pm on January 26, 2020:

The order is not inverted. The first target “anchors” is added, and was not present before, i.e., the new order is “anchors” then “full-relay” then “block-relay-only”. If anchors would not the first, they could be connected as other type nodes, and anchoring could not happen.

ariard commented at 1:22 am on January 30, 2020:

But anchors nodes are block-only, so now you have block-only-as-anchors, full-relay, block-only-classics. You can still test addresses returned from addrman against m_anchors andon continue in case of equality to avoid the problem you’re describing, but don’t think that changes something, do we favor first-connected nodes in anyway ?

in src/net_processing.cpp:2092 in a41cf3dedb outdated

2087@@ -2088,6 +2088,9 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2088                       pfrom->nVersion.load(), pfrom->nStartingHeight,
2089                       pfrom->GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom->addr.ToString()) : ""),
2090                       pfrom->m_tx_relay == nullptr ? "block-relay" : "full-relay");
2091+            if (!pfrom->m_tx_relay) {
2092+                DumpAnchors(connman->m_anchors_db_path, connman->GetBlockRelayNodeAddresses());

ariard commented at 8:05 pm on January 15, 2020:

What do you think about waiting some time like 24hours or one block announcement before to qualify node as anchor? That would require at least some good behavior from it, that said and may favor too much well-reliable nodes and decay network-wide connectivity..

nit: you may use scheduler for DumpAnchors like we do for DumpAddresses, that would avoid file syscalls while holding cs_main

jamesob commented at 8:36 pm on January 22, 2020:

I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.

hebasto commented at 6:52 pm on January 26, 2020:

nit: you may use scheduler for DumpAnchors like we do for DumpAddresses, that would avoid file syscalls while holding cs_main

I like the idea of moving this onto the scheduler thread, but not a huge deal given this only happens once per blocks-only connection.

Agree that using CScheduler.scheduleFromNow() for DumpAnchors is an improvement. Let’s leave it for follow ups.

ariard commented at 8:12 pm on January 15, 2020: member

Concept ACK on the idea but see my concerns in #17326 (comment).

IMO before to go forward we need to agree on a) the scope of anchors: full-relay, block-relay, both, b) anchors selection & lifecyle, i.e after how much time do we select them or based on any metrics, do we prune them from anchors.dat, do we keep a bigger file and select randomly from it

laanwj added this to the milestone 0.20.0 on Jan 20, 2020

laanwj commented at 8:09 pm on January 20, 2020: member

Added 0.20 milestone as this mitigates a security issue so it would be good to have.

in src/addrdb.cpp:173 in 161dfb8aa4 outdated

164@@ -165,3 +165,10 @@ std::vector<CAddress> ReadAnchors(const fs::path& anchors_db_path)
165 
166     return ret;
167 }
168+
169+void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress>& anchors)
170+{
171+    int64_t start = GetTimeMillis();
172+    SerializeFileDB("anchors", anchors_db_path, anchors);
173+    LogPrintf("Flushed %d anchor outbound peer addresses to anchors.dat in %d ms\n", anchors.size(), GetTimeMillis() - start);

jamesob commented at 8:21 pm on January 22, 2020:

Can use logging/timer.h if you care to. (example)

hebasto commented at 7:25 pm on January 26, 2020:

Thanks. Done in the latest push.

jamesob commented at 9:00 pm on January 22, 2020: member

d405145e8f935b454ba77e23cf7327515b85151e (jamesob/ackr/17428.1.hebasto.p2p_try_to_preserve_outb)

Code looks fine, but seeing some strange behavior locally.

Built, started bitcoind, and noted the current blocks-only peers:

0$ ./src/bitcoin-cli -datadir=/data2/bitcoin getpeerinfo | jq '.[] | select(.relaytxes==false) | .addr'
1
2"185.217.241.142:8333"
3"2.203.128.83:8333"

Verified the creation of anchors.dat:

0$ du -hs /data2/bitcoin/anchors.dat
1
24.0K    /data2/bitcoin/anchors.dat

Shutdown bitcoind, restarted, checked blocks-only peers:

0$ ./src/bitcoin-cli -datadir=/data2/bitcoin getpeerinfo | jq '.[] | select(.relaytxes==false) | .addr'
1
2"2.203.128.83:8333"
3"95.216.198.89:8333"

One of the peers has persisted from the first run, but the second is different. Unexpectedly, my logs don’t indicate the anchor connections were ever reused - the only occurrence of anchor in debug.log is when flushing to anchors.dat.

0Tested on Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
1
2Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror --enable-wallet --enable-debug --with-daemon CXX=/usr/bin/clang++-4.0 CC=/usr/bin/clang-4.0 PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion
3
4Compiled with /usr/bin/ccache /usr/bin/clang++-4.0 -std=c++11 -mavx -mavx2 -std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror -O0 -g3 -ftrapv  -Wstack-protector -fstack-protector-all -msse4 -msha -msse4.1 -msse4.2  i
5
6Compiler version: clang version 4.0.1-10 (tags/RELEASE_401/final)

hebasto commented at 1:09 pm on January 26, 2020: member

@ariard

The only idea of this PR is to make a node to try restore outbound block-relay-only connections as they were just before node restart. It is expected that this change in node behavior mitigates restart-based eclipse attacks without significant other tradeoffs.

IMO before to go forward we need to agree on a) the scope of anchors: full-relay, block-relay, both…

Anchoring as a new procedure, or anchors as “seemingly safe” nodes are out of the scope of this PR.

b) anchors selection & lifecyle, i.e after how much time do we select them or based on any metrics, do we prune them from anchors.dat, do we keep a bigger file and select randomly from it

With an idea to make a node to try restore outbound block-relay-only connections as they were just before node restart, anchors are the last known outbound block-relay-only connections only.

#17428 (review)

What do you think about waiting some time like 24hours or one block announcement before to qualify node as anchor? That would require at least some good behavior from it, that said and may favor too much well-reliable nodes and decay network-wide connectivity..

In this PR anchors are considered as the last known outbound block-relay-only connections, not as “presumable safe” ones.

hebasto force-pushed on Jan 26, 2020

hebasto commented at 7:37 pm on January 26, 2020: member

Updated d405145e8f935b454ba77e23cf7327515b85151e -> f2b4cfdc75952e61ba98eeffe26b2f86a78a45c9 (pr17428.02 -> pr17428.03, compare). @ariard @jamesob

Thank you for your reviews. Your comments have been addressed. @jamesob

… seeing some strange behavior locally.

Added additional logging when connecting to an anchor fails.

hebasto force-pushed on Jan 26, 2020

hebasto commented at 8:38 pm on January 26, 2020: member

Pushed a fixup. Ready for testing and (re)reviewing now.

jamesob approved

jamesob commented at 4:14 pm on January 28, 2020: member

ACK 37671dd540b919f2b1bd4defeed701b6021c933e (jamesob/ackr/17428.2.hebasto.p2p_try_to_preserve_outb)

Built locally, verified that anchors were connected to on restart using the same test as above.

0Tested on Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
1
2Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror --enable-wallet --enable-debug --with-daemon CXX=/usr/bin/clang++-4.0 CC=/usr/bin/clang-4.0 PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion
3
4Compiled with /usr/bin/ccache /usr/bin/clang++-4.0 -std=c++11 -mavx -mavx2 -std=c++11 -Wthread-safety-analysis -Werror -Wthread-safety-analysis -Werror -O0 -g3 -ftrapv  -Wstack-protector -fstack-protector-all -msse4 -msha -msse4.1 -msse4.2  i
5
6Compiler version: clang version 4.0.1-10 (tags/RELEASE_401/final)

luke-jr commented at 3:53 am on January 29, 2020: member

Couldn’t there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

DrahtBot added the label Needs rebase on Jan 29, 2020

hebasto force-pushed on Jan 29, 2020

hebasto commented at 8:13 pm on January 29, 2020: member

Rebased after #16702 has been merged.

hebasto commented at 8:14 pm on January 29, 2020: member

@luke-jr

Couldn’t there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

~~Could manual deletion of the anchors.dat file be a remedy?~~ See: #17428 (comment)

DrahtBot removed the label Needs rebase on Jan 29, 2020

ariard commented at 1:50 am on January 30, 2020: member

@hebasto

Concept ACK, reading your comment I do understand that this PR is really-scoped, so don’t mind #17326 (comment) and #17428 (review) but would happily review future work to make anchors better (sorry for the noise)

A note explaining the exact attack scenario mitigated near to m_anchors or somewhere else would be cool to avoid future reviewers relying on wrong-scoped security assumptions.

Goign to test PR soon.

gmaxwell commented at 8:15 am on January 30, 2020: contributor

@luke-jr oy. uh. forget them if you uncleanly exit?

DrahtBot added the label Needs rebase on Feb 5, 2020

hebasto force-pushed on Feb 5, 2020

hebasto commented at 9:36 pm on February 5, 2020: member

Rebased after #18023 has been merged.

DrahtBot removed the label Needs rebase on Feb 5, 2020

hebasto commented at 12:49 pm on February 6, 2020: member

@luke-jr

Couldn’t there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

If an exploitable DoS vulnerability would exist, this risk would come true, unfortunately. As a DoS attack timing is unpredictable, it seems infeasible to prevent such a risk in the scope of this PR. This might be prevented by means of detecting a particular DoS attack, no?

luke-jr commented at 1:33 pm on February 6, 2020: member

If an exploitable DoS vulnerability would exist, this risk would come true, unfortunately.

This is probably among the more common vulnerability types.

(Let’s go with @gmaxwell’s suggestion to address it.)

hebasto commented at 1:34 pm on February 8, 2020: member

After a consideration of @luke-jr’s concerns and @gmaxwell’s suggestion, I’m going to close this PR.

The initial idea to mitigate eclipse attacks based on node restarts, included ones initiated by adversary, introduces a new risk. @gmaxwell’s suggestion nullifies the initial idea by making anchoring usable only for clean (user initiated) restarts. The long-term impact of such changes on the entire p2p network is unclear, and requires further study.

hebasto closed this on Feb 8, 2020

gmaxwell commented at 2:42 pm on February 8, 2020: contributor

I think closing this is a mistake. Not saving state across crashes in no way nullifies the advantage of preserving connections, and it’s also a natural implementation– e.g. the software will fail to write out the latest peers.dat across a crash.

hebasto reopened this on Feb 8, 2020

hebasto commented at 2:49 pm on February 8, 2020: member

@gmaxwell

I think closing this is a mistake. Not saving state across crashes in no way nullifies the advantage of preserving connections, and it’s also a natural implementation– e.g. the software will fail to write out the latest peers.dat across a crash.

Reopened. Going to implement your suggestion soon.

Definitely, the OP requires an update.

naumenkogs commented at 8:35 pm on February 24, 2020: member

Concept ACK. Looking forward to an update to this PR :)

instagibbs commented at 3:58 pm on February 25, 2020: member

to be clear, the scope of this PR is to attempt to persist connections across clean restarts triggered for whatever reason?

amitiuttarwar commented at 4:55 pm on February 25, 2020: contributor

@gmaxwell

Not saving state across crashes in no way nullifies the advantage of preserving connections

my understanding is this PR was intended to mitigate a restart-based eclipse attack. so, if an attacker crashes a victim node, the anchors help reduce the chance of victim being eclipsed. if we change to only persisting connections on clean restarts, we lose the benefit of that scenario. ( I understand the risk of implementing that solution )

I’m still thinking it through & reading relevant material, but wondering.. what are the advantages you see of having anchors for clean restarts? (vs not having anchors). Is it to mitigate a smaller subset of eclipse attacks that occur with users intentionally restarting their nodes?

mzumsande commented at 9:16 pm on February 25, 2020: member

I was thinking along the same lines as Amiti and would like to understand if the DoS attack possibility mentioned above is serious enough to give up the anchors in a unclean restart, because it seems to be especially desirable to have them in this scenario:

I think that the suggestion to delete anchor.dat on crashes would just change the big picture in case of a subset of DoS attacks working only on the victim’s outbound connections - for more general DoS attacks, the attacker could just as well connect to us after we have restarted (if we allow incoming connections) and attack again.
Such a repeated DoS attack would be tough to apply in a targeted way, because the connection is outgoing from the victim, reducing the incentives.
If an eclipse attacker somehow manages to control our addrman for an extended time (long enough to wait for our clean restart) it seems to me that we are pretty much screwed anyway: Even if anchors protect us from our restart, the attacker could also choose to wait until one by one, our anchors disconnect from us via clean restarts and are replaced by attacker nodes.
On the other hand, an attack on our addrman that would only be effective for a short while and recoverable long-time, combined with a DoS attack that forces us to restart now uncleanly, would be a scenario in which having anchors would seem really great.

ajtowns commented at 1:03 am on February 26, 2020: member

Couldn’t there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

If an exploitable DoS vulnerability would exist, this risk would come true, unfortunately. As a DoS attack timing is unpredictable, it seems infeasible to prevent such a risk in the scope of this PR. This might be prevented by means of detecting a particular DoS attack, no?

If there was a known remote crash vulnerability and the behaviour was “okay, in case of a crash, we don’t try anchors” wouldn’t an attacker trying to eclipse a node use that exploit to ensure they can completely eclipse the node?

If your node is remote crashable, you want to fix it pretty quickly anyway because that often means you’re vulnerable to remote code execution as well; so not sure the fix here is actually better?

hebasto commented at 2:20 pm on February 26, 2020: member

An adversary has two distinct ways to force a victim node to restart:

online activity that exploits a DoS vulnerability (i.e., a remote-crashable node)
offline activity: e.g., power failures, node operator’s delusion via social engineering etc.

The current PR implementation works well against the latter adversary activity.

Regarding the former case, I agree with @ajtowns:

If your node is remote crashable, you want to fix it pretty quickly anyway…

Also an adversary could just wait for a user-initiated (clean) restart of a victim node, which includes restart after a node upgrade. In such cases the current PR implementation works well too.

hebasto force-pushed on Feb 27, 2020

hebasto commented at 7:33 pm on February 27, 2020: member

Updated 78b2f950350688f11b544d24ec44816554c67631 -> 780107e41fe86e31392b3ef76b917e623483259f (pr17428.06 -> pr17428.07, compare):

a new “anchors.dat” file is created only during clean shutdown now

Also see:

amitiuttarwar commented at 7:47 pm on February 27, 2020: contributor

hm, if my understanding of the code is correct, I think we also need to wipe the anchors.dat file on startup to prevent the malicious-anchor-connection-keeps-crashing-your-node issue.

my understanding is that currently you only (potentially) remove connections from anchors.dat when you next invoke DumpAnchors() and replace them. But I might be missing something?

scenario: one of your outbound blocks-relay-only peers is malicious, but patient. eventually you restart your node for whatever reason. they are persisted to your anchors. when you startup you reconnect to them & they crash your node. and then you’re caught in the same loop?

hebasto commented at 8:16 pm on February 27, 2020: member

@amitiuttarwar

hm, if my understanding of the code is correct, I think we also need to wipe the anchors.dat file on startup to prevent the malicious-anchor-connection-keeps-crashing-your-node issue.

my understanding is that currently you only (potentially) remove connections from anchors.dat when you next invoke DumpAnchors() and replace them. But I might be missing something?

scenario: one of your outbound blocks-relay-only peers is malicious, but patient. eventually you restart your node for whatever reason. they are persisted to your anchors. when you startup you reconnect to them & they crash your node. and then you’re caught in the same loop?

It seems we need to re-write the anchors.dat file every time when another anchor is dropped in m_anchors.pop_back();, no?

in src/net.cpp:2321 in 780107e41f outdated

2314@@ -2278,6 +2315,14 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2315         }
2316     }
2317 
2318+    m_anchors_db_path = GetDataDir() / "anchors.dat";
2319+    m_anchors = ReadAnchors(m_anchors_db_path);
2320+    LogPrintf("Loaded %i addresses from %s\n", m_anchors.size(), m_anchors_db_path.filename());
2321+    if (m_anchors.size() > (size_t)m_max_outbound_block_relay) {

amitiuttarwar commented at 9:56 pm on February 27, 2020:

consider introducing a new constant (eg. m_max_anchors) that currently matches the value of m_max_outbound_block_relay, so if we change the number of block-relay-only connections in the future we have to explicitly decide if we want to increase our anchor connections as well

-> limiting max anchors mentioned here, potential downside of anchoring all blocks-only connections mentioned here

hebasto commented at 11:03 am on May 31, 2020:

Updated.

in src/net.cpp:2438 in 780107e41f outdated

2434@@ -2390,6 +2435,7 @@ void CConnman::Stop()
2435     if (fAddressesInitialized)
2436     {
2437         DumpAddresses();
2438+        DumpAnchors(m_anchors_db_path, GetBlockRelayNodeAddresses());

amitiuttarwar commented at 9:58 pm on February 27, 2020:

and pass the first m_max_anchors elements of GetBlockRelayNodeAddresses here.

in src/net.cpp:2415 in 780107e41f outdated

2314@@ -2278,6 +2315,14 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2315         }
2316     }
2317 
2318+    m_anchors_db_path = GetDataDir() / "anchors.dat";
2319+    m_anchors = ReadAnchors(m_anchors_db_path);

amitiuttarwar commented at 10:00 pm on February 27, 2020:

It seems we need to re-write the anchors.dat file every time when another anchor is dropped in m_anchors.pop_back();, no?

I think it would be simpler to delete the contents of the whole file once you have loaded the anchors into memory. Eg. After you read anchors here, invoke a method that wipes anchors.dat.

This way you can avoid adding per-anchor logic.

hebasto commented at 7:08 am on May 31, 2020:

If a node crash happens between wiping anchors.dat and creating a new one, the anchors data would be lost.

amitiuttarwar commented at 10:39 pm on February 27, 2020: contributor

Concept ACK on using the two blocks-only connections as anchors & trying to reconnect to them on startup.

I’m working to better understand addrman management and the blocks-only connection logic, since that underpins the security provided by this implementation of anchors.

But this seems like a strict improvement because it increases the difficulty of executing an eclipse attack, and avoiding the major downsides of too much anchoring / increasing static-ness of transaction relay topology.

RE: malicious anchors & persisting anchors on crashes (or not)

I am still unconvinced that we shouldn’t persist anchors on crashes. I understand the risk is that a malicious anchor could exploit a DoS vulnerability (which is possibly more likely than executing an eclipse attack) and prevent the victim from being able to sync their node. I feel that if the node is DoS-attackable, it should be fixed immediately regardless. So something that prevents using it as a normal node doesn’t seem like a huge downside. And obviously the upside of persisting anchors across crashes is the extra protection against certain types of eclipse attacks.

But scoping this PR to focus only on persisting anchors on clean restarts seems reasonable for getting an improvement merged since it is less controversial.

naumenkogs commented at 4:43 pm on February 28, 2020: member

I am still unconvinced that we shouldn’t persist anchors on crashes. I understand the risk is that a malicious anchor could exploit a DoS vulnerability (which is possibly more likely than executing an eclipse attack) and prevent the victim from being able to sync their node. I feel that if the node is DoS-attackable, it should be fixed immediately regardless.

I agree with @amitiuttarwar right now. Additionally, it’s very difficult to imagine a DoS vulnerability, which is opened only to outbound blocks-only connections. So, if it is broad DoS issue an attacker will be able to shut down all the reachable nodes, because connecting to them in-bound is free. Much bigger trouble than persistent-anchor attack. Okay, if it applies to only outbound connections, then it’s also very easy: an attacker spawns another 2,000 of reachable sybil nodes, and thus has a very high chance of a random node to connect to an attacker as a regular connection (not block-relay-only), and then restarts it. Also bigger trouble than persistent-anchor attack.

Considering these observations, I think such a specific risk (specifically block-relay-only outbound links are able to crush a node, and none other can) should not prevent us from getting benefits of anchor-persistence after crashing. (If we are really worried, one can try counting restarts ). And those benefits (anti-eclipse) are quite important I think, as this scenario (restart-to-attempt-eclipse) seems much more feasible and dangerous then just restart-node-repetitively.

But scoping this PR to focus only on persisting anchors on clean restarts seems reasonable for getting an improvement merged since it is less controversial.

I would prefer to keep the discussion going. I think we may miss an important improvement here, if we just merge clean-restarts, because the conversation will die.

in src/addrdb.h:98 in 780107e41f outdated

94@@ -95,4 +95,10 @@ class CBanDB
95     bool Read(banmap_t& banSet);
96 };
97 
98+/** Read the anchor IP address database (anchors.dat) */

fjahr commented at 10:20 pm on March 2, 2020:

nit: would have been nice to add a little more context information on the purpose of these anchors

hebasto commented at 11:03 am on May 31, 2020:

Updated.

fjahr commented at 10:41 pm on March 2, 2020: member

ACK 780107e41fe86e31392b3ef76b917e623483259f

Reviewed code and tested locally.

I think this can be merged as a strict improvement and persisting anchors across crashes can be done as a follow-up. I don’t think the conversation would die if the follow-up PR is opened soon. But I also fail to see an issue with persisting across crashes so far and I am also happy to re-review if it gets added as part of this PR as well.

laanwj removed this from the milestone 0.20.0 on Mar 26, 2020

laanwj added this to the milestone 0.21.0 on Mar 26, 2020

hebasto commented at 2:02 pm on May 30, 2020: member

From the discussions (#17326, this PR, https://bitcoincore.reviews/17428.html, https://diyhpl.us/wiki/transcripts/la-bitdevs/2020-04-16-amiti-uttarwar-attacking-bitcoin-core/) it follows that the worst case could happen in the following setup:

a remote crash vulnerability is known to an adversary
that adversary is a block-relay-only anchor peer of the victim node
there is a watch-dog script that restarts the victim node in case of its crash

Please note that the message set accepted by a node from a block-relay-only peer is a strict subset of messages accepted from a full-relay outgoing connection. Therefore, the mentioned in (1) vulnerability could affect any node, regardless of anchoring. Having a such vulnerability makes other attack vectors, including all types of eclipsing, much less dangerous in comparison.

The mentioned in (3) script could be optimized to prevent continuous restarts, e.g. from @naumenkogs’ comment:

If we are really worried, one can try counting restarts

Therefore, I believe the described worst case is out of scope of this PR.

Going to update this pull soon.

hebasto force-pushed on May 31, 2020

hebasto commented at 10:42 am on May 31, 2020: member

Updated 780107e41fe86e31392b3ef76b917e623483259f -> dacb155eaaa1775385bc4a2d82a3e0307f68ce7d (pr17428.07 -> pr17428.09, diff):

addressed @amitiuttarwar’s comment:

consider introducing a new constant (eg. m_max_anchors) that currently matches the value of m_max_outbound_block_relay, so if we change the number of block-relay-only connections in the future we have to explicitly decide if we want to increase our anchor connections as well

addressed @naumenkogs’s comment:

So, if it is broad DoS issue an attacker will be able to shut down all the reachable nodes, because connecting to them in-bound is free. Much bigger trouble than persistent-anchor attack.

addressed @fjahr’s comment:

nit: would have been nice to add a little more context information on the purpose of these anchors

addressed @luke-jr’s comment:

wonders if anchors should be reduced/skipped entirely if the user has manually configured addnode peers

Currently, the anchors.dat contains IP addresses of the block-relay-only peers to which the node is connected, and IP addresses that were loaded from anchors.dat into memory at startup and that have not been tried to re-connect to yet.

hebasto force-pushed on May 31, 2020

hebasto commented at 11:01 am on May 31, 2020: member

Rebased dacb155eaaa1775385bc4a2d82a3e0307f68ce7d -> 293065610dd1822c9f8585526031b5a38996f695 (pr17428.09 -> pr17428.10) due to the conflicts with #18088 and #18910.

DrahtBot added the label Needs rebase on Jun 4, 2020

hebasto force-pushed on Jun 5, 2020

hebasto commented at 6:40 am on June 5, 2020: member

Rebased 293065610dd1822c9f8585526031b5a38996f695 -> 1d9264e296f1a03abbfa6c30363628a85cf33ce4 (pr17428.10 -> pr17428.11) due to the conflict with #19053.

DrahtBot removed the label Needs rebase on Jun 5, 2020

in src/net.cpp:1954 in cfe30185df outdated

1950@@ -1951,6 +1951,19 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1951     }
1952 }
1953 
1954+std::vector<CAddress> CConnman::GetBlockRelayNodeAddresses() const

naumenkogs commented at 10:24 am on July 1, 2020:

This function name doesn’t reflect that it’s inbound-only

naumenkogs commented at 10:25 am on July 1, 2020:

Maybe it actually should be named “GetCurrentAnchors”

hebasto commented at 12:25 pm on July 3, 2020:

This function name doesn’t reflect that it’s inbound-only

You mean “outbound”?

hebasto commented at 12:52 pm on July 3, 2020:

Maybe it actually should be named “GetCurrentAnchors”

Updated.

in src/net_processing.cpp:2416 in bd39249ea3 outdated

2410@@ -2411,6 +2411,10 @@ bool ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRec
2411                       pfrom.nVersion.load(), pfrom.nStartingHeight,
2412                       pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
2413                       pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
2414+            if (!pfrom.m_tx_relay) {
2415+                std::vector<CAddress> anchors_to_save = connman->GetBlockRelayNodeAddresses();
2416+                DumpAnchors(connman->GetAnchorsDbPath(), anchors_to_save);

naumenkogs commented at 10:27 am on July 1, 2020:

Wait, why is this done on VERACK? Even if this was discussed in the PR, a comment would help.

hebasto commented at 12:06 pm on July 3, 2020:

Wait, why is this done on VERACK?

It is a point where we get know that an outbound connection to a block-relay-only peer has been established.

Even if this was discussed in the PR, a comment would help.

What kind of comment do you expect to see here?

in src/net_processing.cpp:2416 in 1d9264e296 outdated

2410@@ -2411,6 +2411,12 @@ bool ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRec
2411                       pfrom.nVersion.load(), pfrom.nStartingHeight,
2412                       pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
2413                       pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
2414+            if (!pfrom.m_tx_relay) {
2415+                std::vector<CAddress> anchors_to_save = connman->GetBlockRelayNodeAddresses();
2416+                const std::vector<CAddress> anchors_not_tried = connman->GetAnchors();

naumenkogs commented at 11:36 am on July 1, 2020:

I’m a bit confused of what’s going on here. I guess this is supposed to address the issue of reconnecting to the same anchor on many restarts, but it would really use some extra comments.

hebasto commented at 12:17 pm on July 3, 2020:

I’ve tried to avoid any confusion by meaningful naming variables, but failed :) The node dumps:

anchors that are connected right now
anchors that were known at startup but have not tried to connect to yet

The latter set quickly gets empty after startup.

hebasto commented at 12:53 pm on July 3, 2020:

Updated.

naumenkogs commented at 11:36 am on July 1, 2020: member

The implementation is very simple and fairly straightforward, which is good.

After all discussions I also see that this version of the PR falls into the broader idea of anti-eclipse measures (many more to come).

It’s true that fresh connections are often a good idea, but in this case we should achieve them by peer rotation and proactive feeler tip sync, not by relying on random restarts. Maybe this can be delayed by then, but I don’t see why. This PR currently doesn’t bring any concerns (in my opinion): spying, restart attack and whatnot.

utACK 1d9264e296f1a03abbfa6c30363628a85cf33ce4 modulo those minor things I commented.

hebasto force-pushed on Jul 3, 2020

hebasto commented at 12:50 pm on July 3, 2020: member

Updated 1d9264e296f1a03abbfa6c30363628a85cf33ce4 -> 0a74c7c3686d4b03a27815c6107b3300954641cc (pr17428.11 -> pr17428.13, diff):

addressed the recent @naumenkogs’ comments
small style improvements

in src/net.cpp:2054 in 0a74c7c368 outdated

1983+        if (!pnode->fInbound && !pnode->m_manual_connection && !pnode->m_tx_relay) {
1984+            ret.push_back(pnode->addr);
1985         }
1986     }
1987+
1988+    return ret;

pinheadmz commented at 4:07 pm on July 3, 2020:

2 questions about this:

Is there a possible race condition where GetCurrentAnchors() is called before the connection to the anchor node is complete? Or before the handshake is complete? Like is it possible that the test succeeds, no message is printed to the log, but then the connection still fails due to the peer not responding, etc.?
(and maybe this answers my first question) is there a better way of determining connection success than locking cs_vNodes and searching for the addr? Like, is there any sense in modifying OpenNetworkConnection() to return a success value?

hebasto commented at 5:23 pm on July 3, 2020:

The CConnman::OpenNetworkConnection() call is synchronous, therefore there is no race with the following CConnman::GetCurrentAnchors() call, and I couldn’t see a reason to modify it.

pinheadmz commented at 6:05 pm on July 3, 2020:

What do you think about (2)? Could you get what you need if OpenNetworkConnection() returned true on success?

hebasto commented at 6:29 pm on July 3, 2020:

CConnman::OpenNetworkConnection() has 5 call sites. What benefits of the bool return type for all callers? Unsuccessful connection is not an error.

And it seems orthogonal to this PR :)

hebasto commented at 6:32 pm on July 3, 2020:

… is there a better way of determining connection success than locking cs_vNodes and searching for the addr?

The vector being searching in is very small :)

pinheadmz commented at 7:01 pm on July 3, 2020:

gotcha, thanks!

in src/net.cpp:2402 in 0a74c7c368 outdated

2397+        m_anchors.clear();
2398+    } else {
2399+        const size_t anchor_num_allowed = static_cast<size_t>(connOptions.m_max_block_relay_anchor) - connOptions.m_added_nodes.size();
2400+        if (anchor_num_allowed < m_anchors.size()) m_anchors.resize(anchor_num_allowed);
2401+    }
2402+    LogPrintf("%i block-relay-only anchors will be tried to connect to.\n", m_anchors.size());

pinheadmz commented at 4:17 pm on July 3, 2020:

nit: a bit awkward, maybe "%i block-relay-only anchors will be tried for connections.\n" ?

hebasto commented at 5:35 pm on July 3, 2020:

@pinheadmz Thanks for improving my English :)

Updated.

pinheadmz commented at 5:17 pm on July 3, 2020: member

Concept ACK, just a few questions above about execution.

This was tricky to test because on regtest all nodes have the same IP and my development environment doesn’t have a synced chain. I don’t think the extra 2 blocks-only nodes are connected during IBD (?)

But I hot wired bitcoind with

static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 0;

…forcing my only connections to be the two anchors. Verified the anchors.dat file, tried corrupting it and tested for that error (check!) and restarted bitcoind to check that the anchor connections were loaded and connected correctly (check!).

debug.log | grep anchor:

 02020-07-03T16:54:59Z DumpAnchors: Flush 1 outbound block-relay-only peer addresses to anchors.dat started
 12020-07-03T16:54:59Z DumpAnchors: Flush 1 outbound block-relay-only peer addresses to anchors.dat completed (0.01s)
 22020-07-03T16:55:16Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat started
 32020-07-03T16:55:16Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat completed (0.06s)
 42020-07-03T16:55:32Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat started
 52020-07-03T16:55:32Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat completed (0.05s)
 62020-07-03T16:57:42Z Loaded 2 addresses from "anchors.dat"
 72020-07-03T16:57:42Z 2 block-relay-only anchors will be tried to connect to.
 82020-07-03T16:57:43Z Trying to make an anchor connection to 120.150.206.137:8333
 92020-07-03T16:57:43Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat started
102020-07-03T16:57:43Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat completed (0.00s)
112020-07-03T16:57:48Z Trying to make an anchor connection to 174.138.35.229:8333
122020-07-03T16:57:48Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat started
132020-07-03T16:57:48Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat completed (0.01s)

hebasto force-pushed on Jul 3, 2020

hebasto commented at 5:34 pm on July 3, 2020: member

Updated 0a74c7c3686d4b03a27815c6107b3300954641cc -> ceb9d212716fe1134f9ac179d46e14e4847dd8b1 (pr17428.13 -> pr17428.14, diff):

addressed @pinheadmz’ comment

nit: a bit awkward, maybe "%i block-relay-only anchors will be tried for connections.\n" ?

pinheadmz approved

pinheadmz commented at 7:02 pm on July 3, 2020: member

ACK ceb9d212716fe1134f9ac179d46e14e4847dd8b1

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA256
 2
 3ACK ceb9d212716fe1134f9ac179d46e14e4847dd8b1
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAl7/gI4ACgkQ5+KYS2KJ
 7yTqwOA//WN7fSijzEJXxIBImvSn9JloIV+La/INpckzxYNfT3TxfqNYZ3JU7XToT
 8yfBuMK6GjXcP3G7Ol46uOESmiJn7yAlJJinZwhmmERjZfRFFGFaXHw83LxY/qS13
 98KqFipTNTG+zBsW8BmGsEfikqLdrwWkejcIf1DaAZhct/dehe2V45L3ih5gOI7yr
10drS+dc+6wC7riA15C97Q851CpFe2F+y1aRc57Ii4M7O/u1O01TSPSI1Ip5YocA7/
11LmkHHnLXnaPTCSQIeWOjIURcIsqtMFfPJu29pBrLxI+M10QZyepqxNkB+b3oqBud
12XqyNXuayyJWyqVcZg/zR5G5pdeY5q4YInPVkF5nOt2KQHGMYKK2xVwU2TaOLqw4e
13fd7s2BKl7ZO3a0NhQKAzXhLNilXXbdbwFN1mk8TG29JhsUdvAw9r4cK/LkxkNCdh
14wn6kOW3XH2u1fmEqUpo1fkO84TzqPmrKiRSGHt94+yDUimYqqULsEUtXbUV8ISlv
1533KbJY+2glLUBx+vEnla/JB8cSlG5NHE++HfIZUTOwIkl0Z/6L39mYdwSiZ6sJA0
165ZaNaZeCFbygrNOtOiLCihmy7iPlhB+0qWscQzFO/Tufal0p5VLcqhrl+z9gj7rx
17tnksSx8rdr16O7GsDDNkpsm6NXjUSHn0+XKf1+4fHumM6BfrEu8=
18=Bw61
19-----END PGP SIGNATURE-----

pinheadmz’s public key is on keybase

DrahtBot added the label Needs rebase on Jul 9, 2020

hebasto force-pushed on Jul 10, 2020

hebasto commented at 7:42 am on July 10, 2020: member

Rebased ceb9d212716fe1134f9ac179d46e14e4847dd8b1 -> 005a46dc3a4b84b045d3877ee757a162acf1f5ca (pr17428.14 -> pr17428.15) due to the conflict with #19314.

DrahtBot removed the label Needs rebase on Jul 10, 2020

pinheadmz commented at 3:03 pm on July 14, 2020: member

ACK 005a46dc3a4b84b045d3877ee757a162acf1f5ca

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA256
 2
 3ACK 005a46dc3a4b84b045d3877ee757a162acf1f5ca
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAl8NyM8ACgkQ5+KYS2KJ
 7yToIBhAAgVTXUzjuIrkugUhWiSYy5daBqBCvWtc1CC9P9h6vY+MfdLdvA/KuBlF4
 8dU8oQ7Dzz0sdTRCFvXSHDrYQSjF8hKd1JY3KrYVWgdMGIwITK7VMDpMWk5SCuWBo
 9VQsaFPZmEn8x4HV2+OkCOMM+f02fKrERJ4l5QKEX76CehIt/eJN6NaSOGo2Y7JcQ
100bxoK8iWFIvKkqQV6fatraxwig40bw6Jxuj7OhlN25xnWgI2TwSGL54d+r/9viHV
112OORkqF3frgqQNP6DODQw6ROcQhhyJdfAoUWefrD2qbbaBQ4nV6sovTs91HQq5lr
12oyIX9UMSVPewMdiVsS9YO6tcL5WKq2CVu63KmGOt+jZ6WZl8Y9Xb5oehJdc9BwiL
13wkJL2LbB40Me3eA8pszIAknFrLiyDTl1zMkuooGMQ61jtHJi9ILT+uLYTdNIdEH6
14iQZkX+HujAW0RUWln3ZMeummxgd/VBF/xnxZ35QMSg8ktIzsqrKNgkshmDfNzWOF
15glh99AFuBMTUmM11ShG5icEd07h4Jh3SQ3eOwzXVXcDfGXo5DYpbs7GXGo5o109E
165jeJyw0gIpWMeJoU/26VhD5VyUmRFLUaVDiWCJlvD4GyzY2FOSf30YvAxospjCTu
17uI8+thlo+0GSrKuRwTRXoRKXpp2tlA3UC1CKkv/Sy8iZdS7PzZY=
18=mG97
19-----END PGP SIGNATURE-----

pinheadmz’s public key is on keybase

Confirmed rebase is minimal

 0
 11:  529827588 = 1:  3370328ac p2p: Integrate ReadAnchors() into CConnman()
 22:  e3b817832 = 2:  64d1887fd p2p: Add CConnman::GetCurrentAnchors()
 33:  4f0a4c93e ! 3:  27864da6a p2p: Add DumpAnchors()
 4    @@ Commit message
 5
 6      ## src/addrdb.cpp ##
 7     @@
 8    - #include <chainparams.h>
 9      #include <clientversion.h>
10    + #include <cstdint>
11      #include <hash.h>
12     +#include <logging/timer.h>
13      #include <random.h>
144:  a4292fdd0 ! 4:  8e6eff7e4 p2p: Integrate DumpAnchors() into ProcessMessage()
15    @@ Commit message
16         p2p: Integrate DumpAnchors() into ProcessMessage()
17
18      ## src/net_processing.cpp ##
19    -@@ src/net_processing.cpp: bool ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRec
20    +@@ src/net_processing.cpp: void ProcessMessage(
21                            pfrom.nVersion.load(), pfrom.nStartingHeight,
22                            pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
23                            pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
245:  e69c4d048 = 5:  9c17f92db p2p: Fix off-by-one error in fetching address loop
256:  ceb9d2127 ! 6:  005a46dc3 p2p: Try to connect to anchors once
26    @@ src/net.cpp
27
28     +#include <algorithm>
29     +#include <cmath>
30    + #include <cstdint>
31      #include <unordered_map>
32     -
33     -#include <math.h>
34    @@ src/net.cpp: void CConnman::ThreadOpenConnections(const std::vector<std::string>
35      }
36
37      ## src/net_processing.cpp ##
38    -@@ src/net_processing.cpp: bool ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRec
39    +@@ src/net_processing.cpp: void ProcessMessage(
40                            pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
41                            pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
42                  if (!pfrom.m_tx_relay) {

naumenkogs commented at 10:11 am on July 15, 2020: member

The implementation looks correct, but I think the high-level idea requires more conceptual discussion. It’s not about me, I expressed my general support above.

DrahtBot added the label Needs rebase on Jul 21, 2020

hebasto force-pushed on Jul 21, 2020

hebasto commented at 7:09 pm on July 21, 2020: member

Rebased 005a46dc3a4b84b045d3877ee757a162acf1f5ca -> 9445a3411ed589d38cd4eee6714015d065f2b180 (pr17428.15 -> pr17428.16) due to the conflicts with #19174 and #19217.

DrahtBot removed the label Needs rebase on Jul 21, 2020

DrahtBot added the label Needs rebase on Aug 3, 2020

in src/init.cpp:1910 in 9445a3411e outdated

1888@@ -1889,6 +1889,7 @@ bool AppInitMain(const util::Ref& context, NodeContext& node)
1889     connOptions.nMaxConnections = nMaxConnections;
1890     connOptions.m_max_outbound_full_relay = std::min(MAX_OUTBOUND_FULL_RELAY_CONNECTIONS, connOptions.nMaxConnections);
1891     connOptions.m_max_outbound_block_relay = std::min(MAX_BLOCK_RELAY_ONLY_CONNECTIONS, connOptions.nMaxConnections-connOptions.m_max_outbound_full_relay);
1892+    connOptions.m_max_block_relay_anchor = std::min(MAX_BLOCK_RELAY_ONLY_ANCHORS, connOptions.m_max_outbound_block_relay);

jnewbery commented at 2:44 pm on August 4, 2020:

Is there a reason this needs to be a CConnMan::Options option? Can this logic exist entirely within CConnMan?

hebasto commented at 8:17 am on August 21, 2020:

Is there a reason this needs to be a CConnMan::Options option?

To be consistent with logic related to other connection limits, i.e., MAX_OUTBOUND_FULL_RELAY_CONNECTIONS and MAX_BLOCK_RELAY_ONLY_CONNECTIONS.

jnewbery commented at 10:28 am on September 3, 2020:

I don’t think we should copy bad patterns just because they already exist in the codebase. Options is used to pass parameters into CConnman’s initialize function, but CConnman already has everything it needs to calculate the max block relay anchor connections (the MAX_BLOCK_RELAY_ONLY_ANCHORS constant is defined in net, and m_max_outbound_block_relay is already passed in).

Adding this here is essentially placing logic for how to construct CConnman inside init.

hebasto commented at 11:06 am on September 7, 2020:

Updated.

hebasto force-pushed on Aug 6, 2020

hebasto commented at 2:13 am on August 6, 2020: member

Rebased 9445a3411ed589d38cd4eee6714015d065f2b180 -> d1ef06a1ac45e103eae1bfa5b9162e5cae64e73b (pr17428.16 -> pr17428.17) due to the conflict with #18991.

DrahtBot removed the label Needs rebase on Aug 6, 2020

DrahtBot added the label Needs rebase on Aug 12, 2020

in src/net_processing.cpp:2485 in d1ef06a1ac outdated

2481@@ -2482,6 +2482,15 @@ void ProcessMessage(
2482                       pfrom.nVersion.load(), pfrom.nStartingHeight,
2483                       pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
2484                       pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
2485+            if (!pfrom.m_tx_relay) {

amitiuttarwar commented at 8:15 pm on August 12, 2020:

I think after #19316 we should be able to check directly for the block-relay connection type?

hebasto commented at 8:15 am on August 21, 2020:

Updated.

adamjonas commented at 2:35 pm on August 13, 2020: member

Summary:

This PR is an anti-eclipse attack measure for the scenario when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers. Other discussion surrounding this issue can be found on #17326. This patch helps a node attempt to restore pre-restart outbound block-relay-only connections by dumping the (currently 2) outbound block-relay-only connections to an anchors.dat file and connecting to those addresses on start-up.

laanwj noted:

…reconnecting to peers ‘aggressively’, was, up to now, the reserve of spy nodes and mass-connectors and such… Historically this has been one of the ways used to detect these.

gmaxwell responded:

It would be really good to limit it so that an honest node won’t attempt to connect twice to the same peer through this mechanism for some prudent amount of time… like two minutes." But that “I don’t think it would cause any harm to the intent of this mechanism to not bring them up as fast as possible, and it would avoid tripping up detection of aggressive mass connectors.

gmaxwell and theBlueMatt commented (on #17326) that we shouldn’t do this with all our block-relay-only connections because it would guarantee capture persistence. sdaftuar noted that since we only have two of those connections at the moment, perhaps it’s fine for now.

luke-jr wondered:

Couldn’t there be a risk here, that if one of the anchor peers is malicious and knows a DoS vulnerability, this ensures they can keep repeatedly crashing your node, even if you setup a watchdog to restart it?

To that, gmaxwell suggested that one might forget the anchors if you uncleanly exit, which luke-jr supported.

At that point (Feb 2020), the author closed the PR because:

…forgetting the peers nullifies the initial idea by making anchoring usable only for clean (user initiated) restarts.

gmaxwell however encouraged the PR to be re-opened:

not saving state across crashes in no way nullifies the advantage of preserving connections, and it’s also a natural implementation– e.g. the software will fail to write out the latest peers.dat across a crash.

Since then others made arguments for reconnecting to anchor connections after unclean restarts:

The risk is that a malicious anchor could exploit a DoS vulnerability (which is possibly more likely than executing an eclipse attack) and prevent the victim from being able to sync their node. However, if the node is DoS-attackable, it should be fixed immediately regardless. Something that prevents using it as a normal node doesn’t seem like a huge downside. However, the upside of persisting anchors across crashes is the extra protection against certain types of eclipse attacks. (adapted from amitiuttarwar)

Additionally, it’s very difficult to imagine a DoS vulnerability opened only to outbound blocks-only connections. If there is a broad DoS issue, an attacker will be able to shut down all the reachable nodes with ease because in-bound connections are free. If a vulnerability only applies to outbound connections, then it’s also easy: an attacker spawns 2,000 reachable sybil nodes, and thus has a high chance of becoming a regular outbound connection (not just a block-relay-only). The specific risk of block-relay-only outbound links being able to crash a node while no other connection can, should not prevent us from getting benefits of anchor-persistence after crashing. (If we are worried, we can try counting restarts). The anti-eclipse benefits of reconnecting after an unclean exit are quite valuable since the restart-to-attempt-eclipse scenario seems much more feasible and dangerous then just restart-node-repetitively scenario. (adapted from naumenkogs)

In its current state, the anchors are persisted and reconnected to on unclean shutdowns. (h/t amitiuttarwar and hebasto for clarifying)

In sum, there appears to be broad support for the concept of outbound block-relay-only anchor connections. However, opinions differ on the approach around whether connections are persisted and connected to after unclean restarts. ajtowns, amitiuttarwar, naumenkogs, mzumsande support keeping the connections while luke-jr and gmaxwell are in favor of forgetting them.

Additionally, there were (now invalided) ACKs from fjahr, jamesob, pinheadmz and an utACK from naumenkogs.

hebasto force-pushed on Aug 21, 2020

hebasto commented at 8:14 am on August 21, 2020: member

Rebased d1ef06a1ac45e103eae1bfa5b9162e5cae64e73b -> d7482556c98b76e73f64d04400a703fdbb9c189f (pr17428.17 -> pr17428.18) due to the conflicts with #19658 and #19316.

DrahtBot removed the label Needs rebase on Aug 21, 2020

ariard commented at 6:38 pm on August 23, 2020: member

On the eclipses scenario mitigated, as a first step, it assumes the attacker to perform a addrman takover thus controlling the set of futures peers that the victim node will be able to connect to. This takeover may be either short-lived/long-lived or even persistent at whim of attacker, depending on the theoritical addr-relay vulnerability leveraged. To effectively activate the eclipse, an attacker needs to benefit from a favorable event, a) a restart or b) a full rotation of your outbounds (stale tip, lagging chain, eviction logic of connected peers, …). In case of a), independently of the restart source (spontaneous or attacker triggered), anchors connections will prevent your attacker to win control of your network view, and so are worthy on this ground.

On the peer rotation cases, especially leverage of the eviction logic, I think it would bypass this new counter-measure. The connection starvation tactic could be used to hijack your inbound slots towards your selected anchors peers and thus forcing you to initiate new connections to malicious peers. But a) it’s likely to be harder after #17428, as those anchors are going to be protected as block-relay peers and b) whereas lower bound of restart-eclipse presume for an attacker only injecting p2p messages and leveraging a natural event, starvation requires from it to actively occupy network slots and game on eviction criterias. Really when we analyze eclipse attacks, we should have a hierarchy of them classified with regards to resources they require from the attacker to successfully perform them. The fact that a stronger attack A can easily bypass counter-measure for attack B doesn’t invalidate worthiness of counter-measure B. At the end of the day, any of our actual eclipse counter-measure is likely to perfom poorly with regards to active infrastructure attacks.

On the DoS issue, I don’t think that granting an anchor privilege to our outbound-block-relay-only give any new DoS privilege to them. The set of messages we talk with them is restrained, thus theorically reducing the attack surface we expose to them. At the limit, we may favor them to upgrade them as compact blocks peers, and they may opportunity to exploit compact code. But that’s only exception I can see ? Not persisting anchors through unclear restart will actually reduce effectiveness of counter-measure, and I think that silent eclipsing might be more worrying than noisy DoS. A crash is likely to be notified more or less quickly by the operator and should halt any higher application logic whereas keeping operating on a attacker-controlled view of the network can be even more harmful funds-loss wise.

On the network staticness increase, I think gmaxwell made the point in the other thread that persitence may favor physically-near peers on the long term as the far from us one as more likely to be dropped from they anchor privilege due to network/link layer aleas. That way slightly degrading network robustness with regards to network-partition geographically based. It should be balanced with the fact this anchor, physically-near peers will also known natural restart or other event, likely reshuflling the cards to anyone. This should be something interesting to monitor by coupling anchor rotations with ASN infos provided by ASMAP. And if there is any meaningful geographical bias introduced.

That said a concern I may still have with proposed implementation is with regards to hindering non-observability of block-relay-only. By persisting connections across restarts, given we already protect this connections type from stale tip/lagging logic and the fact block-relay connections are going to be even more stable after #17428, you might improve a potential oracle to guess their presence. An attacker with connection-starvation/tx-relay capabilities could a) map your full-relay outbound with some conflicts b) evict those outbound from your connected peers by leveraging eviction c) observe you never pick new outbounds in some netgroup because this one is already occupied by your anchors.

I would be more confident if we either a) restrain anchors to one of our outbound-block-relay peers or b) add 2 news anchors peers outbound-block-relay-only. We should try to keep the non-observability of our block-relay-only peers as good we can, at least for a subset of them.

naumenkogs commented at 8:22 am on August 24, 2020: member

An attacker with connection-starvation/tx-relay capabilities could a) map your full-relay outbound with some conflicts b) evict those outbound from your connected peers by leveraging eviction c) observe you never pick new outbounds in some netgroup because this one is already occupied by your anchors.

It feels to be a way too advanced threat model (especially since (c) should be really hard) to deal with it right now. We can address it later once we’re done with more basic issues like those discussed above :)

Btw, another solution to (c)-like issues might be to choose the tx-relay connections (semi-?)independently from block-relay-only connections (in terms of netgroups/asmap).

in src/net_processing.cpp:2513 in d7482556c9 outdated

2505@@ -2506,6 +2506,15 @@ void ProcessMessage(
2506                       pfrom.nVersion.load(), pfrom.nStartingHeight,
2507                       pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
2508                       pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
2509+            if (pfrom.IsBlockOnlyConn()) {
2510+                // The node dumps:
2511+                // - current anchors; and
2512+                // - anchors that were known at startup but have not tried for connections yet
2513+                auto anchors_to_save = connman.GetCurrentAnchors();

jnewbery commented at 10:33 am on September 3, 2020:

This logic shouldn’t be in net_processing. It’s all bound up in connection-layer logic rather than application-layer logic. There’s nothing here that’s specific to the pnode that we’re in the context of in this ProcessMessages() call. I think it makes sense for this to be in CConnman rather than here.

hebasto commented at 11:06 am on September 7, 2020:

Thanks! Updated.

DrahtBot added the label Needs rebase on Sep 3, 2020

hebasto force-pushed on Sep 7, 2020

hebasto commented at 11:06 am on September 7, 2020: member

Updated d7482556c98b76e73f64d04400a703fdbb9c189f -> 4eff692eeb861018f1f44b6284891a1d0599dd88 (pr17428.18 -> pr17428.20):

rebased due to the conflicts with #19704 and #19724
addressed @jnewbery’s comment:

I don’t think we should copy bad patterns just because they already exist in the codebase. Options is used to pass parameters into CConnman’s initialize function, but CConnman already has everything it needs to calculate the max block relay anchor connections (the MAX_BLOCK_RELAY_ONLY_ANCHORS constant is defined in net, and m_max_outbound_block_relay is already passed in).

Adding this here is essentially placing logic for how to construct CConnman inside init.

addressed @jnewbery’s comment:

This logic shouldn’t be in net_processing. It’s all bound up in connection-layer logic rather than application-layer logic. There’s nothing here that’s specific to the pnode that we’re in the context of in this ProcessMessages() call. I think it makes sense for this to be in CConnman rather than here.

DrahtBot removed the label Needs rebase on Sep 7, 2020

in src/net.h:541 in 4eff692eeb outdated

537@@ -533,6 +538,8 @@ class CConnman
538     NetEventsInterface* m_msgproc;
539     /** Pointer to this node's banman. May be nullptr - check existence before dereferencing. */
540     BanMan* m_banman;
541+    fs::path m_anchors_db_path;

jnewbery commented at 4:36 pm on September 8, 2020:

Do we need to store this? Seems straightforward enough to call GetDataDir() / "anchors.dat" in the two places we need it (and add a constant for "anchors.dat")

hebasto commented at 8:50 pm on September 8, 2020:

Thanks! Updated.

in src/net.cpp:2420 in 4eff692eeb outdated

2410@@ -2377,6 +2411,25 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2411         }
2412     }
2413 
2414+    m_anchors_db_path = GetDataDir() / "anchors.dat";
2415+    m_anchors = ReadAnchors(m_anchors_db_path);
2416+    LogPrintf("Loaded %i addresses from %s\n", m_anchors.size(), m_anchors_db_path.filename());
2417+    if (m_anchors.size() > static_cast<size_t>(MAX_BLOCK_RELAY_ONLY_ANCHORS)) {
2418+        // Keep our policy safe.

jnewbery commented at 4:36 pm on September 8, 2020:

Why not just limit the anchors in this case rather than throwing them all away (ie m_anchors.resize(static_cast<size_t>(MAX_BLOCK_RELAY_ONLY_ANCHORS))

hebasto commented at 6:17 pm on September 8, 2020:

I do not trust anchors.dat that appears greater than it should be.

Also: #17428 (review)

jnewbery commented at 10:22 am on September 9, 2020:

I think we have to assume that our data directory is not compromised. Otherwise anything could happen.

hebasto commented at 1:58 pm on September 10, 2020:

Thanks! Updated.

in src/net.cpp:2424 in 4eff692eeb outdated

2419+        LogPrintf("%s is too large for this node policy. Skipped anchoring.\n", m_anchors_db_path.filename());
2420+        m_anchors.clear();
2421+    } else {
2422+        // Peers added via -addnode are permanent connections too, therefore we could skip the same number of anchors.
2423+        const size_t max_anchor_num = std::min(MAX_BLOCK_RELAY_ONLY_ANCHORS, connOptions.m_max_outbound_block_relay);
2424+        if (max_anchor_num >= connOptions.m_added_nodes.size()) {

jnewbery commented at 4:38 pm on September 8, 2020:

I don’t understand this logic. If max_anchor_num is 2 and added nodes is 1, then we’ll add 1 anchor, but if max_anchor_num is 2 and added nodes is 3, we’ll add both anchors. Is that intentional?

hebasto commented at 6:35 pm on September 8, 2020:

This is unintentional change during the recent rebasing. Many thanks!

hebasto commented at 8:51 pm on September 8, 2020:

Updated.

in src/net_processing.cpp:2545 in 4eff692eeb outdated

2541@@ -2542,13 +2542,18 @@ void PeerLogicValidation::ProcessMessage(CNode& pfrom, const std::string& msg_ty
2542         pfrom.SetRecvVersion(std::min(pfrom.nVersion.load(), PROTOCOL_VERSION));
2543 
2544         if (!pfrom.IsInboundConn()) {
2545+            {

jnewbery commented at 4:48 pm on September 8, 2020:

If this block scope should be indented. Even better, just use the WITH_LOCK macro to limit the locking to a single statement:

WITH_LOCK(cs_main, State(pfrom.GetId())->fCurrentlyConnected = true);

hebasto commented at 6:30 pm on September 8, 2020:

Don’t calls within LogPrintf() require ::cs_main lock?

hebasto commented at 6:31 pm on September 8, 2020:

Oh, I see. They don’t.

hebasto commented at 8:51 pm on September 8, 2020:

Updated.

in src/net_processing.cpp:2551 in 4eff692eeb outdated

2549             LogPrintf("New outbound peer connected: version: %d, blocks=%d, peer=%d%s (%s)\n",
2550                       pfrom.nVersion.load(), pfrom.nStartingHeight,
2551                       pfrom.GetId(), (fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToString()) : ""),
2552                       pfrom.m_tx_relay == nullptr ? "block-relay" : "full-relay");
2553+            }
2554+            if (pfrom.IsBlockOnlyConn()) {

jnewbery commented at 4:49 pm on September 8, 2020:

style: could do this on one line if you prefer:

if (pfrom.IsBlockOnlyConn()) m_connman.DumpAnchors();

I think calling through to connman to dump the anchors is a much cleaner design. Thanks for doing that!

hebasto commented at 6:20 pm on September 8, 2020:

m_connman.DumpAnchors(); is not a trivial expression, insofar I’d prefer to place it on its own line for better debugging.

jnewbery commented at 10:13 am on September 9, 2020:

ok

in src/net.cpp:2011 in 4eff692eeb outdated

2006+            }
2007+        }
2008+    }
2009+}
2010+
2011+std::vector<CAddress> CConnman::GetCurrentAnchors() const

jnewbery commented at 4:52 pm on September 8, 2020:

Would GetCurrentBlockRelayOnlyConns() be a more accurate name here?

hebasto commented at 6:22 pm on September 8, 2020:

Sure! Going to rename it.

hebasto commented at 8:40 pm on September 8, 2020:

Btw, #17428 (review)

hebasto commented at 8:52 pm on September 8, 2020:

Updated.

in src/net.cpp:1941 in 4eff692eeb outdated

1928@@ -1924,11 +1929,30 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1929         int nTries = 0;
1930         while (!interruptNet)
1931         {
1932-            CAddrInfo addr = addrman.SelectTriedCollision();
1933+            // If we didn't find an appropriate destination after trying 100 addresses fetched from addrman,
1934+            // stop this loop, and let the outer loop run again (which sleeps, adds seed nodes, recalculates
1935+            // already-connected network ranges, ...) before trying new addrman addresses.
1936+            nTries++;

jnewbery commented at 4:54 pm on September 8, 2020:

Would it be better just to put the loop counting into a for statement? (perhaps in a different PR)

hebasto commented at 8:52 pm on September 8, 2020:

(perhaps in a different PR)

Yes :)

jnewbery commented at 10:14 am on September 9, 2020:

ok, I think this whole function could be cleaned up, but this change doesn’t make things too much worse, so lets save it for a different PR.

in src/net.cpp:1991 in 4eff692eeb outdated

1985@@ -1973,8 +1986,9 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1986         }
1987 
1988         if (addrConnect.IsValid()) {
1989-
1990-            if (fFeeler) {
1991+            if (anchor) {

jnewbery commented at 4:57 pm on September 8, 2020:

Why are these specialized logs required? Wouldn’t it be better to do generic logging for all connection types?

hebasto commented at 6:26 pm on September 8, 2020:

Why are these specialized logs required?

Because anchors are specialized connections by their nature. As a secure feature I’d keep logs as they are.

jnewbery commented at 10:26 am on September 9, 2020:

I really don’t like this specialized logging for anchor connections (or the feeler connections below). All connection types are specialized in some way? Why not just log unconditionally for all connection types?

hebasto commented at 10:37 am on September 9, 2020:

Why not just log unconditionally for all connection types?

It could be very noisy in logs. Anyway, touching logging for all connections seems out of this PR scope.

jnewbery commented at 11:42 am on September 9, 2020:

oh, I didn’t realise that these are unconditional logs. I definitely think they should be in the NET category.

hebasto commented at 1:58 pm on September 10, 2020:

Thanks! Updated.

sdaftuar commented at 3:37 pm on September 10, 2020:

We could move this whole LogPrint() up to where the anchor is selected, and get rid of the anchor boolean, IMO.

hebasto commented at 3:48 pm on September 12, 2020:

Updated.

in src/net.cpp:1941 in 4eff692eeb outdated

1937+            if (nTries > 100)
1938+                break;
1939+
1940+            CAddress addr;
1941+            if (anchor) {
1942+                addr = m_anchors.back();

jnewbery commented at 4:59 pm on September 8, 2020:

What happens in the logic below if we pick an anchor connection that isn’t in our addrman? (ie what happens at the if (!fFeeler && !HasAllDesirableServiceFlags(addr.nServices)) line below)?

hebasto commented at 8:53 pm on September 8, 2020:

Thanks! Reworked with a smaller diff.

in src/net.cpp:2969 in 4eff692eeb outdated

2960@@ -2908,3 +2961,13 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
2961 
2962     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
2963 }
2964+
2965+void CConnman::DumpAnchors() const
2966+{
2967+    // The node dumps:
2968+    // - current anchors; and
2969+    // - anchors that were known at startup but have not tried for connections yet

jnewbery commented at 6:13 pm on September 8, 2020:

slight wording clarification: s/have not tried for connections yet/we have not yet tried to connect to/

hebasto commented at 6:28 pm on September 8, 2020:

Thanks! You know English is not my strong point :) Going to fix it.

hebasto commented at 8:41 pm on September 8, 2020:

Btw, #17428 (review)

hebasto commented at 8:54 pm on September 8, 2020:

Updated.

hebasto force-pushed on Sep 8, 2020

hebasto commented at 8:48 pm on September 8, 2020: member

Updated 4eff692eeb861018f1f44b6284891a1d0599dd88 -> 922c65cfd9ea31b00fdc76514dbdd68129169467 (pr17428.20 -> pr17428.21, diff):

addressed @jnewbery’s comments

hebasto force-pushed on Sep 10, 2020

hebasto commented at 1:57 pm on September 10, 2020: member

Updated 922c65cfd9ea31b00fdc76514dbdd68129169467 -> 06c96f27e70a7d397b5980497dfe615a1f57aeb4 (pr17428.21 -> pr17428.22, diff).

Addressed more @jnewbery’s comments:

#17428 (review):

I think we have to assume that our data directory is not compromised. Otherwise anything could happen.

#17428 (review):

oh, I didn’t realise that these are unconditional logs. I definitely think they should be in the NET category.

in src/addrdb.h:85 in 06c96f27e7 outdated

80+ * tried to re-connect to on startup.
81+ */
82+std::vector<CAddress> ReadAnchors(const fs::path& anchors_db_path);
83+
84+/** Dump the anchor IP address database (anchors.dat) */
85+void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress> anchors);

jnewbery commented at 2:21 pm on September 10, 2020:

How about passing anchors by const ref to save a copy?

hebasto commented at 3:47 pm on September 12, 2020:

Thanks! Updated.

in src/net.cpp:2418 in 06c96f27e7 outdated

2412@@ -2377,6 +2413,22 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2413         }
2414     }
2415 
2416+    const fs::path& anchors_db_path = GetDataDir() / ANCHORS_DATABASE_FILENAME;
2417+    m_anchors = ReadAnchors(anchors_db_path);
2418+    LogPrintf("Loaded %i addresses from %s\n", m_anchors.size(), ANCHORS_DATABASE_FILENAME);

jnewbery commented at 2:23 pm on September 10, 2020:

What do you think about moving this logging into the ReadAnchors() function (to be consistent with the DumpAnchors() logging)?

hebasto commented at 3:47 pm on September 12, 2020:

Thanks! Updated.

in src/net.cpp:2006 in 06c96f27e7 outdated

1998@@ -1983,10 +1999,30 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1999             }
2000 
2001             OpenNetworkConnection(addrConnect, (int)setConnected.size() >= std::min(nMaxConnections - 1, 2), &grant, nullptr, conn_type);
2002+
2003+            if (anchor) {
2004+                const auto v = GetCurrentBlockRelayOnlyConns();
2005+                if (std::find(v.begin(), v.end(), addrConnect) == v.end()) {
2006+                    LogPrintf("The anchor connection to %s FAILED\n", addrConnect.ToString());

jnewbery commented at 2:39 pm on September 10, 2020:

Make NET category?

hebasto commented at 3:51 pm on September 12, 2020:

After addressing #17428 (review), it’s irrelevant now.

in src/net.cpp:2396 in a756f5fc18 outdated

2391+        if (anchor_num_allowed < m_anchors.size()) {
2392+            m_anchors.resize(anchor_num_allowed);
2393+        }
2394+    }
2395+
2396+    LogPrintf("%i block-relay-only anchors will be tried for connections.\n", m_anchors.size());

sdaftuar commented at 2:39 pm on September 10, 2020:

Perhaps we should not print this message if -connect=0 is set?

hebasto commented at 3:46 pm on September 12, 2020:

Thanks! Updated.

in src/net.cpp:2385 in a756f5fc18 outdated

2378@@ -2377,6 +2379,22 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2379         }
2380     }
2381 
2382+    const fs::path& anchors_db_path = GetDataDir() / ANCHORS_DATABASE_FILENAME;
2383+    m_anchors = ReadAnchors(anchors_db_path);
2384+    LogPrintf("Loaded %i addresses from %s\n", m_anchors.size(), ANCHORS_DATABASE_FILENAME);
2385+    // Peers added via -addnode are permanent connections too, therefore we could skip the same number of anchors.

sdaftuar commented at 2:48 pm on September 10, 2020:

I’m skeptical of this interaction between -addnode peers and the anchor peers we’re introducing. Basically this gives users a way to footgun the protection we’re trying to give with these anchors, by starting up bitcoind once with -addnode on the command line.

Given that using -addnode doesn’t reduce our outbound or block-relay-only connections today, I don’t think it should have any interaction with these anchor peers either.

hebasto commented at 3:46 pm on September 12, 2020:

Thanks! Updated.

in src/net.cpp:2044 in 1509132268 outdated

1988@@ -1989,6 +1989,19 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1989     }
1990 }
1991 
1992+std::vector<CAddress> CConnman::GetCurrentBlockRelayOnlyConns() const

sdaftuar commented at 3:24 pm on September 10, 2020:

This will have an interaction with #19858, so just wanted to flag it – if this PR is merged before #19858, then I’d change this logic to only return the oldest N block-relay-only peers (so that if we shutdown while connected to an extra one, the extra one is ignored).

in src/net.cpp:2006 in 06c96f27e7 outdated

1998@@ -1985,6 +1999,13 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1999             }
2000 
2001             OpenNetworkConnection(addrConnect, (int)setConnected.size() >= std::min(nMaxConnections - 1, 2), &grant, nullptr, conn_type);
2002+
2003+            if (anchor) {
2004+                const auto v = GetCurrentBlockRelayOnlyConns();
2005+                if (std::find(v.begin(), v.end(), addrConnect) == v.end()) {
2006+                    LogPrintf("The anchor connection to %s FAILED\n", addrConnect.ToString());

sdaftuar commented at 3:39 pm on September 10, 2020:

I think we should just delete this whole if (anchor) block. This kind of logging is perhaps helpful while debugging the PR, but I think an unconditional log message here is not very helpful to most users – we should expect that it is completely normal that this might happen, and having a log message that looks concerning for normal behavior trains users to learn to ignore seemingly scary log messages.

For debugging/analysis/research purposes, I would not object to adding a bool to CNode() so that we can track when anchor connections are initiated and when they get disconnected. No need to do that in this PR, but something like that would be much more helpful than this log message, I think.

hebasto commented at 3:47 pm on September 12, 2020:

Updated.

sdaftuar changes_requested

sdaftuar commented at 3:55 pm on September 10, 2020: member

I disagree with the decision to call DumpAnchors on every block-relay-only connection. I largely agree with the reasoning given by @luke-jr and @gmaxwell; if the software is not working correctly and we have an unclean shutdown, it’s better to reset the state by throwing away old connections than it is to just resume our old connections.

Obviously if there’s a bug in our code that can lead to a crash, that’s something we’d want to fix ASAP. But if you’re a node operator, being able to reset state in the event of the unforeseen in order to mitigate a problem while you implement a better fix (like upgrading your code, if a fix is available) just seems like a common sense behavior to have. This is still strictly better from an eclipse-attack-prevention standpoint than what we have today, and in some circumstances may provide better usability in the event of an unknown future problem than what is proposed here.

I also disagree with having an interaction between -addnode and anchors, which I commented on separately.

If we resolve those two issues I’d be on board with the rest of the behavior proposed here (though I still plan to test).

hebasto commented at 1:24 pm on September 12, 2020: member

@sdaftuar

Thank you for reviewing.

But if you’re a node operator, being able to reset state in the event of the unforeseen in order to mitigate a problem while you implement a better fix (like upgrading your code, if a fix is available) just seems like a common sense behavior to have.

A node operator could just remove anchors.dat file to mitigate a problem. If such behavior could not be considered as a remedy, please consider the following.

Another approach is to introduce a configure option, say -keepanchors, with the default value -keepanchors=clean_shutdown_only, and a node operator could opt in -keepanchors=always.

UPDATE: This suggestion could be leaved for follow ups.

hebasto force-pushed on Sep 12, 2020

hebasto commented at 3:45 pm on September 12, 2020: member

Updated 06c96f27e70a7d397b5980497dfe615a1f57aeb4 -> 21d9d4e29ac09cb74afa24bed0509a928210d974 (pr17428.22 -> pr17428.23, diff).

Addressed @sdaftuar’s comments:

I disagree with the decision to call DumpAnchors on every block-relay-only connection. I largely agree with the reasoning given by @luke-jr and @gmaxwell; if the software is not working correctly and we have an unclean shutdown, it’s better to reset the state by throwing away old connections than it is to just resume our old connections.

I also disagree with having an interaction between -addnode and anchors, which I commented on separately.

Also other @sdaftuar’s and @jnewbery’s comments are addressed.

When feature freeze for 0.21.0 is around the coner the non-controversial and the safest approach seems the most reasonable. My suggestion could be considered later.

hebasto force-pushed on Sep 12, 2020

hebasto commented at 4:52 pm on September 12, 2020: member

Updated 21d9d4e29ac09cb74afa24bed0509a928210d974 -> be82629cc00961325182f03493d84f704b709020 (pr17428.23 -> pr17428.24, diff):

added anchors.dat to files.md

in src/net.cpp:2410 in be82629cc0 outdated

2405@@ -2377,6 +2406,17 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2406         }
2407     }
2408 
2409+    m_anchors = ReadAnchors(GetDataDir() / ANCHORS_DATABASE_FILENAME);
2410+    Assert(MAX_BLOCK_RELAY_ONLY_ANCHORS >= 0);

jnewbery commented at 9:07 am on September 16, 2020:

Why add a runtime assert on the value of a constexpr which is known at compilation time? Just add a static_assert if you need to do this (but I don’t think this is required at all).

hebasto commented at 10:57 am on September 16, 2020:

Thanks. Updated.

in src/net.h:67 in be82629cc0 outdated

62@@ -63,6 +63,9 @@ static const int MAX_OUTBOUND_FULL_RELAY_CONNECTIONS = 8;
63 static const int MAX_ADDNODE_CONNECTIONS = 8;
64 /** Maximum number of block-relay-only outgoing connections */
65 static const int MAX_BLOCK_RELAY_ONLY_CONNECTIONS = 2;
66+/** Maximum number of block-relay-only anchor connections */
67+static constexpr int MAX_BLOCK_RELAY_ONLY_ANCHORS = 2;

jnewbery commented at 9:08 am on September 16, 2020:

This constant is only used internally within net.cpp, so I think it should be moved there, rather than be part of net’s interface to other components.

hebasto commented at 10:57 am on September 16, 2020:

Thanks. Updated.

in src/addrdb.h:85 in be82629cc0 outdated

80+ * Reads the anchor IP address database (anchors.dat)
81+ *
82+ * Anchors are last known outgoing block-relay-only peers that are
83+ * tried to re-connect to on startup.
84+ */
85+std::vector<CAddress> ReadAnchors(const fs::path& anchors_db_path, bool remove_after_read = true);

jnewbery commented at 9:10 am on September 16, 2020:

What is the remove_after_read param for? It’s not used in the one place this is called, so should be removed.

hebasto commented at 10:57 am on September 16, 2020:

Updated.

in src/net.h:575 in be82629cc0 outdated

536@@ -533,6 +537,7 @@ class CConnman
537     NetEventsInterface* m_msgproc;
538     /** Pointer to this node's banman. May be nullptr - check existence before dereferencing. */
539     BanMan* m_banman;
540+    std::vector<CAddress> m_anchors;

jnewbery commented at 9:12 am on September 16, 2020:

I think it’d be good to add a doxygen comment to this new member variable. This vector stores addresses that we should attempt to make block-relay-only connections to, not connections that we’ll necessarily store as anchor connections next time we shut down.

hebasto commented at 10:58 am on September 16, 2020:

Thanks! Updated.

jnewbery commented at 5:12 pm on September 16, 2020:

Thanks for adding this. Sorry I wasn’t clearer, but I didn’t mean you should add “not connections that we’ll necessarily store as anchor connections next time we shut down” as a comment. That was me explaining to you why a comment might be helpful. In general, I don’t think comments should explain what the code_isn’t_ doing.

If you have to touch this PR again, I’d suggest “Addresses that were saved during the previous shutdown. We’ll attempt to make block-relay-only connections to them.”

hebasto commented at 5:40 pm on September 16, 2020:

Improved.

in src/net.cpp:2411 in be82629cc0 outdated

2405@@ -2377,6 +2406,17 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2406         }
2407     }
2408 
2409+    m_anchors = ReadAnchors(GetDataDir() / ANCHORS_DATABASE_FILENAME);
2410+    Assert(MAX_BLOCK_RELAY_ONLY_ANCHORS >= 0);
2411+    constexpr size_t max_anchor_num = static_cast<size_t>(MAX_BLOCK_RELAY_ONLY_ANCHORS);

jnewbery commented at 9:14 am on September 16, 2020:

Why not just make the constexpr MAX_BLOCK_RELAY_ONLY_ANCHORS a size_t rather than use this cast?

hebasto commented at 10:58 am on September 16, 2020:

Updated.

in src/net.cpp:2537 in be82629cc0 outdated

2531@@ -2492,6 +2532,14 @@ void CConnman::StopNodes()
2532     if (fAddressesInitialized) {
2533         DumpAddresses();
2534         fAddressesInitialized = false;
2535+
2536+        auto anchors_to_dump = GetCurrentBlockRelayOnlyConns();

jnewbery commented at 9:18 am on September 16, 2020:

I suggest you add a brief comment here to say that anchor connections are only dumped during clean shutdown.

hebasto commented at 10:58 am on September 16, 2020:

Thanks! Updated.

hebasto force-pushed on Sep 16, 2020

hebasto commented at 10:56 am on September 16, 2020: member

Updated be82629cc00961325182f03493d84f704b709020 -> 24129a6b1966d71c7bd77ea029f17fb7796e2bc2 (pr17428.24 -> pr17428.25, diff):

addressed all recent @jnewbery’s comments

hebasto force-pushed on Sep 16, 2020

in src/net.cpp:2456 in ee6a5002e7 outdated

2408@@ -2377,6 +2409,15 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2409         }
2410     }
2411 
2412+    m_anchors = ReadAnchors(GetDataDir() / ANCHORS_DATABASE_FILENAME);
2413+    if (m_anchors.size() > MAX_BLOCK_RELAY_ONLY_ANCHORS) {
2414+        m_anchors.resize(MAX_BLOCK_RELAY_ONLY_ANCHORS);
2415+    }
2416+
2417+    if (m_use_addrman_outgoing) {

jnewbery commented at 5:31 pm on September 16, 2020:

I wonder if the call to ReadAnchors() should be within this if block. If ReadAnchors() is outside the block, then when the node is started with -noconnect, not only will the node not connect to the anchors, it’ll also throw away the file, so that when it’s restarted without -noconnect, it won’t try to reconnect to the old anchors. Is that what we want?

hebasto commented at 5:33 pm on September 16, 2020:

Is that what we want?

I guess not.

hebasto commented at 5:40 pm on September 16, 2020:

Thanks! Updated.

jnewbery commented at 9:36 am on September 17, 2020:

hmm, actually this still won’t work, since if we start with -noconnect, then on shutdown we’ll still write a new anchors.dat file without any peer addresses.

hebasto force-pushed on Sep 16, 2020

hebasto commented at 5:39 pm on September 16, 2020: member

Updated 24129a6b1966d71c7bd77ea029f17fb7796e2bc2 -> 68870383c35922d3a5aad1cedd9df002568995a0 (pr17428.25 -> pr17428.27, diff):

improved the comment as suggested by @jnewbery:

I’d suggest “Addresses that were saved during the previous shutdown. We’ll attempt to make block-relay-only connections to them.”

addressed @jnewbery’s comment:

I wonder if the call to ReadAnchors() should be within this if block. If ReadAnchors() is outside the block, then when the node is started with -noconnect, not only will the node not connect to the anchors, it’ll also throw away the file, so that when it’s restarted without -noconnect, it won’t try to reconnect to the old anchors. Is that what we want?

jnewbery commented at 9:55 am on September 17, 2020: member

I’ve tested the following manually:

start bitcoind, leave it long enough to make block-relay-only connections, stop, manually inspect anchors.dat, restart and observe that bitcoind connects to the same block-relay-only connections :heavy_check_mark:
corrupt anchors.dat file and make sure that it’s ignored, and other addresses are chosen for block-relay-only connections :heavy_check_mark:
start bitcoind with -noconnect -> anchors.dat file doesn’t get read, but gets overwritten with a new file containing no connections on shutdown. I think we might want to change this behaviour to ensure anchors aren’t thrown away unnecessarily (although this scenario seems quite unlikely)

I think this is ready for merge now, and smaller improvements can be added later. It’ll also be great to add regression tests once #19315 has been merged.

Tested ACK 68870383c3

DrahtBot added the label Needs rebase on Sep 21, 2020

hebasto force-pushed on Sep 21, 2020

hebasto commented at 6:53 pm on September 21, 2020: member

Updated 68870383c35922d3a5aad1cedd9df002568995a0 -> 27113fc53d0537393fc6c61a8a3dea0eced7f588 (pr17428.27 -> pr17428.28, diff):

rebased due to the conflict with #19697
addressed @jnewbery’s comment:

start bitcoind with -noconnect -> anchors.dat file doesn’t get read, but gets overwritten with a new file containing no connections on shutdown. I think we might want to change this behaviour to ensure anchors aren’t thrown away unnecessarily (although this scenario seems quite unlikely)

DrahtBot removed the label Needs rebase on Sep 21, 2020

in doc/files.md:52 in 27113fc53d outdated

48@@ -49,6 +49,7 @@ Subdirectory       | File(s)               | Description
49 `indexes/blockfilter/basic/db/` | LevelDB database      | Blockfilter index LevelDB database for the basic filtertype; *optional*, used if `-blockfilterindex=basic`
50 `indexes/blockfilter/basic/`    | `fltrNNNNN.dat`<sup>[\[2\]](#note2)</sup> | Blockfilter index filters for the basic filtertype; *optional*, used if `-blockfilterindex=basic`
51 `wallets/`         |                       | [Contains wallets](#multi-wallet-environment); can be specified by `-walletdir` option; if `wallets/` subdirectory does not exist, a wallet resides in the data directory
52+`./`               | `anchors.dat`         | Anchor IP address database. Anchors are last known outgoing block-relay-only peers that are tried to re-connect to on startup

ariard commented at 2:49 pm on September 25, 2020:

You can precise that this file is deleted after init to avoid user searching indefinitely for it while the node is operating.

hebasto commented at 4:58 pm on September 28, 2020:

Thanks! Updated.

in src/addrdb.h:80 in 27113fc53d outdated

78+
79+/**
80+ * Reads the anchor IP address database (anchors.dat)
81+ *
82+ * Anchors are last known outgoing block-relay-only peers that are
83+ * tried to re-connect to on startup.

ariard commented at 2:53 pm on September 25, 2020:

You can precise that deleting of anchors.dat isn’t accidental but intentional as it avoids renewed peering to anchors after a gross shutdown and thus potential exploitation of the anchor peer policy.

hebasto commented at 4:59 pm on September 28, 2020:

Updated.

in src/net.cpp:1933 in 27113fc53d outdated

1898@@ -1893,6 +1899,7 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1899 
1900         ConnectionType conn_type = ConnectionType::OUTBOUND_FULL_RELAY;
1901         int64_t nTime = GetTimeMicros();
1902+        bool anchor = false;

ariard commented at 2:53 pm on September 25, 2020:

Don’t we have a coding style convention to prefix flag by f ? What about fAnchorAddr ?

hebasto commented at 4:02 pm on September 25, 2020:

The project has long ceased to use Hungarian notation in new code :)

in src/net.cpp:2456 in 27113fc53d outdated

2409@@ -2378,6 +2410,14 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
2410         }
2411     }
2412 
2413+    if (m_use_addrman_outgoing) {

ariard commented at 2:55 pm on September 25, 2020:

nit: “Load addresses from anchors.dat”

hebasto commented at 5:00 pm on September 28, 2020:

Updated.

in src/net.h:573 in 27113fc53d outdated

538@@ -538,6 +539,12 @@ class CConnman
539     /** Pointer to this node's banman. May be nullptr - check existence before dereferencing. */
540     BanMan* m_banman;
541 
542+    /**
543+     * Addresses that were saved during the previous clean shutdown. We'll
544+     * attempt to make block-relay-only connections to them.

ariard commented at 2:57 pm on September 25, 2020:

I think comment above ConnectionType::BLOCK_RELAY (`src/net.h L160) should be updated towards something like

0/*
1 * "We automatically attempt to open MAX_BLOCK_RELAY_ONLY_ANCHORS using addresses from  
2 * our anchors.dat. Then addresses from our AddrMan if MAX_BLOCK_RELAY_ONLY_CONNECTIONS 
3 * isn't reached yet
4*/

ariard commented at 3:49 pm on September 25, 2020:

I also wonder if saving anchor don’t introduce a slight fingerprint if a node is started on IPV4 then moved to Tor and connect to the same anchors. A spying anchor, by coupling with other infos such as best advertised tip could guess the networks mapping.

I think we should stamp the anchor.dat with last network type connected on and remove then if it has changed compared to actual one.

What’s your opinion ?

hebasto commented at 3:22 pm on September 28, 2020:

I think it is a right way to only allow connection to an anchor of the network type, which is allowed to connect to by the current policy of the node. Good topic for a follow up.

hebasto commented at 5:00 pm on September 28, 2020:

Thanks! Updated.

in src/net.cpp:1978 in 27113fc53d outdated

1946+                const CAddress addr = m_anchors.back();
1947+                m_anchors.pop_back();
1948+                if (!addr.IsValid() || IsLocal(addr) || !IsReachable(addr)) continue;
1949+                addrConnect = addr;
1950+                LogPrint(BCLog::NET, "Trying to make an anchor connection to %s\n", addrConnect.ToString());
1951+                break;

ariard commented at 3:33 pm on September 25, 2020:

I think the relation between connections opening order (anchors -> regular outbounds) and the enforcement of peer diversity should be underscored better. With this patchset, I believe outbound peer diversity is enforced by the fact that OUTBOUND_FULL_RELAY/BLOCK_RELAY are opened after anchors and thus those regular outbound addresses will be checked against the anchor ones.

It could be better to exist this loop normally to let other checks happening while avoiding to call AddrMan::Select thanks to the newanchor flag ? And add a comment above the connection type selection loop underscoring the order importance to avoid future work modifying it. E.g opening an OUTBOUND_FULL_RELAY may evict inadvertently an anchor in the same netgroup, thus diminishing effectiveness of anchors.

It would also enforce HasDesirableServiceFlags() on anchors. It’s unlikely we learn change of their services before we connect to them but our desirability may have changed between a restart. Further, in case of a buggy/malicious anchors.dat we should check network groups distinctness of anchors among themselves.

hebasto commented at 5:04 pm on September 28, 2020:

Thanks! Updated.

Keeping the added code in its own block for the following reasons:

some checks in the loop require CAddrInfo type but m_anchors contains only CAddress instances
smaller diff

in src/net.cpp:1946 in 27113fc53d outdated

1910@@ -1904,7 +1911,10 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1911         // these conditions are met, check the nNextFeeler timer to decide if
1912         // we should open a FEELER.
1913 
1914-        if (nOutboundFullRelay < m_max_outbound_full_relay) {
1915+        if (!m_anchors.empty() && (nOutboundBlockRelay < m_max_outbound_block_relay)) {

ariard commented at 3:38 pm on September 25, 2020:

I think comment above isn’t accurate anymore as it clearly mentions OUTBOUND_FULL_RELAY has the highest priority which isn’t true after this patchset.

hebasto commented at 5:05 pm on September 28, 2020:

Thanks! Updated.

ariard commented at 3:52 pm on September 25, 2020: member

I think this and this are worthy to fix to avoid future code changes silently breaking peer diversity as post-anchor the enforcement of distinct outbound network groups isn’t that straightforward, IMO.

Also, it may introduce a slight fingerprint which could be worthy to solve as a follow-up.

Otherwise, review leftovers are just nice-to-fix.

hebasto force-pushed on Sep 28, 2020

hebasto commented at 4:58 pm on September 28, 2020: member

Updated 27113fc53d0537393fc6c61a8a3dea0eced7f588 -> b397e49b0c390970713b4387e6dd254b926c70d0 (pr17428.28 -> pr17428.30, diff):

fixed bug when an anchor does not pass all checks
addressed @ariard’s comments

practicalswift commented at 7:27 pm on September 28, 2020: contributor

Concept ACK

hebasto closed this on Sep 30, 2020

hebasto reopened this on Sep 30, 2020

in src/addrdb.h:88 in b397e49b0c outdated

83+
84+/**
85+ * Read the anchor IP address database (anchors.dat)
86+ *
87+ * Deleting of anchors.dat is intentional as it avoids renewed peering to anchors after
88+ * a gross shutdown and thus potential exploitation of the anchor peer policy.

jnewbery commented at 8:51 am on October 9, 2020:

s/a gross/an unclean/

hebasto commented at 9:43 am on October 9, 2020:

Thanks. Updated.

in src/addrdb.h:87 in b397e49b0c outdated

82+void DumpAnchors(const fs::path& anchors_db_path, const std::vector<CAddress>& anchors);
83+
84+/**
85+ * Read the anchor IP address database (anchors.dat)
86+ *
87+ * Deleting of anchors.dat is intentional as it avoids renewed peering to anchors after

jnewbery commented at 8:51 am on October 9, 2020:

/Deleting of anchors.dat/Deleting anchors.dat/

hebasto commented at 9:43 am on October 9, 2020:

Thanks. Updated.

in src/net.cpp:2543 in b397e49b0c outdated

2536@@ -2493,6 +2537,15 @@ void CConnman::StopNodes()
2537     if (fAddressesInitialized) {
2538         DumpAddresses();
2539         fAddressesInitialized = false;
2540+
2541+        if (m_use_addrman_outgoing) {
2542+            // Anchor connections are only dumped during clean shutdown.
2543+            auto anchors_to_dump = GetCurrentBlockRelayOnlyConns();

jnewbery commented at 9:07 am on October 9, 2020:

I think explicitly writing the type here (std::vector<CAddress>) is clearer here.

hebasto commented at 9:43 am on October 9, 2020:

Updated.

in doc/files.md:52 in b397e49b0c outdated

48@@ -49,6 +49,7 @@ Subdirectory       | File(s)               | Description
49 `indexes/blockfilter/basic/db/` | LevelDB database      | Blockfilter index LevelDB database for the basic filtertype; *optional*, used if `-blockfilterindex=basic`
50 `indexes/blockfilter/basic/`    | `fltrNNNNN.dat`<sup>[\[2\]](#note2)</sup> | Blockfilter index filters for the basic filtertype; *optional*, used if `-blockfilterindex=basic`
51 `wallets/`         |                       | [Contains wallets](#multi-wallet-environment); can be specified by `-walletdir` option; if `wallets/` subdirectory does not exist, a wallet resides in the data directory
52+`./`               | `anchors.dat`         | Anchor IP address database, created on shutdown and deleted at start. Anchors are last known outgoing block-relay-only peers that are tried to re-connect to on startup

jnewbery commented at 9:08 am on October 9, 2020:

s/deleted at start/deleted at startup/

hebasto commented at 9:43 am on October 9, 2020:

Thanks. Updated.

in src/net.h:317 in b397e49b0c outdated

313@@ -312,6 +314,7 @@ class CConnman
314      * call the function without a parameter to avoid using the cache.
315      */
316     std::vector<CAddress> GetAddresses(CNode& requestor, size_t max_addresses, size_t max_pct);
317+    std::vector<CAddress> GetCurrentBlockRelayOnlyConns() const;

jnewbery commented at 9:17 am on October 9, 2020:

This can be private. Can you also add a short doxygen comment eg “Return vector of current BLOCK_RELAY peers”

hebasto commented at 9:44 am on October 9, 2020:

Thanks. Updated.

jnewbery commented at 9:17 am on October 9, 2020: member

utACK b397e49b0c390970713b4387e6dd254b926c70d0

Just a few small style comments inline.

hebasto force-pushed on Oct 9, 2020

hebasto commented at 9:42 am on October 9, 2020: member

Updated b397e49b0c390970713b4387e6dd254b926c70d0 -> d77ae88b5e7f8d6ea817675b7e4a28f80ed5081f (pr17428.30 -> pr17428.31, diff):

addressed all of the recent @jnewbery’s comments

jnewbery commented at 9:56 am on October 9, 2020: member

Code review ACK d77ae88b5e

DrahtBot added the label Needs rebase on Oct 9, 2020

p2p: Add DumpAnchors() 567008d2a0

p2p: Add ReadAnchors() c29272a157

p2p: Add CConnman::GetCurrentBlockRelayOnlyConns() bad16aff49

p2p: Integrate DumpAnchors() and ReadAnchors() into CConnman 4170b46544

p2p: Fix off-by-one error in fetching address loop

This is a move-only commit.

5543c7ab28

p2p: Try to connect to anchors once 0a85e5a7bc

doc: Add anchors.dat to files.md a490d074b3

hebasto force-pushed on Oct 9, 2020

hebasto commented at 11:32 am on October 9, 2020: member

Rebased d77ae88b5e7f8d6ea817675b7e4a28f80ed5081f -> a490d074b3491427afbd677f5fa635b910f8bb34 (pr17428.31 -> pr17428.32) due to the conflict with #20076.

jnewbery commented at 12:21 pm on October 9, 2020: member

code review ACK a490d074b3

DrahtBot removed the label Needs rebase on Oct 9, 2020

laanwj commented at 6:19 pm on October 15, 2020: member

Code review ACK a490d074b3491427afbd677f5fa635b910f8bb34

laanwj merged this on Oct 15, 2020

laanwj closed this on Oct 15, 2020

hebasto deleted the branch on Oct 15, 2020

sidhujag referenced this in commit f5fa561d87 on Oct 16, 2020

MarcoFalke commented at 9:10 am on January 8, 2021: member

There is a TODO to write release notes for this: https://github.com/bitcoin-core/bitcoin-devwiki/wiki/0.21.0-Release-Notes-Draft#p2p-and-network-changes

hebasto commented at 9:21 am on January 8, 2021: member

@MarcoFalke

There is a TODO to write release notes for this: https://github.com/bitcoin-core/bitcoin-devwiki/wiki/0.21.0-Release-Notes-Draft#p2p-and-network-changes

Done.

fanquake referenced this in commit 2968417948 on May 24, 2021

Fabcien referenced this in commit bc785db129 on Nov 24, 2021

Fabcien referenced this in commit a839a3353d on Nov 24, 2021

Fabcien referenced this in commit a99311050a on Nov 24, 2021

Fabcien referenced this in commit b465c53352 on Nov 24, 2021

Fabcien referenced this in commit e569d7a78f on Nov 24, 2021

Fabcien referenced this in commit e6255ed057 on Nov 24, 2021

Fabcien referenced this in commit 415bb33df2 on Nov 24, 2021

DrahtBot locked this on Aug 16, 2022

p2p: Try to preserve outbound block-relay-only connections during restart #17428

Conflicts