Make CAddrman::Select_ select buckets, not positions, first #23140

sipa commented at 8:29 PM on September 29, 2021: member

The original CAddrMan behaviour (before #5941) was to pick a uniformly random non-empty bucket, and then pick a random element from that bucket. That commit, which introduced deterministic placement of entries in buckets, changed this to picking a uniformly random non-empty bucket position instead.

I believe that was a mistake. Buckets are our best metric for spreading out requests across independently-controlled nodes. That does mean that if a bucket has fewer entries, its entries are supposed to be picked more frequently.

This PR reverts to the original high-level behavior, but on top of the deterministic placement logic.

sipa force-pushed on Sep 29, 2021

DrahtBot added the label P2P on Sep 29, 2021

amitiuttarwar commented at 9:38 PM on September 29, 2021: contributor

concept ACK

naumenkogs commented at 11:10 AM on September 30, 2021: member

Okay so here's what I think is happening:

Before this PR, we pick bucket+position randomly. If that's empty, we pick bucket+position randomly again. After this PR, we pick bucket+position randomly. If that's empty, we stay in the same bucket and go from randomly_picked_position to the end of the bucket. If haven't found anything, pick bucket+position randomly again.

That does mean that if a bucket has fewer entries, its entries are supposed to be picked more frequently.

I think this is a good intuitive high-level justification of the proposed change.

So, I don't think this can be directly exploited at least easily (because we randomly hash), but it just makes the connectivity better overall, for a given node (and perhaps for the network overall)? Since a bucket is a means of diversification, making them all equal makes sense.

But also, after this change, ending up in a less-populated bucket is "luckier" for that node. That probably evens out across all nodes in the network, so that should be fine.

Concept ACK.

in src/addrman.cpp:745 in 8fd67d88d1 outdated

 743 | -                nUBucket = (nUBucket + insecure_rand.randbits(ADDRMAN_NEW_BUCKET_COUNT_LOG2)) % ADDRMAN_NEW_BUCKET_COUNT;
 744 | -                nUBucketPos = (nUBucketPos + insecure_rand.randbits(ADDRMAN_BUCKET_SIZE_LOG2)) % ADDRMAN_BUCKET_SIZE;
 745 | +            // Iterate over the positions of that bucket.
 746 | +            int i;
 747 | +            for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {
 748 | +                if (vvNew[nUBucket][nUBucketPos ^ i] != -1) break;

naumenkogs commented at 11:13 AM on September 30, 2021:

nit: I think it's not particularly intuitive that this loop with nUBucketPos ^ i exhausts all values [0; ADDRMAN_BUCKET_SIZE). Perhaps we could use something more explicit/readable?

sipa commented at 6:31 PM on September 30, 2021:

Made some changes; perhaps it's clearer now.

jnewbery commented at 9:33 AM on October 1, 2021:

Hmmm, maybe just personal taste, but I think the old version is better. Having a separate counter (i) from the thing being incremented (nUBucketPos) seems unusual and a bit jarring. Both work, but I have a mild preference for the original.

in src/addrman.cpp:722 in 8fd67d88d1 outdated

 720 | -                nKBucket = (nKBucket + insecure_rand.randbits(ADDRMAN_TRIED_BUCKET_COUNT_LOG2)) % ADDRMAN_TRIED_BUCKET_COUNT;
 721 | -                nKBucketPos = (nKBucketPos + insecure_rand.randbits(ADDRMAN_BUCKET_SIZE_LOG2)) % ADDRMAN_BUCKET_SIZE;
 722 | +            // Iterate over the positions of that bucket.
 723 | +            int i;
 724 | +            for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {
 725 | +                if (vvTried[nKBucket][nKBucketPos ^ i] != -1) break;

rajarshimaitra commented at 3:20 PM on September 30, 2021:

It seems to me this operation will iterate over all the elements of a bucket, but not serially. So if a bucket is non empty it will always find an address in that bucket.

Is the purpose of the above Xor is to add randomness into the search?

That does mean that if a bucket has fewer entries, its entries are supposed to be picked more frequently.

I am also not understanding how the search will ensure the above? It seems irrespective of the bucket size, it will always find an entry if the bucket is non-empty.

sipa commented at 4:48 PM on September 30, 2021:

The xor is just a simple way of iterating over the positions in the bucket. It's not more or less predictable than starting in a random position and incrementing until it wraps around and reaches the starting position again - just less code for doing something similar.

I am also not understanding how the search will ensure the above? It seems irrespective of the bucket size, it will always find an entry if the bucket is non-empty.

That's exactly what it does. The old code will pick a uniformly random bucket/position combination, among all of those that are non-empty. The new code will pick a uniformly random non-empty bucket, and then a random position in that bucket. That means that relatively speaking, if bucket A has 1 entry, and bucket B as 4 entries, and all other buckets are empty, A's entry will have 50% chance of being picked, and B's 4 entries will have 12.5% each. In the old code, each of the 5 entries would have 20%.

rajarshimaitra commented at 6:12 PM on September 30, 2021:

Thanks @sipa for clarifying. Its natural that if probability of a bucket being chosen is P(bucket) and it has n addresses, then probability of a particular address being picked from a particular bucket is P(bucket) / n. That satisfies your example above.

But to me, that doesn't seem like satisfying your above statement, " if a bucket has fewer entries, its entries are supposed to be picked more frequently."

Because bucket choices are uniformly random, for m non-empty buckets, probability of a particular bucket being chosen is 1/m. And, then because the search ensures that a non-empty bucket will always yield an address, so probability of getting any address from that bucket is 1.

Following this logic, we have, for m buckets, probability that any address from a particular bucket to be chosen is always 1/m, uniformly distributed over all the buckets, and independent of size of buckets.

So mapping back to your example, it would say, probability of choosing an address from Bucket A == Probability of choosing an address from Bucket B == 50%. The address count doesn't play a role anymore, because we will always choose an address from a bucket if it has any.

To compare against the old code:
In the old code, the probability is uniformly distributed across all addresses, irrespective of their buckets. In the new code the probability is now uniformly distributed over all buckets, irrespective of their contents.

Once the bucket is chosen, then there is no probability anymore, as we will always get an address from it if its not empty.

Thus to me it seems like the probability is independent of bucket size.

sipa commented at 6:32 PM on September 30, 2021:

So mapping back to your example, it would say, probability of choosing an address from Bucket A == Probability of choosing an address from Bucket B == 50%. The address count doesn't play a role anymore, because we will always choose an address from a bucket if it has any.

Right, that's the case. But that is exactly the same as saying that addresses in buckets with more addresses are relatively less frequently going to be picked.

rajarshimaitra commented at 6:44 PM on September 30, 2021:

But I do understand your point, that probability of a particular address being chosen from a particular bucket, decreases with number of addresses in that bucket. But that's just natural consequence, and the code doesn't favor smaller buckets in any way. Which I thought was the case by your statement.

sipa commented at 7:49 PM on September 30, 2021:

Ah, no my statement, reformulated, is: "if you look at the resulting change of per-address probabilities caused by this PR, you'll see it now favors those in smaller buckets more. However, that is desirable."

rajarshimaitra commented at 3:22 PM on September 30, 2021: contributor

Concept ACK.

Below is a conceptual question for my own understanding.

sipa force-pushed on Sep 30, 2021

in src/addrman.cpp:726 in 0f47d86f13 outdated

 724 | +                if (vvTried[nKBucket][nKBucketPos] != -1) {
 725 | +                    int nId = vvTried[nKBucket][nKBucketPos];
 726 | +                    const auto it_found{mapInfo.find(nId)};
 727 | +                    assert(it_found != mapInfo.end());
 728 | +                    const CAddrInfo& info{it_found->second};
 729 | +                    if (insecure_rand.randbits(30) < fChanceFactor * info.GetChance() * (1 << 30)) return info;

naumenkogs commented at 7:17 AM on October 1, 2021:

nit: while you touching this, we could also make this more readable by avoiding bit operations :)

sipa commented at 3:21 PM on October 1, 2021:

Any suggestions?

naumenkogs commented at 8:33 AM on October 4, 2021:

I meant insecure_rand.randrange(XYZ) < fChanceFactor * info.GetChance() * XYZ and defining XYZ should just work?

in src/addrman.cpp:733 in 0f47d86f13 outdated

 738 | -            assert(it_found != mapInfo.end());
 739 | -            const CAddrInfo& info{it_found->second};
 740 | -            if (insecure_rand.randbits(30) < fChanceFactor * info.GetChance() * (1 << 30))
 741 | -                return info;
 742 | -            fChanceFactor *= 1.2;
 743 | +            // Either the chosen bucket was empty, or the GetChance()-based bias rejected

naumenkogs commented at 7:22 AM on October 1, 2021:

nit: If you move this comment right before the // Iterate over the positions of that bucket. comment, next readers won't be like me trying to figure out what exactly this bias tries to achieve.

sipa commented at 3:21 PM on October 1, 2021:

I've changed it back to the original approach, but with iteration instead of xor.

naumenkogs commented at 7:23 AM on October 1, 2021: member

ACK 0f47d86f13ba90d9a9f7a6a67d65cefaa70c4303

I think the code is correct, I haven't made any experiments to verify probabilities.

sipa force-pushed on Oct 1, 2021

jnewbery commented at 5:31 PM on October 1, 2021: member

When we learn of a new address from a source peer, I believe we essentially:

select 64 of the 1024 new buckets based on the source peer's netgroup
then select the specific bucket out of those 64 based on the destination's netgroup
then select the bucket position (out of 64) based on the destination's network address (and the bucket index)

That means that if we have one or more source peers that are filling up our new table with addresses, each one will be able to fill up to 64 new buckets with 64 addresses each (4096 addresses).

This proposed change means that if a minority of source peers are sending a disproportionately high number of address records, the addresses that they're providing won't get disproportionately chosen by Select(). I think it also means that we're more likely to try to connect to addresses that we learn about via self-broadcast gossiping from inbound nodes than currently. I expect that (initially at least) our new tables are dominated by addresses that we learn from DNS seeds or from getaddr responses, but that those records are concentrated into specific buckets. by selecting bucket first, then position, we're more likely to select from an under-filled bucket, which would have been populated by address gossip.

Those seems like good changes to me.

When we promote an address to the tried table, we:

select 8 of the 256 buckets based on the destination's netgroup
then select the specific bucket based on the destination's network address
then select the bucket position based on the destination's network address (and the bucket index)

That means that if we successfully connect to a peer in an under-represented netgroup, it'll be placed in a more sparsely-populated tried bucket than if we successfully connect to a peer in a more common netgroup. With this change, we're more likely to choose that address again when Select() is called. If this change is applied to nodes across the network, then nodes on less well represented netgroups will receive slightly more inbound requests on average than those on common netgroups.

So, changing the way that Select() picks entries from the new and tried tables does two slightly different things:

for the new tables, it means that each address source netgroup is (closer to) equally likely to be picked. That's an improvement because we'll no longer favour addresses from spammy address sources.
for the tried tables, it means that each destination netgroup is (closer to) equally like to be be picked, but this is only for addresses that we've connected to at some point in the past.

As far as I can understand, the justification for the PR describes (2), but I think (1) is far more significant, since there are always more entries in the new tables, and because for feeler connections we only select from the new tables.

DrahtBot commented at 6:23 PM on October 1, 2021: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

DrahtBot cross-referenced this on Oct 1, 2021 from issue [p2p] Pimpl AddrMan to abstract implementation details by amitiuttarwar

DrahtBot cross-referenced this on Oct 1, 2021 from issue log: improve addrman logging by mzumsande

naumenkogs commented at 8:34 AM on October 4, 2021: member

ACK 318a8b2792f8183397067e8b3633b16e300db122

I was mainly thinking about new nodes, where the benefit is clear because bucket is selected based on the source.

W.r.t tried, I'm not even sure this makes any practical sense. We won't connect to the same netgroup anyway, right? So, "destination netgroup is (closer to) equally like to be be picked" won't make any real difference (there will be some difference, but i think it doesn't matter).

DrahtBot cross-referenced this on Oct 4, 2021 from issue [fuzz] Use public methods in addrman fuzz tests by jnewbery

DrahtBot added the label Needs rebase on Oct 5, 2021

Make CAddrman::Select_ select buckets, not positions, first

The original CAddrMan behaviour (before commit
e6b343d880f50d52390c5af8623afa15fcbc65a2) was to pick a uniformly
random non-empty bucket, and then pick a random element from that
bucket. That commit, which introduced deterministic placement
of entries in buckets, changed this to picking a uniformly random
non-empty bucket position instead.

This commit reverts the original high-level behavior, but in the
deterministic placement model.

632aad9e6d

sipa force-pushed on Oct 5, 2021

sipa commented at 3:49 PM on October 5, 2021: member

Rebased.

jnewbery commented at 4:51 PM on October 5, 2021: member

W.r.t tried, I'm not even sure this makes any practical sense. We won't connect to the same netgroup anyway, right? So, "destination netgroup is (closer to) equally like to be be picked" won't make any real difference (there will be some difference, but i think it doesn't matter). @naumenkogs - I agree. The logic in ThreadOpenConnections prevents us from making multiple outbound connections to peers in the same network group. @sipa - what do you think about removing the change to picking a tried entry, and only having this logic for picking a new entry?

sipa commented at 5:08 PM on October 5, 2021: member

@jnewbery I prefer being consistent, unless there is actually a reason why this is worse for the tried table?

DrahtBot removed the label Needs rebase on Oct 5, 2021

in src/addrman.cpp:722 in 632aad9e6d

 721 | +            for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {
 722 | +                if (vvTried[nKBucket][(nKBucketPos + i) % ADDRMAN_BUCKET_SIZE] != -1) break;
 723 |              }
 724 | -            int nId = vvTried[nKBucket][nKBucketPos];
 725 | +            // If the bucket is entirely empty, start over with a (likely) different one.
 726 | +            if (i == ADDRMAN_BUCKET_SIZE) continue;

theuni commented at 6:29 PM on October 5, 2021:

Is this an infinite loop if all buckets are empty? It's unclear to me if that can be the case here.

sipa commented at 7:32 PM on October 5, 2021:

It would be, if the function didn't start with testing if the relevant table was empty.

theuni commented at 7:43 PM on October 5, 2021:

Ah, ok, that relationship was unclear from the local code alone, but makes sense after a quick refresher of how vRandom works.

sipa commented at 8:02 PM on October 5, 2021:

Right, it's a bit obscure.

jnewbery commented at 9:53 AM on October 6, 2021: member

utACK 632aad9e6d8369750f4327a886ca5b3d3fed89bd

@jnewbery I prefer being consistent, unless there is actually a reason why this is worse for the tried table?

No, I don't think it would be any worse for the tried table. I can see the benefit of being consistent for the two tables.

naumenkogs commented at 10:13 AM on October 6, 2021: member

ACK 632aad9e6d8369750f4327a886ca5b3d3fed89bd

in src/addrman.cpp:718 in 632aad9e6d

 716 | -                nKBucket = (nKBucket + insecure_rand.randbits(ADDRMAN_TRIED_BUCKET_COUNT_LOG2)) % ADDRMAN_TRIED_BUCKET_COUNT;
 717 | -                nKBucketPos = (nKBucketPos + insecure_rand.randbits(ADDRMAN_BUCKET_SIZE_LOG2)) % ADDRMAN_BUCKET_SIZE;
 718 | +            // Iterate over the positions of that bucket, starting at the initial one,
 719 | +            // and looping around.
 720 | +            int i;
 721 | +            for (i = 0; i < ADDRMAN_BUCKET_SIZE; ++i) {

mzumsande commented at 7:30 PM on October 20, 2021:

I think that this way of iterating does not select all items in a bucket with the same probability when the bucket is partially filled. In the most extreme case of just two items in adjacent positions, one item would get picked with probability 63/64, the other one with probability 1/64 because we start at a random position and then move from left to right until we find something. Do you think that this is a problem? Was the older version of this PR with nUBucketPos ^ i maybe better because of this?

sipa commented at 7:46 PM on October 20, 2021:

They're both equally non-uniform in this regard. I can try adding something to make it actually uniform if there is a concern, but given that the contents of the buckets is secret and assumed not to be under attacker control, I don't think it matters very much.

mzumsande commented at 8:29 PM on October 20, 2021:

Yes, I agree that this cannot be abused. Just a small quirk that in a given addrman configuration, the selection is not always perfectly uniform.

mzumsande commented at 7:43 PM on October 20, 2021: contributor

Concept ACK, one question about the bucket iteration logic below.

W.r.t tried, I'm not even sure this makes any practical sense. We won't connect to the same netgroup anyway, right? So, "destination netgroup is (closer to) equally like to be be picked" won't make any real difference (there will be some difference, but i think it doesn't matter).

The small difference might be that before, there was a tendency to select more tried items from the same netgroup, leading to retries in ThreadOpenConnections if the netgroup was already used for an outbound, and therefore leading to a slight bias towards selecting entries from new vs tried (because the next Select() call would again choose from new with p=0.5). But I agree that this is not material, and even if it was, the new behavior makes more sense.

mzumsande commented at 8:30 PM on October 20, 2021: contributor

ACK 632aad9e6d8369750f4327a886ca5b3d3fed89bd

laanwj merged this on Oct 22, 2021

laanwj closed this on Oct 22, 2021

sidhujag referenced this in commit 4cd7e20142 on Oct 22, 2021

Fabcien referenced this in commit 55e57736b1 on Oct 21, 2022

bitcoin locked this on Oct 30, 2022