addrman: Fix new table bucketing during unserialization #20557

jnewbery commented at 11:38 AM on December 3, 2020: member

This fixes three issues in addrman unserialization.

An addrman entry can appear in up to 8 new table buckets. We store this entry->bucket indexing during shutdown so that on restart we can restore the entries to their correct buckets. Commit ec45646de9e62b3d42c85716bfeb06d8f2b507dc broke the deserialization code so that each entry could only be put in up to one new bucket.
Unserialization may result in an entry appearing in a 9th bucket. If the entry already appears in 8 buckets don't try to place it in another bucket.
We unnecessarily rebucket when reading a peers.dat with file version 1. Don't do that.

jnewbery commented at 11:39 AM on December 3, 2020: member

The code currently rebuckets entries (losing the additional bucket positions) if the asmap checksum changes. I'm not sure if that's necessary and don't see any downside to keeping the same bucketing if restarting with a new asmap. If others agree, I can update this PR to not do that rebucketing.

jnewbery cross-referenced this on Dec 3, 2020 from issue Avoid rebucketing on restart when it's not necessary by naumenkogs

jnewbery commented at 11:43 AM on December 3, 2020: member

We should also add a regression test for this. I think a good test would be to add a round trip serialize/unserialize to the addrman fuzz test, and compare the addrman before and after that round trip. There's some internal state that isn't reconstructed from serialize/unserialize (eg the vRandom ordering), but a comparator function that checked number of entries and bucket->entry pairs should work.

If anyone wants to add that fuzz test or other tests, I'm happy to add them to this PR.

laanwj added the label P2P on Dec 3, 2020

in src/addrman.h:530 in 595e9e56d5 outdated

 539 | +        uint256 serialized_asmap_checksum;
 540 |          if (format >= Format::V2_ASMAP) {
 541 | -            s >> serialized_asmap_version;
 542 | +            s >> serialized_asmap_checksum;
 543 |          }
 544 | +        const bool restore_bucketing{nUBuckets == ADDRMAN_NEW_BUCKET_COUNT &&

MarcoFalke commented at 12:55 PM on December 3, 2020:

        const bool restore_bucketing{nUBuckets == ADDRMAN_NEW_BUCKET_COUNT &&

jnewbery commented at 12:59 PM on December 3, 2020:

fixed

jnewbery added the label Bug on Dec 3, 2020

jnewbery force-pushed on Dec 3, 2020

DrahtBot commented at 1:28 PM on December 3, 2020: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

DrahtBot cross-referenced this on Dec 3, 2020 from issue addrman: improve performance by using more suitable containers by vasild

naumenkogs commented at 1:57 AM on December 4, 2020: member

Concept ACK.

The code currently rebuckets entries (losing the additional bucket positions) if the asmap checksum changes. I'm not sure if that's necessary and don't see any downside to keeping the same bucketing if restarting with a new asmap. If others agree, I can update this PR to not do that rebucketing.

That would mean that asmap is applied to only newly learned addresses, which is a downside (assuming asmap is an improvement). On the other hand, losing the existing 8 buckets is unfortunate. I'm fine either way.

We could do the 8 existing + 1 asmap bucket if we really want... but that also has some consequences of potentially overriding existing buckets etc.

Also, we should keep in mind that non-rebucketing on a new asmap may allow existing records to appear in 16 buckets now instead of 8 (probably all existing records on first asmap appearance). Which is... also not particularly terrible but should be considered?

jnewbery commented at 9:39 AM on December 4, 2020: member

That would mean that asmap is applied to only newly learned addresses, which is a downside (assuming asmap is an improvement)

Currently the bucket positions for existing new addresses are lost entirely when changing asmap, so I don't understand how this is a downside. The choice is between throwing away those bucket positions or not.

Also, we should keep in mind that non-rebucketing on a new asmap may allow existing records to appear in 16 buckets now instead of 8 (probably all existing records on first asmap appearance). Which is... also not particularly terrible but should be considered?

No. We won't add an entry to more than 8 buckets: https://github.com/bitcoin/bitcoin/blob/257cf05f9b841ba30202f23a94bcdb1743feded2/src/addrman.cpp#L294-L296.

naumenkogs commented at 5:38 PM on December 4, 2020: member

Currently the bucket positions for existing new addresses are lost entirely when changing asmap, so I don't understand how this is a downside. The choice is between throwing away those bucket positions or not.

I expressed my agreement with you in the following sentence: "On the other hand, losing the existing 8 buckets is unfortunate.". Still, ideally, we would apply a new asmap to existing addrs without losing the data. Unfortunately, since we don't store addr source, we have to make this choice between the two. And I agree that of these 2 choices, the one you're suggesting is probably better.

No. We won't add an entry to more than 8 buckets:

Good to know! That was just my wrong assumption without looking deep into code :)

in src/addrman.h:352 in 3d438273fd outdated

 348 | @@ -348,8 +349,8 @@ friend class CAddrManTest;
 349 |       * Notice that vvTried, mapAddr and vVector are never encoded explicitly;
 350 |       * they are instead reconstructed from the other information.
 351 |       *
 352 | -     * vvNew is serialized, but only used if ADDRMAN_UNKNOWN_BUCKET_COUNT didn't change,
 353 | -     * otherwise it is reconstructed as well.
 354 | +     * vvNew is serialized, but only used if ADDRMAN_UNKNOWN_BUCKET_COUNT and the asmap checksum

naumenkogs commented at 8:40 AM on December 8, 2020:

This comment is weird in general. I don't think we have anything like ADDRMAN_UNKNOWN_BUCKET_COUNT anymore?

naumenkogs commented at 8:41 AM on December 8, 2020:

Should be talking about ADDRMAN_NEW_BUCKET_COUNT, ADDRMAN_NEW_BUCKETS_PER_ADDRESS.

jnewbery commented at 10:57 AM on December 10, 2020:

Thanks, changed to ADDRMAN_NEW_BUCKET_COUNT.

in src/addrman.h:542 in 75c4f0f3f4 outdated

 536 |              CAddrInfo& info = mapInfo[entry_index];
 537 | +
 538 | +            // The entry shouldn't appear in more than
 539 | +            // ADDRMAN_NEW_BUCKETS_PER_ADDRESS. If it has already, just skip
 540 | +            // this bucket_entry.
 541 | +            if (info.nRefCount >= ADDRMAN_NEW_BUCKETS_PER_ADDRESS) continue;

naumenkogs commented at 8:44 AM on December 8, 2020:

So this happens if a new version of Bitcoin Core reduced ADDRMAN_NEW_BUCKETS_PER_ADDRESS, and in this case we completely forget the extra addrs.

Why do you think it's more reasonable? Assuming addrman is less-than-half-full, there's a good chance this odd addr can find a new place without evicting anything.

jnewbery commented at 10:58 AM on December 10, 2020:

Because ADDRMAN_NEW_BUCKETS_PER_ADDRESS should be an invariant in addrman. No entry should appear in no more than ADDRMAN_NEW_BUCKETS_PER_ADDRESS buckets.

vasild commented at 4:51 PM on January 25, 2021:

@naumenkogs if a new Bitcoin Core reduces ADDRMAN_NEW_BUCKETS_PER_ADDRESS from 8 to e.g. 6 and this condition becomes true in the new version, then the address is already in 6 buckets in memory. So we would not completely forget it, or do I miss something?

ryanofsky commented at 1:51 AM on February 9, 2021:

In commit "[addrman] Don't rebucket new table entries unnecessarily" (a5c9b04959f443372400f9a736c6eaf5502284a1)

It would be good if comment could say when this condition happens. Particularly if there is any case where it's expected to happen, or if it can only happen loading bad data.

Also, discarding entries after max buckets reached seems like another subtle change in behavior that could benefit from a simple unit test. (Or if this an unimportant edge case that doesn't merit a test, maybe it also doesn't merit special handling in the code.)

jnewbery commented at 10:14 AM on February 9, 2021:

I'll expand this comment in a follow-up.

in src/addrman.h:520 in 75c4f0f3f4 outdated

 516 | @@ -517,6 +517,8 @@ friend class CAddrManTest;
 517 |              }
 518 |          }
 519 |  
 520 | +        // Attempt to restore the entry's new buckets if the bucket count and asmap

naumenkogs commented at 8:47 AM on December 8, 2020:

I find it hard to parse "restore the entry's new buckets".

Not only buckets, but also positions.
What exactly is "restore" is also a bit unclear, I'm thinking if there's a better verb. Perhaps just "use the serialized [...]"

jnewbery commented at 11:01 AM on December 10, 2020:

Changed wording to "If the bucket count and asmap checksum haven't changed, then attempt to restore the entries to the buckets/positions they were in before serialization." Hopefully that's clearer!

jnewbery force-pushed on Dec 10, 2020

jnewbery commented at 11:02 AM on December 10, 2020: member

Thanks for the review @naumenkogs. I've addressed all your comments.

laanwj commented at 1:08 PM on December 10, 2020: member

Concept ACK, and the code change looks overall good to me.

#20557 (comment)

Agree that a test would be useful, to make it easier to see that this fixes the mentioned problems and prevent future regression.

naumenkogs commented at 7:48 AM on December 11, 2020: member

utACK dbdf63c6d82ae8d1ca73f11d1b8844614f0bf8e3

in src/addrman.h:509 in 41f2fd90eb outdated

 512 | -                s >> nIndex;
 513 | -                if (nIndex >= 0 && nIndex < nNew) {
 514 | -                    bucket_entries.emplace_back(bucket, nIndex);
 515 | +        for (int bucket = 0; bucket < nUBuckets; ++bucket) {
 516 | +            int bucket_size{0};
 517 | +            s >> bucket_size;

glozow commented at 1:28 AM on January 14, 2021:

Just to clarify... this is the number of entries in this bucket, right? Not the capacity of the bucket, which is ADDRMAN_BUCKET_SIZE?

jnewbery commented at 1:11 PM on January 14, 2021:

Exactly (see the comment above about serialization format: https://github.com/jnewbery/bitcoin/blob/dbdf63c6d82ae8d1ca73f11d1b8844614f0bf8e3/src/addrman.h#L341)

glozow commented at 11:56 PM on January 15, 2021:

I just thought both being called "size" is a bit confusing so num_entries would have been my choice for naming, but it's the nittiest of nits, non-blocking

jnewbery commented at 1:23 PM on January 18, 2021:

I agree that this is confusing. I've renamed to num_entries.

in src/addrman.h:530 in 03002228b1 outdated

 539 | +            // ADDRMAN_NEW_BUCKETS_PER_ADDRESS. If it has already, just skip
 540 | +            // this bucket_entry.
 541 | +            if (info.nRefCount >= ADDRMAN_NEW_BUCKETS_PER_ADDRESS) continue;
 542 | +
 543 |              int nUBucketPos = info.GetBucketPosition(nKey, true, bucket);
 544 | -            if (format >= Format::V2_ASMAP && nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && vvNew[bucket][nUBucketPos] == -1 &&

glozow commented at 1:46 AM on January 14, 2021:

So you're removing the format >= Format::V2_ASMAP condition here because otherwise it'd be impossible for serialized_asmap_checksum == supplied_asmap_checksum yes?

jnewbery commented at 1:13 PM on January 14, 2021:

Correct. If format == V1_DETERMINISTIC, then serialized_asmap_checksum must be 0.

glozow commented at 1:57 AM on January 14, 2021: member

code review ACK https://github.com/bitcoin/bitcoin/commit/dbdf63c6d82ae8d1ca73f11d1b8844614f0bf8e3

Can see that it fixes the bug from https://github.com/bitcoin/bitcoin/commit/ec45646de9e62b3d42c85716bfeb06d8f2b507dc, previously looped over each entry, retrieving the first bucket we recorded in entryToBucket, and now creates a list of <bucket, index> pairs and loops over all of those instead. Some clarifying questions because I'm super unfamiliar with this code and stared at it for a long time.

laanwj added this to the "Bugfixes" column in a project

jnewbery force-pushed on Jan 18, 2021

[addrman] Fix new table bucketing during unserialization

An addrman entry can appear in up to 8 new table buckets. We store this
entry->bucket indexing during shutdown so that on restart we can restore
the entries to their correct buckets.

Commit ec45646de9e62b3d42c85716bfeb06d8f2b507dc broke the
deserialization code so that each entry could only be put in up to one
new bucket. Fix that.

b4c5fda417

[addrman] Improve variable naming/code style of touched code. 009b8e0fdf

[addrman] Rename asmap version to asmap checksum

Version implies that higher numbers take precendence. This is really a
checksum, to check whether the provided asmap is the same as the one
used when the peers.dat file was serialized.

Also update the comments to explain where/why this is used.

8062d928ce

[addrman] Don't rebucket new table entries unnecessarily

Only rebucket if the asmap checksum has changed, not if the file format
has changed but no asmap is provided.

Also, don't try to add an entry to another bucket if it already appears
in ADDRMAN_NEW_BUCKETS_PER_ADDRESS buckets.

a5c9b04959

[addrman] Improve variable naming/code style of touched code. ac3547eddd

jnewbery commented at 1:25 PM on January 18, 2021: member

Improved the naming of one of the variables as suggested in #20557 (review) : https://github.com/bitcoin/bitcoin/compare/dbdf63c6d82ae8d1ca73f11d1b8844614f0bf8e3..36f3d9d8a48541f0b44ea4cff52a770cab43f2f0.

jnewbery force-pushed on Jan 18, 2021

jnewbery commented at 2:36 PM on January 18, 2021: member

Rebased on master

naumenkogs commented at 7:11 AM on January 19, 2021: member

ACK ac3547eddd8a7d67b4103508f30d5d02a9c1f148

in src/addrman.h:353 in ac3547eddd outdated

 348 | @@ -348,8 +349,8 @@ friend class CAddrManTest;
 349 |       * Notice that vvTried, mapAddr and vVector are never encoded explicitly;
 350 |       * they are instead reconstructed from the other information.
 351 |       *
 352 | -     * vvNew is serialized, but only used if ADDRMAN_UNKNOWN_BUCKET_COUNT didn't change,
 353 | -     * otherwise it is reconstructed as well.
 354 | +     * vvNew is serialized, but only used if ADDRMAN_NEW_BUCKET_COUNT and the asmap checksum
 355 | +     * didn't change, otherwise it is reconstructed as well.

vasild commented at 5:36 AM on January 26, 2021:

vVector does not exist. More importantly - none of the members is serialized explicitly/directly. So, I think the following changes would make a better description of the source code:

diff --git i/src/addrman.h w/src/addrman.h
index cde864f25..983a1f4fd 100644
--- i/src/addrman.h
+++ w/src/addrman.h
@@ -332,29 +332,26 @@ public:
      *     (format=5, lowest_compatible=5) and so any versions that do not know how to parse
      *     format=5 will not try to read the file.
      * * nKey
      * * nNew
      * * nTried
      * * number of "new" buckets XOR 2**30
-     * * all nNew addrinfos in vvNew
-     * * all nTried addrinfos in vvTried
-     * * for each bucket:
+     * * all new addresses (total count: nNew)
+     * * all tried addresses (total count: nTried)
+     * * for each new bucket:
      *   * number of elements
-     *   * for each element: index
+     *   * for each element: index in the serialized "all new addresses"
      * * asmap checksum
      *
      * 2**30 is xorred with the number of buckets to make addrman deserializer v0 detect it
      * as incompatible. This is necessary because it did not check the version number on
      * deserialization.
      *
-     * Notice that vvTried, mapAddr and vVector are never encoded explicitly;
+     * Notice that vvNew, vvTried, mapAddr, mapInfo and vRandom are never encoded explicitly;
      * they are instead reconstructed from the other information.
      *
-     * vvNew is serialized, but only used if ADDRMAN_NEW_BUCKET_COUNT and the asmap checksum
-     * didn't change, otherwise it is reconstructed as well.
-     *
      * This format is more complex, but significantly smaller (at most 1.5 MiB), and supports
      * changes to the ADDRMAN_ parameters without breaking the on-disk structure.
      *
      * We don't use SERIALIZE_METHODS since the serialization and deserialization code has
      * very little in common.
      */

Why vvNew is not serialized? Because vvNew[i][j] contains the id under which the element can be found in mapInfo. We serialize a condensed version of vvNew (without the -1s), but instead of that id we store an index in the serialized list of CAddrInfo which exists only on disk.

jnewbery commented at 10:31 AM on January 29, 2021:

Good suggestions. I've added a new commit that makes these changes.

vasild commented at 10:54 AM on January 29, 2021:

Thanks! s/vVector/vRandom/ in that commit - there is no member named vVector. Also, consider adding mapInfo to the list of members that "are never encoded explicitly" - the ids (the keys of that map) are not serialized, only the values (and in different order than in the map).

jnewbery commented at 12:40 PM on January 29, 2021:

Thanks. I applied the diff by hand and missed those. Now fixed!

in src/addrman.h:545 in ac3547eddd outdated

 560 | +            // ADDRMAN_NEW_BUCKETS_PER_ADDRESS. If it has already, just skip
 561 | +            // this bucket_entry.
 562 | +            if (info.nRefCount >= ADDRMAN_NEW_BUCKETS_PER_ADDRESS) continue;
 563 | +
 564 | +            int bucket_position = info.GetBucketPosition(nKey, true, bucket);
 565 | +            if (restore_bucketing && vvNew[bucket][bucket_position] == -1) {

vasild commented at 5:48 AM on January 26, 2021:

Under what conditions could it happen that restore_bucketing is false and vvNew[bucket][bucket_position] != -1 (occupied)?

Is it only if ADDRMAN_BUCKET_SIZE has been changed? If that has happened then we shouldn't try to restore the bucketing (i.e. should set restore_bucketing to false), but unfortunately the old ADDRMAN_BUCKET_SIZE is not saved on disk, so we can only indirectly observe that the file was written with a different ADDRMAN_BUCKET_SIZE.

(the above is not a suggestion for a change, just a few questions to confirm my understanding is correct)

jnewbery commented at 10:23 AM on January 29, 2021:

If both restore_bucketing is false and vvNew[bucket][bucket_position] != -1 then it means that two addresses hash into the same position in the same new bucket, which is possible based on the source address. restore_bucketing would be false if the number of new buckets had changed or the as map had changed.

vasild commented at 11:14 AM on January 29, 2021:

Yes! So it follows that (correct me if I am wrong):

If restore_bucketing is true then vvNew[bucket][bucket_position] == -1 is also true (position is free, not occupied). So the if condition can be reduced to just if (restore_bucketing) {.
If restore_bucketing is false then vvNew[bucket][bucket_position] == -1 may be true or false depending on the hashing. In this case too, the if condition is equivalent to just if (restore_bucketing) {.

jnewbery commented at 12:30 PM on January 29, 2021:

I think vvNew[bucket][bucket_position] == -1 is just a belt-and-suspenders. It shouldn't be hit (if so, then something has gone wrong with the position serializing/deserializing to place two addresses in the same bucket/position). I think you're right that we can just remove it, but can we leave that as a follow-up? We can't fix all of addrman's problems in one PR :)

vasild commented at 2:30 PM on January 29, 2021:

Alright, so either we are both correct that this is redundant and not needed, or we are both wrong!

If we are wrong, and it can happen that restore_bucketing is true and vvNew[bucket][bucket_position] == -1 is false (occupied!) then moving the LogPrint() call was wrong because it would have been executed inside the else branch and now it will not be executed because now it is conditional only on restore_bucketing.

in src/addrman.h:552 in ac3547eddd outdated

 570 | +                ++info.nRefCount;
 571 |              } else {
 572 | -                // In case the new table data cannot be used (format unknown, bucket count wrong or new asmap),
 573 | +                // In case the new table data cannot be used (bucket count wrong or new asmap),
 574 |                  // try to give them a reference based on their primary source address.
 575 |                  LogPrint(BCLog::ADDRMAN, "Bucketing method was updated, re-bucketing addrman entries from disk\n");

vasild commented at 6:04 AM on January 26, 2021:

If restore_bucketing is false, then this will be printed for every address, possibly many times. But it does not include the address itself. What about printing the address:

                LogPrint(BCLog::ADDRMAN, "Bucketing method was updated, re-bucketing addrman entry from disk for %s\n", info.ToString());

or print the generic message once:

                if (!printed) {
                    LogPrint(BCLog::ADDRMAN, "Bucketing method was updated, re-bucketing addrman entries from disk\n");
                    printed = true;
                }

jnewbery commented at 10:27 AM on January 29, 2021:

Good catch! I'm just going to move the log to outside the loop to avoid it printing many times. Only printing once seems to be the original intent of this log. Improved logging can be left for a future PR.

vasild commented at 11:16 AM on January 29, 2021:

s/restore_bucketing/!restore_bucketing/ in 9a0760db4?

jnewbery commented at 12:31 PM on January 29, 2021:

:man_facepalming: You're right of course. I'll fix.

vasild commented at 4:42 PM on January 26, 2021: contributor

I think a good test would be to add a round trip serialize/unserialize to the addrman fuzz test, and compare the addrman before and after that round trip. There's some internal state that isn't reconstructed from serialize/unserialize (eg the vRandom ordering), but a comparator function that checked number of entries and bucket->entry pairs should work.

Created such test at: https://github.com/jnewbery/bitcoin/pull/18.

~~It does not compare bucket->entry though because the ids (the keys into mapInfo) could be different after ser/unser.~~

jnewbery commented at 10:32 AM on January 29, 2021: member

Thanks for the review @vasild. I've resolved your review comments.

I haven't taken your fuzz test yet since it seems like there's a bit more work to make sure that it's getting reliable coverage.

jnewbery force-pushed on Jan 29, 2021

[addrman] Improve serialization comments

Thanks to Vasil Dimov <vd@FreeBSD.org> for these suggestions

436292367c

[addrman] Don't repeat "Bucketing method was updated" log multiple times

Thanks to Vasil Dimov <vd@FreeBSD.org> for these suggestions

4676a4fb5b

jnewbery force-pushed on Jan 29, 2021

vasild commented at 2:24 PM on January 29, 2021: contributor

ACK 4676a4fb5be0f6ef0b3f71c1f4361c20f7cb0e0b

The addition of a fuzzing test (https://github.com/jnewbery/bitcoin/pull/18) is not a show stopper for this PR.
To my understanding, the unneeded vvNew[bucket][bucket_position] == -1 can be removed. It was part of the code before this PR and is doing no harm (other than confusion). Not a show stopper either.

vasild approved

glozow commented at 11:28 PM on January 29, 2021: member

re-ACK https://github.com/bitcoin/bitcoin/commit/4676a4fb5be0f6ef0b3f71c1f4361c20f7cb0e0b, changes were a rename, comments, and removing repeat-logging.

naumenkogs commented at 9:04 AM on January 30, 2021: member

ACK 4676a4f

jnewbery commented at 11:13 AM on February 3, 2021: member

@MarcoFalke or @laanwj - this PR now has 3 ACKs. If one of you re-reviewed, then it might be ready for merge.

jnewbery cross-referenced this on Feb 8, 2021 from issue fuzz: check that ser+unser produces the same AddrMan by vasild

in src/addrman.h:530 in b4c5fda417 outdated

 524 | @@ -523,9 +525,10 @@ friend class CAddrManTest;
 525 |              s >> serialized_asmap_version;
 526 |          }
 527 |  
 528 | -        for (int n = 0; n < nNew; n++) {
 529 | -            CAddrInfo &info = mapInfo[n];
 530 | -            int bucket = entryToBucket[n];
 531 | +        for (auto bucket_entry : bucket_entries) {
 532 | +            int bucket{bucket_entry.first};
 533 | +            const int n{bucket_entry.second};

ryanofsky commented at 9:50 PM on February 8, 2021:

In commit "[addrman] Fix new table bucketing during unserialization" (b4c5fda417dd9ff8bf9fe24a87d384a649e3730d)

Could avoid some verbosity here with

-        for (auto bucket_entry : bucket_entries) {
-            int bucket{bucket_entry.first};
-            const int n{bucket_entry.second};
+        for (auto [bucket, n] : bucket_entries) {

jnewbery commented at 10:24 AM on February 9, 2021:

:heart: structured bindings. Good suggestion. I'll do in a follow-up.

in src/addrman.h:505 in b4c5fda417 outdated

 499 | @@ -500,7 +500,9 @@ friend class CAddrManTest;
 500 |          nTried -= nLost;
 501 |  
 502 |          // Store positions in the new table buckets to apply later (if possible).
 503 | -        std::map<int, int> entryToBucket; // Represents which entry belonged to which bucket when serializing
 504 | +        // An entry may appear in up to ADDRMAN_NEW_BUCKETS_PER_ADDRESS buckets,
 505 | +        // so we store all bucket-entry_index pairs to iterate through later.
 506 | +        std::vector<std::pair<int, int>> bucket_entries;

ryanofsky commented at 9:55 PM on February 8, 2021:

In commit "[addrman] Fix new table bucketing during unserialization" (b4c5fda417dd9ff8bf9fe24a87d384a649e3730d)

Seems like having a data structure here is superfluous if it's just going to be filled up then immediately iterated over and discarded. Maybe it would make sense to remove this vector later.

vasild commented at 10:07 AM on February 9, 2021:

I think that vector could be avoided only if we could seek into the stream. Then we could change the current:

read new buckets
read asmap checksum
process the new buckets that were read, depending on the asmap checksum

to

seek forward and read the asmap checksum (last 32 bytes)
seek back and read+process the new bucketing data, depending on the asmap checksum

jnewbery commented at 10:09 AM on February 9, 2021:

We only know how we're going to use the vector after we've deserialized the asmap_checksum field, so I don't think we can get rid of it.

in src/addrman.h:514 in b4c5fda417 outdated

 510 | @@ -509,7 +511,7 @@ friend class CAddrManTest;
 511 |                  int nIndex = 0;
 512 |                  s >> nIndex;
 513 |                  if (nIndex >= 0 && nIndex < nNew) {
 514 | -                    entryToBucket[nIndex] = bucket;
 515 | +                    bucket_entries.emplace_back(bucket, nIndex);

ryanofsky commented at 10:07 PM on February 8, 2021:

In commit "[addrman] Fix new table bucketing during unserialization" (b4c5fda417dd9ff8bf9fe24a87d384a649e3730d)

It would be good to have some test coverage for this. A functional test might be hard but I would think a c++ unit test should be straightforward. Past bugs predict future bugs so a test here could steer coverage in a useful direction.

vasild commented at 10:09 AM on February 9, 2021:

A test is in the baking at https://github.com/jnewbery/bitcoin/pull/18.

jnewbery commented at 10:10 AM on February 9, 2021:

I definitely agree. @vasild has a test here: https://github.com/jnewbery/bitcoin/pull/18 which can be added after this PR.

in src/addrman.h:528 in b4c5fda417 outdated

 524 | @@ -523,9 +525,10 @@ friend class CAddrManTest;
 525 |              s >> serialized_asmap_version;
 526 |          }
 527 |  
 528 | -        for (int n = 0; n < nNew; n++) {
 529 | -            CAddrInfo &info = mapInfo[n];
 530 | -            int bucket = entryToBucket[n];
 531 | +        for (auto bucket_entry : bucket_entries) {

dhruv commented at 1:43 AM on February 9, 2021:

The old code iterates over mapInfo[0..nNew], so we were guaranteed that all CAddrInfos in the new table make their way into vvNew (either in the old buckets, or newly assigned ones in the else block below). With the new code, we have an implicit assumption that bucket_entries.size() == nNew. Is that safe to assume? I think it is but just wanted to ask since we are iterating over a different part of the serialized data now.

vasild commented at 9:46 AM on February 9, 2021:

I think bucket_entries.size() >= nNew, let me use an example:

mapInfo[id=3] = 1.1.1.1
// for the example, I pick up some arbitrary indexes in vvNew
vvNew[2][5] = id=3
vvNew[4][9] = id=3
nNew = 1

This would be serialized on disk as:

1 (nNew)
0 (nTried)
1024 (new_bucket_count)
1.1.1.1 (list of new addresses) (*)
(zero tried addresses)
0 (number of elements in vvNew[0])
0 (number of elements in vvNew[1])
1 (number of elements in vvNew[2])
0 (the index in (*) of the first non empty element in vvNew[2])
0 (number of elements in vvNew[3])
1 (number of elements in vvNew[4])
0 (the index in (*) of the first non empty element in vvNew[4])
0 (number of elements in vvNew[5])
...
0 (number of elements in vvNew[1023])

During unserialization, in the new code from this PR, this will result in:

bucket_entries[0] = (bucket = 2, index in (*) = 0)
bucket_entries[1] = (bucket = 4, index in (*) = 0)

So, bucket_entries.size() = 2 while nNew = 1.

jnewbery commented at 10:01 AM on February 9, 2021:

This was part of the bug that this PR fixes. nNew is the total number of new addrs. Each new addr can appear in more than one new bucket, so the number of entries in bucket_entries could be higher than nNew. The old code was potentially throwing away these additional placements.

in src/addrman.h:513 in 009b8e0fdf outdated

 516 | +            int num_entries{0};
 517 | +            s >> num_entries;
 518 | +            for (int n = 0; n < num_entries; ++n) {
 519 | +                int entry_index{0};
 520 | +                s >> entry_index;
 521 | +                if (entry_index >= 0 && entry_index < nNew) {

dhruv commented at 1:45 AM on February 9, 2021:

(nit) since entry_index can be confused with the for-loop index, perhaps entry_id is better (to match nIdCount)?

vasild commented at 9:50 AM on February 9, 2021:

That variable entry_index describes the index of the address in the disk-only list of new addresses (marked as (*) in the comment above).

I think that should not be confused with the in-memory-only "id" and entry_index describes it well.

jnewbery commented at 10:04 AM on February 9, 2021:

I think that's a reasonable suggestion, but not worth invalidating the existing ACKs.

in src/addrman.h:419 in 8062d928ce outdated

 413 | @@ -413,13 +414,13 @@ friend class CAddrManTest;
 414 |                  }
 415 |              }
 416 |          }
 417 | -        // Store asmap version after bucket entries so that it
 418 | +        // Store asmap checksum after bucket entries so that it
 419 |          // can be ignored by older clients for backward compatibility.
 420 | -        uint256 asmap_version;
 421 | +        uint256 asmap_checksum;

dhruv commented at 1:47 AM on February 9, 2021:

Thank you! I actually tripped over this while reading the old code.

(nit) could this is a scripted-diff?

jnewbery commented at 10:05 AM on February 9, 2021:

(nit) could this is a scripted-diff?

Scripted diffs are useful when there are many occurrences of a string across multiple files. This string is used in 7 places, so is easy enough to review by eye.

in src/addrman.h:352 in 436292367c outdated

 352 |       *
 353 | -     * Notice that vvTried, mapAddr and vVector are never encoded explicitly;
 354 | +     * vvNew, vvTried, mapInfo, mapAddr and vRandom are never encoded explicitly;
 355 |       * they are instead reconstructed from the other information.
 356 |       *
 357 | -     * vvNew is serialized, but only used if ADDRMAN_NEW_BUCKET_COUNT and the asmap checksum

dhruv commented at 2:02 AM on February 9, 2021:

In the format above aren't lines 340-342 an encoding of vvNew? If so, why remove this comment?

vasild commented at 9:54 AM on February 9, 2021:

No, because the memory-only id may not equal to the disk-only index in the serialized "all new addresses".

jnewbery commented at 10:06 AM on February 9, 2021:

See the line above. vvNew isn't encoded explicitly but is reconstructed.

dhruv commented at 2:08 AM on February 9, 2021: contributor

ACK 4676a4fb5be0f6ef0b3f71c1f4361c20f7cb0e0b

A couple nits and questions below.

in src/addrman.h:544 in a5c9b04959 outdated

 541 | +            if (info.nRefCount >= ADDRMAN_NEW_BUCKETS_PER_ADDRESS) continue;
 542 | +
 543 |              int nUBucketPos = info.GetBucketPosition(nKey, true, bucket);
 544 | -            if (format >= Format::V2_ASMAP && nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && vvNew[bucket][nUBucketPos] == -1 &&
 545 | -                info.nRefCount < ADDRMAN_NEW_BUCKETS_PER_ADDRESS && serialized_asmap_checksum == supplied_asmap_checksum) {
 546 | +            if (restore_bucketing && vvNew[bucket][nUBucketPos] == -1) {

ryanofsky commented at 2:19 AM on February 9, 2021:

In commit "[addrman] Don't rebucket new table entries unnecessarily" (a5c9b04959f443372400f9a736c6eaf5502284a1)

Just some notes to make sure I'm following this correctly.

The restore_bucketing change here is aesthetic and the only actual change this line is dropping the format >= Format::V2_ASMAP condition.
The effect of dropping format condition is this will now keep existing bucket positions instead of rebucketing when an old format is loaded and there is no asmap. ("Only rebucket if" in commit description means "Use existing bucket positions if")
In the case where an old format is loaded an there is an asmap, this logic isn't triggered because the serialized_asmap_checksum == supplied_asmap_checksum condition is false.
Maybe could be a unit test checking old format number + no asmap uses existing bucket positions

jnewbery commented at 10:23 AM on February 9, 2021:

The restore_bucketing change here is aesthetic and the only actual change this line is dropping the format >= Format::V2_ASMAP condition.

Right, this condition doesn't change during loop iterations, so it's clearer to move it out of the loop, mark it const, and only do a conditional check inside the loop on what can change over iterations.

The effect of dropping format condition is this will now keep existing bucket positions instead of rebucketing when an old format is loaded and there is no asmap. ("Only rebucket if" in commit description means "Use existing bucket positions if")

That's right. If the file format is old (no asmap), and we don't have an asmap, then we gain nothing by dropping new bucket positions.

In the case where an old format is loaded an there is an asmap, this logic isn't triggered because the serialized_asmap_checksum == supplied_asmap_checksum condition is false.

Right.

Maybe could be a unit test checking old format number + no asmap uses existing bucket positions

Perhaps. It's a bit difficult to test because we can't easily serialize addrman in the old format.

ryanofsky approved

ryanofsky commented at 2:43 AM on February 9, 2021: contributor

Code review ACK 4676a4fb5be0f6ef0b3f71c1f4361c20f7cb0e0b. I'm not previously familiar with this code but all the changes here do make sense and seem like improvements. Left some notes and comments, but they aren't important so feel to ignore.

jnewbery commented at 10:24 AM on February 9, 2021: member

Thanks @dhruv and @ryanofsky for the reviews. I think this is ready for merge now, and the remaining review comments can be done in follow-ups:

Add test (https://github.com/bitcoin/bitcoin/pull/20557#issuecomment-737890278, also see https://github.com/jnewbery/bitcoin/pull/18)
Use structured bindings to iterate bucket_entries (https://github.com/bitcoin/bitcoin/pull/20557#discussion_r572398580)
Expand comment about info.nRefCount >= ADDRMAN_NEW_BUCKETS_PER_ADDRESS (https://github.com/bitcoin/bitcoin/pull/20557#discussion_r572509364)

laanwj commented at 11:36 AM on February 9, 2021: member

Code review ACK 4676a4fb5be0f6ef0b3f71c1f4361c20f7cb0e0b

laanwj merged this on Feb 9, 2021

laanwj closed this on Feb 9, 2021

sidhujag referenced this in commit 0702be7159 on Feb 9, 2021

laanwj removed this from the "Bugfixes" column in a project

PastaPastaPasta referenced this in commit 5c455421d5 on Jun 27, 2021

PastaPastaPasta referenced this in commit 79690d5554 on Jun 28, 2021

PastaPastaPasta referenced this in commit 54a05bfc58 on Jun 29, 2021

jnewbery deleted the branch on Jul 1, 2021

PastaPastaPasta referenced this in commit d6b893c36a on Jul 1, 2021

PastaPastaPasta referenced this in commit 8e73c5cc96 on Jul 1, 2021

PastaPastaPasta referenced this in commit 09e574156a on Jul 15, 2021

PastaPastaPasta referenced this in commit cd3bedaf1c on Jul 16, 2021

gades referenced this in commit 0bff433173 on May 8, 2022

bitcoin locked this on Aug 16, 2022