I compared the performance of the PR branch with @vasild's suggested optimization ("VD") a bit more closely:
Test 1: Reasonably full addrman (15k addresses): ./bench_bitcoin -filter=AddrManSelect -min-time=5000
=> The PR is much faster, since VD has a constant overhead due to the shuffling.
<details>
<summary>Details</summary>
PR:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 142.41 | 7,021,818.47 | 0.2% | 5.50 | AddrManSelect
| 145.75 | 6,860,921.17 | 0.4% | 5.63 | AddrManSelect
| 145.91 | 6,853,404.83 | 1.4% | 5.55 | AddrManSelect
| 149.78 | 6,676,385.78 | 3.7% | 5.63 | AddrManSelect
| 144.37 | 6,926,771.77 | 0.3% | 5.49 | AddrManSelect
VD:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 12,716.43 | 78,638.44 | 0.1% | 5.49 | AddrManSelect
| 12,790.84 | 78,180.95 | 0.8% | 5.36 | AddrManSelect
| 12,884.37 | 77,613.41 | 0.7% | 5.53 | AddrManSelect
| 12,815.30 | 78,031.72 | 0.9% | 5.52 | AddrManSelect
| 12,961.91 | 77,149.10 | 1.3% | 5.55 | AddrManSelect
</details>
<br />
Test 2: A single address in AddrMan:
bench_bitcoin -filter=AddrManSelectFromAlmostEmpty -min-time=5000
=> VD is faster by a factor of 1.8
<details>
<summary>Details</summary>
PR:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 93,286.28 | 10,719.69 | 2.2% | 5.42 | AddrManSelectFromAlmostEmpty
| 90,781.42 | 11,015.47 | 1.0% | 5.52 | AddrManSelectFromAlmostEmpty
| 92,758.06 | 10,780.73 | 1.2% | 5.57 | AddrManSelectFromAlmostEmpty
| 90,026.20 | 11,107.88 | 0.7% | 5.52 | AddrManSelectFromAlmostEmpty
| 91,946.89 | 10,875.84 | 1.1% | 5.47 | AddrManSelectFromAlmostEmpty
VD:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 55,306.93 | 18,080.92 | 0.7% | 5.50 | AddrManSelectFromAlmostEmpty
| 55,209.86 | 18,112.71 | 0.4% | 5.48 | AddrManSelectFromAlmostEmpty
| 54,989.61 | 18,185.25 | 0.5% | 5.51 | AddrManSelectFromAlmostEmpty
| 56,727.70 | 17,628.07 | 2.1% | 5.57 | AddrManSelectFromAlmostEmpty
| 55,732.54 | 17,942.84 | 0.5% | 5.50 | AddrManSelectFromAlmostEmpty
</details>
Test 3: Three addresses in AddrMan: set NUM_SOURCES = 3, NUM_ADDRESSES_PER_SOURCE=1 then run
./bench_bitcoin -filter=AddrManSelect -min-time=5000
=> PR is slightly faster.
<details>
<summary>Details</summary>
PR:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 30,034.99 | 33,294.50 | 0.7% | 5.51 | AddrManSelect
| 30,370.28 | 32,926.92 | 1.1% | 5.17 | AddrManSelect
| 30,574.52 | 32,706.98 | 2.1% | 5.20 | AddrManSelect
| 29,883.53 | 33,463.25 | 0.7% | 5.52 | AddrManSelect
| 29,944.31 | 33,395.33 | 0.5% | 5.52 | AddrManSelect
VD:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 34,126.18 | 29,303.02 | 0.7% | 5.30 | AddrManSelect
| 34,660.26 | 28,851.49 | 0.9% | 5.32 | AddrManSelect
| 34,075.81 | 29,346.33 | 0.7% | 5.30 | AddrManSelect
| 34,628.06 | 28,878.31 | 1.3% | 5.36 | AddrManSelect
| 34,300.57 | 29,154.03 | 0.6% | 5.31 | AddrManSelect
</details>
Test 4:
Query for a single network-specific address in an AddrMan filled with 15k IPV4 addresses we don't want
./bench_bitcoin -filter=AddrManSelectByNetwork -min-time=5000
=> VD is faster by a factor of ~2.1.
<details>
<summary>Details</summary>
PR:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,256,860.84 | 795.63 | 4.4% | 5.40 | AddrManSelectByNetwork
| 1,269,322.70 | 787.82 | 2.6% | 5.50 | AddrManSelectByNetwork
| 1,242,639.25 | 804.74 | 1.9% | 5.56 | AddrManSelectByNetwork
| 1,246,311.65 | 802.37 | 2.9% | 5.69 | AddrManSelectByNetwork
| 1,265,535.68 | 790.18 | 3.4% | 5.59 | AddrManSelectByNetwork
VD
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 560,818.30 | 1,783.11 | 1.0% | 5.56 | AddrManSelectByNetwork
| 558,351.91 | 1,790.99 | 0.9% | 5.53 | AddrManSelectByNetwork
| 600,536.33 | 1,665.18 | 4.8% | 5.38 | AddrManSelectByNetwork
| 569,208.08 | 1,756.83 | 1.7% | 5.45 | AddrManSelectByNetwork
| 581,860.03 | 1,718.63 | 2.3% | 5.47 | AddrManSelectByNetwork
</details>
Test 5:
Query for 20 network-specific address in an AddrMan filled with 15k IPV4 addresses we don't want (changing the code a bit)
./bench_bitcoin -filter=AddrManSelectByNetwork -min-time=5000
=> Performance is approximately the same.
<details>
<summary>Details</summary>
PR:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 109,388.10 | 9,141.76 | 1.6% | 5.40 | AddrManSelectByNetwork
| 108,804.63 | 9,190.79 | 2.1% | 5.56 | AddrManSelectByNetwork
| 111,333.16 | 8,982.05 | 2.0% | 5.61 | AddrManSelectByNetwork
| 110,827.61 | 9,023.02 | 1.4% | 5.55 | AddrManSelectByNetwork
| 122,518.06 | 8,162.06 | 1.4% | 5.47 | AddrManSelectByNetwork
VD
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 106,806.78 | 9,362.70 | 2.5% | 5.51 | AddrManSelectByNetwork
| 109,221.66 | 9,155.69 | 3.1% | 5.59 | AddrManSelectByNetwork
| 107,870.77 | 9,270.35 | 3.8% | 5.41 | AddrManSelectByNetwork
| 105,131.17 | 9,511.93 | 0.5% | 5.47 | AddrManSelectByNetwork
| 106,080.47 | 9,426.81 | 1.2% | 5.48 | AddrManSelectByNetwork
</details>
In conclusion, my results indicate that the suggested optimization is only faster in some specific scenarios because the pre-shuffling of the buckets adds a constant overhead that is not neglible:
In the case of non-network specific queries (which should be the bulk even after #27213 ), the PR is already faster when AddrMan has more than 3 addresses (which should virtually always be the case!)
In the case of network-specific queries, the tipping point is (because of the additional cost of looking up addrs in buckets that we can't use) at ~20 network-specific addresses (with 15k IPV4 addresses). This is a bit higher, but I would imagine that we should get more than 20 addresses of most alternative networks rather quickly.
To summarize, my results indicate that adding pre-shuffling AddrMan buckets to prevent repeated visits appears to be no clear performance improvement - in most cases, performance would go down.
@vasild: Do you see similar numbers?