addrman: bump peers.dat format version number #23382

pull vasild wants to merge 1 commits into bitcoin:master from vasild:addrman_format_bump_multiaddr changing 2 files +11 −2
  1. vasild commented at 2:43 PM on October 28, 2021: member

    After 92617b7a758c0425330fba4b886296730567927c (https://github.com/bitcoin/bitcoin/pull/23306) we allow multiple entries in addrman with the same address that only differ by port.

    Such peers.dat trips versions before the above commit and they fail to read it with an obscure "corruption" error:

    Corrupt data. Consistency check failed with code -5
    

    So, if we detect that there are such "duplicate" entries, write the file with a newver format version, so that old software emits:

    Unsupported format of addrman database: 4. It is compatible with formats >=4, but the maximum supported by this version of Bitcoin Core is 3.
    
  2. addrman: bump peers.dat format version number
    After 92617b7a758c0425330fba4b886296730567927c
    (https://github.com/bitcoin/bitcoin/pull/23306) we allow multiple entries
    in addrman with the same address that only differ by port.
    
    Such `peers.dat` trips versions before the above commit and they fail to
    read it with an obscure "corruption" error:
    
    ```
    Corrupt data. Consistency check failed with code -5
    ```
    
    So, if we detect that there are such "duplicate" entries, write the file
    with a newver format version, so that old software emits:
    
    ```
    Unsupported format of addrman database: 4. It is compatible with formats >=4, but the maximum supported by this version of Bitcoin Core is 3.
    ```
    1a8a382787
  3. in src/addrman.cpp:172 in 1a8a382787
     168 | @@ -169,9 +169,17 @@ void AddrManImpl::Serialize(Stream& s_) const
     169 |  
     170 |      s << static_cast<uint8_t>(FILE_FORMAT);
     171 |  
     172 | +    std::set<CNetAddr> dup_guard;
    


    vasild commented at 2:44 PM on October 28, 2021:

    Checking 70k addresses takes 0.032 sec. If changed to unordered_map then it takes 0.017 sec. While that is ~2x speedup, I think it is insignificant during Serialize() and not worth introducing and maintaining CNetAddr hasher.

    <details> <summary>CNetAddrHash</summary>

    class CNetAddrHash
    {
    public:
        size_t operator()(const CNetAddr& a) const noexcept
        {
            CSipHasher hasher(m_salt_k0, m_salt_k1);
            hasher.Write(a.m_net);
            hasher.Write(a.m_addr.data(), a.m_addr.size());
            return static_cast<size_t>(hasher.Finalize());
        }
    
    private:
        const uint64_t m_salt_k0 = GetRand(std::numeric_limits<uint64_t>::max());
        const uint64_t m_salt_k1 = GetRand(std::numeric_limits<uint64_t>::max());
    };
    

    </details>


    sipa commented at 2:50 PM on October 28, 2021:

    std::sort + std::adjacent_find may be faster than both.


    vasild commented at 3:42 PM on October 28, 2021:

    Inserting the elements in std::set<CNetAddr> is essentially sorting them (by address only). Inserting N elements would be O(N*log(N)). Same as std::sort, except that the std::set variant could quit early if a duplicate is found.

    The std::unordered_set variant is O(N) (time and memory).


    sipa commented at 3:53 PM on October 28, 2021:

    @vasild Yes, but both set/unordered_set will require an allocation per element. I would expect that to outweigh the complexity difference for the sizes we're talking about.


    vasild commented at 9:17 AM on October 29, 2021:

    I played with this out of curiosity. Speed is not so important in this case. But anyway:

    The "sort + adjacent_find":

        std::vector<CNetAddr> dup_guard;
        dup_guard.reserve(mapAddr.size());
        for (const auto& [addr, id] : mapAddr) {
            dup_guard.push_back(addr);
        }
        std::sort(dup_guard.begin(), dup_guard.end());
        const bool duplicates_by_addr =
            std::adjacent_find(dup_guard.begin(), dup_guard.end()) != dup_guard.end();
    

    Performs about the same as the unordered_set variant. The reserve() call is very important - if missing then this is slower than unordered_set.

    However, the unordered_set also benefits from a reserve() call:

        std::unordered_set<CNetAddr, CNetAddrHash> dup_guard;
        dup_guard.reserve(mapAddr.size());
    

    This makes it the fastest on Earth - 0.011 sec.

    The above is for about 70k entries without duplicates (worst case). If there are duplicates, then the unordered_set is even faster as it would cancel the sorting somewhere in the middle (sorting is the slowest operation). In my case, with just 73 duplicates from 70k entries, the timings from a few runs are: 0.0013, 0.00077, 0.002, 0.00064.

    Clang 13, -O2

  4. sipa commented at 2:46 PM on October 28, 2021: member

    Duplicate of #23354, though this variant also uses V3 when possible.

  5. vasild commented at 2:55 PM on October 28, 2021: member

    Duplicate of #23354, though this variant also uses V3 when possible.

    Doh, closing in favor of that one. I wasn't aware of it. Sorry for the noise.

    Feel free to pick the use-V3-if-no-duplicates tweak into #23354.

  6. vasild closed this on Oct 28, 2021

  7. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 18:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me