Different numbers regarding the size of the peer database #24278

issue BrianPfitz openend this issue on February 6, 2022
  1. BrianPfitz commented at 1:42 pm on February 6, 2022: none

    Dear Bitcoin Developers,

    I am currently working on a project where I am trying to create long term profiles of bitcoin nodes using peer information. For this purpose I need the size of the peer database. The maximum size is 81,920 due to the buckets, but in order to determine a realistic size, I have been running my own Bitcoin node for a few months. However, this now gives me 2 completely different numbers regarding the peer database, which I cannot explain.

    The first number comes from my debug.log file, which contains all logs about incoming addresses. The following is one of the latest entries: 2021-11-25T08:57:56Z Added 1 addresses (of 1) from 2001:41d0:a:69a2::1: 3678 tried, 58574 new This entry basically gives me the information that my node currently manages more than 61,000 unique (no duplicates) addresses.

    Another way to get the number of peers of my node are the bitcoin-cli commands “addrinfo” and “getnodeaddresses”. However, using these commands I get much less (about 47,000) nodes.

    My question now is how this difference can be explained and what the different numbers (i.e. the one from the debug file and the one from the cli commands) tell me about my peer database.

    Thanks in advance for your help,

    Brian

  2. MarcoFalke commented at 1:48 pm on February 6, 2022: member
    Usually the issue tracker is used to track technical issues related to the Bitcoin Core code base. General bitcoin questions and/or support requests are best directed to the Bitcoin StackExchange or the #bitcoin IRC channel on Libera Chat.
  3. MarcoFalke closed this on Feb 6, 2022

  4. BrianPfitz commented at 11:05 pm on February 16, 2022: none

    Usually the issue tracker is used to track technical issues related to the Bitcoin Core code base. General bitcoin questions and/or support requests are best directed to the Bitcoin StackExchange or the #bitcoin IRC channel on Libera Chat.

    Hi Marco, I have asked in the Bitcoin forum and unfortunately have not received an answer yet. I am also not sure if my question is perhaps a bug or an error in the response of the API. I just can’t explain the two different numbers and think that the answer of the API and the numbers in the debug.log file should be the same.

  5. jonatack commented at 11:14 pm on February 16, 2022: member
    If memory serves, peers for which IsTerrible (grep for the function in the codebase) is true are not included in the rpc/cli calls.
  6. BrianPfitz commented at 11:34 pm on February 16, 2022: none

    Okay that sounds really good! That would be more optimal for my work as well, since peers for which IsTerrible is true are not delivered on GetAddr requests. So I would only need the much smaller number of the rpc/cli call anyway, as only this is relevant for possible attackers. I assume that most peers for which IsTerrible is true will remain true, since they are most likely unreachable nodes.

    Thanks for the answer, that really helped me!

  7. mzumsande commented at 6:42 pm on February 17, 2022: member

    I assume that most peers for which IsTerrible is true will remain true, since they are most likely unreachable nodes.

    They may also be just old (code). For example, if you turn off your node for a month, all of the addresses in your database will be terrible at the next startup. Note that these “terrible” peers are still eligible when trying to make outgoing connections (they are just not included in responses to GetAddr requests, as well as in the RPC calls which uses the same code), so depending on your use case they may still be relevant to include.

  8. BrianPfitz commented at 7:13 pm on February 17, 2022: none

    Hi mzumsande, thanks for your comment. My benefit is that I want to create profiles of the different nodes using the managed peers. For this I need a certain number of unique duplicates from 2 (or more) different GetAddr requests. In order to determine how many duplicates I need in theory so that the node can be uniquely identified, I need the size of the peer database with which the node can respond to GetAddr requests. So far I had (unconsciously) neglected the isTerrible peers, but 15,000 fewer peers can make a difference in the identification.

    I hadn’t noticed that a node that goes back online after a month hardly delivers any old peers … I will probably only be able to find out to what extent this hinders identification after a long test. The question here is, how long it will take to make most of the peers isTerrible=false …

  9. sidhujag referenced this in commit f24cfd11ee on Feb 22, 2022
  10. DrahtBot locked this on Feb 17, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me