configurable ZMQ message limit #13315

pull Alex-CodeLab wants to merge 3 commits into bitcoin:master from Alex-CodeLab:master changing 3 files +9 −0
  1. Alex-CodeLab commented at 2:10 PM on May 24, 2018: none

    ZMQ pub/sub has a default limit of 1000 buffered messages. When it hits the limit messages are dropped. This happens sometimes when a new block is found and ~2500messages ( 2.5MB) are send (almost) at once.
    50-800 usually got lost on my (cheap) node. ZMQ_SNDHWM

    The PUBlisher limit, SNDHWM, is disabled by setting it to 0. On very slow networks it might fill up memory a bit, but that is hardly noticeable since the queue is usually emptied in a few hundred milliseconds.

    Same applies to the SUBscriber (although the bottleneck there is only the local processing, not the network throughput), so it would be good practice to set RCV_HWM to zero as well for some client script, although I was only able to hit the limit on the subscriber side after adding sleep(1) on every rawtx.

    I've tested for about a day and had a bunch of subscribers running on different computers, and not a single message got dropped. ( also while using swap and 100% cpu stress --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 1.9;}' < /proc/meminfo)k --vm-keep -m 1 -c 1 -i 2 ) I forced one subscriber to lag about 10000 messages , and they were still not dropped.

    I'd appreciate it if others test their performance and load. You can just use this script and check for missing messages while bitcoind is running. (for testing max throughput, stop bitcoind for a while, then restart it; it will send lots of messages while syncing)

    subscriber testscript: https://gist.github.com/FeedTheWeb/e6243410cb463759d5b29fae24550f36

  2. remove ZMQ message limit 391a7b7dbf
  3. fanquake added the label RPC/REST/ZMQ on May 24, 2018
  4. promag commented at 2:15 PM on May 24, 2018: member

    Not sure about disabling the limit.

  5. jonasschnelli commented at 2:18 PM on May 24, 2018: contributor

    I think a viable long term alternative is replacing ZMQ with a stateless longpolling similar to #13262. Something like a list_blocks_since_block where it idles if the newest block was requested. Seems to be much saner then hoping ZMQ messages arrive.

  6. Alex-CodeLab commented at 2:19 PM on May 24, 2018: none

    @promag I tried a 10000 limit first, and that also works (for now). But the default of 1000 seems too low.

    In older versions of ZMQ the default was "no limit" anyway.

    The queue is per SUBscriber, so if you have thousands of listeners to the node you might see some increase in memory usage. I was not able to see an increase in memory usage on my system with 6 subscribers on different computers.

    Unfortunately the setting is not for a limit in MB , that would have made it easier to configure. every value now seems somewhat arbitrary, and it highly depends on how many subscribers are connected -edit- I did some more testing and these peaks are about 3000messages and 3MB in size and happen when a block is found (or synced). So, in worst case scenario of a totally failing router or nic on your network the queue would grow at about 3MB per block. So if I want to limit memory usage to 500Mb, the SNDHWM value should be 500000 / number of subscribers.

  7. promag commented at 2:24 PM on May 24, 2018: member

    Make it an option?

  8. practicalswift commented at 2:53 PM on May 24, 2018: contributor

    @FeedTheWeb Thanks for addressing this issue. Messages being dropped silently under normal circumstances when running with default settings sounds very confusing for the user.

    Would it make sense to bump the limit to say 2 * "expected maximum under normal operations" instead of removing the limit?

    That way we won't drop messages under normal circumstances while still getting the "avoid going berserk" safety that the hard upper limit is meant to provide.

  9. promag commented at 3:00 PM on May 24, 2018: member

    What is the "expected maximum under normal operations"?

  10. practicalswift commented at 3:25 PM on May 24, 2018: contributor

    @promag Sorry for being vague. I meant the maximum number of buffered messages that can be expected under normal operations :-)

  11. promag commented at 3:34 PM on May 24, 2018: member

    @practicalswift reorgs and rescans will trigger the notifiers so I don't know if we should go that way.

    Also, from the docs:

    If this limit has been reached the socket shall enter an exceptional state and depending on the socket type, 0MQ shall take appropriate action such as blocking or dropping sent messages

    IMO making this configurable is the best bet and ideally should be different per notifier.

  12. laanwj commented at 6:54 AM on May 31, 2018: member

    Ping @jmcorgan

  13. Alex-CodeLab commented at 9:05 AM on June 6, 2018: none

    @promag, it could only be made configurable per socket (not per notifier), but tbh I don't see why that would be useful. @practicalswift the 'normal circumstances' is not relevant here, this is only about peaks (with a duration of usually a few ms). In normal situations there never is a queue; the queue grows only when bitcoind pushes the messages faster then your network can handle (note: 'network', not client software - the receiving zmq-socket has its own buffer).

  14. jmcorgan commented at 6:52 PM on June 8, 2018: contributor

    I agree on having a config option to set this. The reason for leaving the default in the original implementation was to ensure that there were no performance impact or other side effects on bitcoind operation from malicious or poorly executing clients.

  15. jmcorgan commented at 9:44 PM on June 8, 2018: contributor

    To elaborate--ZMQ's high watermark buffered message limit prevents the ZMQ sending process from blocking when downstream resources (network bandwidth) are insufficient to handle the message volume. (The remote end can detect this when sequence numbers get skipped.)

    This was originally chosen by design to be left at the default in order to be conservative--enabling ZMQ notifications and connecting a client should not have any potential adverse impact on bitcoind operation.

    I think allowing this behavior to be tuned via a command line or config file option, to set the numerical value of the high water mark, including turning it off, would be very useful for people relying on this interface. Unfortunately, it is a per-socket behavior and not per-notification, but that's the best that can be done.

  16. sangaman commented at 9:32 PM on June 19, 2018: contributor

    Making this configurable seems like the way to go. And if it spikes around 3000 messages per new block, raising the default to slightly higher than that (maybe 5000?) might be good as well.

  17. reel commented at 8:43 PM on June 26, 2018: none

    I had 3 subscribers on commodity HW and they were missing packets (they were LND nodes) and desycing every 2-3 blocks. Since I applied this PR, I can easily run 7-8 clients and not missing anything. Make it configurable or disable it.

    Great PR!

  18. jmcorgan commented at 8:41 PM on June 27, 2018: contributor

    @FeedTheWeb Are you planning to update this to make this configurable?

  19. Alex-CodeLab commented at 8:22 AM on June 28, 2018: none

    @jmcorgan , we still need to agree on setting a different default value . (10k is reasonable, imo. But I would like others to test and see if this has any positive effect )

  20. reel commented at 12:38 PM on June 28, 2018: none

    Is there any way we can poll to see the actual usage? As I understand, this is affected by the slowest peer?

  21. reel commented at 12:51 PM on June 28, 2018: none

    The ZMQ api is well documented. (http://api.zeromq.org/2-1:zmq-setsockopt)

    ZMQ_HWM and ZMQ_SWAP could be configurable (advanced mode).

    HWM default could stay at 1000 or be increased to 10000 SWAP could still default to 0 but it could be a good safety net for delicate operations

    Let me know if you want me to give it a try or collaborate on this.

  22. Alex-CodeLab commented at 3:38 PM on June 28, 2018: none

    @reel no, that is version 2.1 we use 4.* and it does not use SWAP anymore. http://api.zeromq.org/4-1:zmq-setsockopt

  23. reel commented at 3:58 PM on June 28, 2018: none

    Correct, I need a coffee fix.

    Defaults were set to 1000 7 years ago...

    https://github.com/zeromq/libzmq/commit/d31792e652cc9fd3bc84e2abd89f232d273b7ede

  24. Merge pull request #1 from bitcoin/master
    merge upstr.
    43dd7dabf1
  25. configurable ZMQ HWM 187688553e
  26. in src/init.cpp:457 in 187688553e
     453 | @@ -454,6 +454,7 @@ void SetupServerArgs()
     454 |      gArgs.AddArg("-zmqpubhashtx=<address>", "Enable publish hash transaction in <address>", false, OptionsCategory::ZMQ);
     455 |      gArgs.AddArg("-zmqpubrawblock=<address>", "Enable publish raw block in <address>", false, OptionsCategory::ZMQ);
     456 |      gArgs.AddArg("-zmqpubrawtx=<address>", "Enable publish raw transaction in <address>", false, OptionsCategory::ZMQ);
     457 | +    gArgs.AddArg("-zmqhwm=<n>", "Set HWM limit <n> (default: 10000)", false, OptionsCategory::ZMQ);
    


    jmcorgan commented at 10:33 PM on July 3, 2018:

    This will need a hidden_args entry.

  27. in src/zmq/zmqpublishnotifier.cpp:80 in 187688553e
      74 | @@ -75,9 +75,13 @@ bool CZMQAbstractPublishNotifier::Initialize(void *pcontext)
      75 |              zmqError("Failed to create socket");
      76 |              return false;
      77 |          }
      78 | -        
      79 | -        int HWMlimit = 0;
      80 | -        zmq_setsockopt(psocket, ZMQ_SNDHWM, &HWMlimit, sizeof(HWMlimit));
      81 | +
      82 | +        int hwm = gArgs.GetArg("-zmqhwm", 10000);
      83 | +        zmq_setsockopt(psocket, ZMQ_SNDHWM, &hwm, sizeof(hwm));
    


    jmcorgan commented at 10:55 PM on July 3, 2018:

    The error return should be checked here in case an invalid value is set by the user config.


    Alex-CodeLab commented at 11:02 PM on July 3, 2018:

    if any invalid value is given, it defaults to1000.

  28. jmcorgan commented at 10:57 PM on July 3, 2018: contributor

    In addition to the individual comments above, there also needs to be some changes related to adding a new config option (documentation, mostly.) I suggest you do a 'git grep zmqpub' from the top-level directory (above src) to see where this is necessary.

  29. Alex-CodeLab commented at 11:07 PM on July 3, 2018: none

    pls improve the code if you have any suggestions. (I'm not a C++ coder, so my knowledge kinda ends here anyway)

  30. jmcorgan commented at 4:04 PM on July 4, 2018: contributor

    This would have needed a clean rebase for merge anyway, so if you don't mind I'll just implement this from scratch in a new PR.

  31. Alex-CodeLab renamed this:
    remove ZMQ message limit
    configurable ZMQ message limit
    on Jul 18, 2018
  32. Alex-CodeLab commented at 8:02 PM on August 22, 2018: none

    @jmcorgan , it has a been a few months, and I have not seen your PR. As I told you also in private, I welcome any improvement. However, your comment: "ensure that there were no performance impact or other side effects on bitcoind operation from malicious or poorly executing clients" tells me that you don' t really understand how ZMQ works, since (when implemented correctly) these are completely separated processes. Anyone with a basic knowledge of such messaging systems knows that the main goal of these 'separation of concerns' (messaging systems) is to prevent having an impact on client/3rdparty processes (in this case, bitcoind). The performance of the client simply does not affect the server (bitcoind). (again, not talking about network performance here, as pointed out earlier).

    From here it seems you wanted to claim ownership over this part of the code. Which fine with me - really.- (I don't even use my own name. take all the credit if you want...). But now it seems any progress on this part is halted by you, for unknown reasons.

  33. promag commented at 1:12 AM on August 23, 2018: member

    But now it seems any progress on this part is halted by you, for unknown reasons. @FeedTheWeb if you need this then feel free to implement it, I'd review it too.

  34. jmcorgan commented at 12:29 AM on August 24, 2018: contributor

    @FeedTheWeb Your understanding of how to motivate others to do free work to your liking seems lacking.

    You made a pull request, and when I reviewed and suggested simple changes that would be needed, you immediately gave up.

    I offered to do it, and when it wasn't done in the time frame you wanted, you then made odd claims about my level of understanding of something completely unrelated to either the feature needed or the feedback I gave you on your PR.

    You're also saying that I want to "claim ownership" of this section of code. I've done nothing of the sort, of course.

    The code to allow the high water mark to be configurable is easy, and the choosing the default value is something that can be discussed here for consensus by interested parties. Again, if you want to contribute, feel free to continue what you started earlier, and be prepared to address feedback until those with merge rights deem it acceptable to introduce into the tree and maintain it thereafter.

    Otherwise, your attitude is baffling and not very effective at bringing about the changes you want to see through the volunteer effort of other people.

  35. Alex-CodeLab commented at 8:47 PM on August 24, 2018: none

    thanks for the reply, @jmcorgan First of all, it is a volunteer project for me as well. Tbh, I'm not even that interested in this change, but the LND folks have been complaining about it for over a year, and it seems important, so I investigated the issue, made a case and provided evidence. (Why? because I worked with ZMQ before, so I knew about this issue for some years already. And I like to fix stuff.)

    My task is to identify problems (not a popular thing to do, but someone has to do it). What I've done is identifying a issue that was a big question-mark for some for years and provided evidence. I've mentioned the issue a few times on different platforms, but it seems these kind of things do not communicate well in a chat. As a result, I wrote it down in a ticket, with proof, and some code.

    I created some code only to make a case - I don't claim any proficiency in C++ so I want others to improve it. It should be low-hanging fruit for everyone else.

    Over the last months I've patiently waited and tried to contact you a few times (as have others) but never got a response. (that is just rude) I've created the code to make it configurable, as suggested by some - even though we all know that is pointless without figuring out FIRST what a functional HWM value should be anyway; it doesn't even solve the issue, and no one who suggested it made a case for why it would solve anything...

    I've asked you (or anyone else) to suggest a better default HWM value; never got a reply. The reason why seems quite obvious: no one actually tested what the right value is on their system. The current '1000' is a arbitrary value, which doesn't make any sense when you think about it.

    (as a scientist) I ask you to defend why '1000' is the correct value

  36. mruddy commented at 12:48 AM on August 25, 2018: contributor

    @FeedTheWeb chillax bro, we're all in this together. i built on your work and, i believe, covered all of the feedback so far on my branch in one commit at https://github.com/mruddy/bitcoin/tree/zmqhwm. @jmcorgan does my commit cover your feedback? my changes create a configurable hwm option for each notification type. i left the default at 1000 since it's configurable and people that don't configure it get the current 1000 messages for all notification types.

    if my commit is good, you can use it here, or i can create a separate PR. it doesn't matter to me. EDIT: here's a diff link https://github.com/bitcoin/bitcoin/compare/master...mruddy:zmqhwm moved to https://github.com/bitcoin/bitcoin/pull/14060

  37. reel commented at 1:32 AM on August 25, 2018: none

    Enough with the drama. @mruddy can you please make a new PR referencing this one so we can comment/review?

    Good work everyone!

    One a side note, work done on the LND side to limit the amount of connections and broadcast internally really helped. This PR will really close this issue and prevent maintaining Bitcoin forks for many of us. Thanks again!

  38. mruddy commented at 7:19 AM on August 25, 2018: contributor
  39. DrahtBot closed this on Aug 25, 2018

  40. DrahtBot reopened this on Aug 25, 2018

  41. DrahtBot commented at 6:52 PM on September 7, 2018: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #14060 (ZMQ: add options to configure outbound message high water mark, aka SNDHWM by mruddy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  42. DrahtBot closed this on Sep 7, 2018

  43. DrahtBot commented at 9:00 PM on September 7, 2018: member

    <!--5d09a71f8925f3f132321140b44b946d-->The last travis run for this pull request was 69 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened.

  44. DrahtBot reopened this on Sep 7, 2018

  45. in src/zmq/zmqpublishnotifier.cpp:83 in 187688553e
      75 | @@ -76,6 +76,13 @@ bool CZMQAbstractPublishNotifier::Initialize(void *pcontext)
      76 |              return false;
      77 |          }
      78 |  
      79 | +        int hwm = gArgs.GetArg("-zmqhwm", 10000);
      80 | +        zmq_setsockopt(psocket, ZMQ_SNDHWM, &hwm, sizeof(hwm));
      81 | +        int sndhwm;
      82 | +        size_t sndhwm_size = sizeof(sndhwm);
      83 | +        zmq_getsockopt (psocket, ZMQ_SNDHWM, &sndhwm, &sndhwm_size);
    


    practicalswift commented at 8:20 AM on September 23, 2018:
    2018-09-22 22:19:29 cpplint(pr=13315): src/zmq/zmqpublishnotifier.cpp:83:  Extra space before ( in function call  [whitespace/parens] [4]
    
  46. in src/zmq/zmqpublishnotifier.cpp:79 in 187688553e
      75 | @@ -76,6 +76,13 @@ bool CZMQAbstractPublishNotifier::Initialize(void *pcontext)
      76 |              return false;
      77 |          }
      78 |  
      79 | +        int hwm = gArgs.GetArg("-zmqhwm", 10000);
    


    practicalswift commented at 7:52 AM on October 5, 2018:

    Make this explicit conversion explicit :-)

  47. laanwj referenced this in commit dac2caa371 on Nov 5, 2018
  48. laanwj commented at 1:02 PM on November 5, 2018: member

    Closed by #14060, I guess.

  49. laanwj closed this on Nov 5, 2018

  50. mruddy commented at 1:07 PM on November 5, 2018: contributor

    yes, thanks!

  51. deadalnix referenced this in commit b160ed70b7 on Jun 9, 2020
  52. dzutto referenced this in commit 2017e0ec69 on Aug 23, 2021
  53. dzutto referenced this in commit 896295c7a7 on Aug 23, 2021
  54. dzutto referenced this in commit e225795144 on Aug 23, 2021
  55. dzutto referenced this in commit a7414e35b0 on Sep 3, 2021
  56. dzutto referenced this in commit 1d503a7b5e on Sep 8, 2021
  57. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me