doc: Fix and clarify description of ZMQ message format #31862

pull jirijakes wants to merge 1 commits into bitcoin:master from jirijakes:zmq-doc changing 1 files +50 −21
  1. jirijakes commented at 7:32 am on February 14, 2025: none

    This change stresses that all ZMQ messages share the same structure and that they differ only in the format of the bodies. Previously this was not clear.

    Further it removes the notion of endianness of 32-byte hashes, as it was misleading, and replaces it with the term ‘reversed byte order’ (as opposed to natural or normal byte order produced by hashing functions).

    Additionally, it states that ZMQ 32-byte hashes are in the same format as in RPC. Previously it incorrectly stated that the two were in different formats.

    Rendered.

    Fixes #31856.

  2. DrahtBot commented at 7:32 am on February 14, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31862.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK l0rinc

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Docs on Feb 14, 2025
  4. l0rinc commented at 4:53 pm on February 14, 2025: contributor
    Concept ACK
  5. jirijakes marked this as a draft on Feb 20, 2025
  6. jirijakes commented at 3:21 am on February 20, 2025: none
    Switched to draft because I found another thing to fix in this PR. Will get to it within a day.
  7. jirijakes force-pushed on Feb 21, 2025
  8. jirijakes renamed this:
    doc: Fix description of byte order of hashes in ZMQ documentation
    doc: Fix and clarify description of ZMQ message format
    on Feb 21, 2025
  9. jirijakes marked this as ready for review on Feb 21, 2025
  10. jirijakes commented at 1:57 am on February 21, 2025: none

    Ready for review again.

    When trying to use ZMQ, I realized that the description of sequence was not clear about the format of the message. Therefore I expanded this PR to also clarify that.

    This PR now:

    • clarifies that all messages share the same structure (three parts)
    • adds that sequence numbers are distinct for each topic
    • replaces endianness of 32-byte hashes with their byte order
    • puts note about byte order before specification of body formats
    • from descriptions of topics removes information that feels redundant

    Note that source code diff is not too useful, rendered one is clearer.

  11. jirijakes force-pushed on Feb 21, 2025
  12. jirijakes force-pushed on Feb 28, 2025
  13. in doc/zmq.md:97 in 882375f2eb outdated
    91@@ -92,35 +92,45 @@ corresponds to the notification type. For instance, for the
    92 notification `-zmqpubhashtx` the topic is `hashtx` (no null
    93 terminator). These options can also be provided in bitcoin.conf.
    94 
    95-The topics are:
    96+All ZMQ messages are multipart messages that share the same structure with three parts:
    97 
    98-`sequence`: the body is structured as the following based on the type of message:
    99+    | topic | body | <uint32 sequence number in Little Endian> |
    


    ryanofsky commented at 0:01 am on March 27, 2025:

    In commit “doc: Fix and clarify description of ZMQ message format” (882375f2eb3942be644591ae45fbe57458ed40ee)

    This is confusing because it says the messages have 3 parts, but the text above says there are two parts: topic and body, and seems treats the sequence number as part of the body. Looking at the code I think your version is more accurate, but if you want to keep this description I think you should update the paragraph above to be consistent.


    jirijakes commented at 4:12 am on March 30, 2025:
    I overlooked that, fixed.
  14. in doc/zmq.md:101 in 882375f2eb outdated
    103-    <32-byte hash>R<8-byte LE uint> : Transactionhash removed from mempool for non-block inclusion reason
    104-    <32-byte hash>A<8-byte LE uint> : Transactionhash added mempool
    105+where the last part is a sequence number (representing the message count to detect lost messages), distinct for each topic.
    106 
    107-Where the 8-byte uints correspond to the mempool sequence number.
    108+**_NOTE:_**  All 32-byte hashes are in _reversed byte order_ (i. e. with bytes produced by hashing function reversed), the same format as the RPC interface and block explorers use to display transaction and block hashes.
    


    ryanofsky commented at 0:09 am on March 27, 2025:
    Maybe say “block and transaction hashes” instead of “32 byte-hashes” to provide more context because there is the first mention of “32 byte hashes” in the document and it is not clear what it refers to.

    jirijakes commented at 4:12 am on March 30, 2025:
    Used this suggestion.
  15. in doc/zmq.md:110 in 882375f2eb outdated
    115 
    116-`hashtx`: Notifies about all transactions, both when they are added to mempool or when a new block arrives. This means a transaction could be published multiple times. First, when it enters the mempool and then again in each block that includes it. The messages are ZMQ multipart messages with three parts. The first part is the topic (`hashtx`), the second part is the 32-byte transaction hash, and the last part is a sequence number (representing the message count to detect lost messages).
    117+    | sequence | <32-byte block hash, reversed>C                       | <uint32 sequence number in Little Endian> |
    118+    | sequence | <32-byte block hash, reversed>D                       | <uint32 sequence number in Little Endian> |
    119+    | sequence | <32-byte transaction hash, reversed>R<8-byte LE uint> | <uint32 sequence number in Little Endian> |
    120+    | sequence | <32-byte transaction hash, reversed>A<8-byte LE uint> | <uint32 sequence number in Little Endian> |
    


    ryanofsky commented at 0:44 am on March 27, 2025:

    In commit “doc: Fix and clarify description of ZMQ message format” (882375f2eb3942be644591ae45fbe57458ed40ee)

    I had a hard time figuring out how how to read this section, because the list of topics is broken up by explanations and indented diagrams so it wasn’t clear what the text was referring to and where descriptions ended and began. Was also confused about why the first line only mentioned topics and now bodies but for some reason left off sequence numbers. And then even confused by extreme overloading of “sequence” to refer to zmq sequence numbers, mempool sequence numbers, and the literal sequence string making up the first part of these messages.

    Would recommend reorganizing to just show message struct up front, and save explanation for after. I also think it is important to use a consistent way of describing little endian numbers, and it would be good to describe sequence messages last since they are most confusing and complicated topic and overload the sequence number term that was just explained above. Would suggest:


    The ZMQ messages that are sent look like the following, with topic strings, message bodies, and message sequence numbers in three parts:

    0    | rawtx     | <serialized transaction>                              | <4-byte LE uint sequence number> |
    1    | hashtx    | <32-byte transaction hash, reversed>                  | <4-byte LE uint sequence number> |
    2    | rawblock  | <serialized block>                                    | <4-byte LE uint sequence number> |
    3    | hashblock | <32-byte block hash, reversed>                        | <4-byte LE uint sequence number> |
    4    | sequence  | <32-byte block hash, reversed>C                       | <4-byte LE uint sequence number> |
    5    | sequence  | <32-byte block hash, reversed>D                       | <4-byte LE uint sequence number> |
    6    | sequence  | <32-byte transaction hash, reversed>R<8-byte LE uint> | <4-byte LE uint sequence number> |
    7    | sequence  | <32-byte transaction hash, reversed>A<8-byte LE uint> | <4-byte LE uint sequence number> |
    

    Then the details about specific fields could be given below.


    jirijakes commented at 4:13 am on March 30, 2025:
    This was a very useful suggestion. Thanks!
  16. ryanofsky approved
  17. ryanofsky commented at 1:12 am on March 27, 2025: contributor
    Code review 882375f2eb3942be644591ae45fbe57458ed40ee. I think the changes here describing endianess look good, and I like new descriptions pointing out the common parts of messages. But I also found the formatting and structure of the new section hard to parse (old section was also bad but it was a little simpler). I couldn’t even figure out what is was saying until I looked at the zmq_send_multipart call. So I would recommend making some more changes to try to clarify, and I left some suggestions below.
  18. doc: Fix and clarify description of ZMQ message format
    This change stresses that all ZMQ messages share the same structure
    and that they differ only in the format of the bodies. Previously this
    was not clear.
    
    Further it removes the notion of endianness of 32-byte hashes,
    as it was misleading, and replaces it with the term 'reversed byte
    order' (as opposed to natural or normal byte order produced by hashing
    functions).
    
    Additionally, it states that ZMQ 32-byte hashes are in the same format
    as in RPC. Previously it incorrectly stated that the two were in
    different formats.
    7a93544cdc
  19. jirijakes force-pushed on Mar 30, 2025
  20. jirijakes commented at 4:24 am on March 30, 2025: none

    Reflected @ryanofsky’s comments (thanks!) and used his suggestions.

    • format of messages is now first outlined in a single table
    • explanation of each topic follows the table in dedicated subsections
    • format of numbers is consistent
    • sequence numbers are referred to as either message sequence number or mempool sequence number to stress the difference
    • the last subsection of “Usage” section is now called “Implementing ZMQ client” to separate it from previous sections; additionally, the command included in this section was turned into a code block (previously was unformatted)

    Rendered view.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-31 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me