doc: Fix and clarify description of ZMQ message format

jirijakes commented at 7:32 am on February 14, 2025: contributor

This change stresses that all ZMQ messages share the same structure and that they differ only in the format of the bodies. Previously this was not clear.

Further it removes the notion of endianness of 32-byte hashes, as it was misleading, and replaces it with the term ‘reversed byte order’ (as opposed to natural or normal byte order produced by hashing functions).

Additionally, it states that ZMQ 32-byte hashes are in the same format as in RPC. Previously it incorrectly stated that the two were in different formats.

Rendered.

Fixes #31856.

DrahtBot commented at 7:32 am on February 14, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31862.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, w0xlt, achow101
Concept ACK	l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31375 (multiprocess: Add bitcoin wrapper executable by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Docs on Feb 14, 2025

l0rinc commented at 4:53 pm on February 14, 2025: contributor

Concept ACK

jirijakes marked this as a draft on Feb 20, 2025

jirijakes commented at 3:21 am on February 20, 2025: contributor

Switched to draft because I found another thing to fix in this PR. Will get to it within a day.

jirijakes force-pushed on Feb 21, 2025

jirijakes renamed this:
~~doc: Fix description of byte order of hashes in ZMQ documentation~~
doc: Fix and clarify description of ZMQ message format
on Feb 21, 2025

jirijakes marked this as ready for review on Feb 21, 2025

jirijakes commented at 1:57 am on February 21, 2025: contributor

Ready for review again.

When trying to use ZMQ, I realized that the description of sequence was not clear about the format of the message. Therefore I expanded this PR to also clarify that.

This PR now:

clarifies that all messages share the same structure (three parts)
adds that sequence numbers are distinct for each topic
replaces endianness of 32-byte hashes with their byte order
puts note about byte order before specification of body formats
from descriptions of topics removes information that feels redundant

Note that source code diff is not too useful, rendered one is clearer.

jirijakes force-pushed on Feb 21, 2025

jirijakes force-pushed on Feb 28, 2025

in doc/zmq.md:97 in 882375f2eb outdated

91@@ -92,35 +92,45 @@ corresponds to the notification type. For instance, for the
92 notification `-zmqpubhashtx` the topic is `hashtx` (no null
93 terminator). These options can also be provided in bitcoin.conf.
94 
95-The topics are:
96+All ZMQ messages are multipart messages that share the same structure with three parts:
97 
98-`sequence`: the body is structured as the following based on the type of message:
99+    | topic | body | <uint32 sequence number in Little Endian> |

ryanofsky commented at 0:01 am on March 27, 2025:

In commit “doc: Fix and clarify description of ZMQ message format” (882375f2eb3942be644591ae45fbe57458ed40ee)

This is confusing because it says the messages have 3 parts, but the text above says there are two parts: topic and body, and seems treats the sequence number as part of the body. Looking at the code I think your version is more accurate, but if you want to keep this description I think you should update the paragraph above to be consistent.

jirijakes commented at 4:12 am on March 30, 2025:

I overlooked that, fixed.

in doc/zmq.md:101 in 882375f2eb outdated

103-    <32-byte hash>R<8-byte LE uint> : Transactionhash removed from mempool for non-block inclusion reason
104-    <32-byte hash>A<8-byte LE uint> : Transactionhash added mempool
105+where the last part is a sequence number (representing the message count to detect lost messages), distinct for each topic.
106 
107-Where the 8-byte uints correspond to the mempool sequence number.
108+**_NOTE:_**  All 32-byte hashes are in _reversed byte order_ (i. e. with bytes produced by hashing function reversed), the same format as the RPC interface and block explorers use to display transaction and block hashes.

ryanofsky commented at 0:09 am on March 27, 2025:

Maybe say “block and transaction hashes” instead of “32 byte-hashes” to provide more context because there is the first mention of “32 byte hashes” in the document and it is not clear what it refers to.

jirijakes commented at 4:12 am on March 30, 2025:

Used this suggestion.

in doc/zmq.md:110 in 882375f2eb outdated

115 
116-`hashtx`: Notifies about all transactions, both when they are added to mempool or when a new block arrives. This means a transaction could be published multiple times. First, when it enters the mempool and then again in each block that includes it. The messages are ZMQ multipart messages with three parts. The first part is the topic (`hashtx`), the second part is the 32-byte transaction hash, and the last part is a sequence number (representing the message count to detect lost messages).
117+    | sequence | <32-byte block hash, reversed>C                       | <uint32 sequence number in Little Endian> |
118+    | sequence | <32-byte block hash, reversed>D                       | <uint32 sequence number in Little Endian> |
119+    | sequence | <32-byte transaction hash, reversed>R<8-byte LE uint> | <uint32 sequence number in Little Endian> |
120+    | sequence | <32-byte transaction hash, reversed>A<8-byte LE uint> | <uint32 sequence number in Little Endian> |

ryanofsky commented at 0:44 am on March 27, 2025:

In commit “doc: Fix and clarify description of ZMQ message format” (882375f2eb3942be644591ae45fbe57458ed40ee)

I had a hard time figuring out how how to read this section, because the list of topics is broken up by explanations and indented diagrams so it wasn’t clear what the text was referring to and where descriptions ended and began. Was also confused about why the first line only mentioned topics and now bodies but for some reason left off sequence numbers. And then even confused by extreme overloading of “sequence” to refer to zmq sequence numbers, mempool sequence numbers, and the literal sequence string making up the first part of these messages.

Would recommend reorganizing to just show message struct up front, and save explanation for after. I also think it is important to use a consistent way of describing little endian numbers, and it would be good to describe sequence messages last since they are most confusing and complicated topic and overload the sequence number term that was just explained above. Would suggest:

The ZMQ messages that are sent look like the following, with topic strings, message bodies, and message sequence numbers in three parts:

0    | rawtx     | <serialized transaction>                              | <4-byte LE uint sequence number> |
1    | hashtx    | <32-byte transaction hash, reversed>                  | <4-byte LE uint sequence number> |
2    | rawblock  | <serialized block>                                    | <4-byte LE uint sequence number> |
3    | hashblock | <32-byte block hash, reversed>                        | <4-byte LE uint sequence number> |
4    | sequence  | <32-byte block hash, reversed>C                       | <4-byte LE uint sequence number> |
5    | sequence  | <32-byte block hash, reversed>D                       | <4-byte LE uint sequence number> |
6    | sequence  | <32-byte transaction hash, reversed>R<8-byte LE uint> | <4-byte LE uint sequence number> |
7    | sequence  | <32-byte transaction hash, reversed>A<8-byte LE uint> | <4-byte LE uint sequence number> |

Then the details about specific fields could be given below.

jirijakes commented at 4:13 am on March 30, 2025:

This was a very useful suggestion. Thanks!

ryanofsky approved

ryanofsky commented at 1:12 am on March 27, 2025: contributor

Code review 882375f2eb3942be644591ae45fbe57458ed40ee. I think the changes here describing endianess look good, and I like new descriptions pointing out the common parts of messages. But I also found the formatting and structure of the new section hard to parse (old section was also bad but it was a little simpler). I couldn’t even figure out what is was saying until I looked at the zmq_send_multipart call. So I would recommend making some more changes to try to clarify, and I left some suggestions below.

doc: Fix and clarify description of ZMQ message format

This change stresses that all ZMQ messages share the same structure
and that they differ only in the format of the bodies. Previously this
was not clear.

Further it removes the notion of endianness of 32-byte hashes,
as it was misleading, and replaces it with the term 'reversed byte
order' (as opposed to natural or normal byte order produced by hashing
functions).

Additionally, it states that ZMQ 32-byte hashes are in the same format
as in RPC. Previously it incorrectly stated that the two were in
different formats.

7a93544cdc

jirijakes force-pushed on Mar 30, 2025

jirijakes commented at 4:24 am on March 30, 2025: contributor

Reflected @ryanofsky’s comments (thanks!) and used his suggestions.

format of messages is now first outlined in a single table
explanation of each topic follows the table in dedicated subsections
format of numbers is consistent
sequence numbers are referred to as either message sequence number or mempool sequence number to stress the difference
the last subsection of “Usage” section is now called “Implementing ZMQ client” to separate it from previous sections; additionally, the command included in this section was turned into a code block (previously was unformatted)

Rendered view.

in doc/zmq.md:90 in 7a93544cdc

86@@ -87,40 +87,69 @@ For instance:
87                -zmqpubrawtx=ipc:///tmp/bitcoind.tx.raw \
88                -zmqpubhashtxhwm=10000
89 
90-Each PUB notification has a topic and body, where the header
91-corresponds to the notification type. For instance, for the
92-notification `-zmqpubhashtx` the topic is `hashtx` (no null
93-terminator). These options can also be provided in bitcoin.conf.
94+Notification types correspond to message topics (details in next section). For instance,

ryanofsky commented at 8:34 pm on March 31, 2025:

In commit “doc: Fix and clarify description of ZMQ message format” (7a93544cdcc2874a18a8b6d528a75e84ac007880)

Note: this is no longer mentioning topic strings do not include null terminators, bu this is probably a good thing. There shouldn’t be a reason to think that they would include null terminators.

jirijakes commented at 9:16 am on April 1, 2025:

Yes, I also believe so. The parts of ZMQ message have known lengths and there is nothing in this section suggesting C-style strings.

ryanofsky approved

ryanofsky commented at 8:42 pm on March 31, 2025: contributor

Code review ACK 7a93544cdcc2874a18a8b6d528a75e84ac007880. Nice changes. Documentation seems less repetitive and easier to understand now

DrahtBot requested review from l0rinc on Mar 31, 2025

w0xlt commented at 10:17 pm on April 2, 2025: contributor

Code review ACK https://github.com/bitcoin/bitcoin/pull/31862/commits/7a93544cdcc2874a18a8b6d528a75e84ac007880

achow101 commented at 11:17 pm on April 16, 2025: member

ACK 7a93544cdcc2874a18a8b6d528a75e84ac007880

achow101 merged this on Apr 16, 2025

achow101 closed this on Apr 16, 2025

doc: Fix and clarify description of ZMQ message format #31862

Code Coverage & Benchmarks

Reviews

Conflicts