doc/zmq: Note about endianness does not match reality #31856

issue jirijakes openend this issue on February 13, 2025
  1. jirijakes commented at 12:15 pm on February 13, 2025: none

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    PR #23471 added a note to ZMQ’s documentation page saying that:

    […] 32-byte hashes are in Little Endian and not in the Big Endian format that the RPC interface and block explorers use to display transaction and block hashes.

    Also:

    | hashtx | <32-byte transaction hash in Little Endian> | <uint32 sequence number in Little Endian> | hashblock | <32-byte block hash in Little Endian> | <uint32 sequence number in Little Endian>

    However, unless I am missing something, transaction and block hashes in both ZMQ and RPC appear in the same, reversed byte order (big endian).

    If this is confirmed, I would like to prepare PR for the documentation.

    Expected behaviour

    Documentation would not contain references to hashes being in Little Endian.

    Steps to reproduce

    Start regtest with ZMQ hashblock:

    0bitcoind -regtest -datadir=zmqtest -server --daemon -zmqpubhashblock=tcp://0.0.0.0:43441
    1bitcoin-cli -regtest -datadir=zmqtest createwallet ""
    

    Run independent ZMQ client in Python:

     0  import zmq
     1  import binascii
     2
     3  context = zmq.Context()
     4  socket = context.socket(zmq.SUB)
     5
     6  socket.connect("tcp://localhost:43441")
     7  socket.setsockopt_string(zmq.SUBSCRIBE, "hashblock")
     8
     9  topic = socket.recv()
    10  data = socket.recv()
    11  seq = socket.recv()
    12
    13  print(f"Topic: {topic}")
    14  print(f"Data:  {binascii.hexlify(data)}")
    15  print(f"Seq:   {binascii.hexlify(seq)}")
    

    Generate block:

    0bitcoin-cli -regtest -datadir=zmqtest -generate 1
    
    0{
    1  "address": "bcrt1qpjy0a3ply66pwpj6mp8sn36aslwr3gvdq727v6",
    2  "blocks": [
    3    "4f36a8e4e1ff6ebcc13f6bec62841e44a8b4c281f1f1346f148987735fb72e0e"
    4  ]
    5}
    

    Output of ZMQ client:

    0Topic: b'hashblock'
    1Data:  b'4f36a8e4e1ff6ebcc13f6bec62841e44a8b4c281f1f1346f148987735fb72e0e'
    2Seq:   b'00000000'
    

    Calculate hash of the block header to crosscheck:

    0  header=$(bitcoin-cli -regtest -datadir=zmqtest getblockheader 4f36a8e4e1ff6ebcc13f6bec62841e44a8b4c281f1f1346f148987735fb72e0e false)
    1  echo $header | xxd -r -p | sha256sum | xxd -r -p | sha256sum
    
    00e2eb75f738789146f34f1f181c2b4a8441e8462ec6b3fc1bc6effe1e4a8364f  -
    

    0e2eb75f738789146f34f1f181c2b4a8441e8462ec6b3fc1bc6effe1e4a8364f ← hash in natural order (output of SHA256; little endian) 4f36a8e4e1ff6ebcc13f6bec62841e44a8b4c281f1f1346f148987735fb72e0e ← hash in reversed order (ZMQ, RPC; big endian)


    I assume hashes are printed by GetHex(). The bytes are printed in reversed order:

    https://github.com/bitcoin/bitcoin/blob/55cf39e4c54da6639a8f1f7c813c2909454cada1/src/uint256.cpp#L10-L18

    To ZMQ, they are also sent in reversed order:

    https://github.com/bitcoin/bitcoin/blob/55cf39e4c54da6639a8f1f7c813c2909454cada1/src/zmq/zmqpublishnotifier.cpp#L220-L229

    Relevant log output

    No response

    How did you obtain Bitcoin Core

    Package manager

    What version of Bitcoin Core are you using?

    28.0.0

    Operating system and version

    Arch Linux, 6.12

    Machine specifications

    No response

  2. maflcko added the label Docs on Feb 13, 2025
  3. maflcko added the label RPC/REST/ZMQ on Feb 13, 2025
  4. l0rinc commented at 1:09 pm on February 13, 2025: contributor
    For reference, @hodlinator fixed a few of these in #30526 and @ryanofsky a few more in #31596.
  5. ryanofsky commented at 3:44 pm on February 13, 2025: contributor

    Agree with bug report and earlier comment. Relevant ZMQ code is Lines 220 to 240 in zmqpublishnotifier.cpp and the ZMQ documentation cited is pretty wrong or at least misleading:

    […] 32-byte hashes are in Little Endian and not in the Big Endian format that the RPC interface and block explorers use to display transaction and block hashes. […]

    | hashtx | <32-byte transaction hash in Little Endian> | <uint32 sequence number in Little Endian> | hashblock | <32-byte block hash in Little Endian> | <uint32 sequence number in Little Endian>

    The most straightforward way to think about hashtx and hashblock values is as byte arrays, not as numbers. And ZMQ and RPC both interface return the same reversed byte arrays (with bytes produced by the original hash functions in reverse order). So current ZMQ documentation is wrong to describe ZMQ and RPC interfaces as being inconsistent.

    Since the RPC interface returns the byte array as strings in hex format, you can interpret these strings as hexadecimal numbers, and these are the same numbers you would get if you interpreted output of the hash functions as little-endian numbers. So current ZMQ documentation is mostly wrong describe RPC interface as using big endian format, though if you squint maybe you could say it is right because the RPC interface is sending the numbers in a big endian byte representation that it previously happened to decode from a little-endian byte representation.

    The ZMQ documentation is also maybe somewhat right to describe its own output as little-endian, because similar to the RPC if you just display the ZMQ output as hex, you will see numbers that match little a endian interpretation of hash function outputs. But it is also misleading because actual bytes produced by ZMQ are definitely a big-endian not a little-endian representation of those numbers.

    Best way to fix this would be to avoid using terms little endian and big endian at all and just say ZMQ outputs bytes of tx and block hashes, with bytes reversed, which is the same way they are shown in the RPC interface.

  6. jirijakes commented at 7:36 am on February 14, 2025: none

    current ZMQ documentation is mostly wrong describe RPC interface as using big endian format

    I don’t know where the notion of Big/Little Endian of hashes comes from but I have seen the RPC-style (reversed by order) to be mostly described as Big Endian, for example learnmebitcoin uses it. Anyway, it is apparently very misleading.

    Best way to fix this would be to avoid using terms little endian and big endian at all

    Yes, I took this approach in #31862.

    Thanks!

  7. laanwj commented at 9:08 am on February 14, 2025: member

    The most straightforward way to think about hashtx and hashblock values is as byte arrays, not as numbers.

    Yes, we should never use the word ’endian’ in the context of hashes in our documentation. They’re just byte blobs. Which can be reversed or in original order.

    It’s mistaken terminology that Satoshi introduced back in the day and unfortunately still sticks around.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me