test: Treat msg_version.relay as unsigned, Remove `struct` packing in messages.py #29067

pull maflcko wants to merge 4 commits into bitcoin:master from maflcko:2312-test-less-struct- changing 1 files +107 −111
  1. maflcko commented at 7:14 PM on December 12, 2023: member

    struct has many issues in messages.py:

    • For unpacking, it requires to specify the length a second time, even when it is already clear from the f.read(num_bytes) context.
    • For unpacking, it is designed to support a long format string and returning a tuple of many values. However, except for 3 instances in messages.py, usually only a single value is unpacked and all those cases require an [0] access.
    • For packing and unpacking of a single value, the format string consists of characters that may be confusing and may need to be looked up in the documentation, as opposed to using easy to understand self-documenting code.

    I presume the above issues lead to accidentally treat msg_version.relay as a "signed bool", which is fine, but confusing.

    Fix all issues by using the built-in int helpers to_bytes and from_bytes via a scripted diff.

    Review notes:

    • struct.unpack throws an error if the number of bytes passed is incorrect. int.from_bytes doesn't know about "missing" bytes and treats an empty byte array as int(0). "Extraneous" bytes should never happen, because all read calls are limited in this file. If it is important to keep this error behavior, a helper int_from_stream(stream, num_bytes, bytes, byteorder, *, **kwargs) can be added, which checks the number of bytes read from the stream.
    • For struct.pack and int.to_bytes the error behavior is the same, although the error messages are not identical.
  2. DrahtBot commented at 7:14 PM on December 12, 2023: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK stickies-v, theStack

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29325 (consensus: Store transaction nVersion as uint32_t by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot renamed this:
    test: Treat msg_version.relay as unsigned, Remove `struct` packing in messages.py
    test: Treat msg_version.relay as unsigned, Remove `struct` packing in messages.py
    on Dec 12, 2023
  4. DrahtBot added the label Tests on Dec 12, 2023
  5. maflcko force-pushed on Dec 12, 2023
  6. maflcko force-pushed on Dec 12, 2023
  7. maflcko force-pushed on Dec 12, 2023
  8. maflcko force-pushed on Dec 12, 2023
  9. theStack commented at 1:33 AM on January 7, 2024: contributor

    Concept ACK

    I agree with the outlined drawbacks of struct.{un,}pack (especially the "format string consists of characters that may be confusing and may need to be looked up in the documentation" part) and that int.{from,to}_bytes should be used instead.

    Small side-note unrelated to this PR: For (de)serializing single bytes I usually prefer bytes([val]) over val.to_bytes(1, "little") (and bs[0] over int.from_bytes(bs, "little")), as it feels a bit weird having to specify an endianness for a scenario where it doesn't matter. On the other hand, always using those methods is more consistent I guess.

  10. theStack approved
  11. theStack commented at 1:52 AM on January 8, 2024: contributor

    Code-review ACK fa3b4dbf45b2b7ff1bdac6f68cf23275102cc775

  12. maflcko commented at 10:18 AM on January 8, 2024: member

    over int.from_bytes(bs, "little")

    In python3.11, one could also write int.from_bytes(bs) for single-byte conversions.

  13. DrahtBot added the label CI failed on Jan 16, 2024
  14. maflcko requested review from stickies-v on Jan 26, 2024
  15. in test/functional/test_framework/messages.py:406 in fa3b4dbf45 outdated
     403 |          self.vHave = deser_uint256_vector(f)
     404 |  
     405 |      def serialize(self):
     406 |          r = b""
     407 | -        r += struct.pack("<i", 0)  # Bitcoin Core ignores version field. Set it to 0.
     408 | +        r += (0).to_bytes(4, "little")  # Bitcoin Core ignores version field. Set it to 0.
    


    stickies-v commented at 9:41 PM on January 26, 2024:

    Aren't these supposed to be signed=True? Given that we're serializing a 0 I suppose it doesn't really matter there, but would still keep that consistent with the actual type.


    maflcko commented at 11:16 AM on January 29, 2024:

    I suppose it doesn't really matter there, but would still keep that consistent with the actual type.

    Thanks, done.

    Also, rebased to fix the CI failure.

  16. stickies-v commented at 10:06 PM on January 26, 2024: contributor

    Concept ACK

  17. test: Treat msg_version.relay as unsigned
    The C++ code treats bool as uint8_t, so the python tests should as well.
    
    This also allows to simplify the code, because converting an empty byte
    array to int gives int(0).
    
    >>> int.from_bytes(b'')
    0
    fa3886b7c6
  18. test: Use int from_bytes and to_bytes over struct packing
    This is done in prepration for the scripted diff, which can not deal
    with the 0 literal int.
    fafc0d68ee
  19. scripted-diff: test: Use int from_bytes and to_bytes over struct packing
    -BEGIN VERIFY SCRIPT-
     sed -i --regexp-extended 's!struct.unpack\("(|<|>)B", (.*)\)\[0\]!int.from_bytes(\2, "little")!g'               ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.unpack\("<(H|I|Q)", (.*)\)\[0\]!int.from_bytes(\2, "little")!g'              ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.unpack\("<(h|i|q)", (.*)\)\[0\]!int.from_bytes(\2, "little", signed=True)!g' ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.unpack\(">(H|I|Q)", (.*)\)\[0\]!int.from_bytes(\2, "big")!g'                 ./test/functional/test_framework/messages.py
    
     sed -i --regexp-extended 's!struct.pack\("<?B", (.*)\)!\1.to_bytes(1, "little")!g'             ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.pack\("<I", (.*)\)!\1.to_bytes(4, "little")!g'              ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.pack\("<i", (.*)\)!\1.to_bytes(4, "little", signed=True)!g' ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.pack\("<Q", (.*)\)!\1.to_bytes(8, "little")!g'              ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.pack\("<q", (.*)\)!\1.to_bytes(8, "little", signed=True)!g' ./test/functional/test_framework/messages.py
     sed -i --regexp-extended 's!struct.pack\(">H", (.*)\)!\1.to_bytes(2, "big")!g'                 ./test/functional/test_framework/messages.py
    -END VERIFY SCRIPT-
    fa3fa86dda
  20. test: Remove struct import from messages.py 55556a64a8
  21. maflcko force-pushed on Jan 29, 2024
  22. DrahtBot removed the label CI failed on Jan 29, 2024
  23. stickies-v approved
  24. stickies-v commented at 5:01 PM on January 29, 2024: contributor

    ACK 55556a64a8e4e6238f990cf66295c3b9594c2c3d

    I agree with the rationale in the PR description. The code with to_bytes and from_bytes is easier to understand without having to consult documentation. In that philosophy, I would also be okay with signed=False being explicit in these calls, since the default value (False) is not immediately obvious imo, and quite important. However, it's not something that we enforce, so maybe it's not a good idea. It's also a very quick thing to look up.

  25. DrahtBot requested review from theStack on Jan 29, 2024
  26. theStack approved
  27. theStack commented at 11:17 PM on January 29, 2024: contributor

    re-ACK 55556a64a8e4e6238f990cf66295c3b9594c2c3d

    Checked that since my previous ACK, a rebase + stickies' suggestion of passing signed=True for (de)serializing the CBlockLocator version was applied.

  28. glozow merged this on Jan 30, 2024
  29. glozow closed this on Jan 30, 2024

  30. maflcko deleted the branch on Jan 30, 2024
  31. maflcko commented at 12:42 PM on January 30, 2024: member

    The following command can be used to find single-byte conversion that can have one (default) argument dropped in Python 3.11. (https://github.com/bitcoin/bitcoin/pull/29067#issuecomment-1880721233)

    git grep 'bytes(1, '
    
  32. bitcoin locked this on Jan 29, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-24 09:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me