[contrib] message-capture-parser: fix out of bounds error for empty vectors #25367

pull theStack wants to merge 1 commits into bitcoin:master from theStack:202206-contrib-fix_message_capture_out_of_bounds changing 1 files +2 −1
  1. theStack commented at 12:21 AM on June 14, 2022: member

    The script message-capture-parser.py currently throws an "out of bounds" error if a message containing an empty integer vector element is tried to converted to JSON (e.g. by the BIP157 message cfcheckpt with empty FilterHeaders vector):

    Traceback (most recent call last):
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 217, in <module>
        main()
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 202, in main
        process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 162, in process_file
        msg_dict["body"] = to_jsonable(msg)
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 85, in to_jsonable
        elif slot in HASH_INT_VECTORS and isinstance(val[0], int):
    IndexError: list index out of range
    

    Fix this by using the all(...) predicate rather to access the first element val[0] (which in the error case doesn't exist).

  2. theStack commented at 12:23 AM on June 14, 2022: member

    Can be reproduced with the following input file:

    $ xxd -p msgs_sent.dat
    0b74e2d1e1de050076657273696f6e000000000072000000801101004904
    000000000000812d7e620000000000000000000000000000000000000000
    000000000000000000004904000000000000000000000000000000000000
    000000000000508c4c90436c3d431c2f5361746f7368693a32332e39392e
    3028746573746e6f646530292f0000000001f293e2d1e1de050077747869
    6472656c61790000000000008796e2d1e1de050073656e64616464727632
    0000000000007397e2d1e1de050076657261636b00000000000000000000
    c2a1e2d1e1de050073656e64686561646572730000000000e1a3e2d1e1de
    050073656e64636d706374000000090000000002000000000000005fa4e2
    d1e1de050073656e64636d70637400000009000000000100000000000000
    17a7e2d1e1de050070696e670000000000000000080000006d37706c0c0f
    cf9aedaae2d1e1de05006765746865616465727300004500000080110100
    0106226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf1
    88910f000000000000000000000000000000000000000000000000000000
    0000000000e1b5e2d1e1de050066656566696c7465720000000800000035
    f08b00000000000d12e3d1e1de0500706f6e670000000000000000080000
    0001000000000000003c1de4d1e1de05006366636865636b707400000022
    0000000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2
    b73cf188910f00bc20e4d1e1de0500706f6e670000000000000000080000
    000200000000000000
    

    (Save the hex-lines into a file and convert it into the dump via $ xxd -p -r hexdump.txt > msgs_sent.dat).

  3. DrahtBot added the label Scripts and tools on Jun 14, 2022
  4. message-capture-parser: fix out of bounds error for empty vectors 42bbbba7c8
  5. in contrib/message-capture/message-capture-parser.py:82 in 3e2831d5ee outdated
      78 | @@ -79,7 +79,7 @@ def to_jsonable(obj: Any) -> Any:
      79 |              val = getattr(obj, slot, None)
      80 |              if slot in HASH_INTS and isinstance(val, int):
      81 |                  ret[slot] = ser_uint256(val).hex()
      82 | -            elif slot in HASH_INT_VECTORS and isinstance(val[0], int):
      83 | +            elif slot in HASH_INT_VECTORS and all(isinstance(a, int) for a in val):
    


    laanwj commented at 5:13 AM on June 14, 2022:

    What do we want to happen in the case where val[0] is an integer and (some of) the other elements are not? Should this be an explicit error, or is it fine to implicitly fall through to to_jsonable?


    theStack commented at 12:23 PM on June 14, 2022:

    Good question. On master, this scenario would lead to a (confusing) error right now, since calling ser_uint256(...) with a non-integer type fails:

    diff --git a/contrib/message-capture/message-capture-parser.py b/contrib/message-capture/message-capture-parser.py
    index 9988478f1b..a6bac539de 100755
    --- a/contrib/message-capture/message-capture-parser.py
    +++ b/contrib/message-capture/message-capture-parser.py
    @@ -168,6 +168,12 @@ def process_file(path: str, messages: List[Any], recv: bool, progress_bar: Optio
    
    
     def main():
    +    from test_framework.messages import msg_cfcheckpt
    +    invalid_msg = msg_cfcheckpt()
    +    invalid_msg.headers = [23, 42, "foobar"]
    +    print(to_jsonable(invalid_msg))
    +    return 1
    +
         parser = argparse.ArgumentParser(
             description=__doc__,
             epilog="EXAMPLE \n\t{0} -o out.json <data-dir>/message_capture/**/*.dat".format(sys.argv[0]),
    
    $ ./contrib/message-capture/message-capture-parser.py
    Traceback (most recent call last):
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 220, in <module>
        main()
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 174, in main
        print(to_jsonable(invalid_msg))
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 83, in to_jsonable
        ret[slot] = [ser_uint256(a).hex() for a in val]
      File "/home/honey/bitcoin/./contrib/message-capture/message-capture-parser.py", line 83, in <listcomp>
        ret[slot] = [ser_uint256(a).hex() for a in val]
      File "/home/honey/bitcoin/./contrib/message-capture/../../test/functional/test_framework/messages.py", line 117, in ser_uint256
        rs += struct.pack("<I", u & 0xFFFFFFFF)
    TypeError: unsupported operand type(s) for &: 'bytes' and 'int'
    

    On the PR it falls through to to_jsonable:

    $ ./contrib/message-capture/message-capture-parser.py
    {'filter_type': None, 'stop_hash': None, 'headers': [23, 42, 'foobar']}
    

    Options that I can think of right now:

    1. keep the PR as it is (being more tolerant if some internal data types are not matching, but potentially also covering bugs)
    2. change the all predicate to any to closely mimic the behaviour on master
    3. explicitly assert that all array elements have to be integers for the HASH_INT_VECTORS types (seems right, but could have unintended consequences that I am not aware of)

    laanwj commented at 12:24 PM on June 14, 2022:

    Thanks for checking! In that case, I like adding an explicit assertion best. It makes it immediately clear what the problem is.


    theStack commented at 12:35 PM on June 14, 2022:

    Okay, done. 👌 Didn't check in detail, but I think this assert can't be triggered by a malformed message-capture input, it would rather only point out that something is wrong internally (e.g. our deserialization routines), which is good.

  6. theStack force-pushed on Jun 14, 2022
  7. laanwj commented at 12:38 PM on June 14, 2022: member

    Code review ACK 42bbbba7c83d1e2baad18b4c6f05bad1358eb117

  8. MarcoFalke renamed this:
    message-capture-parser: fix out of bounds error for empty vectors
    [contrib] message-capture-parser: fix out of bounds error for empty vectors
    on Jun 14, 2022
  9. MarcoFalke merged this on Jun 14, 2022
  10. MarcoFalke closed this on Jun 14, 2022

  11. theStack deleted the branch on Jun 14, 2022
  12. sidhujag referenced this in commit 6fac38db8c on Jun 15, 2022
  13. theStack referenced this in commit 644772b9ef on Sep 4, 2022
  14. MarcoFalke referenced this in commit 3c5fb9691b on Sep 9, 2022
  15. sidhujag referenced this in commit e01a908af4 on Sep 11, 2022
  16. DrahtBot locked this on Jun 14, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me