Per-Peer Message Capture #19509

pull troygiorshev wants to merge 6 commits into bitcoin:master from troygiorshev:message-dump changing 9 files +369 −16
  1. troygiorshev commented at 2:10 am on July 14, 2020: contributor

    This PR introduces per-peer message capture into Bitcoin Core. 📓

    Purpose

    The purpose and scope of this feature is intentionally limited. It answers a question anyone new to Bitcoin’s P2P protocol has had: “Can I see what messages my node is sending and receiving?”.

    Functionality

    When a new debug-only command line argument capturemessages is set, any message that the node receives or sends is captured. The capture occurs in the MessageHandler thread. When receiving a message, it is captured as soon as the MessageHandler thread takes the message off of the vProcessMsg queue. When sending, the message is captured just before the message is pushed onto the vSendMsg queue.

    The message capture is as minimal as possible to reduce the performance impact on the node. Messages are captured to a new message_capture folder in the datadir. Each node has their own subfolder named with their IP address and port. Inside, received and sent messages are captured into two binary files, msgs_recv.dat and msgs_sent.dat, like so:

    0message_capture/203.0.113.7:56072/msgs_recv.dat
    1message_capture/203.0.113.7:56072/msgs_sent.dat
    

    Because the messages are raw binary dumps, included in this PR is a Python parsing tool to convert the binary files into human-readable JSON. This script has been placed on its own and out of the way in the new contrib/message-capture folder. Its usage is simple and easily discovered by the autogenerated -h option.

    Future Maintenance

    I sympathize greatly with anyone who says “the best code is no code”.

    The future maintenance of this feature will be minimal. The logic to deserialize the payload of the p2p messages exists in our testing framework. As long as our testing framework works, so will this tool.

    Additionally, I hope that the simplicity of this tool will mean that it gets used frequently, so that problems will be discovered and solved when they are small.

    FAQ

    “Why not just use Wireshark”

    Yes, Wireshark has the ability to filter and decode Bitcoin messages. However, the purpose of the message capture added in this PR is to assist with debugging, primarily for new developers looking to improve their knowledge of the Bitcoin Protocol. This drives the design in a different direction than Wireshark, in two different ways. First, this tool must be convenient and simple to use. Using an external tool, like Wireshark, requires setup and interpretation of the results. To a new user who doesn’t necessarily know what to expect, this is unnecessary difficulty. This tool, on the other hand, “just works”. Turn on the command line flag, run your node, run the script, read the JSON. Second, because this tool is being used for debugging, we want it to be as close to the true behavior of the node as possible. A lot can happen in the SocketHandler thread that would be missed by Wireshark.

    Additionally, if we are to use Wireshark, we are at the mercy of whoever it maintaining the protocol in Wireshark, both as to it being accurate and recent. As can be seen by the many previous attempts to include Bitcoin in Wireshark (google “bitcoin dissector”) this is easier said than done.

    Lastly, I truly believe that this tool will be used significantly more by being included in the codebase. It’s just that much more discoverable.

  2. fanquake added the label P2P on Jul 14, 2020
  3. fanquake added the label Utils/log/libs on Jul 14, 2020
  4. troygiorshev renamed this:
    Message dump
    Per-Peer Message Logging
    on Jul 14, 2020
  5. troygiorshev marked this as a draft on Jul 14, 2020
  6. in contrib/peer-logging/peer-logging-parser.py:67 in dc393b30e9 outdated
    62+
    63+def main():
    64+    """Main"""
    65+    parser = argparse.ArgumentParser(
    66+        description=__doc__,
    67+        epilog="EXAMPLE \n\tpython contrib/peer-logging/peer-logging-parser.py -o out.json <data-dir>/peer_logging/**/*.dat",
    


    MarcoFalke commented at 7:19 am on July 14, 2020:

    I don’t think it is needed in python to duplicate how the script was called in the help.

    0        epilog="EXAMPLE \n\t{sys.argv[0]} -o out.json <data-dir>/peer_logging/**/*.dat",
    

    In case you wanted to do this because the import ( sys.path.append('test/functional')) is relative to the pwd, you could make it relative to this file, which would be more stable.

    Also, a test wouldn’t hurt.


    troygiorshev commented at 6:33 pm on July 22, 2020:

    done

    (That sort of format string is something I’m looking forward to in 3.6 😄)


    troygiorshev commented at 6:33 pm on July 22, 2020:
    A test is in the works
  7. in src/net_processing.cpp:3640 in 38c6f10d5c outdated
    3635@@ -3636,6 +3636,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3636     }
    3637     CNetMessage& msg(msgs.front());
    3638 
    3639+    std::vector<unsigned char> tmp(msg.m_recv.data(), msg.m_recv.data() + msg.m_recv.size());
    3640+    if (gArgs.IsArgSet("-dumpmessages"))
    


    naumenkogs commented at 7:38 am on July 14, 2020:
    “dumpmessages”?

    troygiorshev commented at 11:28 am on July 14, 2020:
    ah thanks fixed!
  8. naumenkogs commented at 7:48 am on July 14, 2020: member

    Concept ACK the high-level idea. I think this feature may be quite useful while debugging, and it requires very few lines of code.

    I have two questions though:

    1. How to better integrate it with the existing codebase? Should the format be similar to what we have now: received: addr (30003 bytes) peer=0? Or maybe it doesn’t matter.
    2. Is there any threat of this feature being used for DoS? Perhaps on a system where file create is expensive, an attacker could disconnect/reconnect (from a new net address?) to make us create a new file? Maybe the file should be created after 1 minute of connection lifetime? (Although the messages from the very beginning should not be lost).
  9. DrahtBot commented at 7:58 am on July 14, 2020: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #20404 (Remove names from translatable strings by hebasto)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. in src/net.cpp:2876 in dc393b30e9 outdated
    2871+void LogMessage(CNode* node, std::string& command, std::vector<unsigned char>& data, bool is_incoming)
    2872+{
    2873+    int64_t time = GetTimeMicros();
    2874+
    2875+    fs::path base_path = GetDataDir() / "peer_logging" / node->addr.ToString();
    2876+    boost::filesystem::create_directories(base_path);
    


    jnewbery commented at 8:25 am on July 14, 2020:
    use fs::create_directories (which is currently just an alias for boost::filesystem`, but means that the std filesystem can be substituted in when we upgrade our c++ version.

    jnewbery commented at 8:26 am on July 14, 2020:
    Does this mean we try to create the directory every time we send/receive a message? Is that a problem?

    Saibato commented at 7:18 pm on July 14, 2020:
    Nit. If if log messages where enabled by-logmessages=<LogMessageDir>not empty we would have a separate LogMessagesDir , instead of logging in DataDir by default. And DataDir should be always <> LogDir.imho. see #19419

    troygiorshev commented at 1:11 pm on July 17, 2020:
    This is ok. As per the boost docs: “Creation failure because [base_path] resolves to an existing directory shall not be treated as an error.”

    troygiorshev commented at 2:06 pm on July 17, 2020:
    I can’t imagine this is a performance hit, but if anyone is worried I guess this could be moved to the CNode’s constructor. I would want to first be really sure that a node’s address won’t ever change.

    troygiorshev commented at 0:51 am on July 20, 2020:

    Nit. If if log messages where enabled by-logmessages=<LogMessageDir>not empty we would have a separate LogMessagesDir , instead of logging in DataDir by default. And DataDir should be always <> LogDir.imho. see #19419

    I’ll consider this, having the log directory configurable would be nice.

  11. in src/net_processing.cpp:3641 in dc393b30e9 outdated
    3635@@ -3636,6 +3636,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3636     }
    3637     CNetMessage& msg(msgs.front());
    3638 
    3639+    std::vector<unsigned char> tmp(msg.m_recv.data(), msg.m_recv.data() + msg.m_recv.size());
    3640+    if (gArgs.IsArgSet("-dumpmessages"))
    3641+        LogMessage(pfrom, msg.m_command, tmp, true);
    


    jnewbery commented at 8:27 am on July 14, 2020:
    Can this logic be moved to ReceiveMsgBytes so that all the message dumping is contained within the net layer?

    troygiorshev commented at 4:01 pm on July 14, 2020:
    Yes! It’s unfortunate that we have some MessageHandler code in net.cpp. Right now all of the logging is done is done in MessageHandler, I think we should keep it that way. It would be nice to move PushMessage to net_processing some day.

    jnewbery commented at 5:11 pm on July 14, 2020:
    Yes, you’re right. Even though PushMessage is in net.cpp, it’s always executed on the MessageHandler thread.
  12. jnewbery commented at 8:28 am on July 14, 2020: member

    Concept ACK. I’m looking forward to playing around with this.

    My first thought is that this would be much more useful if the user was able to specify an individual peer to log messages for, either through a command line argument -logmessages=<ip_addr> or through an RPC to enable and disable logging. Perhaps that can be done in a follow-up PR?

  13. troygiorshev force-pushed on Jul 14, 2020
  14. in src/net.cpp:2871 in cf5966a76f outdated
    2866@@ -2865,3 +2867,25 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2867 
    2868     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2869 }
    2870+
    2871+void LogMessage(CNode* node, std::string& command, std::vector<unsigned char>& data, bool is_incoming)
    


    practicalswift commented at 4:28 pm on July 14, 2020:
    0void LogMessage(const CNode& node, const std::string& command, const std::vector<unsigned char>& data, bool is_incoming)
    

    Rationale: F.15: Prefer simple and conventional ways of passing information (C++ Core Guidlines) :)


    troygiorshev commented at 2:36 am on July 20, 2020:
    thanks :)
  15. in src/net.cpp:2890 in cf5966a76f outdated
    2885+    }
    2886+    uint32_t size = data.size();
    2887+    f.write((char *) &size, sizeof(size));
    2888+    for (auto i : data){
    2889+        f << i;
    2890+    }
    


    practicalswift commented at 5:18 pm on July 14, 2020:
    0    f.write((const char*)data.data(), data.size());
    

    troygiorshev commented at 2:37 am on July 20, 2020:
    done
  16. in src/net.h:1007 in cf5966a76f outdated
    1003@@ -1004,4 +1004,6 @@ inline std::chrono::microseconds PoissonNextSend(std::chrono::microseconds now,
    1004     return std::chrono::microseconds{PoissonNextSend(now.count(), average_interval.count())};
    1005 }
    1006 
    1007+void LogMessage(CNode* node, std::string& command, std::vector<unsigned char>& data, bool is_incoming);
    


    practicalswift commented at 5:19 pm on July 14, 2020:
    0void LogMessage(const CNode& node, const std::string& command, const std::vector<unsigned char>& data, bool is_incoming);
    
  17. in src/net_processing.cpp:3639 in cf5966a76f outdated
    3635@@ -3636,6 +3636,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3636     }
    3637     CNetMessage& msg(msgs.front());
    3638 
    3639+    std::vector<unsigned char> tmp(msg.m_recv.data(), msg.m_recv.data() + msg.m_recv.size());
    


    practicalswift commented at 5:19 pm on July 14, 2020:

    The scope of tmp can be limited to the gArgs.IsArgSet("-logmessages") case below.

    (Or perhaps skip introducing a new variable - the name tmp really doesn’t add any information to the reader.)


    troygiorshev commented at 2:37 am on July 20, 2020:
    good point, done
  18. in src/net.cpp:2879 in cf5966a76f outdated
    2874+
    2875+    fs::path base_path = GetDataDir() / "peer_logging" / node->addr.ToString();
    2876+    boost::filesystem::create_directories(base_path);
    2877+
    2878+    fs::path path = base_path / (is_incoming ? "msgs_recv.dat" : "msgs_sent.dat");
    2879+    std::ofstream f(path.c_str(), std::ios::binary | std::ios::out | std::ios::app);
    


    practicalswift commented at 5:33 pm on July 14, 2020:

    Could use CAutoFile instead (which provides proper serialization automatically)?

    No need for c_str() here :)


    troygiorshev commented at 2:37 am on July 20, 2020:
    Thanks for suggesting this, CAutoFile is much nicer
  19. in src/net.cpp:2870 in cf5966a76f outdated
    2866@@ -2865,3 +2867,25 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2867 
    2868     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2869 }
    2870+
    2871+void LogMessage(CNode* node, std::string& command, std::vector<unsigned char>& data, bool is_incoming)
    2872+{
    2873+    int64_t time = GetTimeMicros();
    


    practicalswift commented at 5:42 pm on July 14, 2020:

    Nit: Could be introduced when needed?

    Rationale: ES.21: Don’t introduce a variable (or constant) before you need to use it (C++ Core Guidelines) :)


    troygiorshev commented at 12:44 pm on July 22, 2020:

    I’d prefer to keep this as it is. I understand (and agree with) ES.21 completely, but I’m trying to treat time as an argument, not a local, which is why it’s right at the top. I would love to pass it in instead but practically can’t (see here).

    If anyone is strongly against this I’ll move it.

  20. practicalswift commented at 5:54 pm on July 14, 2020: contributor

    Concept ACK: Very nice feature! Looking forward to using it. Thanks for adding it.

    Feature request: Would it be possible to add a flag to the Python script that writes the recorded messages to one file each in the binary format the ProcessMessage(…) fuzzers expect (src/test/fuzz/process_message.cpp and src/test/fuzz/process_messages.cpp)?

    Ideally with a file naming convention matching that of what libFuzzer and many other fuzzers generate: the file name being the SHA-1 of the file content.

    That way this feature could be used to record live network messages that could be used directly as fuzzing inputs. A great way to generate a fuzzing seed corpus. That would be super neat! :)

  21. troygiorshev force-pushed on Jul 20, 2020
  22. troygiorshev commented at 2:39 am on July 20, 2020: contributor

    git range-diff master cf5966a 9898fa1

    Made suggested changes and fixed the Travis problem.

  23. in src/net.cpp:2816 in 9898fa1062 outdated
    2812@@ -2813,6 +2813,8 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
    2813 
    2814         if (pnode->nSendSize > nSendBufferMaxSize)
    2815             pnode->fPauseSend = true;
    2816+        if (gArgs.IsArgSet("-logmessages"))
    


    MarcoFalke commented at 5:47 am on July 20, 2020:

    nit:

    0        if (gArgs.IsArgSet("-logmessages")) {
    

    troygiorshev commented at 12:47 pm on July 22, 2020:

    Ah I misread the style guide!

    Now that I’m adding braces to this, I’ll add them to the rest of PushMessage.

  24. in src/net.cpp:2817 in 9898fa1062 outdated
    2812@@ -2813,6 +2813,8 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
    2813 
    2814         if (pnode->nSendSize > nSendBufferMaxSize)
    2815             pnode->fPauseSend = true;
    2816+        if (gArgs.IsArgSet("-logmessages"))
    2817+            LogMessage(*pnode, msg.m_type, msg.data, false);
    


    MarcoFalke commented at 5:47 am on July 20, 2020:

    nit:

    0            LogMessage(*pnode, msg.m_type, msg.data, /* incoming */ false);
    

    troygiorshev commented at 6:33 pm on July 22, 2020:
    done
  25. in src/net.cpp:2871 in 9898fa1062 outdated
    2866@@ -2865,3 +2867,23 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2867 
    2868     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2869 }
    2870+
    2871+void LogMessage(const CNode& node, const std::string& command, const std::vector<unsigned char>& data, bool is_incoming)
    


    MarcoFalke commented at 5:53 am on July 20, 2020:

    nit: Only valid message types are logged, not junk data

    0void LogMessage(const CNode& node, const std::string& msg_type, const std::vector<unsigned char>& data, bool is_incoming)
    

    Also, instead of copying the message data into a new vector, why not use a cheap std::span?


    troygiorshev commented at 6:33 pm on July 22, 2020:
    Good point! done
  26. in src/net.cpp:2870 in 9898fa1062 outdated
    2866@@ -2865,3 +2867,23 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2867 
    2868     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2869 }
    2870+
    2871+void LogMessage(const CNode& node, const std::string& command, const std::vector<unsigned char>& data, bool is_incoming)
    2872+{
    2873+    int64_t time = GetTimeMicros();
    


    MarcoFalke commented at 5:54 am on July 20, 2020:

    Q: Is it important that this time is equal to the receive time of the message?

    If yes, you could pass in the time from the caller. (msg.m_time)


    troygiorshev commented at 12:39 pm on July 22, 2020:
    The exact time doesn’t matter, this is mainly used for sorting the messages later. All that’s important is that messages that are sent later have a later time. To avoid races I only call LogMessage from the MessageHandler thread. When sending a message in PushMessage, we don’t have anything like msg.m_time until the message is into SocketHandler.
  27. MarcoFalke commented at 5:55 am on July 20, 2020: member
    Looks surprisingly simple, so Concept ACK
  28. troygiorshev commented at 12:48 pm on July 21, 2020: contributor

    @naumenkogs thanks for the concept ack!

    How to better integrate it with the existing codebase? Should the format be similar to what we have now: received: addr (30003 bytes) peer=0? Or maybe it doesn’t matter.

    IMO that format is good for a debug log, where things need to be compact on a single line. For this I much prefer JSON.

    Is there any threat of this feature being used for DoS? …

    At the moment, with this locked behind a debug flag, I’m not worried about DoS vectors. For both debugging and learning someone should be at the keyboard and the node shouldn’t be working with anything real. Additionally, I imagine that if writing a very small amount to disk in this way (every time a message is sent or received) is a DoS vector then there are many more serious ones available.

    That said, this is worth consideration, especially if we think this will one day be used for wider purposes.

  29. troygiorshev force-pushed on Jul 22, 2020
  30. troygiorshev force-pushed on Jul 22, 2020
  31. troygiorshev commented at 7:03 pm on July 22, 2020: contributor

    git range-diff master 9898fa1 cbb154e

    • Trivial Rebase
    • Added a commit at the beginning cleaning up PushMessage’s and ProcessMessages’s if statements
    • LogMessage now uses Span
    • command -> msg_type
    • python import is now relative to file
    • other small comments and fixes
  32. in src/net.cpp:2885 in cbb154e5d5 outdated
    2810@@ -2811,18 +2811,17 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
    2811         pnode->mapSendBytesPerMsgCmd[msg.m_type] += nTotalSize;
    2812         pnode->nSendSize += nTotalSize;
    2813 
    2814-        if (pnode->nSendSize > nSendBufferMaxSize)
    2815-            pnode->fPauseSend = true;
    2816+        if (pnode->nSendSize > nSendBufferMaxSize) pnode->fPauseSend = true;
    


    practicalswift commented at 8:46 pm on July 22, 2020:

    Please avoid change unrelated lines (unless there are good reasons doing so). It makes it harder to reason about the actual changes if one includes unrelated whitespace changes in the same commit :)

    Applies also below and throughout this PR.


    troygiorshev commented at 2:05 am on July 23, 2020:

    I made these changes in response to this comment. As far as I can tell they’re all in the first commit 84a79d61cef5c9386597869adccda4aa5992cc96 (which only contains the unrelated changes), and they don’t show up at all in git range-diff master 9898fa1 cbb154e. i.e. on my machine I haven’t mixed any real changes with whitespace changes.

    Are you seeing something different? Is there a better way I could have done this?


    jnewbery commented at 10:14 am on July 23, 2020:
    This seems reasonable. Although the general advice is to not fix style on lines that you’re not otherwise touching, if it’s all contained within a single cleanup commit, then I think that’s also fine.

    narula commented at 11:38 pm on August 18, 2020:
    Might you add something to this affect to the commit title so people know that? I didn’t get that from reading “Clean PushMessage and ProcessMessages”. Maybe “Whitespace only changes in PushMessage and ProcessMessages”
  33. in src/net_processing.cpp:4031 in cbb154e5d5 outdated
    3647@@ -3648,14 +3648,12 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3648     if (!pfrom->orphan_work_set.empty()) return true;
    3649 
    3650     // Don't bother if send buffer is too full to respond anyway
    3651-    if (pfrom->fPauseSend)
    3652-        return false;
    3653+    if (pfrom->fPauseSend) return false;
    


    practicalswift commented at 8:50 pm on July 22, 2020:
    Same here (and below): don’t change unrelated lines :)
  34. practicalswift changes_requested
  35. in src/net.cpp:2880 in cbb154e5d5 outdated
    2875+    fs::path path = base_path / (is_incoming ? "msgs_recv.dat" : "msgs_sent.dat");
    2876+    CAutoFile f(fsbridge::fopen(path, "ab"), SER_DISK, CLIENT_VERSION);
    2877+
    2878+    f.write((const char*)&time, sizeof(time));
    2879+    f.write(msg_type.data(), msg_type.length());
    2880+    for (int i = msg_type.length(); i < CMessageHeader::COMMAND_SIZE; ++i) {
    


    MarcoFalke commented at 6:23 pm on July 23, 2020:
    0    for (auto i = msg_type.length(); i < CMessageHeader::COMMAND_SIZE; ++i) {
    

    to fix the compile failure:

    0net.cpp:2880:39: error: comparison of integers of different signs: 'int' and 'const size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare]
    1
    2    for (int i = msg_type.length(); i < CMessageHeader::COMMAND_SIZE; ++i) {
    3
    4                                    ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    5
    61 error generated.
    

    troygiorshev commented at 6:45 pm on July 23, 2020:
    I’m not sure why this didn’t come up before, and still doesn’t come up on my machine :/ Regardless, fixed.

    MarcoFalke commented at 7:26 pm on July 23, 2020:
    I don’t know either: #19123
  36. troygiorshev force-pushed on Jul 23, 2020
  37. troygiorshev commented at 6:47 pm on July 23, 2020: contributor

    git range-diff master cbb154e 863c6b5

    • Added Test
    • Fixed Travis warning
    • Made Time Mockable
  38. troygiorshev marked this as ready for review on Jul 23, 2020
  39. theStack commented at 7:45 pm on July 26, 2020: contributor

    Huge Concept ACK! Very nice idea with simple implementation, I’m also pretty sure this will be very helpful for debugging purposes. Will definitely code-review that within the next days.

    Just one detail I find quite odd: why create two files instead of just one and not rather include the direction as part of the serialization for each message? Is it only about saving disk space (1 byte per message) or is there another reason I’m not aware of? I’ve never seen a logging/sniffing tool that divides up into multiple files before and tend to think it would only complicate re-assembling the whole send/receive order later.

  40. troygiorshev commented at 12:46 pm on July 27, 2020: contributor

    @theStack Thanks for the concept ACK!

    why create two files instead of just one

    Good question; I considered a few ways of doing this. Just to be clear, I create two files per peer. Partially, yes, this saves 1 byte per message. I could have also just logged everything into one file, which meant I would have had to save the peer address alongside each message as well. Ultimately, I chose the current way with the intention of simplifying the usage of the parser. As it stands, the parser is a simple “just give me what you want me to parse” sort of program. This allows the user to take advantage of globbing when selecting which files to parse (e.g. **/*.dat or **/msgs_recv.dat). Additionally the parser, being written in python, is actually noticeably slow. Forcing it to process all of the messages and then filter would be unnecessarily inconvenient.

  41. theStack commented at 12:53 pm on July 28, 2020: contributor

    why create two files instead of just one

    Good question; I considered a few ways of doing this. Just to be clear, I create two files per peer. Partially, yes, this saves 1 byte per message. I could have also just logged everything into one file, which meant I would have had to save the peer address alongside each message as well. Ultimately, I chose the current way with the intention of simplifying the usage of the parser. As it stands, the parser is a simple “just give me what you want me to parse” sort of program. This allows the user to take advantage of globbing when selecting which files to parse (e.g. **/*.dat or **/msgs_recv.dat). Additionally the parser, being written in python, is actually noticeably slow. Forcing it to process all of the messages and then filter would be unnecessarily inconvenient.

    Generally I’d say it makes sense to divide up into files/folders by criteria that I would likely use as frequent search key for accessing the logging data. Would I often want to access logs for a certain peer? Absolutely. Would I want to see only incoming or outcoming messages for a certain peer? Very unlikely, as the resulting data is probably not very valueable without the whole request/response stream in both directions. On the other hand, your arguments for simplification, efficiency (if only looking for incoming or outcoming messages is really such a frequent operation) and flexibility are also very strong, hence this solution is also okay with me.

    I reviewed the code in more detail and noticed that there may be a serialization endianness problem, see the comments soon following below.

  42. in src/net.cpp:2875 in 0c31d58f7d outdated
    2870+    fs::create_directories(base_path);
    2871+
    2872+    fs::path path = base_path / (is_incoming ? "msgs_recv.dat" : "msgs_sent.dat");
    2873+    CAutoFile f(fsbridge::fopen(path, "ab"), SER_DISK, CLIENT_VERSION);
    2874+
    2875+    f.write((const char*)&time, sizeof(time));
    


    theStack commented at 12:55 pm on July 28, 2020:
    This way of serialization doesn’t seem to be endian-agnostic, i.e. it depends on the architecture whether this is written out in big or little endian format. In the parsing script you assume both time and data size are stored in little endian, hence this needs to be enforced here. I think ser_writedata32 (see serialization.h) can be used for this. It’s a pity though that the CI tests don’t fail, I think this is because we don’t have any testing instance where the architecture uses big endian byte order :/

    troygiorshev commented at 1:17 pm on July 30, 2020:
    Thanks for catching this! When fixing it I considered if it was better to instead change the test to follow the system endianness (with python’s sys.byteorder). I think fixing a particular endianness for the .dat files, as temporary as they are, is better. Someone might generate them on a BE system and then parse them on a LE system one day…
  43. in src/net.cpp:2881 in 0c31d58f7d outdated
    2876+    f.write(msg_type.data(), msg_type.length());
    2877+    for (auto i = msg_type.length(); i < CMessageHeader::COMMAND_SIZE; ++i) {
    2878+        f << '\0';
    2879+    }
    2880+    uint32_t size = data.size();
    2881+    f.write((const char*)&size, sizeof(size));
    


    theStack commented at 12:55 pm on July 28, 2020:
    Same as above, this should be guaranteed to be written out in little endian byte order.

    troygiorshev commented at 1:15 pm on July 30, 2020:
    done
  44. MarcoFalke commented at 3:53 pm on July 28, 2020: member
    re-run ci
  45. MarcoFalke closed this on Jul 28, 2020

  46. MarcoFalke reopened this on Jul 28, 2020

  47. in test/functional/p2p_peer_logging.py:48 in 863c6b5aeb outdated
    43+            assert(time >= 1231006505000000)   # genesis block timestamp
    44+            command = tmp_header.read(COMMAND_SIZE).split(b'\x00', 1)[0]  # type: bytes
    45+            assert(len(command) > 0)
    46+            length = int.from_bytes(tmp_header.read(LENGTH_SIZE), "little") # type: int
    47+            data = f_in.read(length)
    48+            assert(len(data) == length)
    


    MarcoFalke commented at 4:45 pm on July 28, 2020:

    would be good to use assert_equal for debugability

    https://travis-ci.org/github/bitcoin/bitcoin/jobs/712616300#L3056


    troygiorshev commented at 1:17 pm on July 30, 2020:
    done
  48. luke-jr commented at 11:50 pm on July 29, 2020: member

    I don’t find your “why not Wireshark?” very convincing… If anything, it made me less sure this should get merged.

    Wireshark is a fairly simple and standard tool. Glancing at the code, this PR is basically just a less-user-friendly reinvention thereof.

    If there are concerns with maintenance of the Wireshark dissector, as I understand it, we could fork it and maintain it out of Wireshark’s codebase.

  49. troygiorshev force-pushed on Jul 30, 2020
  50. troygiorshev commented at 1:19 pm on July 30, 2020: contributor

    git range-diff master 863c6b5 f5fffb7

    • Enforced little-endian writing to match parser
    • Test uses assert_equal when possible
  51. DrahtBot added the label Needs rebase on Jul 30, 2020
  52. troygiorshev force-pushed on Jul 31, 2020
  53. troygiorshev commented at 0:46 am on July 31, 2020: contributor
    rebased
  54. DrahtBot removed the label Needs rebase on Jul 31, 2020
  55. in contrib/peer-logging/peer-logging-parser.py:1 in 6095431033 outdated
    0@@ -0,0 +1,92 @@
    1+#!/usr/bin/env python3
    


    jnewbery commented at 5:01 pm on July 31, 2020:
    chmod this to 755 so it’s executable.

    troygiorshev commented at 8:26 pm on August 4, 2020:
    done
  56. in contrib/peer-logging/peer-logging-parser.py:50 in 6095431033 outdated
    45+        while True:
    46+            tmp_header_raw = f_in.read(TIME_SIZE + LENGTH_SIZE + COMMAND_SIZE)
    47+            if not tmp_header_raw:
    48+                break
    49+            tmp_header = BytesIO(tmp_header_raw)
    50+            time = int.from_bytes(tmp_header.read(TIME_SIZE), "little") # type: int
    


    jnewbery commented at 5:02 pm on July 31, 2020:
    nit: pep8 says at least 2 spaces before an inline comment :grimacing:

    troygiorshev commented at 8:11 pm on August 4, 2020:

    oof thanks, TIL

    (I’m surprised nothing caught this)


    jonatack commented at 10:22 am on August 19, 2020:
    at HEAD 4fc2e3906, there are still a number of misformatted inline comments and a few other nits that you would see by running the new python files through pycodestyle and black (with black, I tend to ignore the single-to-double quote changes and some of the newlines for brackets, but otherwise it’s useful)
  57. in contrib/peer-logging/peer-logging-parser.py:60 in 6095431033 outdated
    55+            msg = MESSAGEMAP[command]()
    56+            msg.deserialize(f_in)
    57+            msg_dict = to_jsonable(msg)
    58+            msg_dict["time"] = time
    59+            msg_dict["length"] = length
    60+            msg_dict["recv"] = recv
    


    jnewbery commented at 5:14 pm on July 31, 2020:
    "recv": true and "recv": false don’t seem very friendly here. Perhaps "direction": "received" and "direction": "sent" is better?

    troygiorshev commented at 8:21 pm on August 4, 2020:
    I’ll meet you in the middle and do "direction": "recv" and "direction": "sent" :)
  58. in contrib/peer-logging/peer-logging-parser.py:57 in 6095431033 outdated
    52+            length = int.from_bytes(tmp_header.read(LENGTH_SIZE), "little") # type: int
    53+            if command not in MESSAGEMAP:
    54+                continue    # For now just skip unrecognized messages
    55+            msg = MESSAGEMAP[command]()
    56+            msg.deserialize(f_in)
    57+            msg_dict = to_jsonable(msg)
    


    jnewbery commented at 5:16 pm on July 31, 2020:
    You should deserialize the body of the message to a sub-object. That means if any message types contain fields called ‘msgtype’, ’time’, ’length’ or ‘recv’, they won’t conflict with the metadata.

    troygiorshev commented at 8:26 pm on August 4, 2020:
    done, and improved the names and order
  59. jnewbery commented at 5:21 pm on July 31, 2020: member

    I’ve been playing around with this and it’s great. Just a few small comments so far.

    The parser is printing hashes as large floats. Is it possible to print the hex representation:

     0  {
     1    "msgtype": "getheaders",
     2    "hashstop": 0,
     3    "locator": {
     4      "nVersion": 70015,
     5      "vHave": [
     6        6.450434035588393e+53,
     7        7.713140798064138e+53,
     8        2.6493031207713062e+53,
     9        1.0843487835635608e+54,
    10        6.8089734462318345e+53,
    11        1.4434627912104054e+54,
    12        3.0470281418447093e+53,
    13        4.1428701739036675e+53,
    14        ...
    
  60. troygiorshev force-pushed on Aug 4, 2020
  61. troygiorshev force-pushed on Aug 5, 2020
  62. troygiorshev commented at 2:21 pm on August 5, 2020: contributor

    git range-diff master 6095431 0547b23

    • Values coming from uint256 variables print as hex correctly
    • Restructured the output, renamed some fields
    • Some whitespace fixes
    • Small change to msg_generic (unused) in messages.py
  63. in test/functional/p2p_peer_logging.py:37 in 0547b23754 outdated
    32+    - Message Type
    33+        - We can add new message types
    34+
    35+    We're ignoring these because they're simply too brittle to test here.
    36+    """
    37+    with open(dat_file, 'rb') as f_in:
    


    MarcoFalke commented at 2:31 pm on August 5, 2020:
    Instead of reimplementing the parser here, couldn’t this just call contrib/peer-logging/peer-logging-parser.py {dat_file}?

    theStack commented at 9:29 am on August 6, 2020:
    +1 on code deduplication. The included timestamps check (all timestamps are past genesis block) can be done on the result rather than directly in the parser.

    jnewbery commented at 9:58 am on August 6, 2020:
    I think it’s much better not to assume that the contrib directory is available. Functional tests can be run when the build directory is out of tree and the source directory is not available. It makes sense to me to have a limited parser in the test case rather than importing from somewhere else.

    theStack commented at 10:30 am on August 6, 2020:
    I agree that the dependency on the contrib directory is a drawback, but then on the other hand we already have a test importing from contrib: feature_loadblock.py (uses contrib/linearize scripts). Is there really a need to run functional tests without the other sources available? They are all part of the same repository.

    jnewbery commented at 10:40 am on August 6, 2020:

    Is there really a need to run functional tests without the other sources available?

    Yes, it’s an explicit usage pattern to build out of tree and be able to run tests separately from the source directory.


    theStack commented at 11:51 am on August 6, 2020:
    Hm, so the usage pattern in this case is “copy the test/functional folder somewhere” and executing from there, decoupled from the repository, should still work? Still not convinced why anyone should ever want/do that (maybe I’m missing something there).

    jnewbery commented at 12:32 pm on August 6, 2020:

    Search for “out of tree builds” in this repo. You’ll see lots of issues/PRs for running tests out of tree.

    I agree that the dependency on the contrib directory is a drawback, but then on the other hand we already have a test importing from contrib: feature_loadblock.py (uses contrib/linearize scripts).

    Yes, and I think that’s a terrible pattern that we shouldn’t repeat. Importing code from contrib essentially makes those files part of the test framework.


    theStack commented at 11:43 am on August 7, 2020:
    To my understanding the primary point of out of tree builds is separation between the build artifacts (i.e. everything spit out by make, like binaries, manpages etc.) and the source folder. I guess our disagreement stems from the fact that in your view, contrib is part of the source folder, while in my view it isn’t – it only contains scripts and no compilation units, hence no ending up in an (out of tree) build folder. That could change in the future though, so I can see your point.

    MarcoFalke commented at 11:47 am on August 7, 2020:
    The functional test scripts itself live in the source directory. The build directory has a symlink to it. As long as the script in contrib is also symlinked to the build dir, this should not affect out-of-tree builds at all. And all ci scripts use out-of-tree builds, so if this lead to issues, it would be noticed immediately.

    troygiorshev commented at 8:41 pm on August 7, 2020:
    I’ve elaborated on why I did this below. If anyone is still against this choice, please let me know!
  64. in src/net_processing.cpp:3808 in fac22eae7c outdated
    3803@@ -3804,6 +3804,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3804     }
    3805     CNetMessage& msg(msgs.front());
    3806 
    3807+    if (gArgs.IsArgSet("-logmessages")) {
    3808+        LogMessage(*pfrom, msg.m_command, Span<const unsigned char>{(unsigned char*)msg.m_recv.data(), (size_t)msg.m_recv.size()}, /* incoming */ true);
    


    theStack commented at 8:52 am on August 6, 2020:

    I think the cast to size_t is not needed here? msg.m_recv is an instance of CDataStream, its size() method returns CDataStream::size_type which is a typedef to a std::vector::size_type, which again is size_t.

    0        LogMessage(*pfrom, msg.m_command, Span<const unsigned char>{(unsigned char*)msg.m_recv.data(), msg.m_recv.size()}, /* incoming */ true);
    

    troygiorshev commented at 12:58 pm on August 6, 2020:
    I imagine I did this because I had to, but you’re right I see no reason for this cast (and clang++ agrees). Thanks!
  65. in contrib/peer-logging/peer-logging-parser.py:22 in 78959de4b4 outdated
    17+from test_framework.messages import ser_uint256     # noqa: E402
    18+from test_framework.mininode import MESSAGEMAP      # noqa: E402
    19+
    20+TIME_SIZE = 8
    21+LENGTH_SIZE = 4
    22+COMMAND_SIZE = 12
    


    theStack commented at 9:05 am on August 6, 2020:
    The “command” terminology should not be used anymore for network messages, we just call it “message type” now (see e.g. PRs #18533, #18610, #18937). Replacement suggestions: . s/COMMAND_SIZE/MSGTYPE_SIZE and s/command/msg_type (in the parsing code below).

    troygiorshev commented at 1:02 pm on August 6, 2020:

    Thanks for catching this! This “command -> message type” switch is a monumental effort that I’m happy to be a part of.

    I’m going to choose msgtype not msg_type for the same reasons you did in #18610 :)

  66. in contrib/peer-logging/peer-logging-parser.py:80 in 78959de4b4 outdated
    75+            tmp_header = BytesIO(tmp_header_raw)
    76+            time = int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    77+            command = tmp_header.read(COMMAND_SIZE).split(b'\x00', 1)[0]     # type: bytes
    78+            length = int.from_bytes(tmp_header.read(LENGTH_SIZE), "little")  # type: int
    79+            if command not in MESSAGEMAP:
    80+                continue    # For now just skip unrecognized messages
    


    theStack commented at 9:33 am on August 6, 2020:
    For skipping unrecognized messages, I think a dummy read on f_in (length bytes) is missing here to advance the file pointer to the next message.

    troygiorshev commented at 1:05 pm on August 6, 2020:
    It is, thanks!
  67. in contrib/peer-logging/peer-logging-parser.py:82 in 78959de4b4 outdated
    77+            command = tmp_header.read(COMMAND_SIZE).split(b'\x00', 1)[0]     # type: bytes
    78+            length = int.from_bytes(tmp_header.read(LENGTH_SIZE), "little")  # type: int
    79+            if command not in MESSAGEMAP:
    80+                continue    # For now just skip unrecognized messages
    81+            msg = MESSAGEMAP[command]()
    82+            msg.deserialize(f_in)
    


    theStack commented at 10:05 am on August 6, 2020:

    Could check here after the deserialization if the f_in was advanced by the expected length (what is currently done in the functional test), e.g. something like:

    0            msg_start_pos = f_in.tell()
    1            ...
    2            msg.deserialize(f_in)
    3            ...
    4            msg_length = f_in.tell() - msg_start_pos
    5            assert_equal(length, msg_length)
    

    troygiorshev commented at 1:23 pm on August 6, 2020:
    I like this a lot, thanks
  68. troygiorshev force-pushed on Aug 6, 2020
  69. troygiorshev commented at 1:58 pm on August 6, 2020: contributor

    git range-diff master 0547b23 4205c56

    • peer logging -> message logging
    • command -> msgtype
    • Removed unneeded size_t cast
    • Parser now properly handles unrecognized messages
    • Parser now verified payload length
  70. troygiorshev commented at 2:00 pm on August 6, 2020: contributor

    Regarding mini_parser in the functional test: discussion here. The popular question is “Why reimplement the parser here, as opposed to just calling contrib/message-logging/message-logging-parser.py?”

    The main reason, which I agree with, is brought up by @jnewbery.

    I think it’s much better not to assume that the contrib directory is available. Functional tests can be run when the build directory is out of tree and the source directory is not available. It makes sense to me to have a limited parser in the test case rather than importing from somewhere else.

    Additionally, I think that this separate mini_parser makes it clear, at a glance, what exactly is being tested. As noted in the module docstring of p2p_message_logging.py, the message logging parser isn’t tested in this test. This test is for LogMessage only. I would hate to run into a Hubble Space Telescope-like problem where LogMessage is wrong but the message logging parser is wrong in the same way, so the errors go by unnoticed.

  71. in src/net_processing.cpp:3808 in 6f0e88752d outdated
    3803@@ -3804,6 +3804,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3804     }
    3805     CNetMessage& msg(msgs.front());
    3806 
    3807+    if (gArgs.IsArgSet("-logmessages")) {
    3808+        LogMessage(*pfrom, msg.m_command, Span<const unsigned char>{(unsigned char*)msg.m_recv.data(), msg.m_recv.size()}, /* incoming */ true);
    


    theStack commented at 4:15 pm on August 6, 2020:
    Sorry that I didn’t noticed on the earlier review today, but: if you rebase on master, this will unlock you the nice MakeUCharSpan (see commit e63dcc3a6752e7d406e7a650c2d6c2e95cd39aab, introduced with PR #19326 that was merged 3 days ago), not needing to cast anything anymore. :tada:

    troygiorshev commented at 8:40 pm on August 7, 2020:
    Thanks, I’ll do this! Another upcoming improvement is #19660, but I’ll try and leave that for a follow-up as not to increase the scope too far here.
  72. troygiorshev force-pushed on Aug 7, 2020
  73. troygiorshev commented at 9:38 pm on August 7, 2020: contributor

    git range-diff master 4205c56 555e48a

    • Rebased (trivial)
    • Now uses MakeUCharSpan as suggested here
    • Now accounts for the increase in file descriptor usage (by one)
      • This will be improved in #18911
  74. theStack commented at 9:00 pm on August 11, 2020: contributor

    I tested the logging parser on my system (running Python 3.5.2) and got the following error:

     0$ contrib/message-logging/message-logging-parser.py /tmp/bitcoin_func_test_9sw28m5f/node0/regtest/message_logging/127.0.0.1\:33370/*.dat
     1/tmp/bitcoin_func_test_9sw28m5f/node0/regtest/message_logging/127.0.0.1:33370/msgs_recv.dat
     2<class 'pathlib.PosixPath'>
     3Traceback (most recent call last):
     4  File "contrib/message-logging/message-logging-parser.py", line 129, in <module>
     5    main()
     6  File "contrib/message-logging/message-logging-parser.py", line 117, in main
     7    process_file(log, messages, "recv" in log.stem)
     8  File "contrib/message-logging/message-logging-parser.py", line 73, in process_file
     9    with open(path, 'rb') as f_in:
    10TypeError: invalid file: PosixPath('/tmp/bitcoin_func_test_9sw28m5f/node0/regtest/message_logging/127.0.0.1:33370/msgs_recv.dat')
    

    Turns out that open() is capable to take os.PathLike objects only from Python 3.6 onwards, see https://stackoverflow.com/a/42694113 Since we are still on Python 3.5 (though there is already a PR for changing to 3.6, see #19504), this should be fixed by converting the path to a string by opening str(path) instead.

  75. troygiorshev force-pushed on Aug 13, 2020
  76. troygiorshev commented at 11:48 pm on August 13, 2020: contributor

    git range-diff master 555e48a 4fc2e39 @theStack thanks for catching that! I had been running this all under the wrong environment the whole time. (I’ll plug conda here, never worry about your python environment again: just make sure to pick the right one)

    In fixing this I ran into a current python bug! https://discuss.python.org/t/pathlib-absolute-vs-resolve/2573

    TL;DR: if you want an absolute path to a file that may not exist, in Python 3.5, always use Path.cwd() / Path(arg). This is due to the “strictness” of resolve() discussed here.

  77. jb55 commented at 8:10 pm on August 17, 2020: contributor

    @luke-jr says:

    I don’t find your “why not Wireshark?” very convincing… If anything, it made me less sure this should get merged. Wireshark is a fairly simple and standard tool. Glancing at the code, this PR is basically just a less-user-friendly reinvention thereof. If there are concerns with maintenance of the Wireshark dissector, as I understand it, we could fork it and maintain it out of Wireshark’s codebase.

    I’m inclined to agree. Also with a more general system like tracepoints we could do low overhead p2p(& other subsystem) tracing/filtering/scripting without any custom code

  78. in test/functional/p2p_message_logging.py:60 in 4fc2e39063 outdated
    55+        self.extra_args = [["-logmessages"]]
    56+        self.setup_clean_chain = True
    57+
    58+    def run_test(self):
    59+        logdir = os.path.join(self.nodes[0].datadir, "regtest/message_logging")
    60+        # Connect an disconnect a node so that the handshake occurs
    


    adamjonas commented at 7:41 pm on August 18, 2020:
    0        # Connect and disconnect a node so that the handshake occurs
    

    troygiorshev commented at 2:33 pm on August 20, 2020:
    Thanks. Fixed
  79. in contrib/message-logging/message-logging-parser.py:32 in 0d10af73df outdated
    27+# As such, they are itemized here
    28+# (These can be easily found by looking for calls to deser_uint256, deser_uint256_vector, and uint256_from_str in messages.py)
    29+HASH_INTS = [
    30+    "blockhash",
    31+    "block_hash",
    32+    "hash",     # A few conflicts here
    


    amitiuttarwar commented at 11:18 pm on August 18, 2020:
    conflicts with what?

    troygiorshev commented at 12:59 pm on August 20, 2020:

    This part of the parser recognizes a uint256 by the name of the member. It doesn’t take into account the name of the class. In some classes (in messages.py), self.hash refers to one of these “int-encoded” uint256s, and in others it does not.

    e.g. COutPoint, self.hash is an “int-encoded” uint256 CTransaction, self.hash is a str

    This is here to justify the isinstance(val, int) checks in to_jsonable but I think it’s causing more confusion than it’s helping. I’ll comment this somewhere else.


    amitiuttarwar commented at 7:11 pm on August 20, 2020:
    gotcha, sounds good
  80. in test/functional/p2p_message_logging.py:62 in 4fc2e39063 outdated
    57+
    58+    def run_test(self):
    59+        logdir = os.path.join(self.nodes[0].datadir, "regtest/message_logging")
    60+        # Connect an disconnect a node so that the handshake occurs
    61+        self.nodes[0].add_p2p_connection(P2PDataStore())
    62+        self.nodes[0].disconnect_p2ps()
    


    amitiuttarwar commented at 11:31 pm on August 18, 2020:

    I’m not following why this disconnect_p2ps() is needed here. The comment above says “so that the handshake occurs”, but this doesn’t clarify anything for me. do you mean the version handshake? why do we need to disconnect for the version handshake to occur?

    my understanding of this test is that it parses through the sent & received data files to check the structure of each recorded message is valid. I added some logging to see what message types are recorded & from what I can tell its the same whether or not we disconnect. am I missing something?


    glozow commented at 3:55 pm on August 19, 2020:

    I have the same question - by default, add_p2p_connection waits for verack.

    My initial guess was that disconnecting the peer closed the files or unlocked a mutex, but I don’t think this is the case 🤔


    troygiorshev commented at 1:11 pm on August 20, 2020:

    You’re both right, it isn’t needed.

    I thought it was just good form to disconnect the peers at the end of the test. Makes it easier for someone to expand the test later.

    I’ll fix the comment

  81. amitiuttarwar commented at 11:40 pm on August 18, 2020: contributor

    approach ACK. this is super cool! thank you for building it!

    I’ve read over the commits, but don’t feel familiar enough with many of the components & relevant concerns (yet) to leave a proper review ACK. left a couple questions though.

    I tried this out locally & it works great. I think it would be helpful to add some light documentation within the repo with the basics of how-to-use that you’ve mentioned on this PR. By basics I mean things like run node with -logmessages, the file structure of the message_logging dir & passing through contrib/message-logging/message-logging-parser.py to interpret.

    The one trick I found very useful is using message_logging/[ip address]/*.dat as the arg for the message parser, since it interpolates the sent & received messages from the peer. I gleaned this from your comment earlier because I agree with @theStack about a common desired use case as accessing all logs for a certain peer. Your reasoning behind having sent vs received as separate files makes perfect sense, and I’m happy that the message parser supports the use case nicely. My request here is something simple to make this interface more apparent.

    thanks again for this PR! I’m excited to query my node more and level up my jq skills.

  82. in src/net.cpp:2951 in ac953a3773 outdated
    2886+    auto time = GetTime<std::chrono::microseconds>();
    2887+
    2888+    fs::path base_path = GetDataDir() / "message_logging" / node.addr.ToString();
    2889+    fs::create_directories(base_path);
    2890+
    2891+    fs::path path = base_path / (is_incoming ? "msgs_recv.dat" : "msgs_sent.dat");
    


    narula commented at 11:51 pm on August 18, 2020:
    I don’t think “.dat” is the right extension. These are logs, not data, and should end in “.log”.

    practicalswift commented at 10:41 am on August 19, 2020:
    I think .dat is fine since this is a non-text binary file. Personally I expect files with the .log suffix to be enjoyable in less (more or less) :)

    narula commented at 1:06 pm on August 19, 2020:
    Yes, please disregard my comment, I misunderstood!
  83. narula commented at 0:00 am on August 19, 2020: contributor

    It pains me a bit that this circumvents the existing logging infrastructure. Why is it not the case this is all just logged to debug.log (like everything else) and the parser works on that, filtering peer messages?

    As @jnewbery points out below, I totally missed that this was binary logging! I should have looked more carefully.

  84. jnewbery commented at 8:20 am on August 19, 2020: member

    It pains me a bit that this circumvents the existing logging infrastructure. Why is it not the case this is all just logged to debug.log (like everything else) and the parser works on that, filtering peer messages?

    This PR dumps binary serialized messages (not text) to file, which is then post-processed by other software. The msgs_*.dat files can be enormous - hundreds of megabytes if blocks are being sent over that connection. Converting those messages to printable text and dropping them into the debug.log file would make them larger still, overwhelm the standard debug logging (eg a block message converted to hex would be millions of characters), and probably be a performance bottleneck (since all messages logged to debug.log would need to be deserialized and passed through tinyformat.h).

    I think this also answers your other questions:

    I don’t think “.dat” is the right extension. These are logs, not data, and should end in “.log”.

    .log to me suggests printable text log files. These are binary files that need post-processing before they’re readable.

  85. practicalswift commented at 10:11 am on August 19, 2020: contributor

    Regarding the “why not use Wireshark?” discussion:

    I don’t know about which use cases others are thinking about, but having this in master would help me a lot at least:

    I will use this feature in ways not suitable for Wireshark: both for long-term permanent capture of messages for after-the-fact trouble-shooting/debugging, and also for capturing messages that can be fed as seeds into the net message fuzzers.

    I’ve played with this PR a bit and it covers my needs: I certainly hope it will land in master :)

  86. in src/net.cpp:2835 in 4fc2e39063 outdated
    2829@@ -2830,18 +2830,17 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
    2830         pnode->mapSendBytesPerMsgCmd[msg.m_type] += nTotalSize;
    2831         pnode->nSendSize += nTotalSize;
    2832 
    2833-        if (pnode->nSendSize > nSendBufferMaxSize)
    2834-            pnode->fPauseSend = true;
    2835+        if (pnode->nSendSize > nSendBufferMaxSize) pnode->fPauseSend = true;
    2836+        if (gArgs.IsArgSet("-logmessages")) {
    2837+            LogMessage(*pnode, msg.m_type, MakeSpan(msg.data), /* incoming */ false);
    


    practicalswift commented at 10:37 am on August 19, 2020:
    Could this be moved to the start of the function right after the LogPrint(BCLog::NET, …) call to a.) avoid being run while we’re holding pnode->cs_vSend (which we don’t need for logging), and b.) make sure all sending %s (%d bytes) peer=%d``BCLog::NET debug log entries are guaranteed to have a corresponding entry in the message capture file (in the absence of fatal logging errors)?

    troygiorshev commented at 1:41 pm on August 20, 2020:
    Yup, it can, and I agree with you that it should be. The locking in PushMessage is a bit of a mess, which I fix up a little in #19673.
  87. in src/net_processing.cpp:3842 in 4fc2e39063 outdated
    3838@@ -3841,6 +3839,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3839     }
    3840     CNetMessage& msg(msgs.front());
    3841 
    3842+    if (gArgs.IsArgSet("-logmessages")) {
    


    jonatack commented at 10:38 am on August 19, 2020:

    here and in net.cpp::2834, this may be somewhat more robust:

    0
    1-    if (gArgs.IsArgSet("-logmessages")) {
    2+    if (gArgs.GetBoolArg("-logmessages", false)) {
    

    see https://github.com/bitcoin/bitcoin/pull/19089/files#diff-321303fddcf725df060981d626a05df9R619 and tests at https://github.com/bitcoin/bitcoin/pull/19089/files#diff-ff77e31823822bc143e3113aaf85d4b9R64 that iirc would fail with IsArgSet()


    troygiorshev commented at 1:34 pm on August 20, 2020:
    thanks, fixed
  88. in src/net.cpp:2887 in 4fc2e39063 outdated
    2882@@ -2884,3 +2883,23 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2883 
    2884     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2885 }
    2886+
    2887+void LogMessage(const CNode& node, const std::string& msg_type, const Span<const unsigned char>& data, bool is_incoming)
    


    practicalswift commented at 10:40 am on August 19, 2020:
    To make it easier to reason about this function (and its locking requirements) from looking at the function signature, what about replacing const CNode& node with const CAddress& address since that is what is actually used (node.addr)?

    troygiorshev commented at 1:46 pm on August 20, 2020:
    I like this, thanks. One day LogMessage may use other attributes of a CNode, but it can be easily changed back when the time comes. Done.
  89. in src/init.cpp:535 in 4fc2e39063 outdated
    519@@ -520,6 +520,7 @@ void SetupServerArgs(NodeContext& node)
    520         ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    521     argsman.AddArg("-debugexclude=<category>", strprintf("Exclude debugging information for a category. Can be used in conjunction with -debug=1 to output debug logs for all categories except one or more specified categories."), ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    522     argsman.AddArg("-logips", strprintf("Include IP addresses in debug output (default: %u)", DEFAULT_LOGIPS), ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    523+    argsman.AddArg("-logmessages", "Log all p2p messages to file", ArgsManager::ALLOW_BOOL | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    


    jonatack commented at 11:42 am on August 19, 2020:

    I think the help should specify the message_logging directory location and also mention the location of and need to use the contrib/message-logging/message-logging-parser.py script.

    Perhaps also add a markdown documentation file somewhere relevant that makes sense.

    All of this, along with usage info, might also usefully be in the header docstring of the parser script and make sure it shows up nicely when git grepping.


    jonatack commented at 6:03 pm on August 19, 2020:
    The docs could also helpfully mention that passing both send and recv files to the parser will interleave them in order.

    fjahr commented at 12:40 pm on September 1, 2020:
    nit: Add info on the default setting.
  90. in contrib/message-logging/message-logging-parser.py:173 in 4fc2e39063 outdated
    119+    jsonrep = json.dumps(messages)
    120+    if output:
    121+        with open(str(output), 'w+', encoding="utf8") as f_out:
    122+            f_out.write(jsonrep)
    123+    else:
    124+        print(jsonrep)
    


    jonatack commented at 11:44 am on August 19, 2020:

    There may be a better way to do it, but I found the stdout to be much more readable with:

    0 import os
    1+import pprint
    2 import sys
    3...
    4-        print(jsonrep)
    5+        pprint.pprint(jsonrep)
    

    There seems to be a formatting issue due to unescaped double quotes, however.


    FelixWeis commented at 5:52 pm on August 19, 2020:
    or just piping trough jq, gives you colors too

    troygiorshev commented at 3:16 pm on October 7, 2020:
    I think I’ll leave formatting to the user! I’ll take the jq suggestion and add that as a note in the module docstring.
  91. michaelfolkson commented at 2:41 pm on August 19, 2020: contributor

    I think this is a clear Concept ACK for new contributors.

    I was unsure on the Concept ACK for longer term contributors. The fact that this is useful for some longer term contributors (e.g. @theStack, @practicalswift) convinces me it is just about a Concept ACK given the low cost (future maintenance, additional complexity).

    If there are concerns with maintenance of the Wireshark dissector, as I understand it, we could fork it and maintain it out of Wireshark’s codebase.

    This sounds like it would be more work and in the absence of someone saying they are keen to do this I think it is an overall Concept ACK for this PR from me.

    (Excellent initial motivation section from @troygiorshev and fair challenge from @luke-jr)

  92. sipa commented at 5:18 pm on August 19, 2020: member
    I wonder why the name “message dumping” was changed to “message logging”. I find dumping a much more accurate term, as this isn’t just producing a log (which sounds more like metadata recording), but an exact dump of all messages.
  93. in src/net.cpp:2891 in 4fc2e39063 outdated
    2882@@ -2884,3 +2883,23 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2883 
    2884     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2885 }
    2886+
    2887+void LogMessage(const CNode& node, const std::string& msg_type, const Span<const unsigned char>& data, bool is_incoming)
    2888+{
    2889+    auto time = GetTime<std::chrono::microseconds>();
    2890+
    2891+    fs::path base_path = GetDataDir() / "message_logging" / node.addr.ToString();
    


    FelixWeis commented at 5:53 pm on August 19, 2020:
    might be an issue on windows. : is not a valid file name char.

    troygiorshev commented at 4:39 pm on August 21, 2020:
    Thanks, fixed to use underscores instead
  94. in src/net.h:90 in 4fc2e39063 outdated
    86@@ -86,6 +87,8 @@ static const uint64_t MAX_UPLOAD_TIMEFRAME = 60 * 60 * 24;
    87 static const bool DEFAULT_BLOCKSONLY = false;
    88 /** -peertimeout default */
    89 static const int64_t DEFAULT_PEER_CONNECT_TIMEOUT = 60;
    90+/** Number of file descriptors required for message logging **/
    91+static const int NUM_FDS_MESSAGE_LOGGING = 1;
    


    ariard commented at 11:47 pm on August 19, 2020:
    Each open file should have its fd so nMaxConnections * 2 ?

    troygiorshev commented at 2:27 pm on August 20, 2020:
    We open and close the file every time, so I think it is correct the way it is. I intentionally didn’t want to double or triple our fd usage.

    ariard commented at 8:19 pm on August 20, 2020:
    Right, performance not being an issue, we shouldn’t bother opening/closing every time.
  95. in src/net_processing.cpp:3843 in 4fc2e39063 outdated
    3838@@ -3841,6 +3839,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3839     }
    3840     CNetMessage& msg(msgs.front());
    3841 
    3842+    if (gArgs.IsArgSet("-logmessages")) {
    3843+        LogMessage(*pfrom, msg.m_command, MakeUCharSpan(msg.m_recv), /* incoming */ true);
    


    ariard commented at 11:51 pm on August 19, 2020:
    Have you tried to fit this call in CConnman::SocketHandler like when we iter to increment nSizeAdded ? I think it would be cleaner to avoid leaking more network-level processing in a p2p protocol processing function.

    troygiorshev commented at 4:16 am on October 7, 2020:
    For simplicity, and to avoid any race conditions, I’ve made sure to keep all calls to LogMessage in MessageHandler.
  96. ariard commented at 0:05 am on August 20, 2020: member

    Approach ACK.

    Given the minimal new code it sounds like a win to relieve developers to install yet another debugging tool. Debugging tool which may not be accurate given the steady rate of new p2p messages (block filters, erlay, wtxid, …) or running cross-platform. If now one used it would be easy to remove it.

  97. in contrib/message-logging/message-logging-parser.py:110 in 0d10af73df outdated
    77+            time = int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    78+            msgtype = tmp_header.read(MSGTYPE_SIZE).split(b'\x00', 1)[0]     # type: bytes
    79+            length = int.from_bytes(tmp_header.read(LENGTH_SIZE), "little")  # type: int
    80+            if msgtype not in MESSAGEMAP:
    81+                # For now just skip unrecognized messages
    82+                f_in.read(length)
    


    theStack commented at 11:19 am on August 20, 2020:

    For only advancing the file pointer, the seek method would probably be more appropriate (see e.g. https://code-maven.com/python-seek):

    0                f_in.seek(length, os.SEEK_CUR)
    
  98. troygiorshev force-pushed on Aug 21, 2020
  99. troygiorshev commented at 4:48 pm on August 21, 2020: contributor

    git range-diff master 4fc2e39 ff3a39c

    Thanks for the review everyone!

    Improvement

    • Added a progress bar to the parser

    Fixes

    • Now uses GetBoolArg as opposed to IsArgSet
    • IP addresses now have colons replaced with underscores
    • One LogMessage call site is moved, as per this comment
    • LogMessage now takes a CAddress& as opposed to a CNode&
    • Clarified some comments

    Next up, a rename and some light documentation!

  100. in src/net_processing.cpp:3878 in 801154b360 outdated
    3838@@ -3839,6 +3839,10 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3839     }
    3840     CNetMessage& msg(msgs.front());
    3841 
    3842+    if (gArgs.GetBoolArg("-logmessages", false)) {
    


    fjahr commented at 12:08 pm on September 1, 2020:
    nit: Put the default in a shared const
  101. in src/net.cpp:2938 in b3a0e528d1 outdated
    2839+        if (nMessageSize) pnode->vSendMsg.push_back(std::move(msg.data));
    2840 
    2841         // If write queue empty, attempt "optimistic write"
    2842-        if (optimisticSend == true)
    2843-            nBytesSent = SocketSendData(pnode);
    2844+        if (optimisticSend == true) nBytesSent = SocketSendData(pnode);
    


    fjahr commented at 12:20 pm on September 1, 2020:

    nit

    0        if (optimisticSend) nBytesSent = SocketSendData(pnode);
    

    or:

    0        if (node->vSendMsg.empty()) nBytesSent = SocketSendData(pnode);
    

    and remove the bool completely. The comment already clarifies what is happening.


    troygiorshev commented at 4:59 am on October 7, 2020:
    I’m more than happy to do this!
  102. fjahr commented at 12:44 pm on September 1, 2020: contributor

    Concept ACK

    Will do testing and deeper review when CI is green. Mostly ignored the python code for now. Travis failed with running out of disc space but it would surprise me if that was related, so I restarted it. I don’t think I can restart Cirrus CI.

  103. laanwj commented at 12:36 pm on September 6, 2020: member
    Concept ACK, I think the dumping changes to the C++ here could entirely be replaced with a use of #19866, no strong opinion on whether we should do both or not, but I think the python analysis tooling here is useful in any case.
  104. DrahtBot added the label Needs rebase on Sep 21, 2020
  105. troygiorshev force-pushed on Sep 23, 2020
  106. troygiorshev commented at 2:11 am on September 23, 2020: contributor

    git range-diff master ff3a39c ba68f02

    trivial rebase

  107. DrahtBot removed the label Needs rebase on Sep 23, 2020
  108. jnewbery commented at 9:51 am on September 23, 2020: member
    @troygiorshev what’s the status here? You’ve rebased, but there are unaddressed review comments from a few contributors. CI is also failing.
  109. in contrib/message-logging/message-logging-parser.py:19 in ba68f02310 outdated
    14+from typing import Any, List, Optional
    15+
    16+sys.path.append(os.path.join(os.path.dirname(__file__), '../../test/functional'))
    17+
    18+from test_framework.messages import ser_uint256     # noqa: E402
    19+from test_framework.mininode import MESSAGEMAP      # noqa: E402
    


    michaelfolkson commented at 3:22 pm on September 27, 2020:
    All test_framework.mininode need to be replaced by test_framework.p2p as of https://github.com/bitcoin/bitcoin/pull/19760

    troygiorshev commented at 5:03 am on October 7, 2020:
    Thanks!
  110. DrahtBot added the label Needs rebase on Sep 29, 2020
  111. troygiorshev force-pushed on Oct 7, 2020
  112. troygiorshev commented at 4:09 am on October 7, 2020: contributor

    git range-diff master ba68f02 5d5aeca

    Rebased

  113. DrahtBot removed the label Needs rebase on Oct 7, 2020
  114. troygiorshev force-pushed on Oct 7, 2020
  115. troygiorshev renamed this:
    Per-Peer Message Logging
    Per-Peer Message Capture
    on Oct 7, 2020
  116. troygiorshev force-pushed on Oct 7, 2020
  117. troygiorshev commented at 4:55 am on October 7, 2020: contributor

    git range-diff master 5d5aeca 18bcf97

    • Renamed “Logging” to “Capture” throughout!
  118. troygiorshev force-pushed on Oct 7, 2020
  119. troygiorshev commented at 5:08 am on October 7, 2020: contributor

    git range-diff master 18bcf97 d2d3d12

    • Now uses seek instead of read
    • Fixed mininode to p2p problem
    • Removed unneeded optimistic_send boolean from PushMessage

    Still TODO: Documentation!

  120. practicalswift commented at 9:30 am on October 7, 2020: contributor

    Tested ACK d2d3d12d94477d0d377c82738832dfca2ac9e531

    Looking forward to having this in master. This functionality is super useful for 1.) learning purposes, 2.) troubleshooting purposes, and 3.) fuzzing purposes (easy generation of seed corpora from actual message data).

    I encourage others to review. To try this out:

    0$ gh checkout pr 19509 && make distclean && ./autogen.sh && ./configure --with-incompatible-bdb && make
    1$ src/bitcoind -capturemessages
    2# now look in ~/.bitcoin/message_capture/
    

    The availability of easy-to-use built-in tools such as this and bitcoin-cli -netinfo which allow for easy debugging/inspection of the P2P layer of Bitcoin Core will make the road to P2P understanding so much easier/quicker for future generations of Bitcoin Core contributions. Neat feature with great long-term benefits @troygiorshev! Thanks! :)

  121. jnewbery commented at 9:57 am on October 9, 2020: member
    I’m planning to rereview this once the CI failures are fixed and the documentation is done.
  122. troygiorshev force-pushed on Oct 14, 2020
  123. troygiorshev force-pushed on Oct 14, 2020
  124. troygiorshev commented at 6:17 am on October 14, 2020: contributor

    git range-diff master d2d3d12 d0ffb34

    • Fixed remaining mininode to p2p issue
    • Added documentation!
  125. troygiorshev commented at 6:17 am on October 14, 2020: contributor

    git range-diff master d0ffb34 9a9314a

    • Rebased to master to fix Travis

    All pending TODOs for this PR are complete. Thanks to everyone who has reviewed so far, should be an easy reACK from many of you!

  126. promag commented at 7:07 am on October 14, 2020: member
    Concept ACK.
  127. in src/net.cpp:2943 in c2f62aefab outdated
    2945+        if (nMessageSize) pnode->vSendMsg.push_back(std::move(msg.data));
    2946 
    2947         // If write queue empty, attempt "optimistic write"
    2948-        if (optimisticSend == true)
    2949-            nBytesSent = SocketSendData(pnode);
    2950+        if (pnode->vSendMsg.empty()) nBytesSent = SocketSendData(pnode);
    


    MarcoFalke commented at 7:51 am on October 14, 2020:
    Any reason for this behavior change?

    MarcoFalke commented at 7:52 am on October 14, 2020:
    Probably causing the ci failures

    promag commented at 1:34 pm on October 14, 2020:

    c2f62aefabc2b00ce0ed28901d80d636b600feef

    Exactly, msg.data is pushed to vSendMsg right above.


    troygiorshev commented at 10:24 pm on October 14, 2020:
    I’ll give this a look

    troygiorshev commented at 2:10 am on October 16, 2020:
    Yep this was silly of me, thanks for finding the bug!
  128. practicalswift commented at 8:21 am on October 14, 2020: contributor

    Removed my ACK (temporarily) in light of @MarcoFalke’s comment about the unrelated change in src/net.cpp.

    Hopefully that will be resolved soon and I’d be glad to re-review :)

  129. in src/net.cpp:2952 in 2dd6370738 outdated
    2996+
    2997+    fs::path base_path = GetDataDir() / "message_capture" / clean_addr;
    2998+    fs::create_directories(base_path);
    2999+
    3000+    fs::path path = base_path / (is_incoming ? "msgs_recv.dat" : "msgs_sent.dat");
    3001+    CAutoFile f(fsbridge::fopen(path, "ab"), SER_DISK, CLIENT_VERSION);
    


    promag commented at 1:40 pm on October 14, 2020:

    2dd63707380ae5cdf79a2a8d98e0133f881f9381

    I know this is a debugging feature, but file open/close for each message is kind of bad right, also because close flushes?


    troygiorshev commented at 2:44 am on October 16, 2020:
    I’m not sure that I see the disadvantage. The alternative required using a greater number of file descriptors, which I thought was worse.

    practicalswift commented at 1:14 pm on October 16, 2020:

    If measurements show that this ever becomes a problem in practice it can be tackled in a follow-up.

    TBH I’m much more worried about the possible file descriptor DoS vector that we would risk open up if the file descriptor were kept open.

    Security first! :)

  130. promag commented at 1:44 pm on October 14, 2020: member
    We could (also) expose these via ZMQ, just saying.
  131. troygiorshev force-pushed on Oct 16, 2020
  132. troygiorshev commented at 2:41 am on October 16, 2020: contributor

    git range-diff master 9a9314a 2418ec6

    • Reverted the change to optimisticSend

    Travis should™ pass now. I can’t get the fuzz tests to build, but I get the same linker error on master as I do here, so I assume it’s unrelated.

  133. troygiorshev commented at 9:46 am on October 16, 2020: contributor
    Cirrus failing to fetch sqlite-autoconf-3320100.tar.gz, 404. Unrelated.
  134. practicalswift commented at 1:48 pm on October 16, 2020: contributor
    ACK 2418ec658ccd2e8e033bced0f5b7c183946940ac
  135. theStack commented at 7:59 pm on November 15, 2020: contributor

    The changes themselves LGTM. However, while testing the feature on mainnet, the parsing tool contrib/message-capture/message-capture-parser.py encountered a problem decoding the following version message:

    0$ xxd -g1 msgs_recv.dat
    100000000: 75 da 89 ca 28 b4 05 00 76 65 72 73 69 6f 6e 00  u...(...version.
    200000010: 00 00 00 00 6b 00 00 00 60 ea 00 00 00 00 00 00  ....k...`.......
    300000020: 00 00 00 00 95 68 b1 5f 00 00 00 00 01 00 00 00  .....h._........
    400000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff  ................
    500000040: d9 4f b2 14 20 8d 01 00 00 00 00 00 00 00 00 00  .O.. ...........
    600000050: 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00  ................
    700000060: 25 08 55 ca 19 a0 39 05 15 2f 62 69 74 63 6f 69  %.U...9../bitcoi
    800000070: 6e 2d 73 65 65 64 65 72 3a 30 2e 30 31 2f 30 57  n-seeder:0.01/0W
    900000080: 05 00 00                                         ...
    

    This output can be converted back to the binary file via xxd -r hexdump.txt. If anyone has a better idea how to share binary data for reviewing here (probably base64 or alike?), I’d be glad to hear it.

     0$ contrib/message-capture/message-capture-parser.py ~/.bitcoin/message_capture/78.244.132.5_52410/*
     1Processing /home/honeybadger/.bitcoin/message_capture/78.244.132.5_52410/msgs_recv.dat
     2Traceback (most recent call last):
     3  File "contrib/message-capture/message-capture-parser.py", line 178, in <module>
     4    main()
     5  File "contrib/message-capture/message-capture-parser.py", line 160, in main
     6    process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
     7  File "contrib/message-capture/message-capture-parser.py", line 117, in process_file
     8    assert_equal(length, payload_length)
     9  File "contrib/message-capture/../../test/functional/test_framework/util.py", line 50, in assert_equal                                                                                                   
    10    raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    11AssertionError: not(107 == 106)
    

    It seems like the deserialization routine for msg_version is consuming one byte less than expected. Guess that has to do with the optional fRelay field at the end. See also BIP37 and BIP60 – the latter is a draft, but it seems to summarize perfectly the problem we have here.

  136. in contrib/message-capture/message-capture-docs.md:18 in 2418ec658c outdated
    13+  * Inside each peer's folder there are two `.dat` files: one is for received messages (`msgs_recv.dat`) and the other is for sent messages (`msgs_sent.dat`).
    14+* Run `contrib/message-capture/message-capture-parser.py` with the proper arguments.
    15+  * See the `-h` option for help.
    16+  * To see all messages, both sent and received, for all peers use:
    17+    ```
    18+    python contrib/message-capture/message-capture-parser.py -o out.json \
    


    theStack commented at 8:10 pm on November 15, 2020:
    nit: I’d just drop the python command part and call the script directly (also because on some systems, python symlinks to python2, not python3! See also https://www.python.org/dev/peps/pep-0394/), thanks to the she-bang on top of the file the right python interpreter will be called.

    troygiorshev commented at 10:12 pm on November 17, 2020:
    Thank you again for the link. I’m so used to having python environment problems that I’ve always tried to be as explicit as possible when running python scripts. Since switching to conda I haven’t had any problems, and I’m sure other people have things figured out as well (Either they only have one python install or they use an environment manager too). I’ll make this change.
  137. in contrib/message-capture/message-capture-docs.md:25 in 2418ec658c outdated
    20+    ```
    21+  * Note:  The messages in the given `.dat` files will be interleaved in chronological order.  So, giving both received and sent `.dat` files (as above with `*.dat`) will result in all messages being interleaved in chronological order.
    22+  * If an output file is not provided (i.e. the `-o` option is not used), then the output prints to `stdout`.
    23+* View the resulting output.
    24+  * The output file is `JSON` formatted.
    25+  * Suggestion: use `jq` to view the output, with `cat out.json | jq`
    


    theStack commented at 8:13 pm on November 15, 2020:
    nit: Could simply use jq out.json here (see also the “The Useless Use of Cat Award” http://porkmail.org/era/unix/award.html#cat)

    troygiorshev commented at 9:58 pm on November 17, 2020:

    Thanks for the link :)

    cat foo.txt | less is so common I once saw the suggestion to set a shell alias: alias L='| less' so that you can run cat foo.txt L!

  138. troygiorshev force-pushed on Nov 18, 2020
  139. troygiorshev commented at 0:42 am on November 18, 2020: contributor

    git range-diff master 2418ec6 4690881

    • Fixed version fRelay bug
      • The parser now more closely matches how parsing works in the test framework and in bitcoind
    • Documentation now follows PEP 0394
    • Removed cat overuse in documentation

    First of all, many thanks @theStack for both finding the bug and identifying exactly what causes it. Your guess was spot on and the links to the BIPs were very useful! It ended up identifying a small bug with how I implemented my message parser.

    However, I’m not sure what to think about this message. Looking at it closely, it has the following:

    • Version = 60000
    • fRelay = 0 (1 byte)
    • Declared payload size = 107
    • Actual payload size = 107

    The fRelay field is only supposed to exist as of version 70001. So this message is malformed. But, in the protocol’s implementation in bitcoind, we don’t check this.

    This is a discrepancy between our testing framework and bitcoind. If fRelay was 1 for a message with version <70001, bitcoind would take it as a 1, whereas our testing framework would take it was a 0. I don’t think this will ever happen in the real world, but it’s worth noting. I’ve opened a PR to discuss this as #20411.

    For now I take this as fixed.

  140. troygiorshev commented at 1:03 am on November 18, 2020: contributor
    Also, I thought sharing the binary file as a hexdump worked great! Thanks for adding the tip to use xxd’s -r flag. I personally use HxD in Windows to view and edit binary files. It’s a GUI so everything’s messy - someone who uses xxd with vim might have a good reason to have binary files shared one way or another.
  141. in contrib/message-capture/message-capture-parser.py:20 in 4690881fc5 outdated
    15+
    16+sys.path.append(os.path.join(os.path.dirname(__file__), '../../test/functional'))
    17+
    18+from test_framework.messages import ser_uint256     # noqa: E402
    19+from test_framework.p2p import MESSAGEMAP           # noqa: E402
    20+from test_framework.util import assert_equal        # noqa: E402
    


    MarcoFalke commented at 6:24 am on November 18, 2020:
    contrib/message-capture/message-capture-parser.py:20:1: F401 ’test_framework.util.assert_equal’ imported but unused

    troygiorshev commented at 3:36 pm on January 20, 2021:
    Fixed
  142. MarcoFalke commented at 6:29 am on November 18, 2020: member
    another example: In commit fa74d3d720f I removed unused deserialize code, because supported versions of Bitcoin Core would never serialize such a version message. (We only use the test framework to test supported versions of Core). So I am wondering if the message capture may ever write a message that can’t be parsed by our test code. And if there is a way to protect against this.
  143. practicalswift commented at 10:02 am on November 18, 2020: contributor

    Is there any chance this PR could get a release milestone?

    FWIW I love this feature and would love to see it in master soon :) Thanks @troygiorshev!

  144. DrahtBot added the label Needs rebase on Nov 19, 2020
  145. troygiorshev commented at 7:01 am on November 25, 2020: contributor

    So I am wondering if the message capture may ever write a message that can’t be parsed by our test code. And if there is a way to protect against this.

    Ideally, in my mind, our test code should function as closely as possible to Core. So, if a message can be parsed by Core it should be able to be parsed by our test code. You’re right that this will get tricky for undefined behavior or broken messages like this.

    For now I’ll add a MIN_PEER_PROTO_VERSION check as suggested in 14025.

    Edit: We still have to have a way to deal with messages that fail to be deserialized by the test code. Doesn’t appear to be a problem, see below.

  146. MarcoFalke commented at 7:06 am on November 25, 2020: member
    the message capture code will write the message before parsing. So except for the message header (message type) nothing is parsed, it could be any raw byte string. There is no way the test framework could make sense of that if not by accident.
  147. troygiorshev commented at 7:20 am on November 25, 2020: contributor

    Oh, yeah you’re completely right, thanks.

    Luckily (due to #19107) only messages with a valid header are pushed onto the vRecv queue, given to ProcessMessages, and parsed by the message capture parser. So we can trust that the header is workable. This will allow us to simply skip messages files that contain messages that fail deserilization, and we should be able to continue parsing all of the other ones.

    I’ll think of a nice way to signal this to the user.

    Edit: I think we’ll have to skip the entire file that contains a bad message, as a bad stated size will result in the next message being deserialized badly almost all of the time. I’ll experiment and see how best to fix this.

  148. Clean PushMessage and ProcessMessages
    This brings PushMessage and ProcessMessages further in line with the
    style guide by fixing their if statements.
    
    LogMessage is later called, inside an if statement, inside both of these
    methods.
    dbf779d5de
  149. laanwj referenced this in commit 43f3ada27b on Jan 18, 2021
  150. troygiorshev force-pushed on Jan 19, 2021
  151. troygiorshev commented at 8:49 pm on January 19, 2021: contributor

    git range-diff master 4690881 abbb8ac

    • Parser now no longer crashes when it reaches an unrecognized or broken message. Instead it simply skips that message and the rest of the file the message came from.
    • Removed an unused import from the previous force push
  152. troygiorshev force-pushed on Jan 19, 2021
  153. troygiorshev commented at 9:03 pm on January 19, 2021: contributor

    git range-diff master abbb8ac d2e0f73

    • Rebased
  154. in src/init.cpp:524 in d2e0f731c5 outdated
    520@@ -521,6 +521,7 @@ void SetupServerArgs(NodeContext& node)
    521     argsman.AddArg("-limitdescendantcount=<n>", strprintf("Do not accept transactions if any ancestor would have <n> or more in-mempool descendants (default: %u)", DEFAULT_DESCENDANT_LIMIT), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    522     argsman.AddArg("-limitdescendantsize=<n>", strprintf("Do not accept transactions if any ancestor would have more than <n> kilobytes of in-mempool descendants (default: %u).", DEFAULT_DESCENDANT_SIZE_LIMIT), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    523     argsman.AddArg("-addrmantest", "Allows to test address relay on localhost", ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    524+    argsman.AddArg("-capturemessages", "Capture all p2p messages to file", ArgsManager::ALLOW_BOOL | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    


    practicalswift commented at 9:26 pm on January 19, 2021:

    Nit: “P2P” is more commonly used.

    0$ git grep -E ' P2P ' | wc -l
    1230
    2$ git grep -E ' p2p ' | wc -l
    363
    

    troygiorshev commented at 9:44 pm on January 19, 2021:
    Sounds good, and Wikipedia agrees. Fixed
  155. in contrib/message-capture/message-capture-docs.md:9 in d2e0f731c5 outdated
    0@@ -0,0 +1,25 @@
    1+# Per-Peer Message Capture
    2+
    3+## Purpose
    4+
    5+This feature allows for message capture on a per-peer basis.  It answers the simple question: "Can I see what messages my node is sending and receiving?"
    6+
    7+## Usage and Functionality
    8+
    9+* Run `bitcoind` with the `-capturemessage` option.
    


    practicalswift commented at 9:34 pm on January 19, 2021:
    This should be -capturemessages (with ending “s”) :)

    troygiorshev commented at 9:44 pm on January 19, 2021:
    Thank you!
  156. practicalswift commented at 9:41 pm on January 19, 2021: contributor
    Tested ACK d2e0f731c5c7a2e1c060beb1ed5b16e7632e5f9d modulo nits
  157. troygiorshev force-pushed on Jan 19, 2021
  158. troygiorshev commented at 9:46 pm on January 19, 2021: contributor

    git range-diff master d2e0f73 af8bcd1

    • Fixed two typos in documentation
  159. DrahtBot removed the label Needs rebase on Jan 19, 2021
  160. practicalswift commented at 2:43 am on January 20, 2021: contributor
    Tested ACK af8bcd1a6ecdfa15d7fb0f98256d47dec9b40f61
  161. in src/net.cpp:2868 in af8bcd1a6e outdated
    2863@@ -2864,6 +2864,9 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
    2864 {
    2865     size_t nMessageSize = msg.data.size();
    2866     LogPrint(BCLog::NET, "sending %s (%d bytes) peer=%d\n",  SanitizeString(msg.m_type), nMessageSize, pnode->GetId());
    2867+    if (gArgs.GetBoolArg("-capturemessages", false)) {
    2868+        CaptureMessage(pnode->addr, msg.m_type, MakeSpan(msg.data), /* incoming */ false);
    


    jnewbery commented at 11:10 am on January 20, 2021:

    There’s no need to explicitly convert to a span from a vector since #18468 was merged.

    0        CaptureMessage(pnode->addr, msg.m_type, msg.data, /* incoming */ false);
    

    No need to change this unless you touch the branch for another reason.

  162. in contrib/message-capture/message-capture-parser.py:173 in af8bcd1a6e outdated
    168+        progress_bar = ProgressBar(total_size)
    169+        for capture in capturepaths:
    170+            process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
    171+    else:
    172+        for capture in capturepaths:
    173+            process_file(str(capture), messages, "recv" in capture.stem, None)
    


    jnewbery commented at 11:19 am on January 20, 2021:

    No need to duplicate the process_file call:

    0    else:
    1        progress_bar = None
    2
    3    for capture in capturepaths:
    4        process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
    
  163. in contrib/message-capture/message-capture-parser.py:157 in af8bcd1a6e outdated
    152+        help="binary message capture files to parse.")
    153+    parser.add_argument(
    154+        "-o", "--output",
    155+        help="output file.  If unset print to stdout")
    156+    parser.add_argument(
    157+        "--no-progress-bar",
    


    jnewbery commented at 11:20 am on January 20, 2021:
    Can you add -n as a short argument here?
  164. in contrib/message-capture/message-capture-parser.py:95 in af8bcd1a6e outdated
    90+        return obj.hex()
    91+    else:
    92+        return obj
    93+
    94+
    95+def process_file(path: str, messages: List[Any], recv: bool, progress_bar: Optional[ProgressBar]) -> None:
    


    jnewbery commented at 11:22 am on January 20, 2021:

    This function is a bit of a wall of code. I’d suggest a few blank lines to split up different areas of functionality and make it a bit more readable.

    Something like:

     0        while True:
     1            if progress_bar:
     2                # Update progress bar
     3                diff = f_in.tell() - bytes_read - 1
     4                progress_bar.update(diff)
     5                bytes_read = f_in.tell() - 1
     6
     7            # Read header
     8            tmp_header_raw = f_in.read(TIME_SIZE + LENGTH_SIZE + MSGTYPE_SIZE)
     9            if not tmp_header_raw:
    10                break
    11            tmp_header = BytesIO(tmp_header_raw)
    12            time = int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    13            msgtype = tmp_header.read(MSGTYPE_SIZE).split(b'\x00', 1)[0]     # type: bytes
    14            length = int.from_bytes(tmp_header.read(LENGTH_SIZE), "little")  # type: int
    15
    16            if msgtype not in MESSAGEMAP:
    17                # For now just skip unrecognized messages
    18                f_in.seek(length, os.SEEK_CUR)
    19                print(f"Skipping unrecognized message type {msgtype}", file=sys.stderr)
    20                continue
    21
    22            # Read body
    23            msg_ser = BytesIO(f_in.read(length))
    24            msg = MESSAGEMAP[msgtype]()
    25            try:
    26                msg.deserialize(msg_ser)
    27            except KeyboardInterrupt:
    28                raise
    29            except Exception:
    30                # Message was bad, skip the rest of the file
    31                print(f"Found broken message, skipping remainder of file: {path}", file=sys.stderr)
    32                break
    33
    34            # Construct msg_dict and insert into messages
    35            msg_dict = {}
    36            msg_dict["msgtype"] = getattr(msg, "msgtype", None).decode()
    37            msg_dict["direction"] = "recv" if recv else "sent"
    38            msg_dict["time"] = time
    39            msg_dict["size"] = length   # "size" is less readable here, but more readable in the output
    40            if length:
    41                msg_dict["body"] = to_jsonable(msg)
    42            messages.append(msg_dict)
    43
    44        if progress_bar:
    45            # Update the progress bar to the end of the current file
    46            # in case we exited the loop early
    47            f_in.seek(0, os.SEEK_END)   # Go to end of file
    48            diff = f_in.tell() - bytes_read - 1
    49            progress_bar.update(diff)
    
  165. in contrib/message-capture/message-capture-parser.py:66 in af8bcd1a6e outdated
    61+        num_blocks = int(max_blocks*progress)
    62+        print('\r[ {}{} ] {:3.0f}%'.format(
    63+                '#'*num_blocks,
    64+                ' '*(max_blocks - num_blocks),
    65+                progress*100),
    66+            end ='')
    


    jnewbery commented at 11:27 am on January 20, 2021:

    A few pep8 style suggestions:

    • surround arithmetic operators with spaces
    • no spaces around named argument parameters
    • align parameters with opening parens
    0        num_blocks = int(max_blocks * progress)
    1        print('\r[ {}{} ] {:3.0f}%'.format('#' * num_blocks,
    2                                           ' ' * (max_blocks - num_blocks),
    3                                           progress * 100),
    4              end='')
    

    practicalswift commented at 3:56 pm on January 20, 2021:
    Since this is new code: consider running black all new .py files added in this PR :)

    troygiorshev commented at 10:16 pm on January 20, 2021:
    Thanks, this is much more readable now. I’ve broken it up slightly differently than you.
  166. in test/functional/p2p_message_capture.py:44 in af8bcd1a6e outdated
    39+            tmp_header_raw = f_in.read(TIME_SIZE + LENGTH_SIZE + MSGTYPE_SIZE)
    40+            if not tmp_header_raw:
    41+                break
    42+            tmp_header = BytesIO(tmp_header_raw)
    43+            time = int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    44+            assert(time >= 1231006505000000)   # genesis block timestamp
    


    jnewbery commented at 11:40 am on January 20, 2021:
    I’m not sure if we want this. You could imagine running this test inside some environment where the system time is not set correctly. That’d potentially cause this test to fail even though there’s no problem with message capture.

    troygiorshev commented at 10:19 pm on January 20, 2021:
    I’m fine to remove this, especially given your other suggestions that add other pieces to this test.
  167. in test/functional/p2p_message_capture.py:49 in af8bcd1a6e outdated
    41+                break
    42+            tmp_header = BytesIO(tmp_header_raw)
    43+            time = int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    44+            assert(time >= 1231006505000000)   # genesis block timestamp
    45+            msgtype = tmp_header.read(MSGTYPE_SIZE).split(b'\x00', 1)[0]     # type: bytes
    46+            assert(len(msgtype) > 0)
    


    jnewbery commented at 11:41 am on January 20, 2021:
    Perhaps also assert that there are no non-null characters after the first non-null character? i.e. the message type is a string of printable characters followed by just null characters.
  168. in test/functional/p2p_message_capture.py:37 in af8bcd1a6e outdated
    32+    - Message Type
    33+        - We can add new message types
    34+
    35+    We're ignoring these because they're simply too brittle to test here.
    36+    """
    37+    with open(dat_file, 'rb') as f_in:
    


    jnewbery commented at 11:49 am on January 20, 2021:
    We should also test that these files are non-empty. We may not know the exact sequence of messages, but there should be some messages during initial handshake, and we should capture them.
  169. in src/net.h:1246 in af8bcd1a6e outdated
    1241@@ -1239,6 +1242,8 @@ inline std::chrono::microseconds PoissonNextSend(std::chrono::microseconds now,
    1242     return std::chrono::microseconds{PoissonNextSend(now.count(), average_interval.count())};
    1243 }
    1244 
    1245+void CaptureMessage(const CAddress& addr, const std::string& msg_type, const Span<const unsigned char>& data, bool is_incoming);
    


    jnewbery commented at 12:03 pm on January 20, 2021:

    We don’t have guidance on line length in the style guide, but anything over 100 columns is a bit long for my taste. Also consider adding a doxygen comment:

    0/** Dump timestamp and binary message to file */
    1void CaptureMessage(const CAddress& addr, const std::string& msg_type,
    2                    const Span<const unsigned char>& data, bool is_incoming);
    

    troygiorshev commented at 7:05 pm on January 21, 2021:

    Added the doxygen comment, but I think I’ll leave the long line as it is. Looking through the rest of net.h, I see many declarations longer than 100 characters, and I don’t see any that have been broken up the way you’re suggesting.

    I agree with you that it’s not great, but I’m going to appeal to consistency in this case.


    jnewbery commented at 10:06 am on January 22, 2021:
    That’s fair!
  170. in src/net.cpp:2938 in af8bcd1a6e outdated
    2931@@ -2933,3 +2932,27 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2932 
    2933     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2934 }
    2935+
    2936+void CaptureMessage(const CAddress& addr, const std::string& msg_type, const Span<const unsigned char>& data, bool is_incoming)
    2937+{
    2938+    auto time = GetTime<std::chrono::microseconds>();
    


    jnewbery commented at 12:04 pm on January 20, 2021:
    maybe s/time/now/
  171. jnewbery commented at 12:07 pm on January 20, 2021: member

    utACK af8bcd1a6ecdfa15d7fb0f98256d47dec9b40f61

    This looks great, Troy. I think it can be merged, but I’ve given a few style suggestions inline. Feel free to take or leave.

  172. sidhujag referenced this in commit d35db2baaa on Jan 20, 2021
  173. troygiorshev force-pushed on Jan 21, 2021
  174. troygiorshev force-pushed on Jan 21, 2021
  175. troygiorshev commented at 7:39 pm on January 21, 2021: contributor

    git range-diff master af8bcd1 a17b250

    • Implemented various style improvement suggested by jnewbery above
  176. in contrib/message-capture/message-capture-docs.md:25 in a17b2509e0 outdated
    20+    ```
    21+  * Note:  The messages in the given `.dat` files will be interleaved in chronological order.  So, giving both received and sent `.dat` files (as above with `*.dat`) will result in all messages being interleaved in chronological order.
    22+  * If an output file is not provided (i.e. the `-o` option is not used), then the output prints to `stdout`.
    23+* View the resulting output.
    24+  * The output file is `JSON` formatted.
    25+  * Suggestion: use `jq` to view the output, with `jq out.json`
    


    theStack commented at 9:27 pm on January 21, 2021:

    nit: sorry to be annoying with this command again, but I just found out that it’s mandatory to add a filter as second parameter (e.g. the dot . for all), otherwise the command fails.

     0$ jq out.json
     1jq: error: out/0 is not defined at <top-level>, line 1:
     2out.json
     3jq: 1 compile error
     4
     5$ jq . out.json
     6[
     7  {
     8    "msgtype": "version",
     9    "direction": "sent",
    10..........
    

    troygiorshev commented at 8:45 pm on January 22, 2021:

    Ah you caught me, I still do cat out.json | jq :)

    Thanks for checking this, I’ll fix it in the docs.

  177. in contrib/message-capture/message-capture-parser.py:106 in d3353b507f outdated
    101+            if progress_bar:
    102+                # Update progress bar
    103+                diff = f_in.tell() - bytes_read - 1
    104+                progress_bar.update(diff)
    105+                bytes_read = f_in.tell() - 1
    106+            
    


    theStack commented at 9:29 pm on January 21, 2021:
    yocto-nit: git shows some trailing whitespace in this line
  178. theStack approved
  179. theStack commented at 9:51 pm on January 21, 2021: contributor

    Tested ACK a17b2509e0d7e7e05fd5600d5387608bb706ac0e Reviewed the code, captured messages on mainnet for a few minutes and parsed the output with the included parser, everything LGTM, except minor things (see below). Looking forward to see this in master (hopefully) soon! :rocket:

    A follow-up that takes care of some PEP8 code style issues (running pylint or black helps here) may makes sense though :) Didn’t check Python type annotations, maybe another reviewer experienced with that could run via mypy. (Potential issues could also solved in a follow-up though.)

  180. in src/init.cpp:524 in e2b6430dcb outdated
    520@@ -521,6 +521,7 @@ void SetupServerArgs(NodeContext& node)
    521     argsman.AddArg("-limitdescendantcount=<n>", strprintf("Do not accept transactions if any ancestor would have <n> or more in-mempool descendants (default: %u)", DEFAULT_DESCENDANT_LIMIT), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    522     argsman.AddArg("-limitdescendantsize=<n>", strprintf("Do not accept transactions if any ancestor would have more than <n> kilobytes of in-mempool descendants (default: %u).", DEFAULT_DESCENDANT_SIZE_LIMIT), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    523     argsman.AddArg("-addrmantest", "Allows to test address relay on localhost", ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    524+    argsman.AddArg("-capturemessages", "Capture all P2P messages to file", ArgsManager::ALLOW_BOOL | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    


    sipa commented at 11:50 pm on January 21, 2021:
    Ultranit: “to file” is slightly confusing as it’s not capturing all traffic into just a single file.

    sipa commented at 0:03 am on January 22, 2021:
    Why make this DEBUG_ONLY? It’s certainly not a feature for all users, but it’s also not just a development debugging tool.

    jnewbery commented at 10:24 am on January 22, 2021:
    Probably my suggestion as well. Making it DEBUG_ONLY allows this to be an experimental feature that we can alter/remove in future releases without disrupting users’ workflow. We can always remove DEBUG_ONLY later if we’re happy to make it more visible.

    troygiorshev commented at 9:15 pm on January 22, 2021:
    s/file/disk/

    troygiorshev commented at 9:20 pm on January 22, 2021:
    I think I’ll keep DEBUG_ONLY unless having it is harmful for some reason. I agree that it’s very easy to remove later, and very difficult to add back.
  181. in src/net_processing.cpp:4045 in e2b6430dcb outdated
    4041@@ -4042,6 +4042,10 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt
    4042     }
    4043     CNetMessage& msg(msgs.front());
    4044 
    4045+    if (gArgs.GetBoolArg("-capturemessages", false)) {
    


    sipa commented at 11:52 pm on January 21, 2021:

    It’s a bit strange that this happens in net_processing while the capture of outgoing messages is in net.

    Any reason why this can’t be in net as well? I’d expect CNode::ReceiveMsgBytes to work well.


    MarcoFalke commented at 10:06 am on January 22, 2021:
    The difference would be that disconnected nodes may still have their messages captured and that the capture file will record a time closer to the receive time, as opposed to the process time of the message? No opinion on what to do, just asking about the difference.

    jnewbery commented at 10:22 am on January 22, 2021:
    There was some discussion of that here: #19509 (review).

    sipa commented at 6:45 pm on January 22, 2021:

    Hmm, I don’t think this functionality really belongs in the net_processing layer/thread in the first place. It’s net that handles the connection and parsing/serializing of messages from/to a stream of bytes. So if that’s the reason for having this weird split across modules, I’m not sure it’s desirable.

    Anyway, no big deal - it just strikes me as ugly, but if it works, it works.


    jnewbery commented at 9:50 pm on January 22, 2021:
    I see this message dumping as a net_processing layer thing. It should be agnostic to the connection layer (eg if we eventually use BIP 151 or similar scheme, then the messages delivered to and sent from the net_processing layer shouldn’t change). If you need the actual bytes sent on the wire, then tcpdump or similar might be more appropriate.

    troygiorshev commented at 9:52 pm on January 22, 2021:

    As it stands, message capture always happens in the MessageHandler thread. Unfortunately, functions in this thread span both net and net_processing.

    I’ve responded more fully below, #19509 (comment)

  182. in contrib/message-capture/message-capture-parser.py:201 in d3353b507f outdated
    180+        progress_bar = None
    181+
    182+    for capture in capturepaths:
    183+        process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
    184+
    185+    messages.sort(key=lambda msg: msg['time'])
    


    sipa commented at 11:56 pm on January 21, 2021:

    A suggestion (possibly for a future PR):

    It’s unnecessary to have this script load the entire dump of all files into memory and then sort them, as we know that every individual file is internally sorted.

    So an approach is possible where you have a loop in which you read one message for each file, and then loop {pick the one with the smallest timestamp; print that one; read one message from that file; repeat}. This would be faster, lower memory (especially when analyzing lots of dumps), and support streaming.


    jnewbery commented at 10:26 am on January 22, 2021:

    Yes, definitely a good suggestion. I agree this can wait for a future PR.

    See the use of heapq.merge() in test/functional/combine_logs.py for one way to do this (although I never enhanced that to support streaming since there wasn’t an immediate need).


    troygiorshev commented at 10:19 pm on January 22, 2021:
    I agree on all accounts: this is a naive solution but I’ll leave an improvement for a future PR.
  183. sipa commented at 0:18 am on January 22, 2021: member

    Concept ACK. I really like the approach of having separate files per peer/connection. Slightly less sure about separating send/receive, but seems ok.

    I wonder if we shouldn’t go further actually. Having one file (or two) = one session would be pretty nice for certain analyses. That could be accomplished by adding e.g. (our own node startup time, peer id) or (connection start time) in the filenames (or directory names).

    I’m slightly concerned about the overhead of opening/closing files for every message sent/received. Has anyone benchmarked this (or have ideas for a useful benchmark, even)? I understand the rationale of not capturing occupying a large part of the file descriptor space for this, but it’d be good to know it doesn’t really hurt.

    Apologies if these questions have been answered already.

  184. in contrib/message-capture/message-capture-parser.py:132 in a17b2509e0 outdated
    127+                msg.deserialize(msg_ser)
    128+            except KeyboardInterrupt:
    129+                raise
    130+            except Exception:
    131+                # Message was bad, skip the rest of the file
    132+                print(f"Found broken message, skipping remainder of file: {path}", file=sys.stderr)
    


    sipa commented at 0:23 am on January 22, 2021:
    Not sure this is what you want to do when an unparsable message is encountered. As each record has an overall length field, there should be no problem with skipping it.

    MarcoFalke commented at 10:21 am on January 22, 2021:
    Tend to agree. The test framework can only deserialize a subset of messages than what Bitcoin Core can deserialize. This is expected to happen in practice, so should fail gracefully in some way. Probably like an unknown message type.

    troygiorshev commented at 10:38 pm on January 22, 2021:
    You’re right. Because of how the capture is done in net.cpp, the length field is guaranteed to be correct. (We’ll never have a misalignment) I’ll change this to just skip the single message, and I’ll insert an informative placeholder into the JSON.

    sipa commented at 10:39 pm on January 22, 2021:
    You could still include the command name (if printable), and a hexdump of the payload.
  185. in src/net_processing.cpp:4046 in e2b6430dcb outdated
    4041@@ -4042,6 +4042,10 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt
    4042     }
    4043     CNetMessage& msg(msgs.front());
    4044 
    4045+    if (gArgs.GetBoolArg("-capturemessages", false)) {
    4046+        CaptureMessage(pfrom->addr, msg.m_command, MakeUCharSpan(msg.m_recv), /* incoming */ true);
    


    MarcoFalke commented at 9:53 am on January 22, 2021:
    Note to myself: Remove MakeUCharSpan after #20464
  186. in contrib/message-capture/message-capture-parser.py:48 in a17b2509e0 outdated
    43+HASH_INT_VECTORS = [
    44+    "hashes",
    45+    "headers",
    46+    "vHave",
    47+    "vHash",
    48+]
    


    MarcoFalke commented at 10:09 am on January 22, 2021:
    completely unrelated, but this seems fragile. Would be good to remove those ints and only use uint256`s

    troygiorshev commented at 7:27 pm on January 23, 2021:
    I agree that this is fragile, but I’m not sure exactly what you mean about removing them. We don’t really have a uint256 object in python. Adding one would unfortunately mean rewriting a LOT of our testing framework…
  187. jnewbery commented at 10:22 am on January 22, 2021: member

    Slightly less sure about separating send/receive, but seems ok.

    This was my suggestion. Maybe a hangover from my telecoms days when it was common to dump different streams for send/recv. The benefit here is that we save one bit (really one byte since there’s nowhere to stuff it in the other metadata fields) per message because we don’t need a flag to indicate send/recv. As well as that, there are probably at least some tools/analyses that are only interested in traffic in one direction. Splitting the files means that those tools would not have to read/filter/seek over a bunch of messages that they’re not interested in.

    I wonder if we shouldn’t go further actually. Having one file (or two) = one session would be pretty nice for certain analyses. That could be accomplished by adding e.g. (our own node startup time, peer id) or (connection start time) in the filenames (or directory names).

    ACK! I think a directory structure of <IP_port>/<peerid>/[send|recv].dev would work very nicely.

    I’m slightly concerned about the overhead of opening/closing files for every message sent/received. Has anyone benchmarked this (or have ideas for a useful benchmark, even)? I understand the rationale of not occupying a large part of the file descriptor space for this, but it’d be good to know it doesn’t really hurt.

    This is something that @troygiorshev and I discussed at length, but I can’t find that discussion in this PR, so it was probably before it was opened. Troy, do you have any notes from then? I know you looked into both alternatives.

  188. in contrib/message-capture/message-capture-parser.py:137 in a17b2509e0 outdated
    132+                print(f"Found broken message, skipping remainder of file: {path}", file=sys.stderr)
    133+                break
    134+
    135+            # Convert message to dictionary and add it to the messages dict
    136+            msg_dict = {}
    137+            msg_dict["msgtype"] = getattr(msg, "msgtype", None).decode()
    


    MarcoFalke commented at 10:30 am on January 22, 2021:

    nit

    0            msg_dict["msgtype"] = msg.msgtype.decode()
    

    or

    0            msg_dict["msgtype"] = msg_type.decode()
    
  189. in contrib/message-capture/message-capture-parser.py:154 in a17b2509e0 outdated
    149+            diff = f_in.tell() - bytes_read - 1
    150+            progress_bar.update(diff)
    151+
    152+
    153+def main():
    154+    """Main"""
    


    MarcoFalke commented at 10:33 am on January 22, 2021:

    nit

  190. in test/functional/p2p_message_capture.py:50 in a17b2509e0 outdated
    45+            _ = int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    46+            raw_msgtype = tmp_header.read(MSGTYPE_SIZE)
    47+            msgtype = raw_msgtype.split(b'\x00', 1)[0]     # type: bytes
    48+            remainder =  raw_msgtype.split(b'\x00', 1)[1]
    49+            assert(len(msgtype) > 0)
    50+            assert(msgtype.decode().isprintable())
    


    MarcoFalke commented at 10:40 am on January 22, 2021:
    Just check that the message type is in MESSAGEMAP? I think we wouldn’t ever want to add a message type that is not in MESSAGEMAP.

    troygiorshev commented at 9:02 pm on January 23, 2021:
    Good idea, thanks
  191. in test/functional/p2p_message_capture.py:51 in a17b2509e0 outdated
    46+            raw_msgtype = tmp_header.read(MSGTYPE_SIZE)
    47+            msgtype = raw_msgtype.split(b'\x00', 1)[0]     # type: bytes
    48+            remainder =  raw_msgtype.split(b'\x00', 1)[1]
    49+            assert(len(msgtype) > 0)
    50+            assert(msgtype.decode().isprintable())
    51+            assert(len(remainder) == 0 or not remainder.decode().isprintable())
    


    MarcoFalke commented at 10:43 am on January 22, 2021:
    That seems fragile. What if the payload happens to be printable?

    troygiorshev commented at 7:32 pm on January 23, 2021:

    Ah, remainder is actually just the remainder of the 12 msgtype bytes after the first null. In most cases it’s just a few null bytes. I added this in response to a suggestion to check that after the first null byte in the msgtype, all further bytes are null.

    Effectively these few lines should do the same job as CMessageHeader::IsCommandValid.


    MarcoFalke commented at 12:12 pm on February 2, 2021:
    should probably just check that all are zero
  192. MarcoFalke approved
  193. MarcoFalke commented at 10:46 am on January 22, 2021: member

    left some nits nothing blocking

    review ACK a17b2509e0 🕵

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3review ACK a17b2509e0 🕵
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUguwgv9FOD3U5WBoyc0YEgJF3o3jYu8yF+c5Pv2XB6gF2p6s1eLQptvp50IGsSO
     8Tsg05XUEh9pnNixYKjtFzGonkGSZpSe2ZLBBDtgj4zuW5cphvuZlLaZjQSYmY472
     9AOjYh34hHzSE6SLaXkQyXLOVgRR9RPlaIsWZT5FG1i25MPLWxFm70asST2gq19yT
    10TmXXM8nq/Jm0MGmULL4PQ5UGphQgXnyYkuyE7+eEfIKy8T+QiWEumHEwfH0l/h9m
    11wJFehDM0iXy/SfDWejKnLmHpMAavgUErQoLIyKmRjAbYtSF9yXRQsQXrl53UqlBn
    12HqqSrlVEsm5R6F6r6/XFeHtaXZbkRx7MAAo3oM3g6YWrGshP+JPQQqwA5SeOwWg7
    13M2qlo+uyjUiULWgCJ+wJzgfi4bdoT2rgedXp8oG4hdoSmJGYcHtZKmBQ0rQxWhzh
    14bC/5V9IUDuZrxKf2I0+/eHtaBFcZtYfueJeMsSLk1MG+9+osiR3GGtfoe8RJVSFd
    15vTI+2+Uy
    16=aY6T
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash 7871da5ba1b4e6ad2ab0e950c9a698549aff909de9029ac589bdbf635a33cb83 -

  194. MarcoFalke commented at 10:53 am on January 22, 2021: member

    ACK! I think a directory structure of <IP_port>//[send|recv].dev would work very nicely.

    I think that could be confusing, because the same IP_port can have different peerid during the same connection (reconnect), but also can have the same peerid across restarts. So this make it harder to get the msgs of just one session.

  195. MarcoFalke commented at 10:58 am on January 22, 2021: member
    There has been some feedback, but at the very least you need to fix the linter for this to be merged.
  196. jnewbery commented at 11:02 am on January 22, 2021: member

    I think that could be confusing, because the same IP_port can have different peerid during the same connection (reconnect), but also can have the same peerid across restarts. So this make it harder to get the msgs of just one session.

    Ah, good point. I’d missed sipa’s point about using a (our own node startup time, peer id) pair to handle restarts.

  197. practicalswift commented at 4:00 pm on January 22, 2021: contributor

    @sipa

    Concept ACK. I really like the approach of having separate files per peer/connection. Slightly less sure about separating send/receive, but seems ok.

    Agreed: I also have a slight preference towards combining send and recv in the same file for these type of captures.

    For “request-response” analysis it is really nice to be able to have the messages in correct order without having to correlate timestamps from two different files to produce an “ordered” file covering the entire flow.

    Personally I think this ease-of-use benefit in the exceeds the cost (one extra byte to indicate direction, and having to skip one direction in the case of one direction analysis).

    I wonder if we shouldn’t go further actually. Having one file (or two) = one session would be pretty nice for certain analyses. That could be accomplished by adding e.g. (our own node startup time, peer id) or (connection start time) in the filenames (or directory names).

    Strong Concept ACK on this idea as well.

    In the use cases I can think of the relevant unit of analysis is more likely to be “one specific session (with one specific IP)” than “all session with one specific IP”.

    And the latter use cases can easily be covered by the user by simply concatenating multiple files of the “one specific session” type :)

  198. sipa commented at 7:09 pm on January 22, 2021: member

    @jnewbery

    The benefit here is that we save one bit (really one byte since there’s nowhere to stuff it in the other metadata fields) per message because we don’t need a flag to indicate send/recv.

    The direction could be stored in the top bit of the size field. The max protocol message size is 4000000 bytes, so even 3 bytes would be enough if we really wanted to. A variable-length encoding of command name (1 length byte + that many bytes for the command name itself e.g.) would also save several bytes on average per message.

    I don’t think dump size is that much of a priority that it warrants lots of extra effort, but I also don’t think the size argument on itself is a good reason for splitting the files based on direction.

    As well as that, there are probably at least some tools/analyses that are only interested in traffic in one direction. Splitting the files means that those tools would not have to read/filter/seek over a bunch of messages that they’re not interested in.

    Maybe, but I expect those to also be interested in just one or a few commands, and they’ll still need to read/filter/seek over the ones they don’t care about. On the other hand, anything analyzing query/response like things will need to look in both files too.

    Especially if we’d go in a direction of separating separate sessions to the same ip/port, I think there is an elegance to having exactly 1 file = 1 session.

    Another idea: would it make sense to store the version information of the dumping node (CLIENT_NAME, CLIENT_VERSION, perhaps also connection type)? Either in a header at the beginning of the file, or in the filename. That would make it easy to do analyses to compare behavior of different client versions.

    What about a directory structure of <CLIENT_NAME>_<CLIENT_VERSION>/<ip>_<port>/<timestamp>_<peerid>.dat?

  199. jnewbery commented at 7:28 pm on January 22, 2021: member

    Especially if we’d go in a direction of separating separate sessions to the same ip/port, I think there is an elegance to having exactly 1 file = 1 session.

    I agree that this seems very nice. I don’t have any objection to changing the output to be this way.

    What about a directory structure of <CLIENT_NAME>_<CLIENT_VERSION>/<ip>_<port>/<timestamp>_<peerid>.dat?

    I’m less convinced of this. For one, those details can easily be fished out of the received version message in the file. I also like the directory structure of one directory per peer in .bitcoin/message_capture (and one file per session in each of those directories). But again, if other people prefer that I won’t strongly oppose it.

  200. sipa commented at 7:33 pm on January 22, 2021: member

    I’m less convinced of this.

    Yes, just a suggestion to hear what people think.

    For one, those details can easily be fished out of the received version message in the file.

    Note that I’m talking about the version of the dumping node, not of the peer we’re connecting to (that may also be interesting of course, but is a bit harder as it means you only know the filename after receiving VERSION).

    I also like the directory structure of one directory per peer in .bitcoin/message_capture

    If the intent is really “one directory per peer”, the port perhaps shouldn’t be included (or at least not for incoming connections), as the outgoing TCP port is newly chosen for every connection.

  201. troygiorshev commented at 9:50 pm on January 22, 2021: contributor

    why not capture messages after CNode::ReceiveMsgBytes?

    This has been brought up a couple times now, and it’s a great question, so it’s worth a full response (that won’t be buried in resolved conversations).

    My priority in this PR is that the capture of messages always occurs in order from an application layer perspective. Because of this, I have to capture messages in the MessageHandler thread.

    If I were to instead capture incoming messages in the SocketHandler thread, say, right after ReceiveMsgBytes, the following could possibly occur:

    • receive and capture message X
    • receive and capture message Y
    • send and capture message A

    whereas on the processing side (since X and Y are in in the vRecvMsg queue before being processed), the node actually did the following:

    • process message X
    • respond with message A
    • process message Y

    So the order would be broken. Remember, this PR isn’t just a fun replacement for wireshark. The point is that someone improving/debugging/learning bitcoin’s p2p protocol can see what’s actually going on from the node’s perspective, without having to worry about how the timing of the queues is complicating things.

  202. sipa commented at 10:16 pm on January 22, 2021: member
    @troygiorshev That’s a great point. I was thinking that the difference between net-level and net_processing-level ordering would only be different interleaving across peers, but you’re right to point out it even obscures the processing within one peer. I agree it’s valuable to dump in processing order.
  203. in src/net.cpp:2942 in a17b2509e0 outdated
    2931@@ -2933,3 +2932,27 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2932 
    2933     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2934 }
    2935+
    2936+void CaptureMessage(const CAddress& addr, const std::string& msg_type, const Span<const unsigned char>& data, bool is_incoming)
    2937+{
    2938+    auto now = GetTime<std::chrono::microseconds>();
    


    MarcoFalke commented at 6:17 am on January 23, 2021:

    nit: Maybe add a small comment here that this function captures the message at process time, not socket receive/send time. Nothing fancy, just a oneline comment to sum up #19509 (comment)

    Not only to document the expected behavior, but also to avoid people to “optimize” away this call to GetTime with the rationale that it doesn’t matter and the message receive time can be used. See also #19509 (review)

  204. troygiorshev commented at 7:50 pm on January 23, 2021: contributor

    is really nice to be able to have the messages in correct order without having to correlate timestamps from two different files to produce an “ordered” file covering the entire flow.

    This is already done for you by the parser, just give it both files and the messages will be interleaved automatically in the output file. If I’m understanding you correctly, the following should do what you’re looking for:

    ./message-capture-parser.py <folder>/*.dat -o out.json

    That said…

    I think there is an elegance to having exactly 1 file = 1 session.

    I agree completely, I’ll figure this out!

  205. Add CaptureMessage
    This commit adds the CaptureMessage function.  This will later be called
    when any message is sent or received.  The capture directory is fixed,
    in a new folder "message_capture" in the datadir.  Peers will then have
    their own subfolders, named with their IP address and port, replacing
    colons with underscores to keep compatibility with Windows.  Inside,
    received and sent messages will be captured into two binary files,
    msgs_recv.dat and msgs_sent.dat.
    
    e.g.
    message_capture/203.0.113.7_56072/msgs_recv.dat
    message_capture/203.0.113.7_56072/msgs_sent.dat
    
    The format has been designed as to result in a minimal performance
    impact.  A parsing script is added in a later commit.
    f2a77ff97b
  206. Call CaptureMessage at appropriate locations
    These calls are toggled by a debug-only "capturemessages" flag.  Default
    disabled.
    4d1a582549
  207. Add capture parser
    This commit adds contrib/message-capture/message-capture-parser.py, a python
    script to be used alongside -capturemessages to parse the captured
    messages.
    
    It is complete with arguments and will parse any file given, sorting the
    messages in the files when creating the output.  If an output file is
    specified with -o or --output, it will dump the messages in json format
    to that file, otherwise it will print to stdout.
    
    The small change to the unused msg_generic is to bring it in line with
    the other message classes, purely to avoid a bug in the future.
    e4f378a505
  208. Add Message Capture Test
    Add a functional test for CaptureMessage.  This connects and then
    disconnects a peer so that the handshake can be used to check if capture
    is being done correctly.
    
    Included in a docstring in the test is the following:
    
    From the data file we'll only check the structure.
    
    We won't care about things like:
    - Deserializing the payload of the message
        - This is managed by the deserialize methods in
          test_framework.messages
    - The order of the messages
        - There's no reason why we can't, say, change the order of the
          messages in the handshake
    - Message Type
        - We can add new message types
    
    We're ignoring these because they're simply too brittle to test here.
    381f77be85
  209. Add documentation to contrib folder
    This commit adds brief documentation for this feature.  Included in the
    justification is the purpose of this feature as well as usage and
    functionality tips.
    bff7c66e67
  210. troygiorshev force-pushed on Jan 23, 2021
  211. troygiorshev commented at 9:17 pm on January 23, 2021: contributor

    git range-diff master a17b250 bff7c66

    • Message capture now more gracefully handles unrecognized and corrupted messages
    • Corrected typo in docs
    • Added comment to clarify how timestamps are being used here
    • A couple small improvements to the test

    Thanks so much for the review everyone!

    Up soon: 1 file = 1 session

  212. MarcoFalke commented at 11:14 am on January 24, 2021: member

    re-ACK bff7c66e67aa2f18ef70139338643656a54444fe only some minor changes: 👚

    • doc fixups
    • append messages in the python parser even when the msg type is unknown or something can not be decoded

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3re-ACK bff7c66e67aa2f18ef70139338643656a54444fe only some minor changes: 👚
     4
     5* doc fixups
     6* append messages in the python parser even when the msg type is unknown or something can not be decoded
     7-----BEGIN PGP SIGNATURE-----
     8
     9iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
    10pUhNwgwAsIB9TNhIdWgBLKYoNkJpnwXmAPbL/OAR3t+csB3wjowlJQMz9UijiMPN
    11H/EzFMGiWMQoDgfxBMewFNXtBetGBzdQHHO9kGk7S0uulqj4S6Tg1WAWrfSOPhbI
    12cSDuE8XSsPXq1qEpVS5NiVjsB5IJiVyER7LqSEnMp/iemdpgCIWfvxiJzaJNXbkj
    13ijCLQ3+2D7TX/e+N8iyHyIBnC/NbTGp8fzim7AoRf6O42NFWy7CVmM2aSu5qpW5B
    147nxTuWff66xBAnGwCEOLgfbm3G96ZgBqXGrBXCfL5GNIVTqNTDblX08cfoyMLqdv
    15InGH+Qdz1ObG7tqY3TRq88U8Jgl+qm1N4Yi63lg2LISAL7r/qwAeQcyk/JiEsN8g
    168cCVCLkTd3BPKsHb4O+ItsfLCg4xkqSPogLML+FjNj1ppDKxo/rPPrdV7Ss04lbo
    175f4UbCSI9WMLMS+v2ZYSASW/4CXVXKF/bgAjzxv3VYK90+ITKnIoQcBjOZv5myGc
    18Zkcqkx7K
    19=FpGK
    20-----END PGP SIGNATURE-----
    

    Timestamp of file with hash 1aec4b471c3da64520ce75cb6750abae42d383f6093d1a0c1987e5b405b08997 -

  213. michaelfolkson commented at 2:01 pm on January 24, 2021: contributor
    Still need to do a lot of playing around with this. Interested if you have given any thought @troygiorshev on what questions this tool can easily answer (without additional digging) and what is possible for future improvements/future PRs. Don’t want to derail review of this PR though so set up a StackExchange question.
  214. theStack approved
  215. theStack commented at 6:08 pm on January 27, 2021: contributor

    re-ACK bff7c66e67aa2f18ef70139338643656a54444fe Verified that the error handling code that was introduced since my last ACK works, by corrupting a message-type in the middle of a random capture file (with a hex editor) and diffing the json outputs of the original and corrupted captures:

    s/sendcmpct/sendcmpcx/

    053,57c53,55
    1<     "msgtype": "sendcmpct",
    2<     "body": {
    3<       "announce": false,
    4<       "version": 2
    5<     }
    6---
    7>     "msgtype": "sendcmpcx",
    8>     "body": "000200000000000000",
    9>     "error": "Unrecognized message type."
    

    s/sendcmpct/<random_unprintable_gibberish>

    0<     "msgtype": "sendcmpct",
    1<     "body": {
    2<       "announce": false,
    3<       "version": 2
    4<     }
    5---
    6>     "msgtype": "UNREADABLE",
    7>     "body": "000200000000000000",
    8>     "error": "Unrecognized message type."
    
  216. MarcoFalke commented at 10:18 am on February 2, 2021: member
    The nit here can be addressed, once you have to rebase: #19509 (review)
  217. jnewbery commented at 11:35 am on February 2, 2021: member

    Up soon: 1 file = 1 session @troygiorshev Are you still planning to do this?

  218. jnewbery commented at 12:01 pm on February 2, 2021: member

    utACK bff7c66e67aa2f18ef70139338643656a54444fe

    This isn’t part of the public interface, so the file layout can be changed in a follow-up.

  219. MarcoFalke merged this on Feb 2, 2021
  220. MarcoFalke closed this on Feb 2, 2021

  221. sidhujag referenced this in commit 7b0f33697b on Feb 3, 2021
  222. in test/functional/p2p_message_capture.py:45 in bff7c66e67
    40+        while True:
    41+            tmp_header_raw = f_in.read(TIME_SIZE + LENGTH_SIZE + MSGTYPE_SIZE)
    42+            if not tmp_header_raw:
    43+                break
    44+            tmp_header = BytesIO(tmp_header_raw)
    45+            int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    


    flack commented at 4:26 pm on February 6, 2021:

    Not sure if this is the right place to ask, but when I’m running vulture against this file, it complains:

    0bitcoin/test/functional/p2p_message_capture.py:45: invalid syntax at "int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int"
    

    Is that a bug in vulture or a problem in the code? If I assign the return value of from_bytes to a variable, it passes


    brunoerg commented at 1:19 am on February 7, 2021:
    I think it is a problem in the code. The return value of from_bytes is not being used.

    jnewbery commented at 7:25 pm on February 7, 2021:

    I guess the fix is:

    0-            int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
    1+            tmp_header.read(TIME_SIZE)  # skip over the timestamp field
    

    MarcoFalke commented at 7:17 am on February 8, 2021:
  223. leonardojobim commented at 7:21 am on February 10, 2021: none

    I ran the message-capture-parser.py and received the following error:

     0Error:
     1Traceback (most recent call last):
     2  File "./message-capture-parser.py", line 214, in <module>
     3    main()
     4  File "./message-capture-parser.py", line 199, in main
     5    process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
     6  File "./message-capture-parser.py", line 159, in process_file
     7    msg_dict["body"] = to_jsonable(msg)
     8  File "./message-capture-parser.py", line 82, in to_jsonable
     9    elif slot in HASH_INT_VECTORS and isinstance(val[0], int):
    10IndexError: list index out of range
    

    So I changed the line 82 elif slot in HASH_INT_VECTORS and isinstance(val[0], int): to elif isinstance(val, list) and len(val) > 0 and slot in HASH_INT_VECTORS and isinstance(val[0], int):

    The script worked after this change.

    The reason of the error is the val variable had the value [] and when accessing val[0] the script crashed. The error happened with following values for the msgtype and for the type of the obj variable (in to_jsonable() function):

    0msgtype: 'tx'       - obj type: <class 'test_framework.messages.CTxWitness'> 
    1msgtype: 'headers'  - obj type: <class 'test_framework.messages.msg_headers'>,
    2msgtype: 'blocktxn' - obj type: <class 'test_framework.messages.CTxWitness'> 
    3msgtype: 'block'    - obj type: <class 'test_framework.messages.CTxWitness'>
    

    I tested it on Ubuntu 20.04 on VMWare. The command used was ./message-capture-parser.py -o out2.json ~/.bitcoin/message_capture/**/*.dat

  224. MarcoFalke referenced this in commit 69f7f50aa5 on Feb 17, 2021
  225. jnewbery commented at 10:42 am on February 18, 2021: member

    Nice find @leonardojobim! You could also test for the list’s non-emptiness by checking its booleanness, which is maybe more pythonic:

    0elif isinstance(val, list) and val and slot in HASH_INT_VECTORS and isinstance(val[0], int):
    

    Do you mind opening a PR for this?

  226. Fabcien referenced this in commit cfcf6bb92f on Nov 18, 2022
  227. bitcoin locked this on Jul 11, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 13:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me