Don’t zero-after-free DataStream: Faster IBD on some configurations #30987

pull davidgumberg wants to merge 3 commits into bitcoin:master from davidgumberg:zero_after_free_allocator_change changing 4 files +17 −62
  1. davidgumberg commented at 10:40 pm on September 26, 2024: contributor

    This PR modifies DataStream’s byte-vector vch to use the default allocator std::allocator rather than the zero_after_free_allocator which degrades performance greatly. The zero_after_free_allocator is identical to the default std::allocator except that it zeroes memory using memory_cleanse() before deallocating.

    This PR also drops the zero_after_free_allocator, since this was only used by DataStream and SerializeData.

    In my testing (n=2) on a Raspberry Pi 5 with 4GB of memory, syncing from a fast connection to a stable dedicated node, my branch takes ~74% of the time taken by master1 to sync to height 815,000; average wall clock time was 35h 58m 40s on this branch and 48h 17m 15s on master. (See the benchmarking appendix)

    I expect most of the performance improvement to come from the use of DataStream for all CDBWrapper keys and values, and for all P2P messages. I suspect there are other use cases where performance is improved, but I have only tested IBD.

    Any objects that contains secrets should not be allocated using zero_after_free_allocator since they are liable to get mapped to swap space and written to disk if the user is running low on memory, and I intuit this is a likelier path than scanning unzero’d memory for an attacker to find cryptographic secrets. Secrets should be allocated using secure_allocator which cleanses on deallocation and mlock()s the memory reserved for secrets to prevent it from being mapped to swap space.

    Are any secrets stored in DataStream that will lose security?

    I have reviewed every appearance of DataStream and SerializeData as of 39219fe and have made notes in the appendix below with notes that provide context for each instance where either is used.

    The only use case that I wasn’t certain of is PSBT’s, I believe these are never secrets, but I am not certain if there are use cases where PSBT’s are worthy of being treated as secrets, and being vigilant about not writing them to disk is wise.

    As I understand, most of the use of DataStream in the wallet code is for the reading and writing of “crypted” key and value data, and they get decrypted somewhere else in a ScriptPubKeyMan far away from any DataStream container, but I could also be wrong about this, or have misunderstood its use elsewhere in the wallet.

    Zero-after-free as a buffer overflow mitigation

    The zero_after_free allocator was added as a buffer overflow mitigation, the idea being that DataStream’s store a lot of unsecured data that we don’t control like the UTXO set and all P2P messages, and an attacker could fill memory in a predictable way to escalate a buffer overflow into an RCE. (See Historical Background appendix).

    I agree completely with practicing security in depth, but I don’t think this mitigation is worth the performance hit because:

    1. Aren’t there still an abundance of other opportunities for an attacker to fill memory that never gets deallocated?
    2. Doesn’t ASLR mostly mitigate this issue and don’t most devices have some form of ASLR?

    I’m not a security expert and I had a hard time finding any writing anywhere that discusses this particular mitigation strategy of zeroing memory, so I hope someone with more knowledge of memory vulnerabilities can assist.


    Other notes

    • I opted to leave SerializeData as std::vector<std::byte> instead of deleting it and refactoring in the spots where it’s used in the wallet to keep the PR small, if others think it would be better to delete it I would be happy to do it.
    • I have a feeling that it’s not just that we’re memsetting everything to 0 in memory_cleanse that is causing the performance issue, but the trick we do to prevent compilers from optimizing out the memset call is also preventing other optimizations on the DataStream’s, but I have yet to test this.
    • I also make a small change to a unit test where boost mysteriously fails to find a left shift-operator for SerializeData once it loses its custom allocator.

    1. Master at the time of my testing was: 6d546336e800 ↩︎

  2. test: avoid BOOST_CHECK_EQUAL for complex types
    Avoid using BOOST_CHECK_EQUAL_COLLECTIONS for two
    std::vector<std::pair<SerializeData, SerializeData>> since boost expects
    printing methods that are very brittle. See:
    
    * https://www.boost.org/doc/libs/1_86_0/libs/test/doc/html/boost_test/test_output/test_tools_support_for_logging/testing_tool_output_disable.html
    * https://stackoverflow.com/questions/10976130/boost-check-equal-with-pairint-int-and-custom-operator
    * https://stackoverflow.com/questions/3999644/how-to-compare-vectors-with-boost-test
    * https://github.com/boostorg/type_traits/issues/196
    0449a22bc0
  3. DrahtBot commented at 10:40 pm on September 26, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31038 (test: Fix copy-paste in wallet/test/db_tests ostream operator by hodlinator)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. DrahtBot added the label CI failed on Sep 26, 2024
  5. DrahtBot commented at 11:56 pm on September 26, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/30733746913

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. util: Stop zero-before-freeing all serialized data
    `SerializeData` is used by `DataStream` which is used throughout the
    codebase for non-secret data. Originally introduced as a mitigation for
    buffer-overflows. The potential mitigation is not worth the performance
    cost. This slows down IBD by as much as ~25%.
    0d9be3f630
  7. refactor: Drop unused `zero_after_free_allocator` 5cf2fefd33
  8. davidgumberg force-pushed on Sep 27, 2024
  9. davidgumberg commented at 3:00 pm on September 27, 2024: contributor

    Appendix

    Benchmarks

    Command being timed:

    0./src/bitcoind -daemon=0 -connect=amd-ryzen-7900x-node:8333 -stopatheight=815000 -port=8444 -rpcport=8445 -dbcache=2048 -prune=550 -debug=bench -debug=blockstorage -debug=coindb -debug=mempool -debug=prune"
    

    I applied my branch on 6d546336e800, which is “master” in the data below.

    Average master time (hh:mm:ss): 48:17:15 (173835s) Average branch time (hh:mm:ss): 35:58:40 (129520s)

    ~25% reduction in IBD time on a raspberry Pi 5 with a DB cache of 2GB.

    Master run 1

    Wall clock time (hh:mm:ss): 49:38:31 (178711s)

     0Bitcoin Core version v27.99.0-6d546336e800 (release build)
     1- Connect block: 158290.53s (620.94ms/blk)
     2    - Sanity checks: 10.89s (0.01ms/blk)
     3    - Fork checks: 151.82s (0.02ms/blk)
     4    - Verify 7077 txins: 135057.68s (165.71ms/blk)
     5      - Connect 1760 transactions: 134786.36s (165.38ms/blk)
     6    - Write undo data: 2681.34s (7.38ms/blk)
     7    - Index writing: 52.76s (0.03ms/blk)
     8  - Connect total: 138100.75s (611.27ms/blk)
     9  - Flush: 3933.29s (8.97ms/blk)
    10  - Writing chainstate: 15814.36s (0.14ms/blk)
    11  - Connect postprocess: 273.39s (0.52ms/blk)
    

    Master run 2

    Wall clock time (hh:mm:ss): 46:55:58 (168958s)

     0Bitcoin Core version v27.99.0-6d546336e800 (release build)
     1- Connect block: 145449.95s (940.78ms/blk)
     2    - Sanity checks: 10.69s (0.01ms/blk)
     3    - Fork checks: 155.81s (0.02ms/blk)
     4    - Verify 7077 txins: 115935.55s (142.25ms/blk)
     5      - Connect 1760 transactions: 115481.15s (141.69ms/blk)
     6    - Write undo data: 2561.36s (9.05ms/blk)
     7    - Index writing: 73.63s (0.04ms/blk)
     8  - Connect total: 118877.56s (929.93ms/blk)
     9  - Flush: 3864.34s (10.11ms/blk)
    10  - Writing chainstate: 22294.82s (0.14ms/blk)
    11  - Connect postprocess: 267.68s (0.56ms/blk)
    

    Branch run 1

    Wall clock time (hh:mm:ss): 34:28:56 (124136s)

     0Bitcoin Core version v27.99.0-a0dddf8b4092 (release build)
     1- Connect block: 107134.59s (1017.01ms/blk)
     2    - Sanity checks: 11.01s (0.01ms/blk)
     3    - Fork checks: 150.93s (0.03ms/blk)
     4    - Verify 7077 txins: 87446.53s (107.30ms/blk)
     5      - Connect 1760 transactions: 87329.99s (107.15ms/blk)
     6    - Write undo data: 2495.47s (7.36ms/blk)
     7    - Index writing: 37.95s (0.04ms/blk)
     8  - Connect total: 90318.60s (1006.42ms/blk)
     9  - Flush: 3917.28s (9.92ms/blk)
    10  - Writing chainstate: 12560.43s (0.15ms/blk)
    11  - Connect postprocess: 259.89s (0.47ms/blk)
    

    Branch run 2

    Wall clock time (hh:mm:ss): 37:28:24 (134904s)

     0Bitcoin Core version v27.99.0-a0dddf8b4092 (release build)
     1- Connect block: 117991.55s (144.77ms/blk)
     2  - Connect total: 101298.20s (124.29ms/blk)
     3    - Sanity checks: 11.17s (0.01ms/blk)
     4    - Fork checks: 151.24s (0.19ms/blk)
     5    - Verify 7077 txins: 98446.38s (120.79ms/blk)
     6      - Connect 1760 transactions: 98339.79s (120.66ms/blk)
     7    - Write undo data: 2484.75s (3.05ms/blk)
     8    - Index writing: 36.62s (0.04ms/blk)
     9  - Flush: 3892.28s (4.78ms/blk)
    10  - Writing chainstate: 12446.33s (15.27ms/blk)
    11  - Connect postprocess: 259.11s (0.32ms/blk)
    

    Historical background

    At some point prior to the oldest git commit for the repo, an allocator secure_allocator was added that zeroes out memory on deallocation with memset(), and was used as the allocator for the vector vch in CDataStream (now DataStream).

    In July 2011, PR #352 adding support for encrypted wallets modified secure_allocator to also mlock() data on allocation to prevent the wallet passphrase or other secrets from being paged to swap space (written to disk).

    In January 2012, findings were shared (https://bitcointalk.org/index.php?topic=56491.0) that #352 modifying CDataStream’s allocator slowed down IBD substantially1, since CDataStream was used in many places that did not need the guarantees of mlock(), and since every call to mlock() results in a flush of the TLB (a cache that maps virtual memory to physical memory).

    PR #740 was opened to fix this, initially2 by removing the custom allocator secure_allocator from CDataStream’s vector_type:

    0 class CDataStream
    1 {
    2 protected:
    3-    typedef std::vector<char, secure_allocator<char> > vector_type;
    4+    typedef std::vector<char> vector_type;
    5     vector_type vch;
    

    A reviewer of #740 suggested that dropping mlock() was a good idea, but that the original behavior of zeroing-after-freeing (should it be zeroing-before-freeing?) CDataStream should be restored as a mitigation for buffer overflows:

    I love the performance improvement, but I still don’t like the elimination of zero-after-free. Security in depth is important.

    Here’s the danger:

    Attacker finds a remotely-exploitable buffer overrun somewhere in the networking code that crashes the process. They turn the crash into a full remote exploit by sending carefully constructed packets before the crash packet, to initialize used-but-then-freed memory to a known state.

    Unlikely? Sure.

    Is it ugly to define a zero_after_free_allocator for CDataStream? Sure. (simplest implementation: copy secure_allocator, remove the mlock/munlock calls).

    But given that CDataStream is the primary interface between bitcoin and the network, I think being extra paranoid here is a very good idea.

    Another reviewer benchmarked CDataStream with an allocator that zeroed memory using memset without mlocking it and found that performance was almost identical to the default allocator, while both were substantially faster than the mlocking variant of CDataStream. (https://web.archive.org/web/20130622160044/https://people.xiph.org/~greg/bitcoin-sync.png).

    Based on the benchmark, and the potential security benefit, the zero_after_free allocator was created and used as CDataStream’s allocator.

    In November 2012, PR #1992 was opened to address the fact that in many cases memset() calls are optimized away by compilers as part of a family of compiler optimizations called dead store elimination by replacing the memset call with openssl’s OPENSSL_cleanse which is meant to solve this problem by doing things that spook compilers into not wanting to optimize the memset. Given that all of the data being zero’ed out in the deallocator is also having it’s only pointer destroyed, these memset calls were candidates for being optimized.

    I suspect that the reason no performance regression was found in the benchmarking of #740 which introduced the zero_after_free allocator is that the memset calls were being optimized out.

    I am not the first to suggest that this is a performance issue:

    https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2015-11-06#1446837840-1446854100;

    https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2016-11-23#1479883620-1479882900;

    Or to write a patch changing it:

    https://github.com/bitcoin/bitcoin/commit/671c724716abdd69b9d253a01f8fec67a37ab7d7

    All uses of DataStream and SerializeData

    ( ⚠️ when opening: very long)

    I performed this review on commit 39219fe145e5e6e6f079b591e3f4b5fea8e71804

    I look, briefly, at every single use of DataStream outside of test code, to see whether or not it contains secret information that should be zeroed out, or should be mlocked to prevent paging to swap.

    I’ve taken liberties to editorialize some of the codeblocks below for legibility, and all comments that have [] are my own.

    DataStream

    In src/addrdb.cpp+src/addrdb.h:

    0/** Only used by tests. */
    1void ReadFromStream(AddrMan& addr, DataStream& ssPeers);
    

    Only used by tests.


    In src/addrman.cpp Addrman::Serialize(DataStream&) & Unserialize(DataStream&), are explicitly instantiated, these are used in SerializeFileDB and DeserializeDB which are used to serialize (DumpPeerAddresses) addrman to disk, and to deserialize addrman from disk (LoadAddrman).

    The most valuable secret seems to be addrman’s nKey used to determine the address buckets randomly.


    In src/blockencodings.cpp:

    0void CBlockHeaderAndShortTxIDs::FillShortTxIDSelector() const {
    1    DataStream stream{};
    2    stream << header << nonce;
    3    CSHA256 hasher;
    4    hasher.Write((unsigned char*)&(*stream.begin()), stream.end() - stream.begin());
    5    uint256 shorttxidhash;
    6    hasher.Finalize(shorttxidhash.begin());
    7    shorttxidk0 = shorttxidhash.GetUint64(0);
    8    shorttxidk1 = shorttxidhash.GetUint64(1);
    9}
    

    Here we are just using the DataStream to be able to Serialize the block header and nonce into a string of bytes that get hashed to make short id k0 and k1 for BIP 152.

    This gets invoked when we construct a CBlockHeaderandShortTxIDs for an INV of type MSG_CMPCT_BLOCK in PeerManagerImpl::SendMessage().


    In src/common/blooms.cpp:

    DataStream is used to deserialize outpoints into our bloom filter, these are not secrets in any way:

    0void CBloomFilter::insert(const COutPoint& outpoint)
    1{
    2    DataStream stream{};
    3    stream << outpoint;
    4    insert(MakeUCharSpan(stream));
    5}
    

    In src/core_read.cpp:

    DataStream is used in DecodeTx for serialization/deserialization of the transaction data, used afaict only in RPC’s for deserializing user arguments into CMutableTransaction’s.

    It’s used in DecodeHexBlockHeader()which deserializes a block header argument into a CBlockHeader for the submitheader rpc.

    Similar for DecodeHexBlk() used by the getblocktemplate and submitblock rpc’s.


    In src/core_write.cpp:

    0void CBloomFilter::insert(const COutPoint& outpoint)
    1{
    2    DataStream stream{};
    3    stream << outpoint;
    4    insert(MakeUCharSpan(stream));
    5}
    

    EncodeHexTx is only used in RPC’s, and transaction data does not contain secrets.


    In dbwrapper.h and dbwrapper.cpp it is used exclusively to serialize and deserialize coinsdb keys and values, none of which is secret.


    In src/external_signer:

    0bool ExternalSigner::SignTransaction(PartiallySignedTransaction& psbtx, std::string& error)
    1{
    2    // Serialize the PSBT
    3    DataStream ssTx{};
    4    ssTx << psbtx;
    

    I don’t think this is a secret, but I don’t know enough about PSBT’s to be sure.


    There is some scaffolding for being able to transmit serializable stuff over the IPC wire in src/capnp/common-types.h, I assume this depends on how it’s used, nothing essentially secret.


    In src/kernel/coinstats.cpp:

    0void ApplyCoinHash(MuHash3072& muhash, const COutPoint& outpoint, const Coin& coin)
    1{
    2    DataStream ss{};
    3    TxOutSer(ss, outpoint, coin);
    4    muhash.Insert(MakeUCharSpan(ss));
    5}
    

    Here it’s used for serializing oupoints and coins for creating the AssumeUTXO assumed utxo set hash, nothing secret.


    In src/net.cpp:

    In ConvertSeeds() serialized seeds get converted into usable address objects, we initialize a DataStream with the input seeds that we are going to try connecting to during node bootstrapping.

     0//! Convert the serialized seeds into usable address objects.
     1static std::vector<CAddress> ConvertSeeds(const std::vector<uint8_t> &vSeedsIn)
     2{
     3    // It'll only connect to one or two seed nodes because once it connects,
     4    // it'll get a pile of addresses with newer timestamps.
     5    // Seed nodes are given a random 'last seen time' of between one and two
     6    // weeks ago.
     7    const auto one_week{7 * 24h};
     8    std::vector<CAddress> vSeedsOut;
     9    FastRandomContext rng;
    10    ParamsStream s{DataStream{vSeedsIn}, CAddress::V2_NETWORK};
    11    while (!s.eof()) {
    12        CService endpoint;
    13        s >> endpoint;
    14        CAddress addr{endpoint, SeedsServiceFlags()};
    15        addr.nTime = rng.rand_uniform_delay(Now<NodeSeconds>() - one_week, -one_week);
    16        LogDebug(BCLog::NET, "Added hardcoded seed: %s\n", addr.ToStringAddrPort());
    17        vSeedsOut.push_back(addr);
    18    }
    19    return vSeedsOut;
    20}
    

    It is also used for creating an empty CNetMessage which has a DataStream member in CNetMessage V2Transport::GetReceivedMessage():

     0//! Convert the serialized seeds into usable address objects.
     1static std::vector<CAddress> ConvertSeeds(const std::vector<uint8_t> &vSeedsIn)
     2{
     3    // It'll only connect to one or two seed nodes because once it connects,
     4    // it'll get a pile of addresses with newer timestamps.
     5    // Seed nodes are given a random 'last seen time' of between one and two
     6    // weeks ago.
     7    const auto one_week{7 * 24h};
     8    std::vector<CAddress> vSeedsOut;
     9    FastRandomContext rng;
    10    ParamsStream s{DataStream{vSeedsIn}, CAddress::V2_NETWORK};
    11    while (!s.eof()) {
    12        CService endpoint;
    13        s >> endpoint;
    14        CAddress addr{endpoint, SeedsServiceFlags()};
    15        addr.nTime = rng.rand_uniform_delay(Now<NodeSeconds>() - one_week, -one_week);
    16        LogDebug(BCLog::NET, "Added hardcoded seed: %s\n", addr.ToStringAddrPort());
    17        vSeedsOut.push_back(addr);
    18    }
    19    return vSeedsOut;
    20}
    

    In net.h

    CNetMessage the universal p2p message container used a DataStream to store received message data.

     0/** Transport protocol agnostic message container.
     1 * Ideally it should only contain receive time, payload,
     2 * type and size.
     3 */
     4class CNetMessage
     5{
     6public:
     7    DataStream m_recv;                   //!< received message data
     8    std::chrono::microseconds m_time{0}; //!< time of message receipt
     9    uint32_t m_message_size{0};          //!< size of the payload
    10    uint32_t m_raw_message_size{0};      //!< used wire size of the message (including header/checksum)
    11    std::string m_type;
    12
    13    explicit CNetMessage(DataStream&& recv_in) : m_recv(std::move(recv_in)) {}
    14    // Only one CNetMessage object will exist for the same message on either
    15    // the receive or processing queue. For performance reasons we therefore
    16    // delete the copy constructor and assignment operator to avoid the
    17    // possibility of copying CNetMessage objects.
    18    CNetMessage(CNetMessage&&) = default;
    19    CNetMessage(const CNetMessage&) = delete;
    20    CNetMessage& operator=(CNetMessage&&) = default;
    21    CNetMessage& operator=(const CNetMessage&) = delete;
    22};
    

    It’s also used for the lower level handling of messages, including partially received header buffers and received socket data in V1Transport as in v2 transport above in net.cpp.

     0/** Transport protocol agnostic message container.
     1 * Ideally it should only contain receive time, payload,
     2 * type and size.
     3 */
     4class CNetMessage
     5{
     6public:
     7    DataStream m_recv;                   //!< received message data
     8    std::chrono::microseconds m_time{0}; //!< time of message receipt
     9    uint32_t m_message_size{0};          //!< size of the payload
    10    uint32_t m_raw_message_size{0};      //!< used wire size of the message (including header/checksum)
    11    std::string m_type;
    12
    13    explicit CNetMessage(DataStream&& recv_in) : m_recv(std::move(recv_in)) {}
    14    // Only one CNetMessage object will exist for the same message on either
    15    // the receive or processing queue. For performance reasons we therefore
    16    // delete the copy constructor and assignment operator to avoid the
    17    // possibility of copying CNetMessage objects.
    18    CNetMessage(CNetMessage&&) = default;
    19    CNetMessage(const CNetMessage&) = delete;
    20    CNetMessage& operator=(CNetMessage&&) = default;
    21    CNetMessage& operator=(const CNetMessage&) = delete;
    22};
    

    In src/net_processing.cpp it used for representing the received data when processing messages in the great PeerManagerImpl::ProcessMessage():

    0void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, DataStream& vRecv,
    1                                     const std::chrono::microseconds time_received,
    2                                     const std::atomic<bool>& interruptMsgProc)
    3{
    

    And for Processing BIP 157 cfilters:

     0/**
     1 * Handle a cfilters request.
     2 *
     3 * May disconnect from the peer in the case of a bad request.
     4 *
     5 * [@param](/bitcoin-bitcoin/contributor/param/)[in]   node            The node that we received the request from
     6 * [@param](/bitcoin-bitcoin/contributor/param/)[in]   peer            The peer that we received the request from
     7 * [@param](/bitcoin-bitcoin/contributor/param/)[in]   vRecv           The raw message received
     8 */
     9void PeerManagerImpl::ProcessGetCFilters(CNode& node, Peer& peer, DataStream& vRecv)
    10{
    11    uint8_t filter_type_ser;
    12    uint32_t start_height;
    13    uint256 stop_hash;
    14
    15    vRecv >> filter_type_ser >> start_height >> stop_hash;
    16
    17    const BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_ser);
    18
    19    const CBlockIndex* stop_index;
    20    BlockFilterIndex* filter_index;
    21    if (!PrepareBlockFilterRequest(node, peer, filter_type, start_height, stop_hash,
    22                                   MAX_GETCFILTERS_SIZE, stop_index, filter_index)) {
    23        return;
    24    }
    25
    26    std::vector<BlockFilter> filters;
    27    if (!filter_index->LookupFilterRange(start_height, stop_index, filters)) {
    28        LogDebug(BCLog::NET, "Failed to find block filter in index: filter_type=%s, start_height=%d, stop_hash=%s\n",
    29                     BlockFilterTypeName(filter_type), start_height, stop_hash.ToString());
    30        return;
    31    }
    32
    33    for (const auto& filter : filters) {
    34        MakeAndPushMessage(node, NetMsgType::CFILTER, filter);
    35    }
    

    and bip 157 cfheaders:

     0/**
     1 * Handle a cfheaders request.
     2 *
     3 * May disconnect from the peer in the case of a bad request.
     4 *
     5 * [@param](/bitcoin-bitcoin/contributor/param/)[in]   node            The node that we received the request from
     6 * [@param](/bitcoin-bitcoin/contributor/param/)[in]   peer            The peer that we received the request from
     7 * [@param](/bitcoin-bitcoin/contributor/param/)[in]   vRecv           The raw message received
     8 */
     9 void PeerManagerImpl::ProcessGetCFHeaders(CNode& node, Peer& peer, DataStream& vRecv)
    10{
    11    uint8_t filter_type_ser;
    12    uint32_t start_height;
    13    uint256 stop_hash;
    14
    15    vRecv >> filter_type_ser >> start_height >> stop_hash;
    16
    17    const BlockFilterType filter_type = static_cast<BlockFilterType>(filter_type_ser);
    18
    19    const CBlockIndex* stop_index;
    20    BlockFilterIndex* filter_index;
    21    if (!PrepareBlockFilterRequest(node, peer, filter_type, start_height, stop_hash,
    22                                   MAX_GETCFHEADERS_SIZE, stop_index, filter_index)) {
    23        return;
    24    }
    25
    26    uint256 prev_header;
    27    if (start_height > 0) {
    28        const CBlockIndex* const prev_block =
    29            stop_index->GetAncestor(static_cast<int>(start_height - 1));
    30        if (!filter_index->LookupFilterHeader(prev_block, prev_header)) {
    31            LogDebug(BCLog::NET, "Failed to find block filter header in index: filter_type=%s, block_hash=%s\n",
    32                         BlockFilterTypeName(filter_type), prev_block->GetBlockHash().ToString());
    33            return;
    34        }
    35    }
    36
    37    std::vector<uint256> filter_hashes;
    38    if (!filter_index->LookupFilterHashRange(start_height, stop_index, filter_hashes)) {
    39        LogDebug(BCLog::NET, "Failed to find block filter hashes in index: filter_type=%s, start_height=%d, stop_hash=%s\n",
    40                     BlockFilterTypeName(filter_type), start_height, stop_hash.ToString());
    41        return;
    42    }
    43
    44    MakeAndPushMessage(node, NetMsgType::CFHEADERS,
    45              filter_type_ser,
    46              stop_index->GetBlockHash(),
    47              prev_header,
    48              filter_hashes);
    49}
    

    In src/psbt.cpp:

     0bool DecodeRawPSBT(PartiallySignedTransaction& psbt, Span<const std::byte> tx_data, std::string& error)
     1{
     2    DataStream ss_data{tx_data};
     3    try {
     4        ss_data >> psbt;
     5        if (!ss_data.empty()) {
     6            error = "extra data after PSBT";
     7            return false;
     8        }
     9    } catch (const std::exception& e) {
    10        error = e.what();
    11        return false;
    12    }
    13    return true;
    14}
    

    It is used for deserializing hex data into a PartiallySignedTransaction object.


    In src/qt/psbtoperationsdialog.cpp:

    Bitcoin Qt interface for Copying psbt to clipboard:

    0void PSBTOperationsDialog::copyToClipboard() {
    1    DataStream ssTx{};
    2    ssTx << m_transaction_data;
    3    GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
    4    showStatus(tr("PSBT copied to clipboard."), StatusLevel::INFO);
    5}
    

    Saving PSBT to disk:

     0void PSBTOperationsDialog::saveTransaction() {
     1    DataStream ssTx{};
     2    ssTx << m_transaction_data;
     3
     4    QString selected_filter;
     5    QString filename_suggestion = "";
     6    bool first = true;
     7    for (const CTxOut& out : m_transaction_data.tx->vout) {
     8        if (!first) {
     9            filename_suggestion.append("-");
    10        }
    11        CTxDestination address;
    12        ExtractDestination(out.scriptPubKey, address);
    13        QString amount = BitcoinUnits::format(m_client_model->getOptionsModel()->getDisplayUnit(), out.nValue);
    14        QString address_str = QString::fromStdString(EncodeDestination(address));
    15        filename_suggestion.append(address_str + "-" + amount);
    16        first = false;
    17    }
    18    filename_suggestion.append(".psbt");
    19    QString filename = GUIUtil::getSaveFileName(this,
    20        tr("Save Transaction Data"), filename_suggestion,
    21        //: Expanded name of the binary PSBT file format. See: BIP 174.
    22        tr("Partially Signed Transaction (Binary)") + QLatin1String(" (*.psbt)"), &selected_filter);
    23    if (filename.isEmpty()) {
    24        return;
    25    }
    26    std::ofstream out{filename.toLocal8Bit().data(), std::ofstream::out | std::ofstream::binary};
    27    out << ssTx.str();
    28    out.close();
    29    showStatus(tr("PSBT saved to disk."), StatusLevel::INFO);
    30}
    

    In src/qt/recentrequestsstablemodel.cpp:

     0// called when adding a request from the GUI
     1void RecentRequestsTableModel::addNewRequest(const SendCoinsRecipient &recipient)
     2{
     3    RecentRequestEntry newEntry;
     4    newEntry.id = ++nReceiveRequestsMaxId;
     5    newEntry.date = QDateTime::currentDateTime();
     6    newEntry.recipient = recipient;
     7
     8    DataStream ss{};
     9    ss << newEntry;
    10
    11    if (!walletModel->wallet().setAddressReceiveRequest(DecodeDestination(recipient.address.toStdString()), ToString(newEntry.id), ss.str()))
    12        return;
    13
    14    addNewRequest(newEntry);
    15}
    

    I am not very familiar with the GUI but as far as I can tell the RecentRequestsTable stores and displays receive addresses / payment requests that you’ve generated. Here the SendCoinsRecipient of payment request consists of an address, a label, an amount, and a memo/message. We serialize the recipient and other data about the request, an ID, and a date/time for the request, and then pass the string into a function which will store it in the RecentRequestsTable.


    In src/qt/sendcoinsdialog.cpp:

     0void SendCoinsDialog::presentPSBT(PartiallySignedTransaction& psbtx)
     1{
     2    // Serialize the PSBT
     3    DataStream ssTx{};
     4    ssTx << psbtx;
     5    GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
     6    QMessageBox msgBox(this);
     7    //: Caption of "PSBT has been copied" messagebox
     8    msgBox.setText(tr("Unsigned Transaction", "PSBT copied"));
     9    msgBox.setInformativeText(tr("The PSBT has been copied to the clipboard. You can also save it."));
    10    msgBox.setStandardButtons(QMessageBox::Save | QMessageBox::Discard);
    11    msgBox.setDefaultButton(QMessageBox::Discard);
    12    msgBox.setObjectName("psbt_copied_message");
    13    switch (msgBox.exec()) {
    14    case QMessageBox::Save: {
    15        QString selectedFilter;
    16        QString fileNameSuggestion = "";
    17        bool first = true;
    18        for (const SendCoinsRecipient &rcp : m_current_transaction->getRecipients()) {
    19            if (!first) {
    20                fileNameSuggestion.append(" - ");
    21            }
    22            QString labelOrAddress = rcp.label.isEmpty() ? rcp.address : rcp.label;
    23            QString amount = BitcoinUnits::formatWithUnit(model->getOptionsModel()->getDisplayUnit(), rcp.amount);
    24            fileNameSuggestion.append(labelOrAddress + "-" + amount);
    25            first = false;
    26        }
    27        fileNameSuggestion.append(".psbt");
    28        QString filename = GUIUtil::getSaveFileName(this,
    29            tr("Save Transaction Data"), fileNameSuggestion,
    30            //: Expanded name of the binary PSBT file format. See: BIP 174.
    31            tr("Partially Signed Transaction (Binary)") + QLatin1String(" (*.psbt)"), &selectedFilter);
    32        if (filename.isEmpty()) {
    33            return;
    34        }
    35        std::ofstream out{filename.toLocal8Bit().data(), std::ofstream::out | std::ofstream::binary};
    36        out << ssTx.str();
    37        out.close();
    38        //: Popup message when a PSBT has been saved to a file
    39        Q_EMIT message(tr("PSBT saved"), tr("PSBT saved to disk"), CClientUIInterface::MSG_INFORMATION);
    40        break;
    41    }
    42    case QMessageBox::Discard:
    43        break;
    44    default:
    45        assert(false);
    46    } // msgBox.exec()
    47}
    

    Here it’s used to serialize the PSBT in order to display it to the user during the process of sending in the GUI.


    In src/qt/walletmodel.cpp:

    DataStream’s are used to serialize PSBT’s when fee bumping a stuck transaction in:

    0bool WalletModel::bumpFee(uint256 hash, uint256& new_hash)
    

    and to serialize the sent transaction in WalletModel::sendCoins():

     0void WalletModel::sendCoins(WalletModelTransaction& transaction)
     1{
     2    QByteArray transaction_array; /* store serialized transaction */
     3
     4    {
     5        std::vector<std::pair<std::string, std::string>> vOrderForm;
     6        for (const SendCoinsRecipient &rcp : transaction.getRecipients())
     7        {
     8            if (!rcp.message.isEmpty()) // Message from normal bitcoin:URI (bitcoin:123...?message=example)
     9                vOrderForm.emplace_back("Message", rcp.message.toStdString());
    10        }
    11
    12        auto& newTx = transaction.getWtx();
    13        wallet().commitTransaction(newTx, /*value_map=*/{}, std::move(vOrderForm));
    14
    15        DataStream ssTx;
    16        ssTx << TX_WITH_WITNESS(*newTx);
    17        transaction_array.append((const char*)ssTx.data(), ssTx.size());
    18    }
    19
    20    // Add addresses / update labels that we've sent to the address book,
    21    // and emit coinsSent signal for each recipient
    22    for (const SendCoinsRecipient &rcp : transaction.getRecipients())
    23    {
    24        // [...]
    25        Q_EMIT coinsSent(this, rcp, transaction_array);
    26    }
    27
    28    checkBalanceChanged(m_wallet->getBalances()); // update balance immediately, otherwise there could be a short noticeable delay until pollBalanceChanged hits
    29}
    

    In src/rest.cpp:

    DataStream is used by Bitcoin Core’s REST interface to serialize responses to requests for headers in rest_headers(), blocks in rest_block(), blockfilterheaders in rest_filter_header() blockfilters in rest_block_filter(), tx’s in rest_tx() utxo’s in rest_getutxos() and blockhashes in rest_blockhash_by_height().


    In src/rpc/blockchain.cpp:

    DataStream is used to serialize the block header in the getblockheader rpc command:

    0    if (!fVerbose)
    1    {
    2        DataStream ssBlock{};
    3        ssBlock << pblockindex->GetBlockHeader();
    4        std::string strHex = HexStr(ssBlock);
    5        return strHex;
    6    }
    

    and to deserialize the block data into a CBlock in the getblock rpc command:

    0    const std::vector<uint8_t> block_data{GetRawBlockChecked(chainman.m_blockman, *pblockindex)};
    1
    2    DataStream block_stream{block_data};
    3    CBlock block{};
    4    block_stream >> TX_WITH_WITNESS(block);
    5
    6    return blockToJSON(chainman.m_blockman, block, *tip, *pblockindex, tx_verbosity);
    

    In src/rpc/mining.cpp:

    DataStream is used by the generateblock rpc for serializing the output hex of a generated block when generateblock is called with submit=false:

    0    UniValue obj(UniValue::VOBJ);
    1    obj.pushKV("hash", block_out->GetHash().GetHex());
    2    if (!process_new_block) {
    3        DataStream block_ser;
    4        block_ser << TX_WITH_WITNESS(*block_out);
    5        obj.pushKV("hex", HexStr(block_ser));
    6    }
    

    In src/rpc/rawtransaction.cpp:

    DataStream is used to serialize the resulting PSBT’s that get passed to EncodeBase64() and returned in combinepsbt:

    0static RPCHelpMan combinepsbt()
    1    // [ ..preparing merged_psbt.. ]
    2
    3    DataStream ssTx{};
    4    ssTx << merged_psbt;
    5    return EncodeBase64(ssTx);
    

    and finalizepsbt() which also might serialize the final transaction hex using a DataStream of TX_WITH_WITNESS(tx) passed to HexStr():

     0static RPCHelpMan finalizepsbt()
     1{
     2    // Unserialize the transactions
     3    PartiallySignedTransaction psbtx;
     4    std::string error;
     5    if (!DecodeBase64PSBT(psbtx, request.params[0].get_str(), error)) {
     6        throw JSONRPCError(RPC_DESERIALIZATION_ERROR, strprintf("TX decode failed %s", error));
     7    }
     8
     9    bool extract = request.params[1].isNull() || (!request.params[1].isNull() && request.params[1].get_bool());
    10
    11    CMutableTransaction mtx;
    12    bool complete = FinalizeAndExtractPSBT(psbtx, mtx);
    13
    14    UniValue result(UniValue::VOBJ);
    15    DataStream ssTx{};
    16    std::string result_str;
    17
    18    if (complete && extract) {
    19        ssTx << TX_WITH_WITNESS(mtx);
    20        result_str = HexStr(ssTx);
    21        result.pushKV("hex", result_str);
    22    } else {
    23        ssTx << psbtx;
    24        result_str = EncodeBase64(ssTx.str());
    25        result.pushKV("psbt", result_str);
    26    }
    27    result.pushKV("complete", complete);
    28
    29    return result;
    30}
    

    and in createpsbt:

     0static RPCHelpMan createpsbt()
     1{
     2
     3    std::optional<bool> rbf;
     4    if (!request.params[3].isNull()) {
     5        rbf = request.params[3].get_bool();
     6    }
     7    CMutableTransaction rawTx = ConstructTransaction(request.params[0], request.params[1], request.params[2], rbf);
     8
     9    // Make a blank psbt
    10    PartiallySignedTransaction psbtx;
    11    psbtx.tx = rawTx;
    12    for (unsigned int i = 0; i < rawTx.vin.size(); ++i) {
    13        psbtx.inputs.emplace_back();
    14    }
    15    for (unsigned int i = 0; i < rawTx.vout.size(); ++i) {
    16        psbtx.outputs.emplace_back();
    17    }
    18
    19    // Serialize the PSBT
    20    DataStream ssTx{};
    21    ssTx << psbtx;
    22
    23    return EncodeBase64(ssTx);
    24}
    

    and in utxoupdatepsbt():

     0static RPCHelpMan utxoupdatepsbt()
     1{
     2    // Parse descriptors, if any.
     3    FlatSigningProvider provider;
     4    if (!request.params[1].isNull()) {
     5        auto descs = request.params[1].get_array();
     6        for (size_t i = 0; i < descs.size(); ++i) {
     7            EvalDescriptorStringOrObject(descs[i], provider);
     8        }
     9    }
    10
    11    // We don't actually need private keys further on; hide them as a precaution.
    12    const PartiallySignedTransaction& psbtx = ProcessPSBT(
    13        request.params[0].get_str(),
    14        request.context,
    15        HidingSigningProvider(&provider, /*hide_secret=*/true, /*hide_origin=*/false),
    16        /*sighash_type=*/SIGHASH_ALL,
    17        /*finalize=*/false);
    18
    19    DataStream ssTx{};
    20    ssTx << psbtx;
    21    return EncodeBase64(ssTx);
    22}
    

    and joinpsbts:

    0static RPCHelpMan joinpsbts()
    1    // [ ... prepare PartiallySignedTransaction shuffled psbt ... ]
    2    DataStream ssTx{};
    3    ssTx << shuffled_psbt;
    4    return EncodeBase64(ssTx);
    5}
    

    and in descriptorprocesspsbt, which like finalizepsbt above might also use DataStream for serializing a final transaction hex that gets passed to HexStr and return if the psbt is complete:

     0RPCHelpMan descriptorprocesspsbt()
     1    // [ ...prepare PartiallySignedTransaction &psbtx... ]
     2    DataStream ssTx{};
     3    ssTx << psbtx;
     4
     5    UniValue result(UniValue::VOBJ);
     6
     7    result.pushKV("psbt", EncodeBase64(ssTx));
     8    result.pushKV("complete", complete);
     9    if (complete) {
    10        CMutableTransaction mtx;
    11        PartiallySignedTransaction psbtx_copy = psbtx;
    12        CHECK_NONFATAL(FinalizeAndExtractPSBT(psbtx_copy, mtx));
    13        DataStream ssTx_final;
    14        ssTx_final << TX_WITH_WITNESS(mtx);
    15        result.pushKV("hex", HexStr(ssTx_final));
    16    }
    17    return result;
    18}
    

    In src/rpc/txoutproof:

    It is used for serializing the merkle inclusion proof in gettxoutproof():

    0static RPCHelpMan gettxoutproof()
    1{
    2    // [...]
    3
    4    DataStream ssMB{};
    5    CMerkleBlock mb(block, setTxids);
    6    ssMB << mb;
    7    std::string strHex = HexStr(ssMB);
    8    return strHex;
    9}
    

    and for deserializing the inclusion proof in verifytxoutproof:

    0static RPCHelpMan verifytxoutproof()
    1{
    2    DataStream ssMB{ParseHexV(request.params[0], "proof")};
    3    CMerkleBlock merkleBlock;
    4    ssMB >> merkleBlock;
    5
    6    // [ ... Validate merkleBlock ... ] 
    7}
    

    Wallet

    If wallet is unencrypted on disk, I feel there is no reason for us to be delicate about how it is handled in memory.

    How wallet disk encryption happens

    My understanding of the way that wallet encryption on disk works is that keys and values are written and read by the wallet in crypted form, and they are decrypted/encrypted in memory by ScriptPubKeyMan, for example:

     0// [ Getting the private key for `CKeyID` address and storing the result 
     1//   in `CKey& keyOut` ]
     2bool LegacyDataSPKM::GetKey(const CKeyID &address, CKey& keyOut) const
     3{
     4    LOCK(cs_KeyStore);
     5    if (!m_storage.HasEncryptionKeys()) {
     6        return FillableSigningProvider::GetKey(address, keyOut);
     7    }
     8
     9    // [ a map of crypted keys is created on legacy wallet load in
    10    //   `LoadLegacyWalletRecords()` ]
    11    CryptedKeyMap::const_iterator mi = mapCryptedKeys.find(address);
    12    if (mi != mapCryptedKeys.end())
    13    {
    14        const CPubKey &vchPubKey = (*mi).second.first;
    15        const std::vector<unsigned char> &vchCryptedSecret = (*mi).second.second;
    16        // [ Use the encryption key to decrypt the crypted key from the map. ]
    17        return m_storage.WithEncryptionKey([&](const CKeyingMaterial& encryption_key) {
    18            return DecryptKey(encryption_key, vchCryptedSecret, vchPubKey, keyOut);
    19        });
    20    }
    21    return false;
    22}
    

    Because of this, we should not be vigilant about securing memory that contains crypted data from the disk.


    In src/wallet/bdb.cpp:

    BerkeleyDatabase::Rewrite() uses DataStream to serialize the keys and values from the existing db when rewriting the database.

    BerkeleyDatabase::Rewrite() is used when encrypting a wallet for the first time, since, according to comments “BDB might keep bits of the unencrypted private key in slack space in the database file.” or when we detect a wallet that was encrypted by version <0.5.0 and >0.4.0 of bitcoin, presumably because of some horrible bug in those versions. (PR #635

    But at this point, the wallet has already been encrypted, and we won’t be loading anything from slack space when rewriting the db, so no problems.

    BerkeleyCursor::Next() is used when cursoring through the BDB, and stores the retrieved Key and Value in DataStream’s, if the wallet is encrypted these will be crypted, if not, the keys are on disk in plaintext anyways.

    BerkeleyBatch::ReadKey() retrieves the value for a given key in the database:

     0bool BerkeleyBatch::ReadKey(DataStream&& key, DataStream& value)
     1{
     2    if (!pdb)
     3        return false;
     4
     5    SafeDbt datKey(key.data(), key.size());
     6
     7    SafeDbt datValue;
     8    int ret = pdb->get(activeTxn, datKey, datValue, 0);
     9    if (ret == 0 && datValue.get_data() != nullptr) {
    10        value.clear();
    11        value.write(SpanFromDbt(datValue));
    12        return true;
    13    }
    14    return false;
    15}
    

    This is not a concern because like above, this data is either in plaintext on disk, or it is being retrieved in crypted form and will be decrypted elsewhere by SPKM.

    Similar arguments to the above apply for BerkeleyBatch::WriteKey(), BerkeleyBatch::EraseKey(), and BerkeleyBatch::HasKey()


    In src/wallet/db.h:

    The same argument as above applies for keys and values used here in DatabaseBatch functions Read, Write, Erase, Exists:

     0/** RAII class that provides access to a WalletDatabase */
     1class DatabaseBatch
     2{
     3private:
     4    virtual bool ReadKey(DataStream&& key, DataStream& value) = 0;
     5    virtual bool WriteKey(DataStream&& key, DataStream&& value, bool overwrite = true) = 0;
     6    virtual bool EraseKey(DataStream&& key) = 0;
     7    virtual bool HasKey(DataStream&& key) = 0;
     8
     9public:
    10    template <typename K, typename T>
    11    bool Read(const K& key, T& value)
    12    {
    13        DataStream ssKey{};
    14        ssKey.reserve(1000);
    15        ssKey << key;
    16
    17        DataStream ssValue{};
    18        if (!ReadKey(std::move(ssKey), ssValue)) return false;
    19        try {
    20            ssValue >> value;
    21            return true;
    22        } catch (const std::exception&) {
    23            return false;
    24        }
    25    }
    26
    27    template <typename K, typename T>
    28    bool Write(const K& key, const T& value, bool fOverwrite = true)
    29    {
    30        DataStream ssKey{};
    31        ssKey.reserve(1000);
    32        ssKey << key;
    33
    34        DataStream ssValue{};
    35        ssValue.reserve(10000);
    36        ssValue << value;
    37
    38        return WriteKey(std::move(ssKey), std::move(ssValue), fOverwrite);
    39    }
    40
    41    template <typename K>
    42    bool Erase(const K& key)
    43    {
    44        DataStream ssKey{};
    45        ssKey.reserve(1000);
    46        ssKey << key;
    47
    48        return EraseKey(std::move(ssKey));
    49    }
    50
    51    template <typename K>
    52    bool Exists(const K& key)
    53    {
    54        DataStream ssKey{};
    55        ssKey.reserve(1000);
    56        ssKey << key;
    57
    58        return HasKey(std::move(ssKey));
    59    }
    60};
    

    In dump.cpp:

    DumpWallet() invoked by doing bitcoin-wallet dump prints all keys and values in a wallet, but does not decrypt them:

     0// [ I've editorialized this codeblock to focus on the part I'm interested in ]
     1bool DumpWallet(const ArgsManager& args, WalletDatabase& db, bilingual_str& error)
     2{
     3    // [.. handle dump file stuff ..]
     4    std::unique_ptr<DatabaseBatch> batch = db.MakeBatch();
     5    std::unique_ptr<DatabaseCursor> cursor = batch->GetNewCursor();
     6
     7    // Read the records
     8    while (true) {
     9        DataStream ss_key{};
    10        DataStream ss_value{};
    11        DatabaseCursor::Status status = cursor->Next(ss_key, ss_value);
    12        if (status == DatabaseCursor::Status::DONE) {
    13            ret = true;
    14            break;
    15        } else if (status == DatabaseCursor::Status::FAIL) {
    16            error = _("Error reading next record from wallet database");
    17            ret = false;
    18            break;
    19        }
    20        std::string key_str = HexStr(ss_key);
    21        std::string value_str = HexStr(ss_value);
    22        line = strprintf("%s,%s\n", key_str, value_str);
    23        dump_file.write(line.data(), line.size());
    24        hasher << Span{line};
    25    }
    26
    27    cursor.reset();
    28    batch.reset();
    29
    30    // [.. handle dump file stuff ..]
    31
    32    return ret;
    33}
    

    In src/wallet/migrate.cpp & src/wallet/migrate.h:

    BerkeleyRO* exist so that we can read keys and values from a legacy bdb wallet when migrating so that we can drop the bdb wallet entirely in the future, the same as in db.h applies here, all the ekys and values read in BerkeleyROBatch::ReadKey(), HasKey and BerkeleyROCursor::Next() are crypted as in their non-RO counterparts found above.


    In src/wallet/rpc/backup.cpp:

    DataStream is used to serialize the transaction inclusion proof argument to the importprunedfunds() rpc which lets pruned nodes import funds without rescanning if they have inclusion proofs similar to above in src/rpc/txoutproof.cpp.

     0RPCHelpMan importprunedfunds()
     1{
     2    std::shared_ptr<CWallet> const pwallet = GetWalletForJSONRPCRequest(request);
     3    if (!pwallet) return UniValue::VNULL;
     4
     5    CMutableTransaction tx;
     6    if (!DecodeHexTx(tx, request.params[0].get_str())) {
     7        throw JSONRPCError(RPC_DESERIALIZATION_ERROR, "TX decode failed. Make sure the tx has at least one input.");
     8    }
     9    uint256 hashTx = tx.GetHash();
    10
    11    DataStream ssMB{ParseHexV(request.params[1], "proof")};
    12    CMerkleBlock merkleBlock;
    13    ssMB >> merkleBlock;
    14
    15    // [.. validate merkle block ..]
    16
    17    // [.. add transactions to wallet.. ]
    18}
    

    In src/wallet/rpc/txoutproof.cpp:

    In static Univalue FinishTransaction used by the rpc’s send() and sendall(), DataStream is used to serialize the completed psbt and print it if either was called with psbt=true.

    In bumpfee_helper when invoked as the psbtbumpfee rpc, a DataStream is used to serialize the unsigned psbt of the new transaction that gets returned.

    In walletprocesspsbt() `DataStream is used to serialize the PSBT, and if the transaction is complete to serialize the final transaction:

     0RPCHelpMan walletprocesspsbt()
     1{
     2    // [...prepare psbtx...]
     3
     4    UniValue result(UniValue::VOBJ);
     5    DataStream ssTx{};
     6    ssTx << psbtx;
     7    result.pushKV("psbt", EncodeBase64(ssTx.str()));
     8    result.pushKV("complete", complete);
     9    if (complete) {
    10        CMutableTransaction mtx;
    11        // Returns true if complete, which we already think it is.
    12        CHECK_NONFATAL(FinalizeAndExtractPSBT(psbtx, mtx));
    13        DataStream ssTx_final;
    14        ssTx_final << TX_WITH_WITNESS(mtx);
    15        result.pushKV("hex", HexStr(ssTx_final));
    16    }
    17
    18    return result;
    19}
    

    in the walletcreatefundedpsbt rpc, it contains the serialized psbt


    In src/wallet/salvage.cpp:

    DataStream is used during RecoverDatabaseFile() when trying to recover key and value data from a db, nothing gets decrypted here:

     0    for (KeyValPair& row : salvagedData)
     1    {
     2        /* Filter for only private key type KV pairs to be added to the salvaged wallet */
     3        DataStream ssKey{row.first};
     4        DataStream ssValue(row.second);
     5        std::string strType, strErr;
     6
     7        // We only care about KEY, MASTER_KEY, CRYPTED_KEY, and HDCHAIN types
     8        ssKey >> strType;
     9        bool fReadOK = false;
    10        // [ The below just load the crypted form of the key, no decryption. ]
    11        if (strType == DBKeys::KEY) {
    12            fReadOK = LoadKey(&dummyWallet, ssKey, ssValue, strErr);
    13        } else if (strType == DBKeys::CRYPTED_KEY) {
    14            fReadOK = LoadCryptedKey(&dummyWallet, ssKey, ssValue, strErr);
    15        } else if (strType == DBKeys::MASTER_KEY) {
    16            fReadOK = LoadEncryptionKey(&dummyWallet, ssKey, ssValue, strErr);
    17        } else if (strType == DBKeys::HDCHAIN) {
    18            fReadOK = LoadHDChain(&dummyWallet, ssValue, strErr);
    19        } else {
    20            continue;
    21        }
    

    In src/wallet/sqlite.cpp & src/wallet/sqlite.h:

    SQLiteBatch::ReadKey, WriteKey, etc. and SQLiteCursor::next mirror berkeley and berkeley RO batches above, again: all reading crypted data from disk, data gets decrypted somewhere else, once it’s far away from it’s humble DataStream beginnings.


    In src/wallet/wallet.cpp:

    Used in MigrateToSQLite() when iterating through BDB with the bdb cursor:

     0bool CWallet::MigrateToSQLite(bilingual_str& error)
     1{
     2    while (true) {
     3        DataStream ss_key{};
     4        DataStream ss_value{};
     5        status = cursor->Next(ss_key, ss_value);
     6        if (status != DatabaseCursor::Status::MORE) {
     7            break;
     8        }
     9        SerializeData key(ss_key.begin(), ss_key.end());
    10        SerializeData value(ss_value.begin(), ss_value.end());
    11        records.emplace_back(key, value);
    12    }
    13    cursor.reset();
    14    batch.reset();
    15
    16    // [....insert the records in to the new sqlite db...] 
    17}
    

    In src/wallet/walletdb.cpp:

    Most of the arguments above about encrypted data on disk hold true here…

    0bool WalletBatch::IsEncrypted()
    1{
    2    DataStream prefix;
    3    prefix << DBKeys::MASTER_KEY;
    4    if (auto cursor = m_batch->GetNewPrefixCursor(prefix)) {
    5        DataStream k, v;
    6        if (cursor->Next(k, v) == DatabaseCursor::Status::MORE) return true;
    7    }
    8    return false;
    9}
    

    master encryption keys are stored in the db (in crypted form!), this is just serializing the master key prefix and then searching for such an entry, no secrets in the prefix!

    LoadKey and LoadCryptedKey don’t do any decryption of the keys. LoadKey just grabs all the keys that have the unencrypted key prefix as-is, and loadcryptedkey loads keys with the crypted key prefix as-is. The story is almost identical with LoadHDChain and LoadEncryptionKey and the same with the rest of the LoadRecords(), LoadLegacyWalletRecoreds(), and LoadDescriptorWalletRecords() circus.

    I definitely got tired and slacked a little while reviewing walletdb.cpp but I’m pretty confident about this.


    In src/zmq/zmpqpublishnotifier.cpp:

    0bool CZMQPublishRawTransactionNotifier::NotifyTransaction(const CTransaction &transaction)
    1{
    2    uint256 hash = transaction.GetHash();
    3    LogDebug(BCLog::ZMQ, "Publish rawtx %s to %s\n", hash.GetHex(), this->address);
    4    DataStream ss;
    5    ss << TX_WITH_WITNESS(transaction);
    6    return SendZmqMessage(MSG_RAWTX, &(*ss.begin()), ss.size());
    7}`
    

    Used to serialize the raw transaction that we are sending a ZeroMQ notification about.

    Not done yet, SerializeData

    Let’s also look at every instance of SerializeData being used, since this is a vector of bytes, with the zero_after_free_allocator:


    In src/wallet/migrate.cpp:

    Used in the BerkeleyROBatch::* family of ReadKey(), HasKey() to represent the vector portion of the same DataStream’s I used and described above that have just crypted key data, or unencrypted data if the wallet itself is unencrypted, e.g.:

     0
     1bool BerkeleyROBatch::ReadKey(DataStream&& key, DataStream& value)
     2{
     3    SerializeData key_data{key.begin(), key.end()};
     4    const auto it{m_database.m_records.find(key_data)};
     5    if (it == m_database.m_records.end()) {
     6        return false;
     7    }
     8    auto val = it->second;
     9    value.clear();
    10    value.write(Span(val));
    11    return true;
    12}
    

    In src/wallet/wallet.cpp:

    Used in MigrateToSQLite() as discussed above to store the DataStream data described above:

     0while (true) {
     1    DataStream ss_key{};
     2    DataStream ss_value{};
     3    status = cursor->Next(ss_key, ss_value);
     4    if (status != DatabaseCursor::Status::MORE) {
     5        break;
     6    }
     7    SerializeData key(ss_key.begin(), ss_key.end());
     8    SerializeData value(ss_value.begin(), ss_value.end());
     9    records.emplace_back(key, value);
    10}
    

    1. Maybe as much as 50x: #740 (comment) ↩︎

    2. I am assuming this from the discussion, github seems to not have dead commits for old pr’s ↩︎

  10. davidgumberg commented at 6:41 pm on September 27, 2024: contributor
    The CI failure on ARM is related, and I am able to reproduce locally. It is from a -Warray-bounds warning that I believe is spurious, I’m trying to create a minimal reproduction.
  11. Sjors commented at 11:55 am on October 1, 2024: member

    I tested this PR on an AMD Ryzen 7950x machine with Ubuntu 24.04, with one local network peer and a gigabit internet connection.

    0bitcoind -dbcache=30000 -stopatheight=863000 -blocksdir=/magnetic/.bitcoin -addnode=local-network
    

    Before: 5 hours 10 minutes After: 4 hours and 55 minutes

    Time includes 20 minutes to flush chainstate to disk during shutdown.

    That’s nowhere near a 25% difference and probably not statistically significant. Some configurations might do better than others from this change. It’s certainly not worse.

    The node was additionally patched to drop the -par limit and use 31 threads for signature validation (in both runs).

  12. davidgumberg renamed this:
    Don't zero-after-free `DataStream`: ~25% faster IBD
    Don't zero-after-free `DataStream`: ~25% faster IBD on some configurations
    on Oct 1, 2024
  13. theuni commented at 3:54 pm on October 1, 2024: member

    @davidgumberg Thanks for the very detailed description!

    I suspect the speedup here is going to be very dependent on the architecture/environment, but it’s not clear to me exactly what variables would matter most.

    Would you be interested in working up a specific zero-after-free benchmark that illustrates the difference? If we knew exactly where the speedup was coming from, we could make a more informed decision about what/where to optimize.

  14. davidgumberg commented at 9:49 pm on October 1, 2024: contributor

    I tested this PR on an AMD Ryzen 7950x machine with Ubuntu 24.04, with one local network peer and a gigabit internet connection.

    0bitcoind -dbcache=30000 -stopatheight=863000 -blocksdir=/magnetic/.bitcoin -addnode=local-network
    

    Before: 5 hours 10 minutes After: 4 hours and 55 minutes

    Time includes 20 minutes to flush chainstate to disk during shutdown. @Sjors Thanks for testing and showing that the benefit (if any) here is setup-dependent. I suspect that the biggest improvements will be seen on memory bandwidth constrained systems and when flushing/syncing coinsdb to disk, which you have skipped most of by running such a high dbcache and unpruned, so I think that your setup is worst case scenario for improvements seen from this change but this could be my own wishful thinking!

    I suspect the speedup here is going to be very dependent on the architecture/environment, but it’s not clear to me exactly what variables would matter most.

    Would you be interested in working up a specific zero-after-free benchmark that illustrates the difference? If we knew exactly where the speedup was coming from, we could make a more informed decision about what/where to optimize. @theuni Good point, especially given Sjors result, I will draft up a zero-after-free benchmark to isolate the causes / relevant factors of any performance benefit that might be here. I think a CDBWrapper::BatchWrite() benchmark could also be helpful for this and other PR’s like #30039 that claim to improve IBD write performance.

  15. davidgumberg renamed this:
    Don't zero-after-free `DataStream`: ~25% faster IBD on some configurations
    Don't zero-after-free `DataStream`: Faster IBD on some configurations
    on Oct 3, 2024
  16. DrahtBot added the label Needs rebase on Oct 8, 2024
  17. DrahtBot commented at 3:24 pm on October 8, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-08 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me