Add DataStream without ser-type and ser-version and use it where possible #25296

pull maflcko wants to merge 3 commits into bitcoin:master from maflcko:2206-datastream-🎑 changing 49 files +196 −181
  1. maflcko commented at 5:37 PM on June 7, 2022: member

    This was done in the context of #25284 , but I think it also makes sense standalone.

    The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version.

    So do this here for DataStream. CDataStream remains in places where it is not yet possible.

  2. maflcko force-pushed on Jun 7, 2022
  3. laanwj added the label Refactoring on Jun 7, 2022
  4. maflcko force-pushed on Jun 7, 2022
  5. DrahtBot commented at 1:04 AM on June 8, 2022: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK stickies-v, aureleoules
    Concept ACK Empact

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26715 (Introduce MockableDatabase for wallet unit tests by achow101)
    • #26705 (clang-tidy: Check headers and fixes them by hebasto)
    • #25297 (wallet: group independent db writes on single batched db transaction by furszy)
    • #24914 (wallet: Load database records in a particular order by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. DrahtBot added the label Needs rebase on Jun 24, 2022
  7. maflcko force-pushed on Jun 24, 2022
  8. DrahtBot removed the label Needs rebase on Jun 24, 2022
  9. maflcko force-pushed on Jul 22, 2022
  10. fanquake commented at 8:09 AM on July 22, 2022: member

    @sipa @laanwj @Empact you might be interested in reiviewing this given 25285 & 25331.

  11. maflcko force-pushed on Aug 1, 2022
  12. maflcko force-pushed on Aug 10, 2022
  13. maflcko force-pushed on Aug 10, 2022
  14. maflcko force-pushed on Aug 10, 2022
  15. DrahtBot added the label Needs rebase on Aug 17, 2022
  16. maflcko force-pushed on Aug 19, 2022
  17. DrahtBot removed the label Needs rebase on Aug 19, 2022
  18. maflcko force-pushed on Sep 2, 2022
  19. maflcko force-pushed on Sep 16, 2022
  20. in src/streams.h:208 in fa5a98f39a outdated
     215 |  
     216 |      template <typename... Args>
     217 | -    CDataStream(int nTypeIn, int nVersionIn, Args&&... args)
     218 | -        : nType{nTypeIn},
     219 | -          nVersion{nVersionIn}
     220 | +    static DataStream FromMany(Args&&... args)
    


    ryanofsky commented at 6:06 PM on October 6, 2022:

    In commit "streams: Add DataStream without ser-type and ser-version" (fa5a98f39a0b236be0ce468d71169a560f2dc489)

    What's going on with this FromMany constructor? It seem unused. Would be good to simplify the commit and drop it


    maflcko commented at 9:39 AM on November 23, 2022:

    Dropped


    maflcko commented at 12:23 PM on January 30, 2023:

    Also removed the constructor in https://github.com/bitcoin/bitcoin/pull/26992

  21. Empact commented at 7:38 PM on October 12, 2022: member

    Concept ACK, also curious about FromMany, seems extraneous.

  22. maflcko force-pushed on Nov 23, 2022
  23. maflcko commented at 9:39 AM on November 23, 2022: member

    Thanks, removed unused method for now (will be re-added later when it is needed).

  24. aureleoules approved
  25. aureleoules commented at 7:08 AM on November 24, 2022: member

    ACK fa04cd9b966d60a8777463499347c302ee75318b

  26. maflcko force-pushed on Nov 30, 2022
  27. maflcko force-pushed on Dec 15, 2022
  28. DrahtBot requested review from aureleoules on Dec 15, 2022
  29. maflcko force-pushed on Jan 3, 2023
  30. aureleoules approved
  31. aureleoules commented at 3:37 PM on January 4, 2023: member

    reACK fab2cec458b7a0773a115da41b03f6930110acd7

  32. fanquake requested review from ryanofsky on Jan 17, 2023
  33. fanquake requested review from stickies-v on Jan 18, 2023
  34. DrahtBot added the label Needs rebase on Jan 23, 2023
  35. streams: Add DataStream without ser-type and ser-version
    The moved parts can be reviewed with "--color-moved=dimmed-zebra".
    The one-char changes can be reviewed with "--word-diff-regex=.".
    fa9becfe1c
  36. maflcko force-pushed on Jan 24, 2023
  37. DrahtBot removed the label Needs rebase on Jan 24, 2023
  38. maflcko force-pushed on Jan 24, 2023
  39. stickies-v commented at 4:04 PM on January 24, 2023: contributor

    Concept ACK

    CDataStream operates on bytes, but is agnostic to which kind of data is buffered. As such, the constructor takes a serialization type and version, to allow the consumer of the buffer to properly (de)serialize the data (e.g. as used in CAddress). It is not used by CDataStream directly, so it's more like metadata. This data seems unnecessary for a lot of the callsites, so this PR allows us to construct a DataStream without serialization info, while keeping a CDataStream with serialization info. All callsites that do not use serialization info should now be switched over to DataStream.

    Out of curiosity: is there intent to phase out CDataStream entirely (through further refactoring)?

  40. maflcko commented at 4:10 PM on January 24, 2023: member

    Yes, see OP. Though, the changes here are stand-alone.

  41. stickies-v commented at 1:03 AM on January 25, 2023: contributor

    Approach ACK fa0e6640bac8c6426af7c5744125c85c0f74b9e5

    • The new DataStream class (and CDataStream subclass) looks quite straightforward when looking at the commit with --color-moved=dimmed_zebra.
    • I didn't manually verify the changed instances of CDataStream->DataStream (yet), as there are quite a few. I think it's reasonable enough to assume that if it compiles, it means the GetType() and GetVersion() functions are never called on that instance and it can safely be a DataStream. The main exception I see to that assumption is if the affected code is in a conditionally compiled block, such as e.g. in sighash_tests.cpp, in which case it may only fail with certain configurations.
      • Any other issues with that approach? I'm curious how other reviewers assessed this.
    • I think I've found some additional instances where we could be using a DataStream, as per the below diff:

    <details> <summary>git diff on fa0e6640bac8c6426af7c5744125c85c0f74b9e5</summary>

    diff --git a/src/qt/psbtoperationsdialog.cpp b/src/qt/psbtoperationsdialog.cpp
    index 17a65be0f..c8373d3d9 100644
    --- a/src/qt/psbtoperationsdialog.cpp
    +++ b/src/qt/psbtoperationsdialog.cpp
    @@ -129,14 +129,14 @@ void PSBTOperationsDialog::broadcastTransaction()
     }
     
     void PSBTOperationsDialog::copyToClipboard() {
    -    CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
    +    DataStream ssTx{};
         ssTx << m_transaction_data;
         GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
         showStatus(tr("PSBT copied to clipboard."), StatusLevel::INFO);
     }
     
     void PSBTOperationsDialog::saveTransaction() {
    -    CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
    +    DataStream ssTx{};
         ssTx << m_transaction_data;
     
         QString selected_filter;
    diff --git a/src/qt/recentrequeststablemodel.cpp b/src/qt/recentrequeststablemodel.cpp
    index 85ade624c..bf55399d5 100644
    --- a/src/qt/recentrequeststablemodel.cpp
    +++ b/src/qt/recentrequeststablemodel.cpp
    @@ -175,7 +175,7 @@ void RecentRequestsTableModel::addNewRequest(const SendCoinsRecipient &recipient
         newEntry.date = QDateTime::currentDateTime();
         newEntry.recipient = recipient;
     
    -    CDataStream ss(SER_DISK, CLIENT_VERSION);
    +    DataStream ss{};
         ss << newEntry;
     
         if (!walletModel->wallet().setAddressReceiveRequest(DecodeDestination(recipient.address.toStdString()), ToString(newEntry.id), ss.str()))
    @@ -188,7 +188,7 @@ void RecentRequestsTableModel::addNewRequest(const SendCoinsRecipient &recipient
     void RecentRequestsTableModel::addNewRequest(const std::string &recipient)
     {
         std::vector<uint8_t> data(recipient.begin(), recipient.end());
    -    CDataStream ss(data, SER_DISK, CLIENT_VERSION);
    +    DataStream ss{data};
     
         RecentRequestEntry entry;
         ss >> entry;
    diff --git a/src/qt/sendcoinsdialog.cpp b/src/qt/sendcoinsdialog.cpp
    index 1604cad50..f9c3b0189 100644
    --- a/src/qt/sendcoinsdialog.cpp
    +++ b/src/qt/sendcoinsdialog.cpp
    @@ -399,7 +399,7 @@ bool SendCoinsDialog::PrepareSendText(QString& question_string, QString& informa
     void SendCoinsDialog::presentPSBT(PartiallySignedTransaction& psbtx)
     {
         // Serialize the PSBT
    -    CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
    +    DataStream ssTx{};
         ssTx << psbtx;
         GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
         QMessageBox msgBox;
    diff --git a/src/qt/walletmodel.cpp b/src/qt/walletmodel.cpp
    index cb8491e27..f592db066 100644
    --- a/src/qt/walletmodel.cpp
    +++ b/src/qt/walletmodel.cpp
    @@ -261,7 +261,7 @@ void WalletModel::sendCoins(WalletModelTransaction& transaction)
             auto& newTx = transaction.getWtx();
             wallet().commitTransaction(newTx, /*value_map=*/{}, std::move(vOrderForm));
     
    -        CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
    +        DataStream ssTx{};
             ssTx << *newTx;
             transaction_array.append((const char*)ssTx.data(), ssTx.size());
         }
    @@ -552,7 +552,7 @@ bool WalletModel::bumpFee(uint256 hash, uint256& new_hash)
                 return false;
             }
             // Serialize the PSBT
    -        CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
    +        DataStream ssTx{};
             ssTx << psbtx;
             GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
             Q_EMIT message(tr("PSBT copied"), "Copied to clipboard", CClientUIInterface::MSG_INFORMATION);
    diff --git a/src/wallet/test/walletdb_tests.cpp b/src/wallet/test/walletdb_tests.cpp
    index 21842fe78..6f75635bf 100644
    --- a/src/wallet/test/walletdb_tests.cpp
    +++ b/src/wallet/test/walletdb_tests.cpp
    @@ -15,14 +15,14 @@ BOOST_FIXTURE_TEST_SUITE(walletdb_tests, BasicTestingSetup)
     BOOST_AUTO_TEST_CASE(walletdb_readkeyvalue)
     {
         /**
    -     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the CDataStream steam into a
    +     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the DataStream steam into a
          * CPrivKey or CWalletKey respectively and then reads a hash of the pubkey and privkey into a uint256.
          * Wallets from 0.8 or before do not store the pubkey/privkey hash, trying to read the hash from old
          * wallets throws an exception, for backwards compatibility this read is wrapped in a try block to
          * silently fail. The test here makes sure the type of exception thrown from CDataStream::read()
          * matches the type we expect, otherwise we need to update the "key"/"wkey" exception type caught.
          */
    -    CDataStream ssValue(SER_DISK, CLIENT_VERSION);
    +    DataStream ssValue{};
         uint256 dummy;
         BOOST_CHECK_THROW(ssValue >> dummy, std::ios_base::failure);
     }
    diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
    index c3756f89a..e47a69618 100644
    --- a/src/wallet/wallet.cpp
    +++ b/src/wallet/wallet.cpp
    @@ -3821,8 +3821,8 @@ bool CWallet::MigrateToSQLite(bilingual_str& error)
         bool began = batch->TxnBegin();
         assert(began); // This is a critical error, the new db could not be written to. The original db exists as a backup, but we should not continue execution.
         for (const auto& [key, value] : records) {
    -        CDataStream ss_key(key, SER_DISK, CLIENT_VERSION);
    -        CDataStream ss_value(value, SER_DISK, CLIENT_VERSION);
    +        DataStream ss_key{key};
    +        DataStream ss_value{value};
             if (!batch->Write(ss_key, ss_value)) {
                 batch->TxnAbort();
                 m_database->Close();
    

    </details>

  42. aureleoules approved
  43. aureleoules commented at 11:48 AM on January 25, 2023: member

    reACK fa0e6640bac8c6426af7c5744125c85c0f74b9e5

  44. maflcko commented at 2:59 PM on January 25, 2023: member

    I think I've found some additional instances where we could be using a DataStream, as per the below diff:

    Does it compile?

  45. maflcko force-pushed on Jan 25, 2023
  46. maflcko commented at 3:11 PM on January 25, 2023: member

    Force pushed to take the ones that compile, thanks

  47. stickies-v approved
  48. stickies-v commented at 7:35 PM on January 25, 2023: contributor

    ACK faceeec06

    I didn't verify that in fa4bc9089 all the changed CDataStream->DataStream types are correct because that would be a very laborous task. I did verify that if they are incorrect, they should fail at compile-time, and not at run-time. I think that's a reasonable enough review approach.


    Does it compile?

    Sorry, I compiled everything but I forgot to reset my ./configure so some incorrectly slipped through, as you figured out.

    Tweaked my script a bit, found some more instances - diff below. The one in walletdb_tests.cpp was included in my previous diff too but not yet updated, I'm not sure if that failed for you or just missed it?

    Going to stop looking for more instances now, can always add them later on.

    <details> <summary>git diff</summary>

    diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
    index 1b2543c77..9e0bc312b 100644
    --- a/src/rpc/blockchain.cpp
    +++ b/src/rpc/blockchain.cpp
    @@ -565,7 +565,7 @@ static RPCHelpMan getblockheader()
     
         if (!fVerbose)
         {
    -        CDataStream ssBlock(SER_NETWORK, PROTOCOL_VERSION);
    +        DataStream ssBlock{};
             ssBlock << pblockindex->GetBlockHeader();
             std::string strHex = HexStr(ssBlock);
             return strHex;
    diff --git a/src/test/blockencodings_tests.cpp b/src/test/blockencodings_tests.cpp
    index e1dafc6ba..e23b7228e 100644
    --- a/src/test/blockencodings_tests.cpp
    +++ b/src/test/blockencodings_tests.cpp
    @@ -310,7 +310,7 @@ BOOST_AUTO_TEST_CASE(TransactionsRequestSerializationTest) {
         req1.indexes[2] = 3;
         req1.indexes[3] = 4;
     
    -    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    +    DataStream stream{};
         stream << req1;
     
         BlockTransactionsRequest req2;
    @@ -330,7 +330,7 @@ BOOST_AUTO_TEST_CASE(TransactionsRequestDeserializationMaxTest) {
         req0.blockhash = InsecureRand256();
         req0.indexes.resize(1);
         req0.indexes[0] = 0xffff;
    -    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    +    DataStream stream{};
         stream << req0;
     
         BlockTransactionsRequest req1;
    @@ -350,7 +350,7 @@ BOOST_AUTO_TEST_CASE(TransactionsRequestDeserializationOverflowTest) {
         req0.indexes[0] = 0x7000;
         req0.indexes[1] = 0x10000 - 0x7000 - 2;
         req0.indexes[2] = 0;
    -    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    +    DataStream stream{};
         stream << req0.blockhash;
         WriteCompactSize(stream, req0.indexes.size());
         WriteCompactSize(stream, req0.indexes[0]);
    diff --git a/src/wallet/test/walletdb_tests.cpp b/src/wallet/test/walletdb_tests.cpp
    index 21842fe78..7282aced6 100644
    --- a/src/wallet/test/walletdb_tests.cpp
    +++ b/src/wallet/test/walletdb_tests.cpp
    @@ -15,14 +15,14 @@ BOOST_FIXTURE_TEST_SUITE(walletdb_tests, BasicTestingSetup)
     BOOST_AUTO_TEST_CASE(walletdb_readkeyvalue)
     {
         /**
    -     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the CDataStream steam into a
    +     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the DataStream steam into a
          * CPrivKey or CWalletKey respectively and then reads a hash of the pubkey and privkey into a uint256.
          * Wallets from 0.8 or before do not store the pubkey/privkey hash, trying to read the hash from old
          * wallets throws an exception, for backwards compatibility this read is wrapped in a try block to
    -     * silently fail. The test here makes sure the type of exception thrown from CDataStream::read()
    +     * silently fail. The test here makes sure the type of exception thrown from DataStream::read()
          * matches the type we expect, otherwise we need to update the "key"/"wkey" exception type caught.
          */
    -    CDataStream ssValue(SER_DISK, CLIENT_VERSION);
    +    DataStream ssValue{};
         uint256 dummy;
         BOOST_CHECK_THROW(ssValue >> dummy, std::ios_base::failure);
     }
    diff --git a/src/wallet/walletdb.cpp b/src/wallet/walletdb.cpp
    index 5f3be8a6b..2cd35ae40 100644
    --- a/src/wallet/walletdb.cpp
    +++ b/src/wallet/walletdb.cpp
    @@ -994,7 +994,7 @@ DBErrors WalletBatch::FindWalletTxHashes(std::vector<uint256>& tx_hashes)
             {
                 // Read next record
                 DataStream ssKey{};
    -            CDataStream ssValue(SER_DISK, CLIENT_VERSION);
    +            DataStream ssValue{};
                 DatabaseCursor::Status status = cursor->Next(ssKey, ssValue);
                 if (status == DatabaseCursor::Status::DONE) {
                     break;
    

    </details>

    FYI, this is my (rather messy) script I used to find more candidates. It's a pretty dumb approach and only replaces one location at a time, so any case where e.g. both .cpp and .h file need to be updated simultaneously, this will not find.

    <details> <summary>python script to find `DataStream` candidates</summary>

    """
    Run this script from bitcoin root dir, observe datastream.log for results (summarized at the end)
    """
    
    import subprocess
    from pathlib import Path
    import re
    from typing import Optional
    from pprint import pformat
    
    BASE_PATH = Path().absolute()
    LOG_FILE = Path(BASE_PATH / "datastream.log")
    
    def make() -> int:
        """return 0 if make completed without errors"""
        status = subprocess.run("make -j 9", shell=True).returncode
        return status
        
    def update_line(line: str) -> Optional[str]:
        """return line with suggested DataStream replacement, or None if no CDataStream was found"""
        match = re.search(r'CDataStream\s*(\w+)?\s*[\(\{](.*)[\}\)]', line)
        if match:
            name = match.group(1) or ""
            params = [p.strip() for p in (match.group(2) or "").split(",")]
            if len(params) in (1, 3):
                # 1 param: passing a span; 3 param: passing a span and 2 serialization parameters
                # -> we just want to capture the span
                data = params[0]
            else:
                # 2 param: passing serialization -> initialize DataStream without arguments
                # >3 param -> ignore
                data = ""
            new_line = line.replace(match.group(0), f"DataStream {name}{{{data}}}")
            return new_line
        else:
            return None
            
    def clear_log():
        with open(LOG_FILE, 'w') as f:
            f.close()
            
    def log(msg: str):
        print(msg)
        with open(LOG_FILE, 'a') as f:
            f.write(msg + '\n')
    
    def main():
        success = {}
        failed = {}
        
        subprocess.run("git checkout .", shell=True)  # start from a clean slate in case changes from previous run persisted
        clear_log()
        
        for ext in ("cpp", "h"):
            for item in Path(BASE_PATH / "src").rglob(f"*.{ext}"):
                if item.is_dir():
                    continue
                replaced = False
                log(f"scanning file {item}")
                with open(item, "r") as read_file:
                    data = read_file.readlines()
                    for i, original_line in enumerate(data):
                        location = f"{item}:{i}"
                        new_line = update_line(original_line)
                        if new_line:
                            data[i] = new_line
                            log(f"{location}: replaced {original_line}->{new_line}")
                            with open(item, "w") as write_file:
                                write_file.writelines(data)
                            if make() == 0:
                                success[location] = new_line
                                log(f"{location}: success | {new_line}")
                            else:
                                failed[location] = new_line
                                log(f"{location}: failed")
                            with open(item, "w") as write_file:
                                data[i] = original_line
                                write_file.writelines(data)    
        log("FINISHED")
        log("========")    
        log("success: \n" + pformat(success))
        
    if __name__ == '__main__':
        main()```
    
  49. Use DataStream where possible fa29e73cda
  50. maflcko commented at 9:44 AM on January 26, 2023: member

    The one in walletdb_tests.cpp was included in my previous diff too but not yet updated, I'm not sure if that failed for you or just missed it?

    It doesn't matter, but personally I think it would be best if the unit test used the exact same type as the "real" ReadKeyValue function uses. Happy to change, though.

  51. Remove unused CDataStream::SetType
    The last use was removed in the previous commit.
    fa035fe2d6
  52. maflcko force-pushed on Jan 26, 2023
  53. maflcko commented at 9:46 AM on January 26, 2023: member

    Force pushed to take more, thanks!

  54. stickies-v approved
  55. stickies-v commented at 10:41 AM on January 26, 2023: contributor

    re-ACK fa035fe

    I think it would be best if the unit test used the exact same type as the "real" ReadKeyValue function uses

    Makes sense, and no need to rush with the migration.

  56. aureleoules approved
  57. aureleoules commented at 11:25 AM on January 26, 2023: member
  58. fanquake referenced this in commit 79e007d1d6 on Jan 26, 2023
  59. fanquake commented at 11:32 AM on January 26, 2023: member

    This has been merged.

  60. fanquake closed this on Jan 26, 2023

  61. maflcko deleted the branch on Jan 26, 2023
  62. sidhujag referenced this in commit 3f0d2ba29f on Jan 26, 2023
  63. bitcoin locked this on Jan 30, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me