Skip tx-rehashing on historic blocks #13098

pull MarcoFalke wants to merge 1 commits into bitcoin:master from MarcoFalke:Mf1804-readPureBlock changing 23 files +207 −117
  1. MarcoFalke commented at 8:37 pm on April 26, 2018: member

    When serving historic blocks, there is no need to calculate the hash of each transaction, so just skip it.

    This is mostly refactoring with the following changes:

    • Introduce a CPureTransaction, a stripped down version of CTransaction (no hash or GetHash() members). Then derive class CTransaction from CPureTransaction
    • Deduce a CPureBlock which references CPureTransactions in its vtx.

    Deserialization now only takes 14% of the wall clock time it took previously on my machine. You can check locally with:

    0./bench_bitcoin --filter=DeserializeBlockTest && ./bench_bitcoin --filter=DeserializePureBlockTest
    

    Sending a single historic block now takes slightly less (90% of the wall clock time it took previously) on a local network. I presume the speedup is more pronounced when sending more than one block.

  2. MarcoFalke added the label P2P on Apr 26, 2018
  3. MarcoFalke commented at 9:38 pm on April 26, 2018: member
    The build failure is due to an eight year old bug in clang: https://bugs.llvm.org/show_bug.cgi?id=8263
  4. sdaftuar commented at 1:04 pm on April 27, 2018: member

    I like the goal here but I find the changes a bit overwhelming for achieving this. It seems like we could achieve this more easily in other ways, like (1) just building the hash on demand the first time it’s requested, or (2) adding a deserialization flag to CTransaction that instructs it to skip the hash if we’re not planning to use it?

    [I’m not sure if (1) is a great solution, since there might be some memory-cache-benefits to building the hash at the time of deserialization, and maybe no one likes the design of having a mutable hash inside of CTransaction, but it would be a super-small diff…]

  5. MarcoFalke commented at 2:24 pm on April 27, 2018: member

    I didn’t particularly like option (1) because it makes the caching run-time depended (it is always compiled in, but only used when it happens to be requested), compared to having it compile time enforced by typing CPureTransaction instead of CTransaction.

    Note that my approach is a variation of option (2) by passing through the type of the transaction to ReadBlockFromDisk instead of passing through an additional flag that is converted to a deserialization option and then passed on to the constructor of the transaction.

    So I think having explicit types is advantageous in both code clarity and cleanness, with the tradeoff of a minimally larger diff.

    Note: I will push a fixed up version of the patch by next Monday to minimize the diff and (hopefully) work around the gcc and clang compiler bugs.

  6. sdaftuar commented at 2:40 pm on April 27, 2018: member
    Fair enough, I just don’t want us to overlook simpler solutions that might be acceptable. I’m happy to let the reviewers with stronger feelings about the design of these data types chime in.
  7. MarcoFalke force-pushed on Apr 27, 2018
  8. MarcoFalke commented at 3:09 pm on April 27, 2018: member

    I just pushed a version that might compile on our buggy clang and that is according to the design that @sipa requested in #13011 (comment)

    Then at the very least introduce alternate versions of CTransaction/CBlock that don’t precompute anything, for use in reindexing and rescanning and serving blocks from disk.

  9. MarcoFalke force-pushed on Apr 27, 2018
  10. MarcoFalke force-pushed on Apr 27, 2018
  11. sipa commented at 8:57 pm on April 27, 2018: member

    @MarcoFalke Thanks for working on this, I’ll review soon. @sdaftuar My preference is having separate types, for these reasons:

    • Having lazily-computed hashes in transactions involves synchronization structures (atomics at least to avoid multiple threads simultaneously getting a hash for the first time).
    • Smaller in memory in places where it’s not needed.
    • We have the option of later moving the with-hash types into consensus code and not exposing it.
  12. gmaxwell commented at 6:15 pm on May 1, 2018: contributor
    This seems like very useful work, but for serving out historic blocks it would perhaps be better if we could (largely) avoid deserializing and re-serializing them entirely as doing so is tremendously expensive (involving a billionity mallocs). Better is better however.
  13. laanwj commented at 11:09 am on May 2, 2018: member

    Having lazily-computed hashes in transactions involves synchronization structures (atomics at least to avoid multiple threads simultaneously getting a hash for the first time).

    Agree. The main thing I don’t like about lazy evaluation is that it’s much harder to reason about performance then. I think I prefer the current explicit solution, even though much more verbose.

    This seems like very useful work, but for serving out historic blocks it would perhaps be better if we could (largely) avoid deserializing and re-serializing them entirely as doing so is tremendously expensive (involving a billionity mallocs). Better is better however.

    Yes - I was also about to say this. How difficult would it be to serve the data directly from disk, without serialization/deserialization roundtrip?

    Edit: to do this efficiently would require a change to the on-disk format to add the size, or adding the on-disk block size to the block index. OH this is actually already there, situated before the nPos:

    0    // Write index header
    1    unsigned int nSize = GetSerializeSize(fileout, block);
    2    fileout << messageStart << nSize;
    
  14. laanwj commented at 12:17 pm on May 2, 2018: member

    but for serving out historic blocks it would perhaps be better if we could (largely) avoid deserializing and re-serializing them entirely

    My dumb try at this: #13151

  15. gmaxwell commented at 6:08 pm on May 2, 2018: contributor
    Even with 13151 something like this might very useful for making rescan faster… rescan is slow enough to make it largely unusable now. I’d been thinking deserialization was worse than hashing in it, but the performance numbers above make me wonder.
  16. TheBlueMatt commented at 7:57 pm on May 2, 2018: member
    Certainly Concept ACK if we can get rescan up for this (even as a separate PR), just using it for serving blocks for old nodes is maybe less worth it (at least post-13151).
  17. MarcoFalke commented at 8:00 pm on May 2, 2018: member
    Will do a rebase once the other pull is merged
  18. jonasschnelli commented at 6:44 am on May 3, 2018: contributor
    Nice work! Concept ACK
  19. MarcoFalke force-pushed on May 4, 2018
  20. MarcoFalke force-pushed on May 23, 2018
  21. MarcoFalke force-pushed on May 23, 2018
  22. sipa commented at 11:16 pm on May 23, 2018: member
    Any reason not to reuse CMutableTransaction for the purposes where CPureTransaction is introduced here? The only reason why CTransaction is immutable is to not interfere with or require synchronization on its precomputed hashes. As CMutableTransaction has no cashed hashes, this is not an issue.
  23. MarcoFalke commented at 3:54 pm on May 24, 2018: member

    @sipa I’d prefer to make it “type-safe”, i.e. make it impossible/compile-time enforced that none of the expensive operations are called. Otherwise this seems too fragile.

    I think to make it compile-time enforced, this would require to change one of transaction types to be a class that takes template arguments and then add a static assert in Get*Hash. I am happy to change to that implementation.

  24. sipa commented at 4:04 pm on May 24, 2018: member

    @MarcoFalke But as long as you keep CMutableTransaction around, such a guarantee doesn’t really exist, as that class still has uncached hashing operations (and should probably remain so, see a recent PR that speeds up signing by avoiding CMutableTransaction -> CTransaction conversions).

    My preference is just to evolve CTransaction into the validation-logic-optimized-efficient transaction representation, and CMutableTransaction into the simple-generic representation. That may mean that uncached hashes are still invoked in places where performance doesn’t matter that much.

  25. sipa commented at 4:59 pm on May 24, 2018: member

    After discussing this a bit with @MarcoFalke, I see the advantage of being able to enforce through the type system that no GetHash is unexpectedly called on transactions that don’t have a cache in certain specific cases (like serving historical blocks).

    On the other hand, it’s also introducing yet another type for the same data, which seems a high price. As it has const fields, conversion to other transaction types will be expensive if it’s ever needed, but it does get the ability to take advantage of more efficient storage if they get implemented (like one malloc per transaction).

    Overall, I think my preference is to reuse CMutableTransaction here, but if we want the extra guarantees and an extra type, the current approach seems fine.

  26. DrahtBot added the label Needs rebase on Jul 18, 2018
  27. MarcoFalke force-pushed on Sep 6, 2018
  28. MarcoFalke force-pushed on Sep 6, 2018
  29. MarcoFalke force-pushed on Sep 6, 2018
  30. MarcoFalke force-pushed on Sep 6, 2018
  31. Skip tx-rehashing on historic blocks 775202dea7
  32. MarcoFalke force-pushed on Sep 6, 2018
  33. DrahtBot removed the label Needs rebase on Sep 6, 2018
  34. in src/net_processing.cpp:1245 in 775202dea7
    1287+                    if (pblock) {
    1288+                        cmpctblock = CBlockHeaderAndShortTxIDs{*pblock, fPeerWantsWitness};
    1289                     } else {
    1290-                        CBlockHeaderAndShortTxIDs cmpctblock(*pblock, fPeerWantsWitness);
    1291-                        connman->PushMessage(pfrom, msgMaker.Make(nSendFlags, NetMsgType::CMPCTBLOCK, cmpctblock));
    1292+                        if (!ReadBlockFromDisk(block_read, pindex, consensusParams)) assert(!"cannot load block from disk");
    


    ryanofsky commented at 8:49 pm on September 13, 2018:
    It seems like when !pblock && inv.type == MSG_FILTERED_BLOCK or !pblock && inv.type == MSG_CMPCT_BLOCK in previous code, the message would be dropped, but now the block gets read from disk? Is this an intentional change if so?
  35. ryanofsky commented at 9:10 pm on September 13, 2018: member
    Conditional utACK 775202dea78c2c139823029bd8124836b064943e if ProcessGetBlockData behavior is not a problem, or can be left unchanged.
  36. in src/primitives/transaction.h:292 in 775202dea7
    287@@ -283,6 +288,43 @@ class CTransaction
    288     const int32_t nVersion;
    289     const uint32_t nLockTime;
    290 
    291+    /** Convert a CMutableTransaction into a CPureTransaction. */
    292+    CPureTransaction(const CMutableTransaction& tx);
    


    practicalswift commented at 7:35 am on September 19, 2018:
    02018-09-18 21:44:48 cppcheck(pr=13098): [src/primitives/transaction.h:292]: (style) Class 'CPureTransaction' has a constructor with 1 argument that is not explicit.
    
  37. in src/primitives/transaction.h:293 in 775202dea7
    287@@ -283,6 +288,43 @@ class CTransaction
    288     const int32_t nVersion;
    289     const uint32_t nLockTime;
    290 
    291+    /** Convert a CMutableTransaction into a CPureTransaction. */
    292+    CPureTransaction(const CMutableTransaction& tx);
    293+    CPureTransaction(CMutableTransaction&& tx);
    


    practicalswift commented at 7:35 am on September 19, 2018:
    02018-09-18 21:44:48 cppcheck(pr=13098): [src/primitives/transaction.h:293]: (style) Class 'CPureTransaction' has a constructor with 1 argument that is not explicit.
    
  38. in src/bench/checkblock.cpp:31 in 775202dea7
    26+    stream.write(&a, 1); // Prevent compaction
    27+
    28+    while (state.KeepRunning()) {
    29+        CPureBlock block;
    30+        stream >> block;
    31+        assert(stream.Rewind(sizeof(block_bench::block413567)));
    


    practicalswift commented at 7:36 am on September 19, 2018:
    02018-09-18 21:44:48 cppcheck(pr=13098): [src/bench/checkblock.cpp:31]: (warning) Assert statement calls a function which may have desired side effects: 'Rewind'.
    
  39. DrahtBot commented at 10:38 pm on September 20, 2018: member
    • #14337 ([refactoring] Create common interface for CMutableTransaction and CTransaction with static polymorphism by lucash-dev)
    • #14194 (Annotate unused parameters with [[maybe_unused]] by practicalswift)
    • #13864 (validation: Document where we are intentionally ignoring bool return values from validation related functions by practicalswift)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  40. MarcoFalke closed this on Oct 1, 2018

  41. MarcoFalke deleted the branch on Oct 1, 2018
  42. ryanofsky commented at 3:19 pm on October 19, 2018: member
    Curious why this was closed.
  43. MarcoFalke added the label Up for grabs on Oct 20, 2018
  44. MarcoFalke removed the label Up for grabs on Nov 15, 2019
  45. MarcoFalke commented at 4:00 am on November 20, 2019: member
    Rebased (with the p2p changes removed) here: #17529
  46. MarcoFalke locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me