Ultraprune: use a pruned-txout-set database for block validation #1677

pull sipa wants to merge 27 commits into bitcoin:master from sipa:ultraprune changing 170 files +29299 −1392
  1. sipa commented at 6:26 pm on August 16, 2012: member

    This is a rewrite of the block storage and validation engine.

    Instead of blkindex.dat (a database with block tree data, and all transactions and their spendings in the active chain), it uses chain.dat (only block tree data) and coins.dat (pruned txout set). These two databases together are significantly smaller than blkindex.dat (<200 MiB), and only coins.dat is actively needed during block validation, speeding it up significantly (15 minutes for importing 185000 blocks from a local disk file).

    Blocks are still stored in blk????.dat files, in the same file format, but smaller files (up to 128 MiB). To prevent excessive fragmentation, they are allocated in chunks of 16 MiB, and some statistics are kept about them. To assist with reorganisation, undo files are created (rev????.dat), which contain the data necessary to undo block connections.

    Block pruning itself is not yet implemented, but this makes it trivial to do so; all that is required is deleting old block and undo files when certain thresholds are reached. Also note that this block pruning mechanism is different from the transaction pruning mechanism described by Satoshi. This one does not prevent a node from acting as a full node.

    All commits result in a functional code tree, with succeeding unit tests. The first few add some extra classes, without changing actual semantics. “One file per block” and “Multiple blocks per file” form a refactor of the block storage mechanism, with related database changes. “Do not store hashNext on disk” only introduces a forward-incompatible change that simplifies the database layout. “Ultraprune” itself contains the switch from txindex.dat to coins.dat as validation data, and contains the majority of the changes. What follows are optimizations and other some improvements, that do not effect compatibility.

    There are a few TODO’s left (see comment below), but I’d like to give the code some exposure already.

  2. sipa commented at 6:29 pm on August 16, 2012: member

    (EDITED)

    List of implementation changes:

    • new database layout:
      • 2 leveldb’s (coins/ and blktree/ subdirs), replacing blkindex.dat
      • separate directory (blocks/) with block data (in the usual format, but smaller files) and undo data
    • database keys are of the form (char,key) instead of (string,key) for reasons of compactness
    • there is no txid-to-diskpos index anymore, only blkid-to-diskpos and txid-to-unspent-outputs
      • this makes getrawtransaction only work on unspent outputs (and slower)
        • an optional txid-to-diskpos index is planned
    • some new very specialized serializers are added (compact variable-length integer, compact amount, compact txout)
    • block index does not store hashBlockNext anymore - this is reconstructed from hashBestBlock at startup
    • at startup, automatically reorg to the best block in blktree/blocks
    • new RPCs: gettxoutsetinfo and gettxout operate on the coins database
    • no more CTxIndex in-scope: instead, a global pcoinsTip (representing the coin db) and pblocktree (representing the blktree db)
      • intended to be moved to separate modules/classes, properly encapsulated
    • blktree database contains statistics about the block file (size, which blocks in it, times, heights, undo stats, …)
    • blktree database contains flag per block that determines the degree of validation it had, to allow future headers-first mode
    • block files are pre-allocated (in batches of 16 MiB, the files grow to max 128 MIB), to reduce fragmentation
    • transaction hashes are cached, and typically never calculated more than once

    Included in the pullreq, but technically separate:

    • do -loadblock= and bootstrap.dat import in a separate thread
    • add check for strict DER encoding for signatures, and standard public keys
  3. Diapolo commented at 12:18 pm on August 21, 2012: none
    @sipa One question, our current AppendBlockFile() function takes MAX_SIZE into account and generates a new block-file if the space left in the block file (max allowed filesize) is < MAX_SIZE. So 128 MiB files would have a maximum of 96 MiB usage-data, right?
  4. sipa commented at 12:24 pm on August 21, 2012: member
    @Diapolo: not sure what you mean; I don’t use AppendBlockFile anymore.
  5. Diapolo commented at 12:27 pm on August 21, 2012: none
    @sipa I saw that and wanted to understand the change here, which condition is used to determine, if a new block-file needs to be created, where is the check in your new code for that and what’s the space limit?
  6. sipa commented at 12:29 pm on August 21, 2012: member
    The check is in FindBlockPos in main.cpp. And a new file is created if (old_used_size + new_block_size >= MAX_BLOCKFILE_SIZE).
  7. in src/main.cpp: in f51448802f outdated
    1892-    unsigned int nBlockPos = 0;
    1893-    if (!WriteToDisk(nFile, nBlockPos))
    1894+    CDiskBlockPos blockPos;
    1895+    {
    1896+        CChainDB chaindb;
    1897+        if (!FindBlockPos(chaindb, blockPos, nBlockSize+8, nHeight, nTime))
    


    Diapolo commented at 1:13 pm on August 21, 2012:
    Why nBlockSize+8, is that a padding?

    sipa commented at 1:27 pm on August 21, 2012:
    4 bytes magic, 4 bytes block length; that’s just the file format of blk*.dat.

    Diapolo commented at 1:34 pm on August 21, 2012:
    I’m lacking some background information here, sorry :). Is the format defined / described somewhere?

    sipa commented at 1:39 pm on August 21, 2012:
    No idea, but I wanted to retain compatibility between pre and post-ultraprune block files, so I used the same format. That is: the files are a concatenation of {4 bytes magic, 4 bytes LE integer with the actual block size, block data itself).

    Diapolo commented at 1:43 pm on August 21, 2012:

    I found this one and it explains what I was missing here: https://bitcointalk.org/index.php?topic=101514.0 thanks for your further explanation, too.

    Why keep things compatible here, perhaps it’s the right time to even optimize the internals of the block-files (e.g. compression or such a thing)?

  8. luke-jr commented at 4:03 am on August 24, 2012: member
    Does this break the ability to downgrade at all? (I expect it just means wasted “padding” space in the blk*.dat files?)
  9. sipa commented at 0:23 am on August 27, 2012: member
    Updated. Batch block connection now keeps a permanent cache, and modifies that (instead of delaying block connection until several blocks were available, which interfered with normal network-based downloading). Also added a commit that changes the block database format, in preparation of things like parallel signature checking and initial headers-only mode.
  10. Diapolo commented at 5:47 am on August 27, 2012: none
    @sipa With block database format you mean stored blocks in blk0000x.dat?
  11. sipa commented at 10:31 am on August 27, 2012: member
    @luke-jr how do you mean breaking the ability to downgrade? The blk000*.dat files remain exactly the same format, but the other databases are incompatible. @Diapolo No, it uses coins.dat (the unspent txout set) and chain.dat (the block index), in addition to the blk_.dat (and rev_.dat) files. It’s the format of chain.dat that changed in the last commit.
  12. luke-jr commented at 4:47 pm on August 27, 2012: member
    @sipa If it interacts with downgrades in ugly ways, I’d probably not want to put it into next-test.
  13. sipa commented at 5:07 pm on August 27, 2012: member

    @luke-jr Shouldn’t be a problem - the filenames are all different, so you can (almost) run ultraprune and non-ultraprune together in the same datadir independently.

    That said, it’s likely to conflict with a lot of other stuff, so decide for yourself.

  14. mikehearn commented at 12:34 pm on August 30, 2012: contributor
    Could you provide a squashed version of the patch somewhere, for review? It’s really hard to review as is because it’s just a record of how you implemented it over time.
  15. sipa commented at 12:48 pm on August 30, 2012: member
    @mikehearn #1677.diff ?
  16. mikehearn commented at 10:32 am on August 31, 2012: contributor
    Thanks, that looks useful.
  17. sipa commented at 11:29 am on August 31, 2012: member

    @mikehearn Seems that through rebasing I lost some comments you made earlier on the commits?

    Regarding the encodings, I plan to write some text about the final format for all datastructures, but I may change a few things still.

  18. sipa commented at 11:43 pm on September 4, 2012: member
    Rebased/combined with @mikehearn’s LevelDB patch
  19. sipa commented at 2:40 pm on September 20, 2012: member

    Rebased on 0.7, and moved the more experimental block caching and parallel signature checking to a separate branch. The code in here should be stable and can be tested.

    The only things that remain to be done are automatic import of old data, and more elaborate consistency checks at startup. I think those can be done in separate pull requests though.

    This branch has its own LevelDB glue, independent (though similar, but simpler) from the one in Mike’s leveldb branch. As the coin and block indexes are only opened once, there was no need for a CDB-like wrapper and global CDBEnv to cache database accessors. If LevelDB is merged first, I’ll add reverts for most of it here.

  20. mikehearn commented at 4:05 pm on September 20, 2012: contributor

    I closed the LevelDB pull req. Let’s merge it as part of this.

    Note that my LevelDB branch has code that does replay the blocks with some GUI progress. It’s not great because it actually re-writes the block files in order to track the block offsets … I didn’t do any deep refactorings to fix that as I wanted it to be as easy/fast to merge as possible and it’s a one-off migration anyway. But as it’s now a part of ultraprune that bridge was crossed, so you could just re-use whatever GUI code is possible.

  21. sipa commented at 12:07 pm on September 21, 2012: member
    @TheBlueMatt any way to disable the build tester here, as it seems to be incompatible with this anyway?
  22. laanwj commented at 1:01 pm on September 21, 2012: member

    I’ve tested this a bit on the testnet. No problems found, and synchronization is super-fast.

    One small comment: in your bitcoin-qt.pro, please use $(MAKE) instead of make. This prevents an annoying warning about a job server in Qt Creator.

  23. sipa commented at 1:12 pm on September 21, 2012: member
    @laanwj: updated to use $(MAKE)
  24. TheBlueMatt commented at 8:46 pm on September 22, 2012: member
    @sipa Id rather not, the patch is really quite simple (http://jenkins.bluematt.me/pull-tester/files/bitcoind-comparison.patch) , afaict, its only failing because setBlockIndexValid was added directly above hashGenesisBlock in main.cpp. Can you just move that line and see if it works?
  25. sipa commented at 11:48 am on September 25, 2012: member
    Changed the database/serialization format one more time: coins and undo data now contains the transaction version number. This may be necessary when new versions of transaction are defined that have an influence on their ability to be spent. @TheBlueMatt ok, moved the setBlockIndexValid line in main.cpp.
  26. mikehearn commented at 3:00 pm on September 27, 2012: contributor
    This does not build on MacOS X because there is no fdatasync on that platform.
  27. sipa commented at 2:56 pm on September 28, 2012: member

    @TheBlueMatt I wonder why it still complains?

    EDIT: Oh, just out of date with master. Let’s wait for the next cycle.

  28. mikehearn commented at 10:19 am on September 29, 2012: contributor

    I just tried to start my client based on this branch and got:

    Loading block index… Opening LevelDB in /Users/hearn/Library/Application Support/Bitcoin/blktree Opened LevelDB successfully Opening LevelDB in /Users/hearn/Library/Application Support/Bitcoin/coins Opened LevelDB successfully LoadBlockIndex(): last block file = 23 LoadBlockIndex(): last block file: CBlockFileInfo(blocks=1572, size=132444896, heights=199237..200807, time=2012-09-17..2012-09-27) LoadBlockIndex(): hashBestChain=00000000000000e78688 height=200806 date=09/27/2012 21:08:42 Verifying last 2500 blocks at level 1 block index 36135ms Loading wallet… dbenv.open LogDir=/Users/hearn/Library/Application Support/Bitcoin/database ErrorFile=/Users/hearn/Library/Application Support/Bitcoin/db.log nFileVersion = 70099 wallet 1192ms REORGANIZE: Disconnect 1 blocks; 000000000000051dcdc2..00000000000000e78688 REORGANIZE: Connect 2 blocks; 000000000000051dcdc2..00000000000003d0a2b1


    EXCEPTION: NSt8ios_base7failureE
    CAutoFile::read : end of file
    bitcoin in Runaway exception

  29. mikehearn commented at 11:19 am on September 29, 2012: contributor

    On investigation this failure can happen with both ultralevelprune and old bdb code, it happens when the block is not written but the db updates are. Typically if power is yanked at just the wrong time.

    As it is not a new failure mode, I guess it should not delay review/merge of this code.

  30. in bitcoin-qt.pro: in 41f98b3c50 outdated
    89@@ -90,6 +90,33 @@ contains(BITCOIN_NEED_QT_PLUGINS, 1) {
    90     QTPLUGIN += qcncodecs qjpcodecs qtwcodecs qkrcodecs qtaccessiblewidgets
    91 }
    92 
    93+contains(USE_LEVELDB, -) {
    


    Diapolo commented at 8:34 pm on October 11, 2012:
    So this still includes legacy BDB support? Means we need to keep 2 code-bases up to date. What was the intention to keep it to be able to revert, just wanna know :).

    sipa commented at 9:00 pm on October 11, 2012:
    Yes, though the BDB version most likely doesn’t compile anymore. This was converted from Mike’s code which tried to keep compatibility, but that’s just an unneccessary burden.

    Diapolo commented at 5:36 am on October 12, 2012:
    Thanks, so it would be nice to remove that burden entirely from this pull and the code. If this is a one way ticket there is no need to keep BDB compatibility code in.

    mikehearn commented at 8:41 am on October 12, 2012:

    The original idea was to reduce the risk of merging the code, in case there were issues with LevelDB [on some specific platform] we don’t want to hold up the release or do a potentially messy revert.

    I agree it’s irritating and a burden, but it’d suck if all of ultraprune ended up getting reverted due to unanticipated issues with LevelDB. Once 0.8 has been successfully rolled out to the userbase and things are quiet it could be deleted at that time?


    Diapolo commented at 9:16 am on October 12, 2012:
    I’m fine with removing that later as long as you / sipa keep track of that. That whole block of commands in the pro-file looks like Vodoo to me anyway :-D.
  31. Diapolo commented at 8:37 pm on October 11, 2012: none
    Did anyone build this directly on Windows with MinGW? I saw there was a cross-compile Windows flag in the pro file. Perhaps I should just fetch that branch and try in the next days.
  32. sipa commented at 8:59 pm on October 11, 2012: member
    @Diapolo ýes, I’ve done windows builds; I even had to backport the LevelDB env for windows from c++0x to c++, as gitian only has a gcc 4.2 mingw compiler which doesn’t support c++0x.
  33. gavinandresen commented at 4:07 pm on October 16, 2012: contributor

    Errors compiling on my Mac:

    makefile.osx has 4 spaces instead of a tab: @echo "Building LevelDB ..."; cd leveldb-1.5.0; make; cd ..

    And: txdb-bdb.cpp: In member function ‘bool CCoinsDB::HaveCoins(uint256)’: txdb-bdb.cpp:10: error: ‘make_pair’ was not declared in this scope

  34. sipa commented at 6:51 pm on October 16, 2012: member
    @gavinandresen Why does it build the BDB version? Did you explicitly disable USE_LEVELDB, or is there a problem with the makefile that causes this?
  35. sipa commented at 2:20 pm on October 20, 2012: member

    Some additional changes: reorganized the commits a bit, fixed a bug that caused unit tests to fail, removed some dead code, and added a lot of comments (mostly in main.h).

    Also see the list of implementation changes in the first comment here.

  36. gavinandresen commented at 2:32 pm on October 20, 2012: contributor

    Design ACK; I reviewed all of the changes to main.h and about half of main.cpp, and had just a handful of tiny nits that aren’t worth picking.

    I feel comfortable pulling this as long as @sipa can commit to working through the remaining TODOs and help fix any bugs that crop up over the next month or so.

  37. sipa commented at 2:40 pm on October 20, 2012: member
    By the way: this pull request is rebased on top of ’threadimport’ (#1880) and ‘canonical’ (#1742). I suppose that means those require ACKs first. If there is a problem, I’ll remove them from under this pull.
  38. Import LevelDB 1.5, it will be used for the transaction database. 5e650d6d2d
  39. Leveldb Windows port by Edouard Alligand, adapted for MingW by me. 94a50fb339
  40. Disable libsnappy detection in LevelDB 9d503a7285
  41. Backport Win32 LevelDB env from C++0x to C++
    Since the gitian mingw compiler doesn't support C++0x yet.
    9f56678fce
  42. Makefile integration of LevelDB 3ff3a2bd60
  43. LevelDB glue
    Database-independent glue for supporting LevelDB databases.
    
    Based on code from earlier commits by Mike Hearn in his leveldb
    branch.
    43b7905e98
  44. Compact serialization for variable-length integers
    Variable-length integers: bytes are a MSB base-128 encoding of the number.
    The high bit in each byte signifies whether another digit follows. To make
    the encoding is one-to-one, one is subtracted from all but the last digit.
    Thus, the byte sequence a[] with length len, where all but the last byte
    has bit 128 set, encodes the number:
    
      (a[len-1] & 0x7F) + sum(i=1..len-1, 128^i*((a[len-i-1] & 0x7F)+1))
    
    Properties:
    * Very small (0-127: 1 byte, 128-16511: 2 bytes, 16512-2113663: 3 bytes)
    * Every integer has exactly one encoding
    * Encoding does not depend on size of original integer type
    4d6144f97f
  45. Compact serialization for scripts
    Special serializers for script which detect common cases and encode
    them much more efficiently. 3 special cases are defined:
    * Pay to pubkey hash (encoded as 21 bytes)
    * Pay to script hash (encoded as 21 bytes)
    * Pay to pubkey starting with 0x02, 0x03 or 0x04 (encoded as 33 bytes)
    
    Other scripts up to 121 bytes require 1 byte + script length. Above
    that, scripts up to 16505 bytes require 2 bytes + script length.
    69fc8047a9
  46. Compact serialization for amounts
    Special serializer/deserializer for amount values. It is optimized for
    values which have few non-zero digits in decimal representation. Most
    amounts currently in the txout set take only 1 or 2 bytes to
    represent.
    0fa593d0fb
  47. Add CCoins: pruned list of transaction outputs
    The CCoins class represents a pruned set of transaction outputs from
    a given transaction. It only retains information about its height in
    the block chain, whether it was a coinbase transaction, and its
    unspent outputs (script + amount).
    
    It has a custom serializer that has very low redundancy.
    10fd8604d7
  48. Add CTxUndo: transaction undo information
    The CTxUndo class encapsulates data necessary to undo the effects of
    a transaction on the txout set, namely the previous outputs consumed
    by it (script + amount), and potentially transaction meta-data when
    it is spent entirely.
    44ac1c0fe3
  49. One file per block
    Refactor of the block storage code, which now stores one file per block.
    This will allow easier pruning, as blocks can be removed individually.
    630fd8dcb6
  50. Preliminary undo file creation
    Create files (one per block) with undo information for the transactions
    in it.
    8adf48dc9b
  51. Multiple blocks per file
    Change the block storage layer again, this time with multiple files
    per block, but tracked by txindex.dat database entries. The file
    format is exactly the same as the earlier blk00001.dat, but with
    smaller files (128 MiB for now).
    
    The database entries track how many bytes each block file already
    uses, how many blocks are in it, which range of heights is present
    and which range of dates.
    5382bcf8cd
  52. Pre-allocate block and undo files in chunks
    Introduce a AllocateFileRange() function in util, which wipes or
    at least allocates a given range of a file. It can be overriden
    by more efficient OS-dependent versions if necessary.
    
    Block and undo files are now allocated in chunks of 16 and 1 MiB,
    respectively.
    bba89aa82a
  53. Ultraprune
    This switches bitcoin's transaction/block verification logic to use a
    "coin database", which contains all unredeemed transaction output scripts,
    amounts and heights.
    
    The name ultraprune comes from the fact that instead of a full transaction
    index, we only (need to) keep an index with unspent outputs. For now, the
    blocks themselves are kept as usual, although they are only necessary for
    serving, rescanning and reorganizing.
    
    The basic datastructures are CCoins (representing the coins of a single
    transaction), and CCoinsView (representing a state of the coins database).
    There are several implementations for CCoinsView. A dummy, one backed by
    the coins database (coins.dat), one backed by the memory pool, and one
    that adds a cache on top of it. FetchInputs, ConnectInputs, ConnectBlock,
    DisconnectBlock, ... now operate on a generic CCoinsView.
    
    The block switching logic now builds a single cached CCoinsView with
    changes to be committed to the database before any changes are made.
    This means no uncommitted changes are ever read from the database, and
    should ease the transition to another database layer which does not
    support transactions (but does support atomic writes), like LevelDB.
    
    For the getrawtransaction() RPC call, access to a txid-to-disk index
    would be preferable. As this index is not necessary or even useful
    for any other part of the implementation, it is not provided. Instead,
    getrawtransaction() uses the coin database to find the block height,
    and then scans that block to find the requested transaction. This is
    slow, but should suffice for debug purposes.
    450cbb0944
  54. Batch block connection during IBD
    During the initial block download (or -loadblock), delay connection
    of new blocks a bit, and perform them in a single action. This reduces
    the load on the database engine, as subsequent blocks often update an
    earlier block's transaction already.
    ae8bfd12da
  55. Transaction hash caching
    Use CBlock's vMerkleTree to cache transaction hashes, and pass them
    along as argument in more function calls. During initial block download,
    this results in every transaction's hash to be only computed once.
    64dd46fd05
  56. Direct CCoins references
    To prevent excessive copying of CCoins in and out of the CCoinsView
    implementations, introduce a GetCoins() function in CCoinsViewCache
    with returns a direct reference. The block validation and connection
    logic is updated to require caching CCoinsViews, and exploits the
    GetCoins() function heavily.
    13c51f20f6
  57. Automatically reorganize at startup to best known block
    Given that the block tree database (chain.dat) and the active chain
    database (coins.dat) are entirely separate now, it becomes legal to
    swap one with another instance without affecting the other.
    
    This commit introduces a check in the startup code that detects the
    presence of a better chain in chain.dat that has not been activated
    yet, and does so efficiently (in batch, while reusing the blk???.dat
    files).
    4fea06db25
  58. Prepare database format for multi-stage block processing
    This commit adds a status field and a transaction counter to the block
    indexes.
    857c61df0b
  59. Use singleton block tree database instance d979e6e36a
  60. Flush and sync block data 44d40f26dc
  61. LevelDB block and coin databases
    Split off CBlockTreeDB and CCoinsViewDB into txdb-*.{cpp,h} files,
    implemented by either LevelDB or BDB.
    
    Based on code from earlier commits by Mike Hearn in his leveldb
    branch.
    2d8a48292b
  62. Add LevelDB MemEnv support
    Support LevelDB memory-backed environments, and use them in unit tests.
    e1bfbab802
  63. Add gettxout and gettxoutsetinfo RPCs beeb57610c
  64. Remove BDB block database support 4ca60bba5c
  65. gmaxwell commented at 9:41 pm on October 20, 2012: contributor
    ACK. This appears ready for integration.
  66. sipa referenced this in commit cf9b49fa50 on Oct 20, 2012
  67. sipa merged this on Oct 20, 2012
  68. sipa closed this on Oct 20, 2012

  69. laudney referenced this in commit f7eb23fc8b on Mar 19, 2014
  70. luke-jr referenced this in commit d3ef9b00ec on Mar 24, 2014
  71. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-06 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me