Split BDB database blkindex.dat into multiple databases #1303

pull jgarzik wants to merge 2 commits into bitcoin:master from jgarzik:blockindex changing 6 files +244 −92
  1. jgarzik commented at 6:12 PM on May 14, 2012: contributor
    1. block hash -> CBlockIndex becomes "blkhash.dat", DB_HASH
    2. tx hash -> CTxIndex becomes "txhash.dat", DB_HASH
    3. remainder is renamed to CMetaDB, "blkmeta.dat"

    OBSOLETES: #1282

  2. Diapolo commented at 8:12 PM on May 14, 2012: none

    blkmeta.dat contains settings, options and all that stuff, right?

  3. jgarzik commented at 8:32 PM on May 14, 2012: contributor

    blkmeta.dat mostly contains random singleton datums like hashBestChain or bnBestInvalidWork

  4. jgarzik commented at 10:16 PM on May 14, 2012: contributor

    Updated patch order, dramatically shrinking the diff.

  5. luke-jr commented at 1:34 AM on May 19, 2012: member

    Breaks Bitcoin-Qt:

    src/qt/optionsmodel.cpp: In member function ‘virtual QVariant OptionsModel::data(const QModelIndex&, int) const’:
    src/qt/optionsmodel.cpp:131:29: error: ‘fDetachDB’ was not declared in this scope
    src/qt/optionsmodel.cpp: In member function ‘virtual bool OptionsModel::setData(const QModelIndex&, const QVariant&, int)’:
    src/qt/optionsmodel.cpp:218:13: error: ‘fDetachDB’ was not declared in this scope
    
  6. luke-jr commented at 1:49 AM on May 19, 2012: member

    eca20e7c4c263cb14f1d11e74fc362daca22fd42 is the first bad commit

  7. jgarzik commented at 5:28 AM on May 19, 2012: contributor

    fixed

  8. luke-jr commented at 9:54 PM on May 19, 2012: member

    First run, I am prompted to upgrade my database. debug.log fills with lines like this:

    ProcessBlock: ORPHAN BLOCK, prev=*
    

    Immediately afterward, Bitcoin-Qt crashes:

    EXCEPTION: NSt8ios_base7failureE       
    CDataStream::read() : end of data       
    

    git bisect blames a commit between aa79af8..jgarzik/blockindex

  9. luke-jr commented at 10:09 PM on May 19, 2012: member
    [#1](/bitcoin-bitcoin/1/)  0x0811cbf6 in CDataStream::setstate (this=0xffffb92c, bits=4, psz=0x837c724 "CDataStream::read() : end of data")
        at src/serialize.h:908
    [#2](/bitcoin-bitcoin/2/)  0x0811ccad in CDataStream::read (this=0xffffb92c, pch=0xffffb8cc "", nSize=32) at src/serialize.h:936
    [#3](/bitcoin-bitcoin/3/)  0x08148428 in base_uint<256u>::Unserialize<CDataStream> (this=0xffffb8cc, s=..., nType=2, nVersion=69900)
        at src/uint256.h:374
    [#4](/bitcoin-bitcoin/4/)  0x08139edd in Unserialize<CDataStream, uint256> (is=..., a=..., nType=2, nVersion=69900) at src/serialize.h:353
    [#5](/bitcoin-bitcoin/5/)  0x08153307 in SerReadWrite<CDataStream, uint256> (s=..., obj=..., nType=2, nVersion=69900, ser_action=...)
        at src/serialize.h:685
    [#6](/bitcoin-bitcoin/6/)  0x081af4b5 in CDiskBlockIndex::Unserialize<CDataStream> (this=0xffffb850, s=..., nType=2, nVersion=69900)
        at src/main.h:1211
    [#7](/bitcoin-bitcoin/7/)  0x081ad31f in Unserialize<CDataStream, CDiskBlockIndex> (is=..., a=..., nType=2, nVersion=69900)
        at src/serialize.h:353
    [#8](/bitcoin-bitcoin/8/)  0x081abdd8 in CDataStream::operator>><CDiskBlockIndex> (this=0xffffb92c, obj=...) at src/serialize.h:1005
    [#9](/bitcoin-bitcoin/9/)  0x081a80e2 in CBlockIdxDB::LoadBlockIndex (this=0xffffbbec) at src/db.cpp:730
    [#10](/bitcoin-bitcoin/10/) 0x08110a64 in LoadBlockIndex (fAllowNew=true) at src/main.cpp:2095
    [#11](/bitcoin-bitcoin/11/) 0x0816d05d in AppInit2 () at src/init.cpp:437
    [#12](/bitcoin-bitcoin/12/) 0x08087bd1 in main (argc=1, argv=0xffffc6d4) at src/qt/bitcoin.cpp:301
    
  10. luke-jr commented at 5:25 AM on May 20, 2012: member

    Fixed in latest commits.

  11. jgarzik commented at 5:39 PM on May 20, 2012: contributor

    Commit status notes:

    Note 1) the base read/write logic works. you can run a peer node stably with this. shut it down, restart, etc.

    Note 2) there is a strange behavior, where a single record appears during a read-all-records query of a hash database (LoadBlockIndex), which causes CDataStream to throw an error during deserialize. the error output, created by commit 183e670, is

     CBlockIdxDB::LoadBlockIndex() : de-ser error, ignoring record
    

    Note 3) As discovered while investigating Note 2 (previous item), upstream bitcoin should wrap all CDataStream decoding within a try{} block. There are several places where we do not do this, leaving current bitcoin vulnerable to crash if there is any de-serialization error (data corruption / truncated record on disk).

    Note 4) On this first pass, the Old Way appears to use significantly less disk space than the New Way: http://pastebin.com/MXRc3WZe

    It is likely that this can be improved by changing the hash fill factor. This is the untuned, out-of-the-box switch from DB_BTREE to DB_HASH. Disappointing initial results.

    Note 5) Using the in-place upgrade code, which calls LoadExternalBlockFile(), or using -loadblock=FILE results in the import getting stuck reproducibly at height 173928: http://pastebin.com/xKqr8rfC

    If one downloads from the network, this problem does not occur. The imported file was downloaded from http://eu1.bitcoincharts.com/blockchain/ on May 10.

  12. sipa commented at 6:30 PM on May 20, 2012: member

    For reference: the problem with the stuck chain is probably due to incorrectly isolating blockchain modifications in transactions. This would only cause problems when a block fails to connect to the main chain. When downloading from the network, one does not receive older stales, but LoadExternalBlockFile does import these. The imported file seems to contain a block with an invalid BIP16 transaction at 173929, which indeed would cause an error when connecting the block to the chain.

  13. jgarzik commented at 8:08 PM on May 20, 2012: contributor

    Issue 5 fixed. Thanks to @sipa for noticing the probable bug.

  14. jgarzik commented at 8:54 PM on May 20, 2012: contributor

    db_stat output, after loading 180,953 blocks: http://pastebin.com/LEw3PQbL

  15. jgarzik commented at 10:39 PM on May 20, 2012: contributor

    db_stat output, after loading 180,960 blocks the old way (blkindex.dat): http://pastebin.com/9DhQDCFc

  16. jgarzik commented at 6:05 AM on May 23, 2012: contributor

    Merged several commits into one big one, and then refactored a bit into three pieces:

    1. Upstream-ready refactoring.
    2. Split CTxDB into three pieces, blkhash.dat, txhash.dat and blkmeta.dat. Databases remain DB_BTREE at this point.
    3. Switch txhash.dat and blkhash.dat to DB_HASH.

    This arrangement permits easier testing of the database split itself (still btree) versus hash.

  17. jgarzik commented at 6:31 AM on May 23, 2012: contributor

    db_stat output, with 3way split using DB_BTREE: http://pastebin.com/v9nZ6SCb

  18. Split CTxDB into three databases: CBlockIdxDB, CMetaDB and CTxDB
    blkindex.dat is overloaded to store three different datasets within a
    single key/value database:
    
        1. uint256 hash -> CBlockIndex
        2. uint256 hash -> CTxIndex
        3. miscellaneous metadata associated with the block chain
    
    Split into three files, blkhash.dat, txhash.dat and blkmeta.dat, stored in
    the blockchain/ subdirectory.  blk????.dat storage is also moved into
    the new blockchain/ subdirectory.
    390274abda
  19. Switch CTxDB and CBlockIdxDB from btree to hash within BDB. 13f0a88a17
  20. sipa commented at 3:57 PM on June 14, 2012: member

    I think it would make sense to keep a tree-based index for the block header database, to allow looking up a block prefix quickly. In #1426 the idea rose to use the lower bytes as identifier instead of the higher ones. Maybe it makes sense to do a byteswap on the block header keys, to allow a lookup on those?

  21. jgarzik commented at 3:59 PM on June 27, 2012: contributor

    Closing. Inclusion of this split is conditional on making other major changes to the database structure, such as TD's LevelDB changes. If/when those are merged, this can be updated and reopened.

    The net effect of this pull request was a 10% space savings and CPU util savings.

  22. jgarzik closed this on Jun 27, 2012

  23. jgarzik deleted the branch on Aug 24, 2014
  24. suprnurd referenced this in commit 944420deb0 on Dec 5, 2017
  25. lateminer referenced this in commit 96fd040522 on Jan 22, 2019
  26. lateminer referenced this in commit bcb04a44a1 on May 6, 2020
  27. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-20 00:16 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me