- block hash -> CBlockIndex becomes "blkhash.dat", DB_HASH
- tx hash -> CTxIndex becomes "txhash.dat", DB_HASH
- remainder is renamed to CMetaDB, "blkmeta.dat"
OBSOLETES: #1282
blkmeta.dat contains settings, options and all that stuff, right?
blkmeta.dat mostly contains random singleton datums like hashBestChain or bnBestInvalidWork
Updated patch order, dramatically shrinking the diff.
Breaks Bitcoin-Qt:
src/qt/optionsmodel.cpp: In member function ‘virtual QVariant OptionsModel::data(const QModelIndex&, int) const’:
src/qt/optionsmodel.cpp:131:29: error: ‘fDetachDB’ was not declared in this scope
src/qt/optionsmodel.cpp: In member function ‘virtual bool OptionsModel::setData(const QModelIndex&, const QVariant&, int)’:
src/qt/optionsmodel.cpp:218:13: error: ‘fDetachDB’ was not declared in this scope
eca20e7c4c263cb14f1d11e74fc362daca22fd42 is the first bad commit
fixed
First run, I am prompted to upgrade my database. debug.log fills with lines like this:
ProcessBlock: ORPHAN BLOCK, prev=*
Immediately afterward, Bitcoin-Qt crashes:
EXCEPTION: NSt8ios_base7failureE
CDataStream::read() : end of data
git bisect blames a commit between aa79af8..jgarzik/blockindex
[#1](/bitcoin-bitcoin/1/) 0x0811cbf6 in CDataStream::setstate (this=0xffffb92c, bits=4, psz=0x837c724 "CDataStream::read() : end of data")
at src/serialize.h:908
[#2](/bitcoin-bitcoin/2/) 0x0811ccad in CDataStream::read (this=0xffffb92c, pch=0xffffb8cc "", nSize=32) at src/serialize.h:936
[#3](/bitcoin-bitcoin/3/) 0x08148428 in base_uint<256u>::Unserialize<CDataStream> (this=0xffffb8cc, s=..., nType=2, nVersion=69900)
at src/uint256.h:374
[#4](/bitcoin-bitcoin/4/) 0x08139edd in Unserialize<CDataStream, uint256> (is=..., a=..., nType=2, nVersion=69900) at src/serialize.h:353
[#5](/bitcoin-bitcoin/5/) 0x08153307 in SerReadWrite<CDataStream, uint256> (s=..., obj=..., nType=2, nVersion=69900, ser_action=...)
at src/serialize.h:685
[#6](/bitcoin-bitcoin/6/) 0x081af4b5 in CDiskBlockIndex::Unserialize<CDataStream> (this=0xffffb850, s=..., nType=2, nVersion=69900)
at src/main.h:1211
[#7](/bitcoin-bitcoin/7/) 0x081ad31f in Unserialize<CDataStream, CDiskBlockIndex> (is=..., a=..., nType=2, nVersion=69900)
at src/serialize.h:353
[#8](/bitcoin-bitcoin/8/) 0x081abdd8 in CDataStream::operator>><CDiskBlockIndex> (this=0xffffb92c, obj=...) at src/serialize.h:1005
[#9](/bitcoin-bitcoin/9/) 0x081a80e2 in CBlockIdxDB::LoadBlockIndex (this=0xffffbbec) at src/db.cpp:730
[#10](/bitcoin-bitcoin/10/) 0x08110a64 in LoadBlockIndex (fAllowNew=true) at src/main.cpp:2095
[#11](/bitcoin-bitcoin/11/) 0x0816d05d in AppInit2 () at src/init.cpp:437
[#12](/bitcoin-bitcoin/12/) 0x08087bd1 in main (argc=1, argv=0xffffc6d4) at src/qt/bitcoin.cpp:301
Fixed in latest commits.
Commit status notes:
Note 1) the base read/write logic works. you can run a peer node stably with this. shut it down, restart, etc.
Note 2) there is a strange behavior, where a single record appears during a read-all-records query of a hash database (LoadBlockIndex), which causes CDataStream to throw an error during deserialize. the error output, created by commit 183e670, is
CBlockIdxDB::LoadBlockIndex() : de-ser error, ignoring record
Note 3) As discovered while investigating Note 2 (previous item), upstream bitcoin should wrap all CDataStream decoding within a try{} block. There are several places where we do not do this, leaving current bitcoin vulnerable to crash if there is any de-serialization error (data corruption / truncated record on disk).
Note 4) On this first pass, the Old Way appears to use significantly less disk space than the New Way: http://pastebin.com/MXRc3WZe
It is likely that this can be improved by changing the hash fill factor. This is the untuned, out-of-the-box switch from DB_BTREE to DB_HASH. Disappointing initial results.
Note 5) Using the in-place upgrade code, which calls LoadExternalBlockFile(), or using -loadblock=FILE results in the import getting stuck reproducibly at height 173928: http://pastebin.com/xKqr8rfC
If one downloads from the network, this problem does not occur. The imported file was downloaded from http://eu1.bitcoincharts.com/blockchain/ on May 10.
For reference: the problem with the stuck chain is probably due to incorrectly isolating blockchain modifications in transactions. This would only cause problems when a block fails to connect to the main chain. When downloading from the network, one does not receive older stales, but LoadExternalBlockFile does import these. The imported file seems to contain a block with an invalid BIP16 transaction at 173929, which indeed would cause an error when connecting the block to the chain.
db_stat output, after loading 180,953 blocks: http://pastebin.com/LEw3PQbL
db_stat output, after loading 180,960 blocks the old way (blkindex.dat): http://pastebin.com/9DhQDCFc
Merged several commits into one big one, and then refactored a bit into three pieces:
This arrangement permits easier testing of the database split itself (still btree) versus hash.
db_stat output, with 3way split using DB_BTREE: http://pastebin.com/v9nZ6SCb
blkindex.dat is overloaded to store three different datasets within a
single key/value database:
1. uint256 hash -> CBlockIndex
2. uint256 hash -> CTxIndex
3. miscellaneous metadata associated with the block chain
Split into three files, blkhash.dat, txhash.dat and blkmeta.dat, stored in
the blockchain/ subdirectory. blk????.dat storage is also moved into
the new blockchain/ subdirectory.I think it would make sense to keep a tree-based index for the block header database, to allow looking up a block prefix quickly. In #1426 the idea rose to use the lower bytes as identifier instead of the higher ones. Maybe it makes sense to do a byteswap on the block header keys, to allow a lookup on those?
Closing. Inclusion of this split is conditional on making other major changes to the database structure, such as TD's LevelDB changes. If/when those are merged, this can be updated and reopened.
The net effect of this pull request was a 10% space savings and CPU util savings.