Improvement to the Test Framework in the processing of test blocks

SergioDemianLerner commented at 7:56 pm on August 12, 2014: contributor

For several reasons the current test framework does not allow to easily incorporate new unit tests that append specially crafted blocks to the blockchain using ProcessBlock(). This was pointed out by Mike Hearn on the development list in the thread with subject “[Bitcoin-development] Question on creating test cases for block.CheckBlock()” (http://article.gmane.org/gmane.comp.bitcoin.devel/5939).

After debugging the Bitcoin core, we found that this is because of three reasons:

The miner_tests.cpp leaves the transaction pool in an invalid state and assumes the blockchain is empty on start. So other independent unit tests cannot be run before nor after (without proper cleaning) if they intend to use CreateNewBlock(), which collects transactions from the pool.
Creating test blocks requires proof-of-work. Even if the starting difficulty is low (1) the required proof-of-work is still too high to allow for the dynamic creation of test blocks. Instead, unit test have to be “pre-mined”. This makes the code more opaque and increases the effort of changing the code.
Only the first unit test can start with an empty blockchain, all subsequent tests have to start with the blockchain in the state where the previous test left it. This has the following problematic consequences: a. Each unit test makes the blockchain longer. This breaks the test sequence when a checkpoint is reached because a checkpoint requires a pre-determined hash digest. Moreover, there are exception cases for certain block heights (e.g. regarding allowing two transactions with the same hash) which could be violated. b. Certain combinations of unit tests are inherently impossible to implement in a single block-chain if not run in an specific order. Other combinations may cause unexpected consequences. For example, since version 1 blocks do not have the height field included in the coinbase field of the generation transaction, a unit test may create a coinbase tx with a future height in the height field and prevent a coinbase tx with the same hash to be used afterward when only blocks v2 are accepted (this happened to us while testing). c. The standard unit testing policy that unit tests should be not depend on each other’s output is violated. This makes debugging more difficult. d. “Pre-mining” unit tests is impossible unless all previous unit tests are known and never change. (However, we are proposing to eliminate the proof-of-work check regardless.)

We believe that testing the block acceptance rules is crucial for the safety of the application and so we wrote this patch. By restarting the blockchain before every unit test that requires testing block acceptance we have the guarantee that all tests are independent, executed in a predefined reproducable environment, and don’t unintentionally hit checkpoints or other exceptions. Nevertheless, each unit test decides whether to re-use or reset the block-chain. We haven’t perceived any significant delay while performing the destruction and creation of the block-chain during the execution of the test application. This is because file space allocation functions are fast on modern filesystems. Nevertheless, UNDOFILE_CHUNK_SIZE/BLOCKFILE_CHUNK_SIZE can be reduced during test case execution if the block-chain is reset many times.

In detail, this patch solves 1.,2.,3. from above by:

Providing a method to reset the blockchain to the starting state (testingSetupManager.SetupGenesisBlockChain())
Allowing to dynamically skip the proof-of-work testing (supressCheckBlockWork = true)
Fixing the bug in miner_tests.cpp which leaves in the memory pool invalid transactions (mempool.clear() missing).

We’ve also found the exact procedure that can be used to programmatically destroy and re-create the blockchain correctly, which was not implemented and nor documented. Some cleanup methods existed but some other were added because they were missing.

This could be of great help to re-create completely the blockchain in case a severe damage has been detected, without restarting the application.

As a bonus, 7 unit tests have been added:

ToCheckBlockUpgradeMajority (untested before)
EnforceBlockUpgradeMajority (untested before)
RejectBlockOutdatedMajority (untested before)
“bad-cb-height”
“bad-version”
“time-too-old”
“bad-txns-nonfinal”

Last, we added a way to leave the blockchain unaltered after the test suite is over to debug the unit tests themselves (testingSetupManager.keepTestEvidence = true)

Note: The BerkeleyDB environment field was converted into a heap allocated object because BerkeleyDB handles are not meant to be re-used after close, and the block-chain environment can be closed and re-opened in the unit tests. This is explained in http://docs.oracle.com/cd/E17275_01/html/api_reference/CXX/envclose.html as “After DbEnv::close() has been called, regardless of its return, the Berkeley DB environment handle may not be accessed again.”

Sergio Demian Lerner & Timo Hanke

Improvement to the Test Framework in the processing of test blocks

For several reasons the current test framework does not allow to easily incorporate new unit tests that append specially crafted blocks to the blockchain using ProcessBlock().
This was pointed out by Mike Hearn on the development list in the thread with subject "[Bitcoin-development] Question on creating test cases for block.CheckBlock()" (http://article.gmane.org/gmane.comp.bitcoin.devel/5939).

After debugging the Bitcoin core, we found that this is because of three reasons:

1. The miner_tests.cpp leaves the transaction pool in an invalid state and assumes the blockchain is empty on start.
So other independent unit tests cannot be run before nor after (without proper cleaning) if they intend to use CreateNewBlock(), which collects transactions from the pool.

2. Creating test blocks requires proof-of-work.
Even if the starting difficulty is low (1) the required proof-of-work is still too high to allow for the dynamic creation of test blocks. Instead, unit test have to be "pre-mined".
This makes the code more opaque and increases the effort of changing the code.

3. Only the first unit test can start with an empty blockchain, all subsequent tests have to start with the blockchain in the state where the previous test left it. This has the following problematic consequences:
a. Each unit test makes the blockchain longer. This breaks the test sequence when a checkpoint is reached because a checkpoint requires a pre-determined hash digest. Moreover, there are exception cases for certain block heights (e.g. regarding allowing two transactions with the same hash) which could be violated.
b. Certain combinations of unit tests are inherently impossible to implement in a single block-chain if not run in an specific order.
Other combinations may cause unexpected consequences. For example, since version 1 blocks do not have the height field included in the coinbase field of the generation transaction, a unit test may create a coinbase tx with a future height in the height field and prevent a coinbase tx with the same hash to be used afterward when only blocks v2 are accepted (this happened to us while testing).
c. The standard unit testing policy that unit tests should be not depend on each other's output is violated.
This makes debugging more difficult.
d. "Pre-mining" unit tests is impossible unless all previous unit tests are known and never change. (However, we are proposing to eliminate the proof-of-work check regardless.)

We believe that testing the block acceptance rules is crucial for the safety of the application and so we wrote this patch.
By restarting the blockchain before every unit test that requires testing block acceptance we have the guarantee that all tests are independent, executed in a predefined reproducable environment, and don't unintentionally hit checkpoints or other exceptions. Nevertheless, each unit test decides whether to re-use or
reset the block-chain. We haven't perceived any significant delay while performing the destruction and creation of the block-chain during the execution of the test application. This is because file space allocation functions are fast on modern filesystems. Nevertheless, UNDOFILE_CHUNK_SIZE/BLOCKFILE_CHUNK_SIZE can be reduced during test case execution if the block-chain is reset many times.

In detail, this patch solves 1.,2.,3. from above by:
- Providing a method to reset the blockchain to the starting state (testingSetupManager.SetupGenesisBlockChain())
- Allowing to dynamically skip the proof-of-work testing (supressCheckBlockWork = true)
- Fixing the bug in miner_tests.cpp which leaves in the memory pool invalid transactions (mempool.clear() missing).

We've also found the exact procedure that can be used to programmatically destroy and re-create the blockchain correctly, which was not implemented and nor documented. Some cleanup methods existed but some other were added because they were missing.

This could be of great help to re-create completely the blockchain in case a severe damage has been detected, without restarting the application.

As a bonus, 7 unit tests have been added:
- ToCheckBlockUpgradeMajority (untested before)
- EnforceBlockUpgradeMajority (untested before)
- RejectBlockOutdatedMajority (untested before)
- "bad-cb-height"
- "bad-version"
- "time-too-old"
- "bad-txns-nonfinal"

Last, we added a way to leave the blockchain unaltered after the test suite is over to debug the unit tests themselves (testingSetupManager.keepTestEvidence = true)

Note: The BerkeleyDB environment field was converted into a heap allocated object because BerkeleyDB handles are not meant to be re-used after close, and the block-chain environment can be closed and re-opened in the unit tests. This is explained in http://docs.oracle.com/cd/E17275_01/html/api_reference/CXX/envclose.html as
"After DbEnv::close() has been called, regardless of its return, the Berkeley DB environment handle may not be accessed again."

Sergio Demian Lerner & Timo Hanke

82d00c40df

Bad ident. TAB -> space conversion made 8afc3fe0a5

"delete" missing EnvShutdown() 32bb48e861

theuni commented at 8:31 pm on August 12, 2014: member

You need to include test_bitcoin.h in BITCOIN_TESTS in Makefile.test.include.

gmaxwell commented at 8:57 pm on August 12, 2014: contributor

Creating test blocks requires proof-of-work.

This is what regtest mode is for. (The pull tester currently uses it for dynamically creating blocks to test)

SergioDemianLerner commented at 9:07 pm on August 12, 2014: contributor

@gmaxwell Yes I know, but test_bitcoin is run in MainNet, so to test MainNet specific things (such as the majority rule) I need to create blocks and skip the PoW verification.

SergioDemianLerner commented at 9:08 pm on August 12, 2014: contributor

How can I update this pull request to add test_bitcoin.h into BITCOIN_TESTS in Makefile.test.include ? Do I have to create a new pull request ? In that case, all current comments will be lost…

gmaxwell commented at 9:21 pm on August 12, 2014: contributor

There should be no mainnet behavior which is not testable by regtest, if there is— it should be fixed. The sole reason for regtest existing is exactly this case: regular mainnet blocks are too hard to create on the fly in tests.

SergioDemianLerner commented at 9:28 pm on August 12, 2014: contributor

Regnet class inherits from TestNet, and not from MainNet. But that is is unimportant. One difference is nSubsidyHalvingInterval = 150;

gmaxwell commented at 9:30 pm on August 12, 2014: contributor

Yes, nSubsidyHalvingInterval = 150 is for exactly the same reason: otherwise the subsidy halving interval is not testable from the tests (well, without making 210000 blocks, of course). If some different number would be better for the tests it could be changed to that (obviously the blockchain tester would have to be updated).

SergioDemianLerner commented at 9:34 pm on August 12, 2014: contributor

Even if there is another procedure to create block test cases using another language and another framework (Java, BitcoinJ), I found really useful to be create testcases in the same unit test framework, the same language and the same source tree. It’s much easier.

In fact, in another pull request we’re working on, we’re proposing modifications and we wrote extensive test cases using the internal and more natural framework.

BTW, where is the documentation about how to append testcases run by TheBlueMatt in his BitcoinJ node?

SergioDemianLerner commented at 9:36 pm on August 12, 2014: contributor

You can test it in the MainNet with the current test framework improvement. You create 2017 blocks on-the-fly, you check. Very straightforward.

SergioDemianLerner commented at 9:39 pm on August 12, 2014: contributor

Creating an almost empty block on-the-fly currently takes a less than a few milliseconds. We could benchmark it.

SergioDemianLerner commented at 9:41 pm on August 12, 2014: contributor

Even if there were another thousands ways of testing bitcoind, improving the test framework and letting people add more test cases is a good thing.

gmaxwell commented at 9:44 pm on August 12, 2014: contributor

Internal testing of the blockchain behavior isn’t all that fantastic generally, because most mistakes will agree with themselves. The idea of an external harness is that it can be a reference point for behavior. I fully agree that more tests and more ways of testing is good. What I am not happy with a extra POW bypass dropped into the consensus critical code, especially when the justification is redundant with existing infrastructure.

The subsidy halving interval in mainnet is 210000 blocks, you’re confusing it with the retargeting interval. I suppose you could create 210000 blocks in a test but that would be needlessly slow.

SergioDemianLerner commented at 9:45 pm on August 12, 2014: contributor

@gmaxwell Oops! yes.

SergioDemianLerner commented at 9:52 pm on August 12, 2014: contributor

If an attacker is able to change the value of supressCheckBlockWork at will when there is no compiled code to do it (in bitcoind), he is also probably able to read the process memory and extract the private keys. Nevertheless I understand your concerns.

SergioDemianLerner commented at 10:06 pm on August 12, 2014: contributor

Another difference is that MainNet rejects forwarding non-standard transactions, and RegNet allows it. We can switch to RegNet for test_bitcoin. Then we can omit the POW skip, and we get a better testing framework without any possible theoretical security implication. But I suppose that there are test cases testing non-standard tx rules, so we can’t easily switch. At the end, using supressCheckBlockWork in the MainNet is the closest to the “real” thing one can test dynamically.

gmaxwell commented at 10:11 pm on August 12, 2014: contributor

Again, regtest was created for this purpose. To the extent that regtest has shortcomings we should fix regtest. The initial version of regtest was a set of patches that overlaid on top of bitcoin and changed mainnet rules. When it was merged in as chain params it inhereted testnet but I think there is no reason regtest should be testnet based, IIRC the pull tester does no outside of block transactions, so IsStandard shouldn’t be relevant… and I think it can just be switched.

laanwj added the label Tests on Aug 13, 2014

Indentation corrected (TAB -> space)

test_bitcoin.h added to BITCOIN_TESTS in Makefile.test.include

1197734421

test_bitcoin.h added to BITCOIN_TESTS in Makefile.test.include a7555d6f64

mikehearn commented at 5:00 pm on August 17, 2014: contributor

Great work Sergio, I didn’t review it yet but this has been a big need for a while now.

The pull tester does use various non-standard scripts but mostly because it’s kind of lazy. It could be fixed.

Re: adding tests to it. The current code is kind of gross and Matt talks about rewriting it quite often. I’d hope it’d get better documentation as part of that, if he does it. There is at least a documented build process now which is something. But having tests both in-tree and in the pull tester is not a bad thing.

laanwj commented at 11:13 am on August 18, 2014: member

Why are the wallet-related changes in here? I mean dbenv/wallet/walletdb. As this tests block chain behavior, I’m not sure why those have to change.

laanwj commented at 11:18 am on August 18, 2014: member

Agree with @gmaxwell to prefer using regtest mode for these tests instead of adding a supressCheckBlockWork to main. I wouldn’t want this flag to be ever toggled accidentally on mainnet, and regtest already offers this functionailty anyway.

BitcoinPullTester commented at 6:03 pm on August 18, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4688_a7555d6f6430aa2ddc83404ae9b627d7d0dd1f6f/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

SergioDemianLerner commented at 9:39 pm on August 18, 2014: contributor

@laanwj The wallet code required change in the way a BerkeleyDB handle was accessed: from direct access to the handle to indirect access though a pointer to a handle. This is because one of the objectives of this patch was to be able to destroy and re-create a block-chain programmaticaly. BerkeleyDB handles do not allow to be re-opened after closing them, so was the necessity to delete the handle object and create a new one. The were other solutions: e.g. changing the bitdb global variable from an object to a pointer to an object, which is the same, and requires more updates in other parts of the codebase.

Regarding the regtest mode, if you all agree I will update this pull request to:

Remove supressCheckBlockWork flag
Switch the test framework to regtest.

laanwj commented at 8:38 am on August 19, 2014: member

@SergioDemianLerner Completely agree about programmatically destroying and re-creating block chains, but I’m still not sure how that relates to the wallet DB environment. Does the BDB environment need to be reinitialized every time you create a new block chain? I think this is just extra overhead - especially as you’re not doing anything with the wallet in your tests.

Maybe we could separate out the wallet testing so all that happens only once.

SergioDemianLerner commented at 1:14 pm on August 19, 2014: contributor

@laanwj There are several reasons why the wallet need to be re-created in test_bitcoin when the block-chain is re-created. First, all wallet-transactions will be invalid, since those transactions will be based on coinbases that do not exist anymore. Second, when the block-chain is destroyed in test_bitcoin, all temporary files are removed (the block-chain is stored in a temporary directory whose name depends on the current date-time). When a new block-chain is created, a new temporary directory is also created. Then, the previous BDB wallet files are deleted. The new wallet is stored on a new directory. That’s the main reason.

Nevertheless, we could destroy the block-chain without destroying the wallet, and re-create the block-chain in the same temporary directory. But I don’t know how the Bitcoin core will handle a wallet completely unsynchronized with the block-chain. The code should probably clear the wallet anyway. Does it have a way to clear all the keys? I must check.

The reasons given are only true for testing (test_bitcoin). For production (bitcoind) is probably wise to have methods to destruct/create the block-chain without altering the wallet and vice-versa. But still, to re-create the wallet (because of a corruption error, for example), one needs to destroy and create an new BDB handle, so the same code changes will be needed to repair a wallet programmaticaly and keep the node working.

I will be sending another pull-request with just the new testcases and the change to the regtest without the block-chain re-creation code, while we discuss this.

SergioDemianLerner referenced this in commit 45cccb62a3 on Aug 19, 2014

laanwj commented at 10:58 am on October 22, 2014: member

Needs rebase.

laanwj commented at 11:01 am on January 30, 2015: member

I’m going to replace this by out-of-tree RPC tests. I’ve written weird block generation code in Python, so we can just test these things externally without having unit tests with side effects or a dependent order (or having to change a lot of nontest code).

laanwj closed this on Mar 18, 2015

DrahtBot locked this on Sep 8, 2021

Improvement to the Test Framework in the processing of test blocks #4688