Motivated by the discussion in #4545, I’ve been working on building a python testing framework that will allow us to write tests that exercise the p2p code in bitcoind, so that we can perform end-to-end testing and perform comparisons of behavior across bitcoind versions. This code builds on the existing RPC testing functionality, so that tests using both RPC calls and p2p messages can be written.
I’ve split this pull into 3 commits to try to make this easier to review.
- The first commit adds a file called mininode.py, which I grabbed from jgarzik’s mini-node branch of the pynode repository, and modified. This file does a few things:
- Defines a class, NodeConn, which manages connectivity to a bitcoin node.
- Defines a callback prototype, NodeConnCB, which is used for message delivery. The idea is that you can write a class that inherits from NodeConnCB and pass it in to a NodeConn object, and then be notified when events of interest arrive.
- Defines all the data structures from bitcoin core that pass over the network, eg CBlock, CTransaction, etc.
- Defines all the serialization/deserialization code.
It’s possible to write (crude) tests using mininode.py alone, if you want. Also, mininode supports testing outside of regtest (it can communicate on testnet and mainnet as well), so that theoretically allows for a broader category of testing than we currently can do. All the tests I’ve been working on have been focused on regtest, however.
Also, in the first commit, I provide one example test (maxblocksinflight.py) that shows how you can use mininode. This is largely a proof-of-concept example, but it does test something useful: 0.10 used to fail this test until #5507 was merged.
- The second commit adds support for a comparison-tool style testing framework, similar to the comparison-tool framework that we use from bitcoinj in the pull-tester. The code in this commit is designed to add structure to test writing, so that tests are both easier to read and write (compared to the free-form structure of maxblocksinflight.py). I include in this commit one example test written using the framework, which tests the processing of two different types of invalid blocks.
EDIT: I forgot to mention that I use py-leveldb in blockstore.py, a file I introduced in this commit that provides disk-backed storage for blocks. I think I had to manually install that package on my machine, so I wanted to flag this dependency in case others think this would be a problem.
- The third commit adds some script processing routines (script.py and its dependency, bignum.py – the latter could probably be removed with a little work) which I have copied and modified from python-bitcoinlib. With these additional tools, I was able to write a test, script_test.py, which reads all the script tests from the unit test data directory (script_valid.json and script_invalid.json) and inserts them in blocks delivered to two nodes, and for each test case compares whether the nodes agree on whether the block containing the test transaction passes consensus checks. This test is very slow (perhaps 40 minutes to run), so this is largely a proof of concept that demonstrates the kinds of tests we ought to be able to write. (Unfortunately, this test makes use of RPC calls that only have existed since 0.10, so it is not possible to run this test in its current form to compare 0.9-or-older versions).
If this framework looks okay, then I think a future project would be to re-implement the pull-tester’s comparison test in this framework.
I realize this is a lot of code, so if there’s anything more I can do to aid in review please let me know.