Python P2P testing framework

sdaftuar commented at 6:19 pm on April 7, 2015: member

Motivated by the discussion in #4545, I’ve been working on building a python testing framework that will allow us to write tests that exercise the p2p code in bitcoind, so that we can perform end-to-end testing and perform comparisons of behavior across bitcoind versions. This code builds on the existing RPC testing functionality, so that tests using both RPC calls and p2p messages can be written.

I’ve split this pull into 3 commits to try to make this easier to review.

The first commit adds a file called mininode.py, which I grabbed from jgarzik’s mini-node branch of the pynode repository, and modified. This file does a few things:

Defines a class, NodeConn, which manages connectivity to a bitcoin node.
Defines a callback prototype, NodeConnCB, which is used for message delivery. The idea is that you can write a class that inherits from NodeConnCB and pass it in to a NodeConn object, and then be notified when events of interest arrive.
Defines all the data structures from bitcoin core that pass over the network, eg CBlock, CTransaction, etc.
Defines all the serialization/deserialization code.

It’s possible to write (crude) tests using mininode.py alone, if you want. Also, mininode supports testing outside of regtest (it can communicate on testnet and mainnet as well), so that theoretically allows for a broader category of testing than we currently can do. All the tests I’ve been working on have been focused on regtest, however.

Also, in the first commit, I provide one example test (maxblocksinflight.py) that shows how you can use mininode. This is largely a proof-of-concept example, but it does test something useful: 0.10 used to fail this test until #5507 was merged.

The second commit adds support for a comparison-tool style testing framework, similar to the comparison-tool framework that we use from bitcoinj in the pull-tester. The code in this commit is designed to add structure to test writing, so that tests are both easier to read and write (compared to the free-form structure of maxblocksinflight.py). I include in this commit one example test written using the framework, which tests the processing of two different types of invalid blocks.

EDIT: I forgot to mention that I use py-leveldb in blockstore.py, a file I introduced in this commit that provides disk-backed storage for blocks. I think I had to manually install that package on my machine, so I wanted to flag this dependency in case others think this would be a problem.

The third commit adds some script processing routines (script.py and its dependency, bignum.py – the latter could probably be removed with a little work) which I have copied and modified from python-bitcoinlib. With these additional tools, I was able to write a test, script_test.py, which reads all the script tests from the unit test data directory (script_valid.json and script_invalid.json) and inserts them in blocks delivered to two nodes, and for each test case compares whether the nodes agree on whether the block containing the test transaction passes consensus checks. This test is very slow (perhaps 40 minutes to run), so this is largely a proof of concept that demonstrates the kinds of tests we ought to be able to write. (Unfortunately, this test makes use of RPC calls that only have existed since 0.10, so it is not possible to run this test in its current form to compare 0.9-or-older versions).

If this framework looks okay, then I think a future project would be to re-implement the pull-tester’s comparison test in this framework.

I realize this is a lot of code, so if there’s anything more I can do to aid in review please let me know.

jonasschnelli commented at 6:29 pm on April 7, 2015: contributor

Nice! conceptual ACK. Have plans to test this soon.

laanwj commented at 6:24 am on April 8, 2015: member

Nice work!

I forgot to mention that I use py-leveldb in blockstore.py

I’d rather not add any external dependencies for the tests. Do we need any kind of persistent block storage in the tests? If we really do, I’d rather hack something together with python.

laanwj added the label Tests on Apr 8, 2015

sdaftuar commented at 2:38 pm on April 9, 2015: member

@laanwj I’d prefer to leave the disk-backed storage there, even though it’s probably not needed for the existing tests I’ve included in this pull, because it will be needed to write longer tests like the comparison test which currently runs in the pull-tester.

I pushed up a commit that replaces leveldb with dbm, which I believe is a standard python package; does that seem like an acceptable option?

laanwj commented at 3:42 pm on April 9, 2015: member

Sure, using a python standard package is fine, my gripe was with using an external dependency. Using disk-backed storage is no problem as long as subsequent tests don’t interfere with each other.

sdaftuar force-pushed on Apr 11, 2015

laanwj commented at 9:33 am on April 16, 2015: member

Maybe add the new tests (whenever they don’t take too long) to the pull tester e.g. qa/pull-tester/rpc-tests.sh. For example invalidblockrequest.py seems to finish quickly.

The bipdersig-p2p.py test seems to hang indefinitely here at:

0Initializing test directory /tmp/testBxTNIB
1MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11916
2Test 1: PASS [98]

sdaftuar force-pushed on Apr 17, 2015

sdaftuar commented at 5:39 pm on April 17, 2015: member

@laanwj The bipdersig-p2p.py test was failing because this needed rebase, due to the setgenerate ()/generate() regtest rpc change (now fixed).

I’ve also shortened maxblocksinflight.py, and added both it and invalidblockrequest.py to the pull tester’s rpc tests.

EDIT: I see that this errored out in travis; will investigate.

laanwj commented at 8:05 am on April 21, 2015: member

Ah yes, the generate switcharoo. I think we should add an explicit error when setgenerate() is used on regtest, so that bugs like this are tripped up immediately instead of through vague timeouts.

The travis error seems to occur on 64-bit Linux - no detailed information unfortunately

0Running testscript maxblocksinflight.py...
1
2Initializing test directory /tmp/testIyXJqc
3
4No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.

laanwj commented at 8:24 am on April 28, 2015: member

I’d really like to merge this. Do you have trouble solving the travis error? If so, maybe @theuni can help.

sdaftuar force-pushed on Apr 28, 2015

sdaftuar commented at 4:35 pm on April 28, 2015: member

This needed to be rebased anyway, so I did that and now travis succeeds. I’m not sure what to make of that, since I didn’t actually fix anything, and it seems odd to think that this was just a spurious travis failure that happened to only affect a new test I’ve introduced in this pull.

I’ll try rebasing again, since I wanted to fold the 4th and 5th commits in anyway, and when I push we’ll get one more data point about how travis does with the new tests.

Python p2p testing framework

mininode.py provides a framework for connecting to a bitcoin node over the p2p
network. NodeConn is the main object that manages connectivity to a node and
provides callbacks; the interface for those callbacks is defined by NodeConnCB.
Defined also are all data structures from bitcoin core that pass on the network
(CBlock, CTransaction, etc), along with de-/serialization functions.

maxblocksinflight.py is an example test using this framework that tests whether
a node is limiting the maximum number of in-flight block requests.

This also adds support to util.py for specifying the binary to use when
starting nodes (for tests that compare the behavior of different bitcoind
versions), and adds maxblocksinflight.py to the pull tester.

6c1d1ba6fc

Add comparison tool test runner, built on mininode

comptool.py creates a tool for running a test suite on top of the mininode p2p
framework.  It supports two types of tests: those for which we expect certain
behavior (acceptance or rejection of a block or transaction) and those for
which we are just comparing that the behavior of 2 or more nodes is the same.

blockstore.py defines BlockStore and TxStore, which provide db-backed maps
between block/tx hashes and the corresponding block or tx.

blocktools.py defines utility functions for creating and manipulating blocks
and transactions.

invalidblockrequest.py is an example test in the comptool framework, which
tests the behavior of a single node when sent two different types of invalid
blocks (a block with a duplicated transaction and a block with a bad coinbase
value).

b93974c3f3

sdaftuar force-pushed on Apr 28, 2015

Add script manipulation tools for use in mininode testing framework

script.py is modified from the code in python-bitcoinlib, and provides tools
for manipulating and creating CScript objects.

bignum.py is a dependency for script.py

script_test.py is an example test that uses the script tools for running a test
that compares the behavior of two nodes, in a comptool- style test, for each of
the test cases in the bitcoin unit test script files, script_valid.json and
script_invalid.json.  (This test is very slow to run, but is a proof of concept
for how we can write tests to compare consensus-critical behavior between
different versions of bitcoind.)

bipdersig-p2p.py is another example test in the comptool framework, which tests
deployment of BIP DERSIG for a single node.  It uses the script.py tools for
manipulating signatures to be non-DER compliant.

d76412b068

sdaftuar force-pushed on Apr 28, 2015

sdaftuar commented at 7:56 pm on April 28, 2015: member

Well, on the first attempt, travis failed again with the 10-minute no-data-received timeout, but this time it was not involving the new p2p tests (it died while the java comparison tool was running). I amended the last commit to force a re-run, and it ran successfully, so now I am a little more willing to believe that these may just be random travis failures that I’m encountering, and not a problem in the code introduced here.

theuni commented at 8:00 pm on April 28, 2015: member

I came too late.. what was the original error? Edit: nevermind, posted same time.

Add some travis debugging for python scripts

Adds printing to the console before/after calls to bitcoin-cli -rpcwait,
if the PYTHON_DEBUG environment variable is initialized.

29bff0e684

Fix default binary in p2p tests to use environment variable 2703412a39

sdaftuar commented at 2:57 pm on April 29, 2015: member

After discussing with @theuni on IRC, I added a commit that adds some optional debugging (which I enabled for travis) to calls to bitcoin-cli -rpcwait (inside util.py), and then another addressing the bug he caught with the default binary names.

laanwj commented at 4:03 pm on April 29, 2015: member

ACK

laanwj merged this on Apr 30, 2015

laanwj closed this on Apr 30, 2015

laanwj referenced this in commit da38dc696c on Apr 30, 2015

jonasschnelli commented at 1:06 pm on April 30, 2015: contributor

1st: this is impressive work. Nice! I think the quick merge honors this job.

Nevertheless here are some first post-ACK reports (more detailed report will follow):

Running maxblocksinflight.py does fail on my local machine:

 0Running testscript maxblocksinflight.py...
 1Initializing test directory /var/folders/hp/kb9p9q8x4k3_z_ccy588hxrc0000gn/T/testVtWBqX
 2MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11650
 3Round 0: success (total requests: 8)
 4Round 1: success (total requests: 16)
 5Round 2: success (total requests: 16)
 6Round 3: success (total requests: 16)
 7Stopping nodes
 8Exception in thread Thread-1:
 9Traceback (most recent call last):
10  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
11    self.run()
12  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/mininode.py", line 1237, in run
13    asyncore.loop(0.1, True)
14  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/asyncore.py", line 216, in loop
15    poll_fun(timeout, map)
16  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/asyncore.py", line 145, in poll
17    r, w, e = select.select(r, w, e, timeout)
18error: (9, 'Bad file descriptor')

same for invalidblockrequest.py

 0Running testscript invalidblockrequest.py...
 1Initializing test directory /var/folders/hp/kb9p9q8x4k3_z_ccy588hxrc0000gn/T/testqJOevS
 2MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11506
 3Test 1: PASS [1]
 4Test 2: PASS [101]
 5Assertion failed: Test failed at test 3
 6  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/test_framework.py", line 119, in main
 7    self.run_test()
 8  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/invalidblockrequest.py", line 39, in run_test
 9    test.run()
10  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/comptool.py", line 284, in run
11    raise AssertionError("Test failed at test %d" % test_number)
12Stopping nodes

in qa/rpc-tests/invalidblockrequest.py: in 2703412a39

26+class InvalidBlockRequestTest(ComparisonTestFramework):
27+
28+    ''' Can either run this test as 1 node with expected answers, or two and compare them. 
29+        Change the "outcome" variable from each TestInstance object to only do the comparison. '''
30+    def __init__(self):
31+        self.num_nodes = 1

jonasschnelli commented at 1:31 pm on April 30, 2015:

Is it somehow possible to set this to 2 if --refbinary is set?

sdaftuar commented at 1:38 pm on April 30, 2015: member

@jonasschnelli Thanks, and thanks for running the tests too. Looks to me like the first error message is happening in how the test cleans up, that probably just needs to be made more robust; I’ll take a look. The second error message is more concerning though, as the substantive part of the test is failing.

Could you run invalidblockrequest.py --tracerpc (which will produce a lot of debug output, basically every p2p message sent or received, along with the rpc calls) and upload it somewhere for me to analyze? That should help me debug what’s going on here.

in qa/rpc-tests/mininode.py: in 2703412a39

1232+        self.send_message(self.messagemap['ping']())
1233+
1234+
1235+class NetworkThread(Thread):
1236+    def run(self):
1237+        asyncore.loop(0.1, True)

jonasschnelli commented at 2:25 pm on April 30, 2015:

After discussion this on IRC it turned out, that OSes without polling support (OSX) have problems with timeouts below 1 because they always use select.select(). Should be changed to asyncore.loop(1, True)

jonasschnelli commented at 2:32 pm on April 30, 2015: contributor

Post merge ACK

sdaftuar commented at 3:19 pm on April 30, 2015: member

@jonasschnelli FYI - I did some more digging, and I think I’ve identified a race condition in the invalidblockrequest.py test (which you triggered the first time you ran, and got the assertion failure at test 3).

I’ll PR a cleanup of these issues shortly.

DrahtBot locked this on Sep 8, 2021

Python P2P testing framework #5981