Brainstorm: Improving Test Vector Formats #22957

JeremyRubin commented at 6:00 PM on September 12, 2021: contributor

I've been doing a lot of work recently in the data/*.json test vectors and have noticed that the format of the tests (both the jsons themselves and how they are processed) is a bit brittle and clumsy. It might be worthwhile -- although a mild inconvenience to downstream consumers who depend on the specific format -- to convert the format from arrays to objects with key/value mappings instead.

Extending the test vectors ends up being a bit of a chore, especially if the number of optional arguments grows and you need to insert a default value. It's also somewhat bizzare that transactions are defined as hex txs, it might make more sense to list the transactions as native JSONs (if a bit more verbose) as hex strings make the format harder to read. Going for JSON encoded transactions would make it easier to see what a specific test is doing, and would permit nicer auto-formatting of test vectors. There are also other test areas I noticed are missing, such as lockpoint declarations for used TXIDs, which could be beneficial for testing nSequence/nLockTime in transactions as well.

I don't think it's particularly high priority, but perhaps someone has an interest in working on this!

maflcko added the label Brainstorming on May 12, 2022

maflcko added the label Tests on May 12, 2022

david-bakin commented at 3:38 AM on July 18, 2022: contributor

I find it confusing actually - esp. as there's no documentation except the test code on what the format of any particular file is - and some files use more than one format! I'm not saying an official schema is necessary - but a table - in the test source code - would be nice.

Plus, and this has bothered me - each test vector has only exactly the data needed for the particular test it is used in at the particular time the test was written even though when generating the test vectors more data was available. That additional data could be used, later, to enhance the tests using them or to provide new tests, with additional validation on the same test vectors. Which would be nice. What I'm suggesting here is that as long as you're going to the trouble to generate useful correct test vectors drop all relevant information into the JSON for that test vector so that subsequent tests - that you haven't envisioned - can be more easily written. (Oh, and another use for the additional data: developers investigating bitcoin techniques and algorithms (i.e., learning it, getting better understanding, thinking/trying enhancements) will have a useful data set with which to work in ways the test author, who is generating the test vectors, hasn't envisioned for the tests being written.

JeremyRubin closed this on Dec 16, 2022

bitcoin locked this on Dec 16, 2023