Add pre-allocated vector type and use it for CScript

sipa commented at 1:29 am on October 30, 2015: member

This is likely a controversial change, and will need very careful review and testing.

It adds a new basic data type (prevector<N, T>) which is a fully API compatible drop-in replacement for std::vector<T> (except it does not support custom allocators, does not have at(), and misses a few comparison operators). It will allocate up to N elements inside the parent container directly, only switching over to heap-allocated storage if more is needed. The data structures for the N elements and the pointer/size metadata for the heap-allocated storage are shared, making it very efficient for small N.

CScript is switched to use this new type, reducing the memory consumption of mempool and chainstate.

A benchmark (reindex with script validation disabled):

 02015-10-30 01:10:49   - Load block from disk: 0.00ms [0.07s]
 12015-10-30 01:10:49       - Connect 595 transactions: 2.76ms (0.005ms/tx, 0.003ms/txin) [82.40s]
 22015-10-30 01:10:49     - Verify 1096 txins: 2.80ms (0.003ms/txin) [88.73s]
 32015-10-30 01:10:49     - Index writing: 1.29ms [61.31s]
 42015-10-30 01:10:49     - Callbacks: 0.04ms [6.78s]
 52015-10-30 01:10:49   - Connect total: 4.78ms [182.16s]
 62015-10-30 01:10:49   - Flush: 0.77ms [38.90s]
 72015-10-30 01:10:49   - Writing chainstate: 0.03ms [5.57s]
 82015-10-30 01:10:49 UpdateTip: new best=000000000000046b1fdf00440a86e34886b2b0bd56472b22edb2805c92bbe4ca  height=223133  log2_work=69.460921  tx=13407260  date=2013-02-25 23:16:10 progress=0
 9.064773  cache=666.5MiB(1658277tx)
102015-10-30 01:10:49   - Connect postprocess: 1.46ms [46.22s]
112015-10-30 01:10:49 - Connect block: 7.03ms [272.92s]
12
132015-10-30 01:22:03       - Connect 595 transactions: 2.86ms (0.005ms/tx, 0.003ms/txin) [69.53s]
142015-10-30 01:22:03     - Verify 1096 txins: 2.90ms (0.003ms/txin) [75.80s]
152015-10-30 01:22:03     - Index writing: 1.26ms [56.37s]
162015-10-30 01:22:03     - Callbacks: 0.04ms [6.77s]
172015-10-30 01:22:03   - Connect total: 4.87ms [164.33s]
182015-10-30 01:22:03   - Flush: 1.14ms [24.95s]
192015-10-30 01:22:03   - Writing chainstate: 0.03ms [5.12s]
202015-10-30 01:22:03 UpdateTip: new best=000000000000046b1fdf00440a86e34886b2b0bd56472b22edb2805c92bbe4ca  height=223133  log2_work=69.460921  tx=13407260  date=2013-02-25 23:16:10 progress=0
21.064773  cache=508.6MiB(1658277tx)
222015-10-30 01:22:03   - Connect postprocess: 1.36ms [43.18s]
232015-10-30 01:22:03 - Connect block: 7.41ms [237.66s]

So for this reindex up to 223133, the chainstate needs 23% less memory, and is 13% faster.

There are likely several other places where this datatype can be used.

sipa force-pushed on Oct 30, 2015

dcousens commented at 2:56 am on October 30, 2015: contributor

tentative concept ACK, once-over utACK (not in depth)

As discussed on IRC, the trade offs [currently] are API compatibility (how invasive the change is) and the complexity of the change to achieve the desired memory/performance characteristics.

jonasschnelli commented at 8:03 am on October 30, 2015: contributor

Concept ACK. 23% less memory and 13% faster seems after a reason for doing a such change. Hows the performance boost biased through disabled verification, what performance benefit would be expected “in normal operation”?

btcdrak commented at 8:17 am on October 30, 2015: contributor

@sipa Not sure why you’d consider this controversial. Seems like a big win to me.

gmaxwell commented at 9:21 am on October 30, 2015: contributor

@jonasschnelli well, disabled should (eventually) be representative of initial sync impact… runtime impact is a bit harder to measure, I expect, because it will depend greatly on the signature cache. When the hitrate is high I expect it to be closer to these figures, and when it’s low… the signature validation would dwarf this improvement. @btcdrak because it’s hard to review, if it weren’t such a big improvement in speed/memory it wouldn’t be worth considering.

jgarzik commented at 12:59 pm on October 30, 2015: contributor

This is a standard technique for vectors. I don’t find it controversial. concept ACK

btcdrak commented at 1:43 pm on October 30, 2015: contributor

@gmaxwell Understood, but I was meaning, if there is clearly such a benefit, it makes it uncontroversial to me.

Anyway, Concept ACK, will try to review this weekend.

sipa commented at 4:07 pm on October 30, 2015: member

@jgarzik Controversial to go change such a widely used consensus data structure.

laanwj added the label Validation on Oct 30, 2015

laanwj commented at 3:11 am on November 2, 2015: member

I like the speedup. I do think it introduces a lot of somewhat hard to review code, and by sake of being part of CScript it becomes part of the consensus code.

At some point we should separate how scripts are allocated/stored, which is not consensus critical, from the code used to evaluate them, which is consensus critical, but could act on any slice of memory.

sipa commented at 3:21 am on November 2, 2015: member

@laanwj Yes, absolutely! That would also help building types that use arena storage for scripts (like one single allocation arena for a CCoins, a CTransaction, or even a whole CBlock).

I would of course like to see this change in before that time, but feel free to demand better comments and/or tests if not considered sufficient.

theuni commented at 5:51 am on November 4, 2015: member

@sipa Not sure if you were aware and/or based your work on this, but the “dynarray” was proposed for c++14, but didn’t make it.

http://en.cppreference.com/w/cpp/container/dynarray

I only mention because using a reference implementation of that may ease the minds of reviewers.

sipa commented at 6:03 am on November 4, 2015: member

@cfields: dynarray is very different

It still allocates everything on the heap
Its size is fixed, so it doesn’t behave like a vector, but like an array

theuni commented at 6:46 am on November 4, 2015: member

@sipa ah sorry, I see. That’s what i get for skimming.

gmaxwell commented at 11:34 pm on November 6, 2015: contributor

This is a huge speedup, and tests clean in valgrind. For reindex w/ libsecp256k1 this takes the time down from 3 hours 16 minutes to 2 hours 7 minutes. (With signature tests disabled completely this gets it down to 1h 17m.). Size of the UTXO set in memory is reduced from 5486MB to 4012MB.

I intend to test this further, but I support this, and it looks generally okay to me. I’m surprised we got this much improvement just for cscript, I was expecting we’d have to make the entire cached transaction a single allocation to see this kind of benefit.

dcousens commented at 1:19 am on November 7, 2015: contributor

I was expecting we’d have to make the entire cached transaction a single allocation to see this kind of benefit.

Why isn’t that the case OOI? I feel like CScript could just be 2 pointers, begin and end?

edit: Is CScript really mutated that often?

gmaxwell commented at 1:30 am on November 7, 2015: contributor

Inside the coins cache the whole entry gets mutated, e.g. to delete outputs when they’re spent… but never in a way which needs to increase their size. The cscripts themselves in these entries are never mutated at all.

The mutation that decreases the entry size could be addressed by flagging their deletion, and a separate operation which compacts the coins cache (e.g. takes txouts which many deleted entries and packs then reallocs them to a smaller size)– e.g. it could be run on flush (which scans for modified entries) as well as periodically when flusing is infrequent (e.g. db cache set to infinity), perhaps triggered by an overhead counter (which would be cheap to maintain). This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

dcousens commented at 1:44 am on November 7, 2015: contributor

This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

Indeed, as discussed with @sipa originally, the biggest benefit about this change is how isolated it is, it just hot swaps an existing interface, while netting a huge performance increase.

sipa force-pushed on Nov 7, 2015

sipa commented at 2:39 am on November 7, 2015: member

Added some comments.

sipa commented at 0:14 am on November 8, 2015: member

@dcousens CScript is hardly ever mutated. But CCoins which contains a vector of CTxouts, which contain a CScript is mutated often. The proposal is to make CCoins allocate its entire memory (including several CScripts) at once.

Prevector type 114b5812f6

sipa force-pushed on Nov 13, 2015

sipa renamed this:
~~[WIP] Add pre-allocated vector type and use it for CScript~~
Add pre-allocated vector type and use it for CScript
on Nov 13, 2015

sipa commented at 5:24 pm on November 13, 2015: member

Added a few more iterator tests (which caught (compile-time) errors), and removed the [WIP] marker.

sipa commented at 1:16 pm on November 28, 2015: member

Do we want this for 0.12?

gmaxwell added this to the milestone 0.12.0 on Nov 28, 2015

gmaxwell commented at 2:09 pm on November 28, 2015: contributor

I think so, it’s a large performance improvement and memory usage reduction.

dcousens commented at 3:49 am on November 30, 2015: contributor

ACK for 0.12

laanwj commented at 11:30 am on November 30, 2015: member

I’d prefer to merge a large, reasonably risky change like this after 0.12 branch, but if you are confident enough about this, I’m ok with it.

morcos commented at 10:32 pm on November 30, 2015: member

i’ve done a light code review and have used this pull extensively. i’m in favor of merging and will continue to do a more extensive code review.

gmaxwell commented at 11:00 pm on November 30, 2015: contributor

I have tested this extensively– including operation in valgrind, perhaps more than I’ve tested the current tip without it. I have read the patch and think it looks fine but I feel anything doing this exceeds my knowledge of the subtle behaviors of C++ so I do not consider my code review to have much value on this pull.

I think it’s okay to merge, and considering our interactions if it causes trouble during the 0.12 pre-release cycle its also not hard to back out: absent it; we’re likely to ignore some performance problems in 0.12 chalking it up to “well 6914 is a big fix”. It will also see a lot more testing if its merged for 0.12 (as I will have multiple people working full time on testing the 0.12 and the next elements update that will be 0.12 based).

dcousens commented at 0:35 am on December 1, 2015: contributor

I’ll pull this on to my own nodes for extended testing, and will look over it once more.

laanwj merged this on Dec 1, 2015

laanwj closed this on Dec 1, 2015

laanwj referenced this in commit 327291af02 on Dec 1, 2015

zkbot referenced this in commit 3b68ab255f on Apr 17, 2018

zkbot referenced this in commit d408e23ab7 on Apr 18, 2018

zkbot referenced this in commit 0753a0e8a9 on Apr 19, 2018

Fuzzbawls referenced this in commit 8dfc4806f7 on May 19, 2020

MarcoFalke locked this on Sep 8, 2021

Add pre-allocated vector type and use it for CScript #6914