This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
The current prevector
size of 28 bytes (chosen to fill the sizeof(CScript)
aligned size) was introduced in 2015 (https://github.com/bitcoin/bitcoin/pull/6914) before SegWit
and TapRoot
.
However, the increasingly common P2WSH
and P2TR
scripts are both 34 bytes, and are forced to use heap (re)allocation rather than efficient inline storage.
The core trade-off of this change is to eliminate heap allocations for common 34-36 byte scripts at the cost of increasing the base memory footprint of all CScript
objects by 8 bytes (while still respecting peak memory usage defined by -dbcache
).
Context
Increasing the prevector
size allows these scripts to be stored inline, avoiding heap allocations, reducing potential memory fragmentation, and improving performance during cache flushes. Massif analysis confirms a lower stable memory usage after flushing, suggesting the elimination of heap allocations outweighs the larger base size for common workloads.
Due to memory alignment, increasing the prevector size to 36 bytes doesn’t change the overall sizeof(CScript)
compared to an increase to 34 bytes, allowing us to include P2PK
scripts as well at no additional memory cost.
dbcache=440
Massif before, with a heap threshold of 28
:
0 MB
1744.1^#
2 |#: ::::::@: ::::::: :@:: @::::::::::::::@@
3 |#: ::::::@::::: ::: :@:::@:::::: :: ::::@
4 |#: ::::::@::::: ::: :@:::@:::::: :: ::::@
5 |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@
6 |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@
7 |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@
8 |#::::::::@::::: ::: : :@:::@:::::: :: ::::@
9 |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
10 |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
11 |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
12 |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
13 |#::::::::@::::: :::::::@:::@:::::: :: ::::@
14 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
15 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
16 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
17 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
18 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
19 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
20 |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
21 0 +----------------------------------------------------------------------->h
22 0 1.805
and after, with a heap threshold of 36
:
0 MB
1744.2^ :
2 |# : ::::::::::: : : :: ::: @@:::::: :: :
3 |# : :::: :::::: : : :: ::: @ :: :: : :
4 |# : :::: ::::::: : :@:: ::: @ :: :: : :::
5 |# : :::: ::::::: : :@:: ::: @ :: :: : : :
6 |# : :::: ::::::: : :@:: ::: @ :: :: : : :
7 |# : :::: ::::::: : :@:: ::: @ :: :: : : :
8 |# :: :::: ::::::: : :@:: ::: @ :: :: : : :
9 |# :: :::: ::::::: : :@:: ::::@ :: :: : : :
10 |#:::: :::: ::::::: :::@:: ::::@ :: :: : : :
11 |#: ::::::: ::::::: :::@:: ::::@ :: :: @: : :
12 |#: ::::::: ::::::: :::@:::::::@ :: :: @: : :
13 |#: ::::::: ::::::::::::@:::::::@ :: :: @: : :
14 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :
15 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
16 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
17 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
18 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
19 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
20 |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
21 0 +----------------------------------------------------------------------->h
22 0 1.618
for
dbcache=4500
:
Massif before, with a heap threshold of 28
:
0 GB
14.565^ ::
2 | ##: @@::: :::: :@:::: :::: ::::
3 | # : @ :: ::: :@: :: : :: :::
4 | # : @ :: ::::: :@: :: : :: :::
5 | # : @ :: : ::: :@: :: @: :: :::
6 | # : @ :: : ::: :@: :: @: :: :::
7 | # : @ :: : ::: :@: :: @: :: :::
8 | # : @ :: : ::: :@: :: @: :: :::
9 | # : ::@ :: : ::: :@: :: @: :: :::
10 | # : : @ :: : ::: :@: :: @: :: :::
11 | # : : @ :: : ::: :@: :: @: ::::::
12 | # : : @ :: : ::: :@: :: @: ::::::
13 | # : : @ :: : ::: :@: :: @: ::::::
14 | # : : @ :: : ::: ::@: :: @: ::::::
15 | # : : @ :: : ::: ::@: :: @: ::::::
16 | # : : @ :: : ::: ::@: :: @: ::::::
17 | # : : @ :: : ::: ::@: :: @: :::::: @::
18 | # : : @ :: : ::: ::@: :: @: :::::: @:
19 | # : : @ :: : ::: ::@: :::@: :::::: @:
20 | # : : @ :: : ::: ::@: :::@: :::::: @: :::::::::::::::::::::::::::::@:::
21 0 +----------------------------------------------------------------------->h
22 0 1.500
and after, with a heap threshold of 36
:
0 GB
14.640^ :
2 | ##:: ::::: :::: ::::::@ ::::
3 | # :: : ::: :::: :: :::@ ::::
4 | # :: :: ::: :::: :: :::@ ::::
5 | # :: :: ::: ::::: :: :::@ ::::
6 | # :: :: ::: ::::: :: :::@ ::::
7 | # :: :: ::: ::::: :: :::@ ::::
8 | # :: :: ::: ::::: :: :::@ ::::
9 | # :: :: ::: ::::: :: :::@ :::: :@@
10 | # :: :: ::: ::::: ::: :::@ :::::::@
11 | # :: :: ::: ::::: ::: :::@ ::::: :@
12 | # :: :: ::: ::::: ::: :::@::::::: :@
13 | # ::::: ::: ::::: ::: :::@: ::::: :@
14 | # ::::: ::: ::::: ::: :::@: ::::: :@
15 | # ::::: :::::::::: ::: :::@: ::::: :@
16 | # ::::: :::: ::::: ::: :::@: ::::: :@
17 | # ::::: :::: ::::: ::: :::@: ::::: :@
18 | # ::::: :::: ::::::::: :::@: ::::: :@
19 | # ::::: :::: ::::::::: :::@: ::::: :@
20 | # ::::: :::: ::::::::: :::@: ::::: :@ ::::::@:::@:::@::::@:::::@::::@::
21 0 +----------------------------------------------------------------------->h
22 0 1.360
Benchmarks and Memory
Performance benchmarks for AssumeUTXO
load and flush show:
- Small dbcache (450MB): ~1-3% performance improvement (despite more frequent flushes)
- Large dbcache (4500MB): ~6-8% performance improvement due to fewer heap allocations (and basically the number of flushes)
- Very large dbcache (4500MB): ~5-6% performance improvement due to fewer heap allocations (and memory limit not being reached, so there’s no memory penalty)
Full IBD and -reindex-chainstate
with also show an overall ~3-4% speedup (both for smaller and larger dbcache values).
We haven’t investigated using different prevector
sizes based on script type, though this could be explored in the future if needed.
Historical explanation for the speedup (by Anthony Towns)
I think the tradeoff is something like:
- spends of p2pk, p2sh, p2pkh coins – these cost 8 more bytes
- spends of p2wpkh – these cost 16 more bytes (sPK and scriptSig didn’t need an allocation)
- spends of p2wsh and p2tr – these cost ~48 fewer bytes (save 64 byte allocation on 64bit system, lose 8 bytes for both scriptSig and sPK)
- spends of nested p2wsh – presumably save ~96 bytes, since the scriptSig would save an allocation, but I’m bundling it in the previous section
Based on mainnet.observer stats for 2025-05-08, p2wpkh is about 55% of txs, p2tr is about 28%, p2pkh about 13%, p2wsh about 4% and the rest is noise, maybe? Those numbers net out to a saving of ~5.5 bytes per input. If p2wpkh rose from 55% to 80% and p2tr dropped to 20%, that would net to wasting ~3.2 bytes per input.