Update nChainTx to 64bit type

ajtowns commented at 7:42 am on January 17, 2024: contributor

https://github.com/bitcoin/bitcoin/blob/8106b268cde8e97a7c330afdda39b6bb55e5574a/src/chain.h#L178-L186

Hey, it’s 2024! nChainTx seems to currently be 953368191 (height 826100), which is hex 38d3 3e7f, so we’re only 22% of the way to an overflow, and would still need another 200k blocks (~ four years) full of 60 byte transactions to get there.

ajtowns commented at 7:51 am on January 17, 2024: contributor

Background: #13875 and https://github.com/bitcoin/bitcoin/pull/1677

willcl-ark added the label Validation on Jan 24, 2024

portlandhodl commented at 4:24 pm on January 27, 2024: contributor

Would this be a hard fork because it would be loosening the conditions for validation?

mzumsande commented at 4:29 pm on January 29, 2024: contributor

Would this be a hard fork because it would be loosening the conditions for validation?

I don’t think so, since I don’t think the value of nChainTx is used in validation (or do you see a spot where it is?). It’s used for logging (progress estimation), and whether it is 0 is used in HaveNumChainTxs. The latter could become problematic in case of an overflow, since if the unsigned int would overflow exactly to zero (which is a bit unlikely given that there are a few thousand txns in each block but ceratainly possible) the node would crash in https://github.com/bitcoin/bitcoin/blob/478ac185be49feffd1231366c07e06068683f941/src/validation.cpp#L3017.

portlandhodl commented at 4:44 pm on January 29, 2024: contributor

^^^ The above was the line that I was referencing. Thank you.

Sjors commented at 11:26 am on February 6, 2024: member

There was also the idea of dropping this number entirely: #13875 (comment)

Though see also #13875 (comment)

maflcko commented at 11:44 am on February 6, 2024: member

Though see also #13875 (comment)

This is nTx, not nChainTx.

maflcko commented at 12:09 pm on February 6, 2024: member

For reference, I removed nChainTx from consensus in #29349. The next steps would be to add a member to CChain to account for the sum of nTx, and update it when SetTip is called. Then, update the RPC, wallet and estimate functions to derive the nChainTx they need from the nChainTx at the tip, walking the relevant portion of the chain.

This is possible, but I am not sure if it is worth it, because:

It is more code overall.
The RPC, wallet, and logging code is slower when it has to walk the chain.

The benefit mainly seems to be to reduce the memory footprint? On 64-bit platforms, sizeof(CBlockIndex) is 144. Adding 4 bytes or removing 4 bytes is less than 10% of cost or savings. Ref: https://godbolt.org/z/rKEPYTh4j

Removing nTx is even harder, so less worth it. Especially, given that it could be reduced to 16 bits in the memory layout, if needed.

This leads to an alternative solution to not increase the memory footprint: Move the 32+32 bits used for nTx and nChainTx into a single 64-bit blob, where the 16 bit are for nTx and the remaining for nChainTx. But, again, the overall heap memory usage shouldn’t change much, either way.

ryanofsky commented at 8:08 pm on February 6, 2024: contributor

This leads to an alternative solution to not increase the memory footprint: Move the 32+32 bits used for nTx and nChainTx into a single 64-bit blob, where the 16 bit are for nTx and the remaining for nChainTx. But, again, the overall heap memory usage shouldn’t change much, either way.

This does seem like an nice and easy solution. Would be good to incorporate in #29331

As a tangent, I was thinking along the same lines last week before implementing #29370 and came up with worse solution, taking advantage of the fact that when nChainTx is set, you can derive nTx by subtracting nChainTx - pprev->nChainTx. In case it’s interesting:

 0class CBlockIndex
 1{
 2public:
 3    CBlockIndex* pprev{nullptr};
 4
 5    // Indicate whether num_tx_value contains total number of transactions in
 6    // this block, or total number of transactions in the chain up and including
 7    // this block.
 8    enum NumTxType { UNSET = 0, BLOCK_TX = 1, CHAIN_TX = 2 };
 9    NumTxType num_tx_type : 4 {UNSET};
10    uint64_t num_tx_value : 60;
11
12    uint64_t NumBlockTx()
13    {
14        if (num_tx_type == BLOCK_TX) return num_tx_value;
15        if (num_tx_type == CHAIN_TX) {
16            assert(pprev && pprev->num_tx_type == CHAIN_TX);
17            return num_tx_value - pprev->num_tx_value;
18        }
19        return 0;
20    }
21
22    uint64_t NumChainTx()
23    {
24        return num_tx_type == CHAIN_TX ? num_tx_value : 0;
25    }
26};

maflcko commented at 12:32 pm on February 7, 2024: member

This does seem like an nice and easy solution. Would be good to incorporate in #29331

Not sure if it so easy. On 32-bit platforms, nTx currently starts at byte 60, so making it 16-bit will still eat 32-bit with padding. So forcing a 64-bit blob of both values is needed, but that requires changing all code that uses either field. Not sure if worth it for a +-5% memory difference?

ryanofsky commented at 3:59 pm on February 8, 2024: contributor

Not sure if it so easy. On 32-bit platforms, nTx currently starts at byte 60, so making it 16-bit will still eat 32-bit with padding.

I didn’t test on a 32-bit platform, but aren’t bitfields a good solution to this? The following seems to work for me, increasing nChainTx range without increasing CBlockIndex size on x86_64 (144 bytes):

 0--- a/src/chain.h
 1+++ b/src/chain.h
 2@@ -178,7 +178,7 @@ public:
 3     //! Note: this value is faked during UTXO snapshot load to ensure that
 4     //! LoadBlockIndex() will load index entries for blocks that we lack data for.
 5     //! [@sa](/bitcoin-bitcoin/contributor/sa/) ActivateSnapshot
 6-    unsigned int nTx{0};
 7+    uint16_t nTx : 16 {0};
 8 
 9     //! (memory only) Number of transactions in the chain up to and including this block.
10     //! This value will be non-zero only if and only if transactions for this block and all its parents are available.
11@@ -188,7 +188,7 @@ public:
12     //! have the underlying block data available during snapshot load.
13     //! [@sa](/bitcoin-bitcoin/contributor/sa/) AssumeutxoData
14     //! [@sa](/bitcoin-bitcoin/contributor/sa/) ActivateSnapshot
15-    unsigned int nChainTx{0};
16+    uint64_t nChainTx : 48 {0};
17 
18     //! Verification status of this block. See enum BlockStatus
19     //!
20@@ -412,7 +412,12 @@ public:
21 
22         READWRITE(VARINT_MODE(obj.nHeight, VarIntMode::NONNEGATIVE_SIGNED));
23         READWRITE(VARINT(obj.nStatus));
24-        READWRITE(VARINT(obj.nTx));
25+
26+        unsigned int num_tx;
27+        SER_WRITE(obj, num_tx = obj.nTx);
28+        READWRITE(VARINT(num_tx));
29+        SER_READ(obj, obj.nTx = num_tx);
30+
31         if (obj.nStatus & (BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO)) READWRITE(VARINT_MODE(obj.nFile, VarIntMode::NONNEGATIVE_SIGNED));
32         if (obj.nStatus & BLOCK_HAVE_DATA) READWRITE(VARINT(obj.nDataPos));
33         if (obj.nStatus & BLOCK_HAVE_UNDO) READWRITE(VARINT(obj.nUndoPos));
34--- a/src/validation.cpp
35+++ b/src/validation.cpp
36@@ -5119,7 +5119,7 @@ double GuessVerificationProgress(const ChainTxData& data, const CBlockIndex *pin
37 
38     double fTxTotal;
39 
40-    if (pindex->nChainTx <= data.nTxCount) {
41+    if (static_cast<int64_t>(pindex->nChainTx) <= data.nTxCount) {
42         fTxTotal = data.nTxCount + (nNow - data.nTime) * data.dTxRate;
43     } else {
44         fTxTotal = pindex->nChainTx + (nNow - pindex->GetBlockTime()) * data.dTxRate;

maflcko commented at 4:42 pm on February 8, 2024: member

bitfields

Jup, that didn’t work for me. Or maybe I did something wrong. See https://godbolt.org/z/z6EoWKbjY

0   60:0-15 |   int16_t nTx
1   64:0-47 |   int64_t nChainTx

(nTx eats byte 60-64, even though it is only a two byte bitfield)

ryanofsky commented at 5:13 pm on February 8, 2024: contributor

Jup, that didn’t work for me. Or maybe I did something wrong. See https://godbolt.org/z/z6EoWKbjY

It’s not that surprising that the compiler adds padding so the struct member with uint64_t type is inside a field aligned to 8 bytes. But if you move the two bitfield members to begin on an 8 byte boundary (like between the nDataPos and nUndoPos fields, or to the beginning of the struct, it does get packed without extra padding). So I think the diff above is an easy way to avoid consuming extra memory on 64 bit platforms, even though we could go further with more changes to get the same optimization on 32 bit.

ajtowns commented at 5:33 pm on February 8, 2024: contributor

bitfields

Jup, that didn’t work for me. Or maybe I did something wrong. See https://godbolt.org/z/z6EoWKbjY
0   60:0-15 |   int16_t nTx
1   64:0-47 |   int64_t nChainTx
(nTx eats byte 60-64, even though it is only a two byte bitfield)

I think the problem here is you have an odd number of 32-bit pointers prior to nTx, so it’s misaligned to be combined with nChainTx. So you end up with nTx [4B-6B padding] nChainTx nStatus [4B padding]. Moving nStatus prior to nTx gives an even number of 32-bit objects prior to nTx, so drops the size back down.

ryanofsky commented at 8:01 pm on March 8, 2024: contributor

When updating this should also update the type of the AssumeutxoData::nChainTx field:

https://github.com/bitcoin/bitcoin/blob/1cd2e29870e4ad3b7c57b1d567df0e6df56572b0/src/kernel/chainparams.h#L57

glozow closed this on Aug 5, 2024

Fabcien referenced this in commit 1a19a4d960 on Aug 5, 2024

bitcoin locked this on Aug 5, 2025

Update nChainTx to 64bit type #29258