Evict coinbase spends from memory pool during re-org

gavinandresen commented at 7:47 pm on November 25, 2014: contributor

This is an alternative version of #5267

It works by having DisconnectTip() lookup the block at height current-COINBASE_MATURITY, and then removing any transactions spending that coinbase (recursively) from the memory pool.

Since re-orgs happen by repeated calls to DisconnectTip(), this will keep the memory pool always consistent, even in the middle of a re-org.

Builds on #5316

Includes a new regression test.

gmaxwell commented at 0:34 am on November 26, 2014: contributor

@TheBlueMatt ping

sipa commented at 0:53 am on November 26, 2014: member

I worry about the increased block access, and the resulting 100 extra blocks that can’t be pruned.

Isn’t it easier to just allow immature spends in the mempool? Typically in a reorganization, after the disconnect, a reconnect follows that will make most of the removed transaction in the mempool valid again anyway.

TheBlueMatt commented at 9:26 am on November 26, 2014: member

Yes, there are several ways to fix this, and this one was discussed a bit. However, I’d highly prefer to hit coinsdb for this as inputs to mempool txn are likely to be cached anyway, which is preferable to hitting disk for a second block. (this also has potentially significant implications for pruning, as you now need to keep 100 extra blocks for any given reorg depth).

Also, could you include “Make CTxMemPool::check more thourough by using CheckInputs 3215e5a”? This should make the code much more robust in the future, though currently this branch fails those checks.

TheBlueMatt commented at 9:27 am on November 26, 2014: member

@sipa Yes, if it weren’t for your invalidateblock/reconsiderblock RPC additions then we could just only run check() after reorgs instead of during and all would be OK.

gavinandresen commented at 10:57 am on November 26, 2014: contributor

@sipa : a re-org happens on the main network on average once or twice a day. Extra block access is really NOT an issue.

RE: not being able to prune to less than 100+plausible-re-org-depth: I’m OK with 120 being the minimum number of blocks a pruning node needs to keep. If we REALLY want to support pruning down lower, then storing the hash of the coinbase transactions in the block indices would be sufficient.

RE: easier to allow immature in mempool: I really don’t like the idea of relaxing the invariant of “every transaction in the mempool is ready to be mined.” Certainly this code is simpler than that alternative. I also think we’ll have to optimize CreateBlock fairly soon, which will probably mean a mempool rewrite to maintain a priority queue of transactions– so creating a block is just “pop off queue until block full.” @TheBlueMatt : including more thorough CheckInputs is a good idea, but I don’t think I’ll have time to do that until Monday (out for Thanksgiving).

RE: run check() after reorgs are complete comment: theoretically, you can reorg onto a more-work chain that has a lower height. That will never happen on the main network, but I hate to say never and I think we should be as robust as possible at all the edge cases we can think up.

sipa commented at 11:04 am on November 26, 2014: member

I’m worried about the extra block propagation latency of reorgs. Not about killing people’s disks.

sipa commented at 11:05 am on November 26, 2014: member

And the requirement in the pruning code is already limited to 1 or 2 days, if I recall correctly. Whatever we are or were comfortable with, this adds another 100 nonprunable blocks.

gavinandresen commented at 11:07 am on November 26, 2014: contributor

Idea for if we decide to care about resurrecting coinbase spends that are invalidated during a re-org but become valid when it is finished:

Move the coinbase spends to the orphan transaction list instead of dropping them. And add code when extending the tip to look for any “now mature” coinbase spends in mapOrphans, and move them to the mempool.

(but I still don’t think we need to care about this case)

sipa commented at 4:05 pm on November 26, 2014: member

Well, I definitely prefer having a fix than not having one, and pruning isn’t exactly a blocker right now.

Given that this also adds a test to detect the incorrect behaviour, I think we should merge it, and work on better solutions later.

sipa commented at 4:07 pm on November 26, 2014: member

Also, can you rebase on top of the new #5316?

TheBlueMatt commented at 1:25 am on November 27, 2014: member

re: minimum block count for pruning: you really need to have enough blocks to make human-timescale-fixes possible (ie I’d say a week of blocks, at least, ie 1008 blocks), and while an extra 100 blocks isnt a huge deal to me either, going from 1000 to 1100 is not a trivial difference, though I’m not sure who it’d effect practically.

Anyway, I’ll try to find some time this week to figure out why my pull fails your tests and your pull fails my tests :)

laanwj added the label UTXO Db and Indexes on Nov 28, 2014

gavinandresen commented at 4:40 pm on November 28, 2014: contributor

RE: requiring 100 MORE blocks when pruned:

I must be missing something, I don’t understand why this would be number pruned PLUS 100 more blocks, rather than just “at least 100 most-recent blocks plus plausible re-org depth must be maintained”).

Can @sipa or @TheBlueMatt explain why you’re thinking this means pruning needs 100 more blocks than it would without this?

sipa commented at 4:45 pm on November 28, 2014: member

Say you want to be sure you can reorg 111 blocks back, so down to height-111. This code here will fail if it can’t read block B-100 if B is being reorged, so to reorg down to height-111, you need to have block height-211 available. Thus, you can only prune up to block height-211 rather than height-111.

gavinandresen commented at 5:19 pm on November 28, 2014: contributor

@sipa: ok, misunderstanding was that anybody would think that a 100+ block re-org was plausible, or would want to plan for a multi-hundred-block re-org. We’re saying the same thing, I just think it is silly to think that if a 100+ blockchain re-org happens having your mempool messed up will be an issue anybody cares about.

TheBlueMatt commented at 5:28 pm on November 28, 2014: member

A 100+ block reorg isnt the issue, but /any/ reorg will require an extra 100 blocks to complete. As mentioned above in this thread, with pruning, its important to maintain enough reorg-able blocks to allow for human-scale issues (ie when there’s an issue, we need to be able to get miners to upgrade and ensure people will eventually reorg to the best chain, ala the bdb locks fork)…Even ignoring that, hitting disk for this instead of iterating over the mempool and only hitting memory (in reasonable caches) is a huge difference, at least for those with spinning disks.

TheBlueMatt commented at 6:46 pm on November 28, 2014: member

There appears to be some misunderstanding here…I bring up the issue of hitting disk only as a way to say that I dont think #5267 is going to perform any worse than this in most cases, which it appears to at first glance, not because I’m necessarily concerned about relay times across nodes or anything like that (we have better solutions to that problem already). My point is more that I see #5267 as more obviously robust (instead of removing just the transactions we know we need to with a scalpel, we use a hammer and scan through the whole mempool to find the ones we need to remove) and performance isnt really a reason to chose this over that.

gavinandresen force-pushed on Dec 2, 2014

gavinandresen commented at 7:23 pm on December 2, 2014: contributor

Fixed off-by-one error found by @TheBlueMatt, and now includes his “Make CTxMemPool::check more thorough” commit.

And rebased.

gavinandresen force-pushed on Dec 2, 2014

Make CTxMemPool::remove more effecient by avoiding recursion 02b2c2af2f

Make CTxMemPool::check more thourough by using CheckInputs f62dcf66ad

Remove coinbase-dependant transactions during reorg.

This still leaves transactions in mempool that are potentially
invalid if the maturity period has been reorged out of, but at
least they're not missing inputs entirely.

c3dabdbc42

gavinandresen force-pushed on Dec 3, 2014

sipa commented at 4:09 pm on December 3, 2014: member

utACK. I believe the code and solution here are correct, assuming the block 100 back can be correctly read.

I would still prefer a solution that does not rely on that assumption, but that can be done later.

in src/main.cpp: in 7268566aa3 outdated

1879+    int immatureHeight = pindexDelete->nHeight-COINBASE_MATURITY+1;
1880+    if (immatureHeight > 0)
1881+    {
1882+        const CBlockIndex* immatureIndex = pindexDelete->GetAncestor(immatureHeight);
1883+        CBlock immatureBlock;
1884+        if (ReadBlockFromDisk(immatureBlock, immatureIndex))

laanwj commented at 8:46 am on December 4, 2014:

Isn’t it a fatal error if this ReadBlockFromDisk fails?

gavinandresen commented at 2:42 pm on December 4, 2014:

Very good question, the answer isn’t obvious to me.

return state.Abort() if block-100 couldn’t be read would be trivial, but it seems a shame to get stuck on a bad fork if you can’t read block-100 because your disk went bad. Although I suppose if your disk is going bad you want to know sooner rather than later.

A safe alternative would be to mempool.clear() if this read fails.

laanwj commented at 2:53 pm on December 4, 2014:

I think stopping the node (with a state.Abort) is the acceptable when expected information cannot be read from disk. It means something is wrong on a lower level than we can handle. On the other hand you’re right that the mempool is not ciritical, and clearing it would be enough to get past this with consistent state.

TheBlueMatt commented at 9:21 am on December 4, 2014: member

@sipa: HUH? Why should it be done later? Do you have any specific objections to #5267?

in src/txmempool.cpp: in 7268566aa3 outdated

449+        uint256 hash = txToRemove.front();
450+        txToRemove.pop_front();
451+        if (!mapTx.count(hash))
452+            continue;
453+        const CTransaction& tx = mapTx[hash].GetTx();
454+        removed.push_front(tx);

TheBlueMatt commented at 11:14 am on December 4, 2014:

To mimic the old behavior, you actually need to push_back here. Though, again, I dont think this code is even necessary.

sipa commented at 11:23 am on December 4, 2014: member

@TheBlueMatt I have one objection against it: it isn’t passing Travis.

Fix mempool handling of coinbase spends

The logic for removing/resurrecting transactions from the memory
pool was broken for spends of coinbase transactions; they
were not being removed from the mempool when they should.

Fixed by having DisconnectTip() remove any transactions from the memory
pool that are descendants of the was-mature-but-is-now-immature coinbase
transaction.

A future pull request could keep track of all those removed transactions
and try to resurrect them after re-organizing onto the new chain, but
I'm not sure it is worth writing that code for a case that never happens,
and, if it does happen, should correct itself (wallets should re-broadcast
transactions that don't end up in the chain).

cdc176bfc9

gavinandresen force-pushed on Dec 4, 2014

gavinandresen commented at 2:58 pm on December 4, 2014: contributor

ReadFromDisk(block-100) failure will now state.abort().

And picked the push_front versus back nit pointed out by @TheBlueMatt .

laanwj commented at 3:32 pm on December 4, 2014: member

Tested ACK (for 0.10) commithash cdc176bfc94dd7ff2ff5ad398d03b2b69f71abdb (signature)

laanwj added this to the milestone 0.10.0 on Dec 5, 2014

laanwj commented at 2:27 pm on December 11, 2014: member

Merged #5267 instead, to have one fix in. If it turns out that this solution was better we can change it again.

laanwj closed this on Dec 11, 2014

MarcoFalke locked this on Sep 8, 2021

Evict coinbase spends from memory pool during re-org #5368