This PR contains two refactors to the UpdateForDescendants method that could be stand alone, but I think are easiest to review together. This is follow up work from #17925 and a related to the work laid out in #17268.
The first change refactors UpdateForDescendants to use Epochs instead of sets to de-duplicate traversal, and replaces the cache entries with a vector instead of a set. This is a straightforward win. The algorithm is a bit clever so as not to require two separate vectors for txiters that need expanding and txiters that have already been processed, but I think it’s relatively easy to reason about.
The second change removes the cache altogether, as it is not needed.
The cache was an optimization to limit re-traversal in UpdateForDescendants. But because we know that descendants limit limits the total number of descendants, this actually doesn’t help that much. There are some common cases where having a CacheMap might be faster, but an adversary can trivially construct a worst-case example that causes equal behavior with or without a cachemap. For example, a transaction with N/2 children which each have N/2 outputs that get spent by N/2 grandchildren will cause the parent to iterate over N^2 entries with or without CacheMap. Post Epochs, UpdateForDescendants is sufficiently cheap such that the cachemap is just extra memory overhead.
This extra memory overhead isn’t free, and my expectation is that getting rid of it should be faster in general. In an average case, we might expect not to have very deep long chains so we’re allocating memory for little benefit.
It could be the case that the first commit causes a larger speed-up, because the upgraded cache vectors are much more efficient to build and iterate over. Then by removing them, we need to now go back to a traversal of std::sets for the MemPool children (before the first commit these cache lines were std::sets which are slow to iterate). However, we also eliminate the cache entry lookups, which are also slow, and get rid of a bunch of allocation. My guess is that it “comes out in the wash”, and the new algorithm after both patches is superior to both the previous algorithm and the algorithm after only the first patch. The descendants limit comes into play helping feel comfortable with this: we know that the vecEntries stage
can be no larger than the desendants limit (25), so we can bound how much traversals we’d do at 25*24.
As a future work, the stage can become a cached epoch member (since we know we have the lock), so that this function does not need to allocate on each call, but this seemed like a minor improvement and invites some more discussion on if pre-allocating these scratchpads is a good idea.