I agree this is pretty intricate. I’ll try to explain in the comments, to see if you have any suggestions for making it clearer.
The SetType modified;
variable contains a conservative overestimate of which clusters in the real graph may differ between main and staging. Whenever a transaction is added, or removed, or undergoes a dependency adding in the simulated staging graph, it is definitely considered modified. But for example, say transactions A->B exists in main & staging, so they’re not in modified
. Now a transaction C is added to staging, and a dependency B->C is added. Main is A->B, staging is {A,C}->B. We mark A as modified too, because despite not directly undergoing a change, the cluster it is in is still changed.
Now as for why this may be an overestimate. Say D->E exist in main, as well as a separate F. Staging is created, a dependency E->F is added, and E is removed. The result, in the simulation, is that D & F remain, both modified. However, in the real TxGraphImpl
, the dependency addition and removal on E is done lazily, and at application time, the removal happens first, and the dependency addition is a no-op. In the real graph, F is never copied to staging, and thus will not be included in the diagrams, since it is actually the same cluster in both main & staging.