Instead of a bucket per thread, have a batch per thread. However, instead of batch verifying after each vChecks batch, batch verify after the queue of checks is empty for the block. We need CCheckQueue::Complete to set a flag that no more checks will be added, and wake all threads again. The threads will all verify their batches once the global queue is empty. We would need to reset the flag after Complete.
Yes, one batch per thread is the approach I mentioned above as well as preferred, if that wasn’t clear. My thinking was that the fastest approach overall would be if we would do that plus efficient merging of batches which should ideally have the same overhead as adding a single signature to a batch (to be verified that this is possible). It would need to be benchmarked across different scenarios for blocks with varying share of schnorr sigs and across different numbers of available threads, of course. I am sure there will be some scenarios where the “let each thread verify their own batch” will be faster and there will be other scenarios where “merge thread batches and verify once” is faster and we would need to decide based on that in the end. A definite upside of the merging of thread batches would be that it would be simpler code, we could save this round of waking up the threads with the flag set.
However, in the meantime I have asked Siv how hard it would be to add the merging of batches to the secp api and it appears it might be harder than I thought because it would require merging of scratch spaces, which is currently not possible. So, while maybe theoretically possible, it’s quite far of terms of engineering effort. Siv will still look into it and check if it’s feasible at all but until then I will disregard this approach.
So, long story short, I have now implemented it with a batch per thread using a flag set in complete as you suggested.