I don’t think the binary sizes would be a problem after we merge something like #30882 and the extra folder for the corpus is also not a huge cost (they compress really nicely).
Are you sure? IIUC 30882 creates a separate binary for each fuzz target, so depending on the compile/link options (static, without lto, …), each fuzz target may or may not be heavy. For example, the OSS-Fuzzing already went down due to storage limitations (https://github.com/google/oss-fuzz/pull/12232). I am not saying that OSS-Fuzz should be considered a blocker, given that there are now several in-house backups, but I wanted to mention it for context.
I’d prefer to follow best practices over having to weigh triviality against some external overhead of running these tests. “Put separate tests into separate harnesses” is easier to follow than “aggregate trivial functions in one harness” because it is entirely subjective. What is trivial? How many unrelated trivial functions are allowed in one harness? I think this creates confusion for devs that aren’t experts on fuzzing.
I think I agree that it would be ideal to have an easy/trivial guideline to follow. However, I still think there is some threshold where a separate fuzz target (or maybe even a fuzz target at all) may not be the best choice. There are many pure util functions that (in theory) could go into a separate fuzz target. For example, StringForFeeEstimateHorizon
https://github.com/bitcoin/bitcoin/blob/e43ce250c6fb65cc0c8903296c8ab228539d2204/src/test/fuzz/kitchen_sink.cpp#L45 should (in theory) be moved to a separate target. However, it is such a trivial function (a single switch-case) that linking the binary and starting it takes longer than reaching full coverage from an empty initial seed corpus. Maybe the function is too trivial to be put into a fuzz target at all, but I think there are plenty other pure util functions, such that there could easily be 500-1500 fuzz targets with similar properties. Of course there is nothing inherently wrong with having so many fuzz targets, but I could imagine that some tooling (introspector, coverage reports per fuzz target) or humans consuming the tool’s output could be overwhelmed by the number of trivial fuzz targets and miss the actually important ones.