This repository is very large (~16GB atm) and I think there are a bunch of things we could do to improve that.
- Prune the git history,
.git
is currently at 4GB. (we don’t really need the history/we could archive to the history to a separate repo) - Compress corpora (~6GB gzip)
- Avoid large inputs / have separate repo for those
The biggest downside to the size currently is that we pull this repo in our CI jobs (oss-fuzz as well) which is a big overhead.
Maybe we setup an automated mirror repo that has the compressed corpora and no git history?