Delete old fuzz inputs from history #232

pull maflcko wants to merge 0 commits into bitcoin-core:main from maflcko:2507-filter changing 0 files +0 −0
  1. maflcko commented at 3:23 pm on July 23, 2025: contributor

    Fixes #228

    Can be reproduced via:

    0git filter-repo --invert-paths --path fuzz_seed_corpus  # clear the path with the old name
    1
    2git mv fuzz_corpora fuzz_corpora_backup  # backup of the new name
    3git commit -m 'backup'
    4git filter-repo --invert-paths --path fuzz_corpora
    5git mv fuzz_corpora_backup fuzz_corpora
    6git commit -m 'restore'
    

    I’ve also rebased on the very first commit, as it does not need to be rewritten (this will also simplify review later on) and the exact commit id can be kept:

    0git rebase 52db8e0f4a2c75b0f977c808f81c0cf6b264e077
    

    This can be reviewed by re-doing the filter, and then comparing the resulting commit history:

    0git range-diff 52db8e0f4a2c75b0f977c808f81c0cf6b264e077 HEAD fd7e08cd37a175b31a100f71f8a9f3fb369b4837
    

    Or simply by comparing against current main (ignoring the history) and observing an empty diff:

    0$ git diff e6e82b895a44365a2faa9ff96f6d39dafe2da43e fd7e08cd37a175b31a100f71f8a9f3fb369b4837 | wc -l
    10
    
  2. maflcko commented at 3:24 pm on July 23, 2025: contributor
    (Obviously this should not be merged, but rather force pushed to the main branch, after review)
  3. maflcko commented at 3:34 pm on July 23, 2025: contributor

    This should nuke 5GB unused stuff from the .git history, bringing a full fresh .git clone down to ~600MB:

    0$ du -sh ./.git
    1632M	./.git
    
  4. murchandamus commented at 3:38 pm on July 23, 2025: contributor
    I’m surprised to see a few “Add inputs” commits in that history still. Should they not all be squashed to one to get a cut-through?
  5. maflcko commented at 3:54 pm on July 23, 2025: contributor
    I can remove them as well, but I don’t think it is going to provide a significant difference. I’ll take a look tomorrow.
  6. murchandamus commented at 4:33 pm on July 23, 2025: contributor
    Ah right, if they only add inputs that are still in the current set it would not make a big difference. I just thought from the description of what you are doing that all commits that touch the content of the fuzz_corpora dir would be squashed, but that would not actually necessarily follow.
  7. maflcko force-pushed on Jul 23, 2025
  8. maflcko commented at 6:23 am on July 24, 2025: contributor

    Thanks, done. It actually went down another 50%:

    0$ du -sh ./.git
    1309M	./.git
    
  9. dergoegge commented at 11:49 am on July 24, 2025: member
    I tried git clone --depth 1 --branch 2507-filter git@github.com:maflcko/bitcoin-core-qa-assets.git filtered-qa-assets but the size of the clone is still >4GB? I expected this to reflect the new size we are aiming for after a force push.
  10. maflcko commented at 12:02 pm on July 24, 2025: contributor

    I tried git clone --depth 1 --branch 2507-filter git@github.com:maflcko/bitcoin-core-qa-assets.git filtered-qa-assets but the size of the clone is still >4GB? I expected this to reflect the new size we are aiming for after a force push.

    For me it is 300M:

     0root@4445d0550fb8:/# git clone --depth 1 --branch 2507-filter https://github.com/maflcko/bitcoin-core-qa-assets.git filtered-qa-assets 
     1Cloning into 'filtered-qa-assets'...
     2remote: Enumerating objects: 184319, done.
     3remote: Counting objects: 100% (184319/184319), done.
     4remote: Compressing objects: 100% (149742/149742), done.
     5Receiving objects:  95% (175104/184319), 265.99 MiB | 4.26 MiB/s
     6remote: Total 184319 (delta 8622), reused 173457 (delta 7023), pack-reused 0 (from 0)
     7Receiving objects: 100% (184319/184319), 279.44 MiB | 4.20 MiB/s, done.
     8Resolving deltas: 100% (8622/8622), done.
     9Updating files: 100% (185786/185786), done.
    10
    11root@4445d0550fb8:/# du -sh filtered-qa-assets/.git
    12309M	filtered-qa-assets/.git
    

    Also note, that this doesn’t affect a clone that omits the history (--depth=1). Clones with depth=1 will be exactly the same size before and after this. Also, this doesn’t affect the checked out files, because they are identical to the ones in current main, and will use the same amount of storage-space.

    This will only affect a fresh, full clone. The goal is to drop years old fuzz inputs from the history that are irrelevant today.

  11. dergoegge commented at 12:06 pm on July 24, 2025: member

    Thanks, I was looking at the whole directory not just .git🤦

    lgtm!

  12. maflcko commented at 12:20 pm on July 24, 2025: contributor
    @murchandamus I guess I’ll wait for your review and then merge this?
  13. murchandamus commented at 5:48 pm on July 24, 2025: contributor

    I have verified that the diff between this branch and main is empty. I was curious which commit would be creating the fuzz_corpora content, but it looks like their history simply begins with them being moved back and forth. It might be cleaner to squash those two commits by resetting to the commit before them and adding fuzz_corpora as if it were new. That said, I don’t feel strongly about it being necessary.

    LGTM.

  14. maflcko merged this on Jul 24, 2025
  15. maflcko closed this on Jul 24, 2025

  16. maflcko force-pushed the base branch on Jul 24, 2025
  17. maflcko deleted the branch on Jul 24, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/qa-assets. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-02 07:25 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me