txospenderindex: use zero-byte entry values #35634

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/txospenderindex-empty-values changing 2 files +15 −3
  1. l0rinc commented at 4:57 PM on July 1, 2026: contributor

    Problem: TxoSpenderIndex values are never read: FindSpender() only uses the key, which already encodes the spender's disk position, but the values are written as single-byte \0(instead of empty values).

    Fix: Write zero-byte markers for new txospenderindex entries instead. Existing indexes stay readable because both formats use the same keys and the value is ignored. Rebuilding is only needed to shrink old entries.

    Related work: This complements #35568, which saves another ~4.5 GB by dropping bloom filters. A #35531-style rewrite could let existing indexes reclaim both savings without a full rebuild.

    Reproducer: Automated tests weren't added, but the manual reproducer below builds both commits, checks old/new format compatibility through gettxspendingprevout, and prints rebuilt index sizes. On my mainnet datadir this PR shrank txospenderindex by ~3.4 GB (from 89G to 86G).

    <details><summary>Script + sample output</summary>

    BEFORE="89b4000ae06c72c5a14ee05ad70d3aece3f1b382" AFTER="1be799ef858c33158a302857b084e169b5bf8c1f" DATA_DIR="/mnt/my_storage/BitcoinData" LOG="${DATA_DIR}/debug.log" OUT='[{"txid":"0437cd7f8525ceed2324359c2d0ba26006d92d856a9c20fa0241106ee5a597c9","vout":0}]' SPEND="f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16"; \
    for b in before after; do [ -x ./build-$b/bin/bitcoin-cli ] && ./build-$b/bin/bitcoin-cli -datadir="${DATA_DIR}" stop >/dev/null 2>&1 || true; done; sleep 10; \
    git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && (git fetch origin "$BEFORE" "$AFTER" >/dev/null 2>&1 || true) && \
    for c in before:$BEFORE after:$AFTER; do git checkout ${c#*:} >/dev/null 2>&1 && cmake -B build-${c%:*} -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && cmake --build build-${c%:*} -j --target bitcoind bitcoin-cli >/dev/null 2>&1; done && \
    for b in before after; do ./build-$b/bin/bitcoin-cli -datadir="${DATA_DIR}" stop >/dev/null 2>&1 || true; done; sleep 10; \
    start_node() { [ "$2" = wipe ] && rm -rf "${DATA_DIR}/indexes/txospenderindex" "${LOG}"; : > "${LOG}"; ./build-"$1"/bin/bitcoind -datadir="${DATA_DIR}" -txospenderindex=1 -connect=0 -printtoconsole=0 & pid=$!; while ! grep -Fq 'txospenderindex is enabled at height' "${LOG}" 2>/dev/null; do kill -0 "$pid" 2>/dev/null || { tail -100 "${LOG}"; return 1; }; sleep 5; done; }; \
    check_spend() { result="$(./build-"$1"/bin/bitcoin-cli -datadir="${DATA_DIR}" gettxspendingprevout "$OUT" '{"mempool_only":false}')"; echo "$2: $result"; echo "$result" | grep -q "$SPEND"; }; \
    stop_node() { ./build-"$1"/bin/bitcoin-cli -datadir="${DATA_DIR}" stop >/dev/null; wait "$pid"; }; \
    echo "prevout: 0437cd7f8525ceed2324359c2d0ba26006d92d856a9c20fa0241106ee5a597c9:0 -> ${SPEND}" && \
    start_node before wipe && check_spend before "old binary reads old-format index" && du -sh "${DATA_DIR}/indexes/txospenderindex"; stop_node before && \
    start_node after && check_spend after "new binary reads old-format index"; stop_node after && \
    start_node after wipe && check_spend after "new binary reads new-format index" && du -sh "${DATA_DIR}/indexes/txospenderindex"; stop_node after && \
    start_node before && check_spend before "old binary reads new-format index"; stop_node before
    

    Expected result, trimmed to the relevant compatibility and size lines:

    prevout: 0437cd7f8525ceed2324359c2d0ba26006d92d856a9c20fa0241106ee5a597c9:0 -> f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16
    old binary reads old-format index: [ ... "spendingtxid": "f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16" ... ]
    89G     /mnt/my_storage/BitcoinData/indexes/txospenderindex
    new binary reads old-format index: [ ... "spendingtxid": "f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16" ... ]
    new binary reads new-format index: [ ... "spendingtxid": "f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16" ... ]
    86G     /mnt/my_storage/BitcoinData/indexes/txospenderindex
    old binary reads new-format index: [ ... "spendingtxid": "f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16" ... ]
    

    </details>

  2. DrahtBot commented at 4:57 PM on July 1, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35634.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sedited, ekzyis, andrewtoth, davidgumberg

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. in src/index/txospenderindex.cpp:80 in a8f57da34e outdated
      79 | -        // key is hash(spent outpoint) | disk pos, value is empty
      80 | -        batch.Write(key, "");
      81 | +        txospenderindex::DBKey key(txospenderindex::CreateKey(m_siphash_key, outpoint, pos));
      82 | +        // The key encodes the spent outpoint hash and disk position. The value is only a marker.
      83 | +        // Older entries may contain serialized empty strings; FindSpender() reads only keys.
      84 | +        batch.Write(key, std::span<const std::byte>{});
    


    andrewtoth commented at 5:12 PM on July 1, 2026:

    It seems to me this PR could just be this one line change to have the effect we want. What's the motivation behind >100 lines of changes?


    l0rinc commented at 5:26 PM on July 1, 2026:

    Mostly the tests to prove backwards compatibility


    sedited commented at 7:26 PM on July 1, 2026:

    I think I'd also prefer just changing this one line. The test is nice to demonstrate that this works correctly, but I don't see it really helping us to either prevent future regressions, or making the code easier to evolve in the future.


    l0rinc commented at 8:39 PM on July 1, 2026:

    Ok, thanks for the feedback, removed the tests, kept the automated reproducer description.


    davidgumberg commented at 12:08 AM on July 3, 2026:

    I'm not sure I agree here, after this PR is merged, there is no enforcement (outside of the code comment) that future code doesn't assume that the value is 0 bytes long, or that all values are the same or w/e, maybe I'm being too paranoid, but it would help to elaborate why a regression is implausible here.


    andrewtoth commented at 1:33 PM on July 3, 2026:

    future code doesn't assume that the value is 0 bytes long, or that all values are the same

    I don't see why future code would need to know what length the value is or whether any of them are the same. The value is ignored in this index, only the key is used.


    davidgumberg commented at 7:46 PM on July 3, 2026:

    Yeah I can't think of something in particular that I think is likely, but I'm not sure why that should be the bar, isn't it the case that when evaluated before-the-fact, most future regressions seem pretty unlikely since they require someone doing something no one expects now, and misunderstanding something everyone understands clearly right now since we are looking at it?

    Contrived example, future code starts embedding data in txospenderindex value field sometimes, for some unexpected reason, and makes the assumption that you can do comparisons between the fetched values.

    Anyways, just wanted to give that concern a second chance, don't want to block here on this, I definitely defer to to yall's judgement on this question since you've been looking at this area a lot more than I have, and I'm convinced that a regression here is pretty unlikely.


    l0rinc commented at 7:53 PM on July 3, 2026:

    The objection isn't unreasonable; it's why I added the test in the first place. But given that the test is demonstrably passing (though it's too cumbersome for such a simple change), it served its purpose, and the remaining comment should be enough:

    Older entries may contain serialized empty strings; FindSpender() reads only keys.

  4. l0rinc force-pushed on Jul 1, 2026
  5. DrahtBot added the label CI failed on Jul 1, 2026
  6. DrahtBot commented at 9:39 PM on July 1, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/28546146458/job/84631833696</sub> <sub>LLM reason (✨ experimental): CI failed because the IWYU (include-what-you-use) check detected missing/incorrect header includes and intentionally exited with “Failure generated from IWYU”.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  7. index: shrink txospenderindex value markers
    `TxoSpenderIndex` uses LevelDB values only as presence markers. `FindSpender` iterates over keys and loads the spending transaction from the disk position stored in each key.
    Write new entries with a zero-byte span instead of a serialized empty string, so rebuilt indexes avoid one extra byte per spender entry.
    Existing indexes remain readable because the lookup path does not deserialize the marker value.
    ce06878288
  8. doc: add txospenderindex release note
    Document that newly indexed `txospenderindex` entries use less disk space and that existing indexes remain readable.
    Users only need to rebuild the index if they want previously indexed entries rewritten with the smaller marker.
    113402286e
  9. l0rinc force-pushed on Jul 1, 2026
  10. DrahtBot removed the label CI failed on Jul 1, 2026
  11. sedited approved
  12. sedited commented at 7:47 AM on July 2, 2026: contributor

    ACK 113402286ea305a2f0c0c0712452d424113899f2

  13. sedited requested review from fjahr on Jul 2, 2026
  14. ekzyis commented at 9:31 AM on July 2, 2026: contributor

    utACK 113402286ea305a2f0c0c0712452d424113899f2

    Problem: TxoSpenderIndex values are never read: FindSpender() only uses the key, which already encodes the spender's disk position, but the values are written as single-byte \0 values.

    Nit: As mentioned in the release notes, this is about saving one byte, but the problem description sounds like there's a bug.

  15. andrewtoth approved
  16. andrewtoth commented at 2:01 PM on July 2, 2026: contributor

    ACK 113402286ea305a2f0c0c0712452d424113899f2


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-07-04 02:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me