doc: clarify test placement guidance #35260

pull l0rinc wants to merge 1 commits into bitcoin:master from l0rinc:l0rinc/document-test-placement changing 2 files +31 −1
  1. l0rinc commented at 10:32 AM on May 11, 2026: contributor

    Problem: doc/developer-notes.md does not explain how test coverage should be placed across a commit stack, especially when a change modifies previously uncovered behavior or when a refactor depends on an uncovered invariant. This has led to review questions about current-behavior tests, regression tests, and behavior-preserving refactors, for example in #35251 and #31212.

    Fix: Add a General Testing section to doc/developer-notes.md with rule-of-thumb guidance for automated tests, manual testing recipes, and behavior-preserving changes. Add a Commit Structure for Tests subsection explaining when tests should be updated with behavior changes, when refactors should first cover preserved invariants (documenting the problem before jumping to the solution), and when characterization tests can make non-trivial uncovered changes easier to review. Replace the short CONTRIBUTING.md rule with a link to the detailed guidance.

  2. DrahtBot added the label Docs on May 11, 2026
  3. DrahtBot commented at 10:32 AM on May 11, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35260.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK w0xlt
    Stale ACK optout21, rkrux, ryanofsky, sedited, hodlinator

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    No conflicts as of last run.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. maflcko commented at 10:37 AM on May 11, 2026: member

    Seems fine, but for single commit prs, testing can also be done by not compiling the cpp code changes (or reverting them) and then running the test as-is and observing a failure.

  5. in doc/developer-notes.md:301 in 737e5a488a
     297 | @@ -298,6 +298,25 @@ Linux: `sudo apt install doxygen graphviz`
     298 |  
     299 |  MacOS: `brew install doxygen graphviz`
     300 |  
     301 | +## Tests and Commit Structure
    


    optout21 commented at 10:52 AM on May 11, 2026:

    I have doubts that allocating a full top-level ("##") section to this issue is the best. Maybe the section could be called "General Testing", with the first paragraph, and the second paragraph demoted by one level to "### Commit Structure for Tests".


    l0rinc commented at 11:17 AM on May 11, 2026:

    Done, thanks

  6. in doc/developer-notes.md:308 in 737e5a488a outdated
     303 | +Tests should be added or updated for new features, bug fixes, and other behavior
     304 | +changes. A behavior-changing commit should include the relevant test coverage.
     305 | +Test changes can be omitted for behavior-preserving work with no externally
     306 | +observable behavior to test, such as moving code, renaming, or mechanical
     307 | +refactors.
     308 | +
    


    optout21 commented at 10:55 AM on May 11, 2026:

    The general points on testing could also include that tests should check results or post-conditions (a test without proper checks increases code coverage, but has limited value). But this is already outside of the scope of the original intent.


    l0rinc commented at 11:17 AM on May 11, 2026:

    Partially done

  7. optout21 commented at 11:01 AM on May 11, 2026: contributor

    ACK 3df19b4e0aa90206197fcebbb27950133f46b91c

    Prev: reACK b2cb955d518e03321e06fd0516c48a2b8e762b6b ACK 737e5a488aa0e9ad1245e087c87aa3c465327c15 I consider the proposed approach good software craftsmanship. The text is well explained, and the 'why' question is supported by references.

    There is a relevant quote in the mentioned blog post that I feel worthwhile to reproduce here:

    When a system goes into production, in a way, it becomes its own specification. We need to know when we are changing existing behavior regardless of whether we think it’s right or not.

  8. l0rinc commented at 11:17 AM on May 11, 2026: contributor

    testing can also be done by not compiling the cpp code changes

    I agree, especially for a small single-commit fix where reverting the code change is straightforward.

    The proposed split is meant to make the evidence more granular. With a combined fix+test commit, reviewers can check that the bundle fails without the fix and passes with it, but the test itself does not stand alone as a record of prior behavior. With a separate current-behavior test commit, that commit documents what the code did before, and the following fix commit shows the minimal expectation change.

    I agree that this matters most once the fix is larger than a one-line change. For refactors or multi-file changes, there often is no clean "just revert the cpp" recipe, and mixing the fix with several new assertions can make it harder to see which assertion changed and which ones were already true.

    That is the characterization-testing angle: the existing system "becomes its own specification", so the split helps show exactly when behavior is being changed.

    Adjusted the PR description, covered this angle, added @optout21's suggestions, thanks.

  9. l0rinc force-pushed on May 11, 2026
  10. DrahtBot added the label CI failed on May 11, 2026
  11. in doc/developer-notes.md:323 in b2cb955d51 outdated
     318 | +style keeps the evidence more granular: the test-only commit records prior
     319 | +behavior, and the behavior-changing commit shows the minimal expectation change;
     320 | +see also [Michael Feathers' original
     321 | +write-up](https://michaelfeathers.silvrback.com/characterization-testing).
     322 | +When prior behavior is already covered, update those tests in the same commit as
     323 | +the behavior change.
    


    maflcko commented at 11:36 AM on May 11, 2026:

    Reminds me of:

    CONTRIBUTING.md:125:This means tests must be updated in the same commit that changes the behavior.
    

    Maybe that line can be replaced by a link to this section?

    Not sure what should go in CONTRIBUTING.md vs this file here.


    l0rinc commented at 11:43 AM on May 11, 2026:

    Good point, replaced the CONTRIBUTING.md duplication with a link to the new developer-notes section.


    hodlinator commented at 12:41 PM on May 11, 2026:

    Would expect these paragraphs in CONTRIBUTING.md, but I guess developer-notes.md already has advice on scripted diffs etc so this works too.

  12. l0rinc force-pushed on May 11, 2026
  13. rkrux approved
  14. rkrux commented at 12:38 PM on May 11, 2026: contributor

    lgtm ACK 3df19b4

    I haven't gone through the links in detail but I understand the general idea.

  15. DrahtBot requested review from optout21 on May 11, 2026
  16. DrahtBot removed the label CI failed on May 11, 2026
  17. hodlinator commented at 12:43 PM on May 11, 2026: contributor

    Concept ACK

  18. in doc/developer-notes.md:303 in 3df19b4e0a
     297 | @@ -298,6 +298,30 @@ Linux: `sudo apt install doxygen graphviz`
     298 |  
     299 |  MacOS: `brew install doxygen graphviz`
     300 |  
     301 | +## General Testing
     302 | +
     303 | +Tests should be added or updated for new features, bug fixes, and other behavior
    


    maflcko commented at 3:55 PM on May 11, 2026:

    Also, adding a test is not always the best choice. Sometimes it is better to manually document how to test something, instead of adding a brittle and confusing test for eternity.


    l0rinc commented at 2:30 PM on May 30, 2026:

    Covered in the latest push, added you as coauthor, thanks.

  19. ryanofsky approved
  20. ryanofsky commented at 6:26 PM on May 14, 2026: contributor

    Code review ACK 3df19b4e0aa90206197fcebbb27950133f46b91c

    My vote for the concept would be +0.5. This seems like good advice to me, but I think it should only be added if there is broad agreement. The developer notes already contain some advice that is debatable, was merged without much discussion, and is hard to remove once added. I think the bar for adding any new advice should be high (higher than it has been historically).

    I also think the guideline could be shorter without changing the meaning, and not relying as heavily on external sources. For example, I would just write:

    When feasible, changes that modify externally observable behavior should be accompanied by functional or unit tests. Ideally, new tests should be added in separate commits before changing behavior, as characterization tests, and then updated atomically in later commits that add features or fix bugs. This makes it clear exactly how behavior is changing. But this is not always practical, and it is also OK to add new tests in the same commits that change behavior.

    This could get the basic point across without being as granular. But your current version also seems good and readers may appreciate being told specifically what to do in different cases.

  21. DrahtBot requested review from hodlinator on May 14, 2026
  22. maflcko commented at 6:51 PM on May 14, 2026: member

    The developer notes already contain some advice that is debatable, was merged without much discussion, and is hard to remove once added.

    I think it would be fine to remove stuff, especially if it is stale, confusing, redundant, or controversial. See also the latest removal for reference: https://github.com/bitcoin/bitcoin/pull/32572

  23. in doc/developer-notes.md:307 in 3df19b4e0a
     302 | +
     303 | +Tests should be added or updated for new features, bug fixes, and other behavior
     304 | +changes. A behavior-changing commit should include the relevant test coverage.
     305 | +Test changes can be omitted for behavior-preserving work with no externally
     306 | +observable behavior to test, such as moving code, renaming, or mechanical
     307 | +refactors.
    


    sedited commented at 7:00 PM on May 14, 2026:

    I'm not sure about adding this. We already seem to be doing a decent job at it, and as maflcko pointed out, sometimes the tests being added for an obvious one-line change can be brittle, or just don't warrant the additional complexity. Some tests do also require non-trivial maintenance. Maybe this can be formulated in a bit more aspirational tone? I think you make a good job of that in the other paragraph, with "rule of thumb", "consider", etc.


    l0rinc commented at 2:31 PM on May 30, 2026:

    I have rephrased this 10 times again, let me know what you think.

  24. l0rinc force-pushed on May 14, 2026
  25. l0rinc commented at 8:23 PM on May 14, 2026: contributor

    Thanks for the comments, rewrote it 20 times, I think the new version is indeed easier to understand and contains the important parts - please let me know if it addressed your concerns.

  26. maflcko commented at 7:26 AM on May 15, 2026: member

    The developer notes already contain some advice that is debatable, was merged without much discussion, and is hard to remove once added.

    I think it would be fine to remove stuff, especially if it is stale, confusing, redundant, or controversial. See also the latest removal for reference: #32572

    I don't want to hijack this pull for removals, but I couldn't find any debatable sections, so if you want something removed, please leave a comment in #35296, and I may take it or ignore it :sweat_smile:

  27. l0rinc commented at 10:16 AM on May 15, 2026: contributor

    I don't want to hijack this pull for removals

    What's your opinion about this PR?

  28. maflcko commented at 11:29 AM on May 15, 2026: member

    What's your opinion about this PR?

    lgtm.

    nit: I think the notes encompass three large sections: (1) style/whitespace (2) debug tricks and tools, and (3) guidelines. So I think this belongs in section (3), but just a nit.

  29. ryanofsky approved
  30. ryanofsky commented at 6:19 PM on May 18, 2026: contributor

    Code review ACK 658927c877352e9486e8fa39b8af7545bce7dd22. Concept +0.5 as explained in previous review (seems like a positive change, but I'm wary of guidelines being added that are hard to remove later).

  31. DrahtBot requested review from rkrux on May 18, 2026
  32. in doc/developer-notes.md:313 in 658927c877
     308 | +a manual testing recipe in the commit message or PR description is an
     309 | +acceptable alternative.
     310 | +
     311 | +### Commit Structure for Tests
     312 | +
     313 | +Tests typically go in the commit whose behavior they prove. For a single
    


    w0xlt commented at 1:02 AM on May 21, 2026:

    nit: the previous sentence can be kept.

    Tests typically go in the commit whose behavior they prove. When 
    existing tests already cover the behavior being changed, update 
    those tests in the same commit as the behavior change. For a single
    

    l0rinc commented at 11:57 AM on May 30, 2026:

    Mostly done, let me know what you think.

  33. w0xlt commented at 1:02 AM on May 21, 2026: contributor

    ACK 658927c877352e9486e8fa39b8af7545bce7dd22

  34. sedited approved
  35. sedited commented at 8:00 AM on May 22, 2026: contributor

    ACK 658927c877352e9486e8fa39b8af7545bce7dd22

  36. hodlinator approved
  37. hodlinator commented at 3:28 PM on May 25, 2026: contributor

    ACK 658927c877352e9486e8fa39b8af7545bce7dd22

  38. achow101 commented at 12:38 AM on May 26, 2026: member

    Concept -0

    When introducing a new test, I strongly prefer the pattern of a commit that fixes the problem, followed by a commit that enforces the correct behavior in a test. This greatly eases review of such changes as the test is (usually) trivial to cherry-pick to master to verify that the test fails. Since we requires all commits to pass CI, the suggestion in these docs requires first a commit that adds a test for the wrong behavior, then a second commit that has the bug fix and a test change. This makes extracting the test to apply to master a manual task that is much more annoying to do.

    In general, I also don't like the idea of adding tests for things that we know are the incorrect behavior. This has been a problem in several places in the wallet and other areas of the codebase where a test exists, the behavior is changed to be more correct, and those tests fail. But because those tests lack any indication that they test for actual behavior and not intended behavior, countless hours are wasted on debugging and archaeology to determine that those tests are in fact supposed to now fail and could be deleted.

  39. in doc/developer-notes.md:317 in 658927c877
     312 | +
     313 | +Tests typically go in the commit whose behavior they prove. For a single
     314 | +behavior-changing commit, this placement can be verified by running the new
     315 | +test against the old implementation and observing it fail. If the current
     316 | +behavior is not already covered by tests, consider first adding a
     317 | +behavior-preserving test commit that documents it.
    


    naiyoma commented at 12:04 PM on May 26, 2026:

    I’m not opposed to this paragraph, but IMO it would be sufficient to just fix the behavior and add a test for the correct behavior in the same change, rather than having one commit first asserting the incorrect behavior.


    l0rinc commented at 2:35 PM on May 30, 2026:

    it would be sufficient to just fix the behavior and add a test for the correct behavior in the same change, rather than having one commit first asserting the incorrect behavior.

    a commit that fixes the problem, followed by a commit that enforces the correct behavior in a test

    I agree this is often fine for simple fixes, and the text now says a simple feature or bug fix can add its test in the same commit. The case I want to document is the non-trivial one, where the existing behavior is uncovered and the fix and test are intertwined. Adding the final test after the fix, or relying on reviewers to cherry-pick/revert code to make it fail, does not clearly record the smallest invariant being changed. It just documents the post-fix behavior without clarifying exactly what behavior was modified. It also makes review harder when reverting the fix is not clean or when the change is mixed with refactoring.

    The split is meant to avoid that: first add a characterization test that records the current result, potentially marking wrong or temporary expectations with a TODO; then the behavior-changing commit removes the TODO and updates the assertion. That makes the diff show exactly which expectation changed, while still keeping each commit passing.

    The reviewer or CI can just run both commits' tests to prove the exact effect of the behavior change, with no need for partial reverts unless that's simpler, which isn't always the case.

  40. DrahtBot added the label Needs rebase on May 26, 2026
  41. doc: clarify test placement guidance
    Add a General Testing section to the developer notes describing when behavior changes should usually be covered by automated tests, when manual testing can be acceptable, and where test coverage can fit in a commit stack.
    
    Explain that existing tests should be updated with the behavior-changing commit, while simple uncovered behavior can often be tested in the same commit as the feature or bug fix.
    
    For non-trivial refactors, describe the expected split where the preserved behavior is covered first, then the refactor commit proves it did not change that behavior by not needing expectation updates.
    
    For non-trivial changes to uncovered existing behavior, describe the split where a behavior-preserving characterization test first records the current behavior, then the behavior-changing commit updates that assertion to the intended behavior. If the recorded behavior is known to be wrong or temporary, mark it with a TODO comment so it is not mistaken for intended behavior later.
    
    Replace the existing CONTRIBUTING.md sentence with a link to the new developer-notes section, so the detailed guidance lives in one place.
    
    Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
    Co-authored-by: optout <13562139+optout21@users.noreply.github.com>
    Co-authored-by: ryanofsky <ryan@ofsky.org>
    Co-authored-by: sedited <seb.kung@gmail.com>
    Co-authored-by: w0xlt <94266259+w0xlt@users.noreply.github.com>
    Co-authored-by: Ava Chow <github@achow101.com>
    914175ba25
  42. l0rinc force-pushed on May 30, 2026
  43. l0rinc commented at 2:49 PM on May 30, 2026: contributor

    I strongly prefer the pattern of a commit that fixes the problem, followed by a commit that enforces the correct behavior in a test.

    Please see my response in #35260 (review)

    This greatly eases review of such changes as the test is (usually) trivial to cherry-pick to master to verify that the test fails.

    What I'm suggesting should simplify this more.

    the suggestion in these docs requires first a commit that adds a test for the wrong behavior

    Yes, this is a feature: we document the invariant, followed by a focused narrow diff for the exact change the system is expected to experience. Otherwise it may be difficult to review the change, since we can only see the new behavior, not the diff.

    This makes extracting the test to apply to master a manual task that is much more annoying to do.

    Squashing the two commits is trivial; separating them is not. This suggestion favors reviewers and gives authors a bit more work.

    don't like the idea of adding tests for things that we know are the incorrect behavior

    countless hours are wasted on debugging and archaeology to determine that those tests are in fact supposed to now fail and could be deleted.

    Good point, I have extended the description to mention adding TODOs. Though in a sense what I'm suggesting is the opposite of what you describe above: document the current invalid behavior as we're fixing it, to make it obvious that the behavior change was intended.

  44. DrahtBot removed the label Needs rebase on May 30, 2026
  45. w0xlt commented at 8:16 PM on May 30, 2026: contributor

    ACK 914175ba25ddfa534a17430a2ec2c29623eed7dc

  46. DrahtBot requested review from sedited on May 30, 2026
  47. DrahtBot requested review from hodlinator on May 30, 2026
  48. DrahtBot requested review from ryanofsky on May 30, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-02 01:50 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me