tracing: explicitly cast block_connected duration to nanoseconds #29877

pull 0xB10C wants to merge 1 commits into bitcoin:master from 0xB10C:2024-04-tracing-cast-duration-to-µs changing 4 files +17 −10
  1. 0xB10C commented at 1:11 pm on April 15, 2024: contributor

    When the validation:block_connected tracepoint was introduced in 8f37f5c2a562c38c83fc40234ade9c301fc4e685, the connect block duration was passed in microseconds µs. By starting to use steady clock in fabf1cdb206e368a9433abf99a5ea2762a5ed2c0 this changed to nanoseconds ns. As the test only checked if the duration value is > 0 as a plausibility check, this went unnoticed. This was detected this when setting up monitoring for block validation time as part of the Great Consensus Cleanup Revival discussion.

    This change casts the duration explicitly to nanoseconds, updates the documentation, and adds a check for an upper bound to the tracepoint interface tests. The upper bound is quite lax as mining the block takes much longer than connecting the empty test block. It’s however able to detect a duration passed in an incorrect unit (1000x off).

    A previous version of this PR casted the duration to microseconds µs - however, as the last three major releases have had the duration as nanoseconds (and this went unnoticed), we assume that this is the API now and changeing it back to microseconds would break the API again. See also #29877 (comment)

  2. DrahtBot commented at 1:11 pm on April 15, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK maflcko
    Stale ACK laanwj

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. maflcko commented at 1:24 pm on April 15, 2024: member

    How is is possible that this even compiled? I presume tracing just takes any type and value and passes it along as raw bytes, so std::chrono being type-safe does not help here?

    Edit: If yes, one could consider if adding a static_assert to ban std::chrono (and possibly other types) from tracing?

  4. laanwj added the label Utils/log/libs on Apr 15, 2024
  5. laanwj commented at 5:14 pm on April 15, 2024: member
    Code review ACK ca4eaa5777888292bbdef3ae8d45af6a407df5df
  6. 0xB10C commented at 11:09 am on April 17, 2024: contributor

    How is is possible that this even compiled? I presume tracing just takes any type and value and passes it along as raw bytes, so std::chrono being type-safe does not help here?

    Edit: If yes, one could consider if adding a static_assert to ban std::chrono (and possibly other types) from tracing?

    In this case, casting with e.g. int64_t{time_5 - time_start} would have failed to compile too. Maybe being explicit with the type we expect here, as we do with e.g. the utxocache tracepoints, is worth considering.

  7. maflcko commented at 11:18 am on April 17, 2024: member
    lgtm ACK ca4eaa5777888292bbdef3ae8d45af6a407df5df
  8. maflcko commented at 11:19 am on April 17, 2024: member

    In this case, casting with e.g. int64_t{time_5 - time_start} would have failed to compile too. Maybe being explicit with the type we expect here, as we do with e.g. the utxocache tracepoints, is worth considering.

    Another benefit would be that the code is self-documenting. If a comment for each argument is appended in-line, one could even generate the doc/tracing file from the source code, with the guarantee that it is accurate.

    But that can be done in a follow-up.

  9. DrahtBot added the label CI failed on Apr 18, 2024
  10. luke-jr commented at 2:38 am on April 21, 2024: member

    So 23.x and 24.x used microseconds, while 25.x, 26.x, and 27.x currently use nanoseconds?

    Seems like changing this back will disrupt more than it will fix…? Any reason not to just stick to nanoseconds and harden it against future accidents?

  11. bitcoin deleted a comment on Apr 21, 2024
  12. DrahtBot removed the label CI failed on Apr 23, 2024
  13. 0xB10C commented at 9:08 am on April 29, 2024: contributor

    Any reason not to just stick to nanoseconds and harden it against future accidents?

    I think updating the documentation and using Ticks<std::chrono::nanoseconds> would work just as well, yes..

  14. 0xB10C force-pushed on May 5, 2024
  15. 0xB10C renamed this:
    tracing: cast block_connected duration to µs
    tracing: explicitly cast block_connected duration to nanoseconds
    on May 5, 2024
  16. 0xB10C force-pushed on May 5, 2024
  17. 0xB10C commented at 11:11 am on May 5, 2024: contributor

    Any reason not to just stick to nanoseconds and harden it against future accidents?

    I think updating the documentation and using Ticks<std::chrono::nanoseconds> would work just as well, yes..

    Done. For me, not breaking the (already broken) API again is easier. Especially when running current and future releases in parallel. Sorry to invalidate your ACKs @maflcko @laanwj.

  18. 0xB10C force-pushed on Aug 8, 2024
  19. maflcko commented at 9:12 am on August 8, 2024: member
    lgtm ACK 34c18de369b7bd71334f849395723f49dffae560
  20. DrahtBot requested review from laanwj on Aug 8, 2024
  21. DrahtBot added the label Needs rebase on Sep 3, 2024
  22. tracing: cast block_connected duration to nanoseconds
    When the tracepoint was introduced in 8f37f5c2a562c38c83fc40234ade9c301fc4e685,
    the connect_block duration was passed in microseconds `µs`.
    By starting to use steady clock in fabf1cdb206e368a9433abf99a5ea2762a5ed2c0
    this changed to nanoseconds `ns`. As the test only checked if the
    duration value is `> 0` as a plausibility check, this went unnoticed.
    
    I detected this when setting up monitoring for block validation time
    as part of the Great Consensus Cleanup Revival discussion.
    
    This change casts the duration explicitly to nanoseconds (as it has been
    nanoseconds for the last three releases; switching back now would 'break'
    the broken API again; there don't seem to be many users affected), updates
    the documentation and adds a check for an upper bound to the tracepoint
    interface tests. The upper bound is quite lax as mining the block takes
    much longer than connecting the empty test block. It's however able to
    detect incorrect duration units passed.
    cd0edf26c0
  23. 0xB10C force-pushed on Sep 3, 2024
  24. maflcko commented at 12:38 pm on September 3, 2024: member
    re-lgtm ACK cd0edf26c07c8c615f3ae3ac040c4774dcc8e650
  25. DrahtBot removed the label Needs rebase on Sep 3, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-15 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me