Enable consistency checks by default with --enable-debug #24709

issue maflcko openend this issue on March 29, 2022
  1. maflcko commented at 3:47 pm on March 29, 2022: member
    Now that most consistency checks have been made a runtime option instead of configure option (see commit 803ef70fd9f65ef800567ff9456fac525bc3e3c2), it could make sense to enable them on --enable-debug. Otherwise they are only run when explicitly passed the runtime setting.
  2. maflcko added the label Feature on Mar 29, 2022
  3. maflcko added the label Brainstorming on Mar 29, 2022
  4. maflcko added the label Tests on Mar 29, 2022
  5. bitcoin deleted a comment on Mar 30, 2022
  6. jonatack commented at 11:40 am on April 4, 2022: contributor
    Proposed for DEBUG_LOCKCONTENTION in #24757. This seems non-troublesome, as it is behind a logging config or runtime option.
  7. fanquake referenced this in commit a39002e0c6 on May 20, 2022
  8. fanquake referenced this in commit bd57b4e0c0 on May 25, 2022
  9. sidhujag referenced this in commit 84a4c6614a on May 28, 2022
  10. willcl-ark referenced this in commit 8c6ba54ace on Mar 21, 2023
  11. willcl-ark commented at 11:52 am on March 22, 2023: member

    @MarcoFalke, I’ve implemented this in a branch but don’t feel I understand the rationale for the change fully to open a PR yet. Is the idea that (mainly devs) running after building with --enable-debug would have this running by default to potentially catch consistency issues e.g. on mainnet?

    (Otherwise it seems to me that we could just enable it via the flag in the two Ci tests which build in debug mode, native QT5 and i686 multiprocess)

  12. maflcko commented at 12:05 pm on March 22, 2023: member

    Is the idea that …

    Yes.

    Otherwise it seems odd to have an --enable-debug setting that doesn’t enable debug checks.

    Maybe the --enable-debug setting can be made more clear that this enables stricter checks, which could terminate the program or throw additional errors, that otherwise wouldn’t appear. See also #26338 (comment)

  13. ajtowns commented at 7:21 am on May 19, 2023: contributor

    Not sure this is a good idea? If the consistency check is runtime configurable, why not just configure it at runtime – anyone who’s going to actively debug any problems found isn’t going to find it too hard to add a line in bitcoin.conf? If the problem is just that there might be lots of different options and someone might want to get all of them, wouldn’t adding an allchecks=1 option be better? (Or -checks=addrman -checks=mempool -checks=all similar to the -debug=* option)

    The drawback of setting these options on by default is that they might make a node unusable under load (https://github.com/bitcoin/bitcoin/pull/27300#issuecomment-1479890611 for addrman checks or #27586 for others). Due to cs_main and the mempool, a lot of what we do is effectively single-threaded, so using 100% of a single core can easily be a pretty severe bottleneck.

  14. maflcko commented at 7:36 am on May 19, 2023: member

    I think it is pretty clear that we’ll have to add at least one more configure flag for (expensive) debug checks. See --enable-slow-debug (https://github.com/bitcoin/bitcoin/issues/27586#issuecomment-1553973702) and --debug-mode (https://github.com/bitcoin/bitcoin/pull/27300#issuecomment-1479890611). So I fail to see the downside of putting more (runtime configurable) consistency checks into that new flag.

    Regardless, it is already questionable whether --enable-debug is suitable for production loads, given that it may crash your node at any time, because additional (cheap) assertions are enabled. So I’d even go as far and extract a third configure option to only enable the “I want to be able to attach gdb” part. --enable-debug-symbols (https://github.com/bitcoin/bitcoin/pull/27300#issuecomment-1479775571)?

  15. willcl-ark commented at 8:28 am on May 19, 2023: member

    Before I actually push any more changes, I wonder if the following would work and feel “expected” to developers:

    1. --enable-debug: behaviour change to disable lockorder/lockcontention consistency checks + behaviour change to disable boost_multi_index_safe_mode
    2. --enable-debug-checks: implies 1. plus enables consistency checks (addrman, lockorder, lockcontention)
    3. --enable-debug-all?: implies 2. plus enables boost_multi_index_safe_mode

    The way I had been approaching this was to use a new DEBUG_CONFIGURATION macro in configure to avoid too much code repetition in configure (not updated with boost_multi_index_safe_mode code yet):

     0define([DEBUG_CONFIGURATION],
     1[
     2  dnl If debugging is enabled, and the user hasn't overridden CXXFLAGS, clear
     3  dnl them, to prevent autoconfs "-g -O2" being added. Otherwise we'd end up
     4  dnl with "-O0 -g3 -g -O2".
     5  if test "$CXXFLAGS_overridden" = "no"; then
     6    CXXFLAGS=""
     7  fi
     8
     9  dnl Disable all optimizations
    10  AX_CHECK_COMPILE_FLAG([-O0], [DEBUG_CXXFLAGS="$DEBUG_CXXFLAGS -O0"], [], [$CXXFLAG_WERROR])
    11
    12  dnl Prefer -g3, fall back to -g if that is unavailable.
    13  AX_CHECK_COMPILE_FLAG(
    14    [-g3],
    15    [DEBUG_CXXFLAGS="$DEBUG_CXXFLAGS -g3"],
    16    [AX_CHECK_COMPILE_FLAG([-g], [DEBUG_CXXFLAGS="$DEBUG_CXXFLAGS -g"], [], [$CXXFLAG_WERROR])],
    17    [$CXXFLAG_WERROR])
    18
    19  AX_CHECK_PREPROC_FLAG([-DDEBUG], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DDEBUG"], [], [$CXXFLAG_WERROR])
    20  $1
    21  AX_CHECK_PREPROC_FLAG([-DABORT_ON_FAILED_ASSUME], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DABORT_ON_FAILED_ASSUME"], [], [$CXXFLAG_WERROR])
    22  AX_CHECK_PREPROC_FLAG([-DRPC_DOC_CHECK], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DRPC_DOC_CHECK"], [], [$CXXFLAG_WERROR])
    23  AX_CHECK_COMPILE_FLAG([-ftrapv], [DEBUG_CXXFLAGS="$DEBUG_CXXFLAGS -ftrapv"], [], [$CXXFLAG_WERROR])
    24])
    25
    26if test "$enable_debug" = "yes"; then
    27  DEBUG_CONFIGURATION([AX_CHECK_PREPROC_FLAG([-DRPC_DOC_CHECK], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DRPC_DOC_CHECK"], [], [$CXXFLAG_WERROR])])
    28fi
    29
    30if test "$enable_debug_checks" = "yes"; then
    31  DEBUG_CONFIGURATION([
    32    AX_CHECK_PREPROC_FLAG([-DDEBUG_ADDRMAN], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DDEBUG_ADDRMAN"], [], [$CXXFLAG_WERROR])
    33    AX_CHECK_PREPROC_FLAG([-DDEBUG_LOCKORDER], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DDEBUG_LOCKORDER"], [], [$CXXFLAG_WERROR])
    34    AX_CHECK_PREPROC_FLAG([-DDEBUG_LOCKCONTENTION], [DEBUG_CPPFLAGS="$DEBUG_CPPFLAGS -DDEBUG_LOCKCONTENTION"], [], [$CXXFLAG_WERROR])
    35  ])
    36fi
    37
    38dnl Add enable_debug_all here
    

    Before I make any more changes though I’d like to make sure that I’m getting the right balance of preconfigured-bundles-of-options vs. just set it at runtime. To me, the above 3 preconfigured option bundles make sense in the absence of totally changing (or renaming) --enable-debug.

    Thoughts?

  16. ajtowns commented at 7:44 am on May 22, 2023: contributor

    Linux distros do detached debug symbols so that you can grab them after the fact and just use them with your regular production binaries. cf http://debug.mirrors.debian.org/debian-debug/pool/main/b/bitcoin/ Doing that for our guix builds might be worthwhile, in which case perhaps we could always build with debug symbols?

    Thinking about it more, I think for me the levels are more something like:

    • production – should only abort on unrecoverable errors, should be as fast as possible
    • debug – should run any checks that have negligible impact on performance and crash on errors so they’re easy to detect. good for unit/functional/fuzz tests and also running against mainnet (to catch bugs triggered by live behaviour but not our test suite) as long as you can deal with unnecessary crashes
    • thorough – additional tests that more thoroughly check for bugs but cause serious performance impacts

    But I think “causes serious performance impacts” means they should be selected individually, not just turned on by default – so you might run with valgrind, or with sanitizers, or with addrman checks, or with boost multiindex checks, but you should be picking which of those you’re comfortable enabling, because some aren’t suitable for some workloads (like running on mainnet) and some are incompatible with others?

    And then at that point, some thorough checks can be enabled at runtime (addrman checks, valgrind), and some can only be enabled at compile time (boost multiindex, sanitizers, Assume checks).

    With that perspective, --enable-slow-debug maybe doesn’t make sense; instead it should just be --enable-boostmultiindexsafemode and that should be added to some of the CI tasks?

    Getting back to the original point: if you’re always enabling each “thorough” check explicitly, there’s (IMHO) not much advantage to doing it via a compile time flag that just changes a default vs explicitly setting it in bitcoin.conf; and the code’s simpler if we only provide one way to override the default.

  17. fanquake commented at 9:23 am on May 22, 2023: member

    Doing that for our guix builds might be worthwhile

    Note that we already do this as a part of our Guix builds. See release-process.md:

    The *-debug* files generated by the guix build contain debug symbols for troubleshooting by developers. It is assumed that anyone that is interested in debugging can run guix to generate the files for themselves. To avoid end-user confusion about which file to pick, as well as save storage space do not upload these to the bitcoincore.org server, nor put them in the torrent.

  18. maflcko commented at 8:22 am on June 27, 2023: member
    Closing per #24709 (comment)
  19. maflcko closed this on Jun 27, 2023

  20. jonatack commented at 4:11 pm on June 27, 2023: contributor

    The drawback of setting these options on by default is that they might make a node unusable under load (#27300 (comment) for addrman checks or #27586 for others). Due to cs_main and the mempool, a lot of what we do is effectively single-threaded, so using 100% of a single core can easily be a pretty severe bottleneck.

    I can confirm that my node sees block download timeouts and has trouble keeping up with the mainnet chain tip when it is run with checkmempool=1, for instance.

  21. bitcoin locked this on Jun 26, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me