Assertion failed: (data.size() > node.nSendOffset), function SocketSendData, file net.cpp, line 837 #27963

issue FelixWeis openend this issue on June 25, 2023
  1. FelixWeis commented at 7:11 am on June 25, 2023: contributor

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    crashed

    Expected behaviour

    not crash

    Steps to reproduce

    compiled master at d1ae96755a0f9d7e12c3f6741c030d8ea6d0416f, did mainnet IBD, around 40 hours after IBD crashed with the asseration fail.

    Relevant log output

    unclear if relevant but these are the final lines

    02023-06-24T06:15:52Z Saw new header hash=00000000000000000000f72e1a37859593a848a2f9b88eb9e5c3f0d8c971d89a height=795677
    12023-06-24T06:15:52Z [net] Saw new cmpctblock header hash=00000000000000000000f72e1a37859593a848a2f9b88eb9e5c3f0d8c971d89a peer=1
    22023-06-24T06:15:52Z UpdateTip: new best=00000000000000000000f72e1a37859593a848a2f9b88eb9e5c3f0d8c971d89a height=795677 version=0x20800000 log2_work=94.260280 tx=855881332 date='2023-06-24T06:15:33Z' progress=1.000000 cache=6.1MiB(47317txo)
    32023-06-24T06:20:25Z New outbound peer connected: version: 70016, blocks=795677, peer=557 (outbound-full-relay)
    42023-06-24T06:20:26Z New outbound peer connected: version: 70016, blocks=795677, peer=558 (outbound-full-relay)
    5Assertion failed: (data.size() > node.nSendOffset), function SocketSendData, file net.cpp, line 837.
    

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    master@d1ae96755a0f9d7e12c3f6741c030d8ea6d0416f

    Operating system and version

    macOS 13.4 (22F66)

    Machine specifications

    M2 Max

  2. FelixWeis commented at 7:14 am on June 25, 2023: contributor
     0Build Options:
     1  with external callbacks = no
     2  with benchmarks         = no
     3  with tests              = yes
     4  with ctime tests        = no
     5  with coverage           = no
     6  with examples           = no
     7  module ecdh             = no
     8  module recovery         = yes
     9  module extrakeys        = yes
    10  module schnorrsig       = yes
    11
    12  asm                     = no
    13  ecmult window size      = 15
    14  ecmult gen prec. bits   = 4
    15
    16  valgrind                = no
    17  CC                      = gcc
    18  CPPFLAGS                =
    19  SECP_CFLAGS             = -O2  -std=c89 -pedantic -Wno-long-long -Wnested-externs -Wshadow -Wstrict-prototypes -Wundef -Wno-overlength-strings -Wall -Wno-unused-function -Wextra -Wcast-align -Wconditional-uninitialized -Wreserved-identifier -fvisibility=hidden
    20  CFLAGS                  = -g -O2
    21  LDFLAGS                 =
    22
    23Options used to compile and link:
    24  external signer = yes
    25  multiprocess    = no
    26  with experimental syscall sandbox support = no
    27  with libs       = yes
    28  with wallet     = yes
    29    with sqlite   = yes
    30    with bdb      = no
    31  with gui / qt   = no
    32  with zmq        = yes
    33  with test       = yes
    34  with fuzz binary = yes
    35  with bench      = yes
    36  with upnp       = no
    37  with natpmp     = no
    38  use asm         = yes
    39  USDT tracing    = no
    40  sanitizers      =
    41  debug enabled   = no
    42  gprof enabled   = no
    43  werror          = no
    44  LTO             = no
    45
    46  target os       = darwin22.5.0
    47  build os        = darwin22.5.0
    48
    49  CC              = gcc
    50  CFLAGS          = -pthread -g -O2
    51  CPPFLAGS        =   -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3  -DHAVE_BUILD_INFO -Xclang -internal-isystem/usr/local/include -DMAC_OSX -DOBJC_OLD_DISPATCH_PROTOTYPES=0 -DPROVIDE_FUZZ_MAIN_FUNCTION
    52  CXX             = g++ -std=c++17
    53  CXXFLAGS        =    -Wstack-protector -fstack-protector-all  -Wall -Wextra -Wgnu -Wformat -Wformat-security -Wvla -Wshadow-field -Wthread-safety -Wloop-analysis -Wredundant-decls -Wunused-member-function -Wdate-time -Wconditional-uninitialized -Woverloaded-virtual -Wsuggest-override -Wunreachable-code-loop-increment -Wimplicit-fallthrough -Wdocumentation  -Wno-unused-parameter -Wno-self-assign    -g -O2
    54  LDFLAGS         =  -lpthread  -Wl,-bind_at_load   -Wl,-headerpad_max_install_names -Wl,-dead_strip -Wl,-dead_strip_dylibs
    55  AR              = /usr/bin/ar
    56  ARFLAGS         = cr
    
  3. maflcko added the label Bug on Jun 25, 2023
  4. maflcko added the label macOS on Jun 25, 2023
  5. maflcko added the label P2P on Jun 25, 2023
  6. maflcko commented at 8:39 am on June 25, 2023: member

    around 40 hours after IBD crashed with the asseration fail.

    I guess this makes it hard to reproduce with gdb attached or with more logging enabled?

    Glancing at the code, I don’t see how this could happen unless macOS send returned a value larger than the len passed in?

    https://github.com/bitcoin/bitcoin/blame/d1ae96755a0f9d7e12c3f6741c030d8ea6d0416f/src/net.cpp#L837

    The last change in the function was https://github.com/bitcoin/bitcoin/commit/691eaf8873fe2f189153ca637506a0291504c97a but that shouldn’t be the (direct) cause of the assert failure.

  7. Sjors commented at 2:41 pm on June 26, 2023: member

    If I read correctly it happened after IBD.

    I guess -debug=net would useful here, if you can ever reproduce it. It might give us a hint which message we were about to react to.

    As an aside, since you configured the wallet, be careful loading descriptor wallets on master until #27915 is resolved.

  8. maflcko commented at 2:53 pm on June 26, 2023: member

    I guess -debug=net would useful here, if you can ever reproduce it. It might give us a hint which message we were about to react to.

    If you do this, make sure to add additional logging (maybe nBytes, data.size(), and node.nSendOffset after the Send()?) in the function that asserted, because there is none.

  9. FelixWeis commented at 10:08 pm on June 26, 2023: contributor
     02023-06-26T19:44:34.554349Z [net] received: tx (313 bytes) peer=44
     12023-06-26T19:44:34.554644Z [mempool] AcceptToMemoryPool: peer=44: accepted 783c947208643cb757b916a9693fcbbac89ae4059ec748ff8c78735739c831bc (poolsz 21998 txn, 70199 kB)
     22023-06-26T19:44:34.554678Z [net] received: tx (313 bytes) peer=44
     32023-06-26T19:44:34.554966Z [mempool] AcceptToMemoryPool: peer=44: accepted ccec8d2d77fcfca2386280efaff17fa5b3011d8bae3b91a2fdd2345a1ee9e5b8 (poolsz 21999 txn, 70200 kB)
     42023-06-26T19:44:34.554993Z [net] received: tx (266 bytes) peer=44
     52023-06-26T19:44:34.555309Z [mempool] AcceptToMemoryPool: peer=44: accepted 62e9ed54e4a48ceb8958c1bc37144a72b61043c1bcdeb0002fa7ea63bc0169fb (poolsz 22000 txn, 70201 kB)
     62023-06-26T19:44:34.555345Z [net] received: tx (266 bytes) peer=44
     72023-06-26T19:44:34.555626Z [mempool] AcceptToMemoryPool: peer=44: accepted 8ef118ace1c5019d6e77f9ad216612abe66745f40b8c5f9630f14bf97f90c3f9 (poolsz 22001 txn, 70202 kB)
     82023-06-26T19:44:34.660816Z [net] sending inv (1261 bytes) peer=49
     92023-06-26T19:44:34.767010Z [net] sending inv (1261 bytes) peer=61
    102023-06-26T19:44:35.286961Z [net] sending inv (1261 bytes) peer=59
    112023-06-26T19:44:35.599720Z [net] sending inv (1261 bytes) peer=43
    122023-06-26T19:44:35.601098Z [net] sending inv (1261 bytes) peer=82
    132023-06-26T19:44:35.807384Z [net] sending inv (541 bytes) peer=61
    142023-06-26T19:44:36.464645Z [net] sending inv (541 bytes) peer=49
    15Assertion failed: (data.size() > node.nSendOffset), function SocketSendData, file net.cpp, line 837.
    16zsh: abort      ./src/bitcoind -txindex -rest -debug=mempool -debug=net -logtimemicros
    

    crashed a few times now.

  10. maflcko commented at 4:29 am on June 27, 2023: member

    Can you try with my suggested diff?

     0diff --git a/src/net.cpp b/src/net.cpp
     1index a96ffcfbe9..6c9f928f0d 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -854,6 +854,10 @@ size_t CConnman::SocketSendData(CNode& node) const
     5             node.nSendBytes += nBytes;
     6             node.nSendOffset += nBytes;
     7             nSentSize += nBytes;
     8+            if (node.nSendOffset > data.size()) {
     9+            LogPrint(BCLog::NET, "socket sent %s bytes (total: %s, offset: %s) for peer=%d.\n", nBytes, data.size(), node.nSendOffset, node.GetId());
    10+            assert(false);
    11+            }
    12             if (node.nSendOffset == data.size()) {
    13                 node.nSendOffset = 0;
    14                 node.nSendSize -= data.size();
    
  11. Sjors commented at 7:40 am on June 27, 2023: member

    @FelixWeis do you have inbound peers? After you’ve tried with the extra logging, maybe also try with -listen=0.

    I’m running master @ 679f825ba3aae45af4476f357a67f8e6ec9483e2 now with this patch on an Intel macOS 13.4.1. Will update here if something happens. I haven’t noticed such crashes with v25.0 which I’ve run for multiple days.

  12. FelixWeis commented at 9:09 am on June 27, 2023: contributor
    no inboud peers, node ran smoothly overnight. crashed 2 times yesterday within a few hours.
  13. FelixWeis commented at 1:55 pm on June 27, 2023: contributor

    with @MarcoFalke patch

    02023-06-27T11:10:56.851484Z [mempool] AcceptToMemoryPool: peer=8: accepted 57303c3ef5097e684419aa326d2a0048005f1571c1b26ffa315b9b8119143907 (poolsz 40693 txn, 132600 kB)
    12023-06-27T11:10:57.059933Z [net] sending inv (37 bytes) peer=2
    22023-06-27T11:10:57.060104Z [net] socket sent 28 bytes (total: 24, offset: 28) for peer=2.
    3Assertion failed: (false), function SocketSendData, file net.cpp, line 859.
    
    02023-06-27T15:47:31.186625Z [net] sending ping (8 bytes) peer=0
    12023-06-27T15:47:31.227146Z [net] socket sent 17 bytes (total: 24, offset: 34) for peer=0.
    2Assertion failed: (false), function SocketSendData, file net.cpp, line 859.
    
  14. maflcko commented at 2:04 pm on June 27, 2023: member

    So that seems to confirm my guess that this is a macOS bug? (The total data is 24 bytes, but macOS send() returned 28)

    My suggestion would be to call Tim Cook or switch to Linux.

  15. maflcko added the label Upstream on Jun 27, 2023
  16. maflcko removed the label Bug on Jun 27, 2023
  17. Sjors commented at 9:50 am on June 29, 2023: member
    Maybe even more specifically an M2 bug? I’ve been using macOS for years on two Intel machines, leaving nodes running for weeks or months at a time, and never encountered this. Similarly there’s enough devs running M1 machines that we probably would have caught this.
  18. FelixWeis commented at 8:16 am on June 30, 2023: contributor
    reverted to compiling 6ee3881551f2cd411c4e4d8b0ccedf0f0416d8c2 (tag: v25.0). same cpu, os, compiler version. runs smooth since 3 days
  19. maflcko commented at 8:21 am on June 30, 2023: member

    runs smooth since 3 days

    This implies it is not an upstream bug, or the issue fixed itself in the meantime. This may take some time, but you could try bisecting?

  20. Sjors commented at 8:29 am on July 2, 2023: member
    A bisect would be ideal. Though you could start with recent pull requests that touch net(_processing).cpp and were not backported to the v25.0 release. E.g. #27626 and #27625.
  21. fanquake commented at 8:50 am on July 6, 2023: member
    @FelixWeis were you able to perform a bisect here?
  22. maflcko commented at 6:51 am on July 17, 2023: member
    Closing for now, but please do let us know if this happens again on any commit, including the one you reported already above.
  23. maflcko closed this on Jul 17, 2023

  24. mononaut commented at 1:35 pm on December 20, 2023: none

    I believe I just saw the same issue reproduce on two separate instances running on the same machine:

    02023-12-20T10:28:46Z UpdateTip: new best=00000000000000000001036517697dfb912067d7f66f311d815a568b2c9dc166 height=822072 version=0x2d338000 log2_work=94.612828 tx=939105193 date='2023-12-20T10:28:26Z' progress=1.000000 cache=119.3MiB(745842txo)
    12023-12-20T10:29:02Z New outbound peer connected: version: 70016, blocks=822072, peer=722 (block-relay-only)
    22023-12-20T10:31:50Z New outbound peer connected: version: 70016, blocks=822072, peer=723 (block-relay-only)
    32023-12-20T10:37:08Z New outbound peer connected: version: 70016, blocks=822072, peer=724 (block-relay-only)
    42023-12-20T10:48:03Z New outbound peer connected: version: 70016, blocks=822072, peer=726 (outbound-full-relay)
    52023-12-20T10:48:09Z New outbound peer connected: version: 70016, blocks=822072, peer=727 (outbound-full-relay)
    62023-12-20T10:48:10Z New outbound peer connected: version: 70016, blocks=801108, peer=728 (outbound-full-relay)
    7Assertion failed: (data.size() > node.nSendOffset), function SocketSendData, file net.cpp, line 841.
    8abort      bitcoind 
    
    02023-12-20T10:28:46Z Saw new header hash=00000000000000000001036517697dfb912067d7f66f311d815a568b2c9dc166 height=822072
    12023-12-20T10:28:46Z [net] Saw new cmpctblock header hash=00000000000000000001036517697dfb912067d7f66f311d815a568b2c9dc166 peer=1488
    22023-12-20T10:28:47Z UpdateTip: new best=00000000000000000001036517697dfb912067d7f66f311d815a568b2c9dc166 height=822072 version=0x2d338000 log2_work=94.612828 tx=939105193 date='2023-12-20T10:28:26Z' progress=1.000000 cache=57.7MiB(53222txo)
    32023-12-20T10:42:02Z Flushed fee estimates to fee_estimates.dat.
    42023-12-20T10:48:33Z New outbound peer connected: version: 70016, blocks=822073, peer=2096 (outbound-full-relay)
    52023-12-20T10:48:34Z Saw new header hash=0000000000000000000384c96e559cdb827b0bb1d1341baa38914bc74d4bd958 height=822073
    62023-12-20T10:48:35Z UpdateTip: new best=0000000000000000000384c96e559cdb827b0bb1d1341baa38914bc74d4bd958 height=822073 version=0x21380004 log2_work=94.612842 tx=939109968 date='2023-12-20T10:47:38Z' progress=1.000000 cache=52.5MiB(16490txo)
    7Assertion failed: (data.size() > node.nSendOffset), function SocketSendData, file net.cpp, line 841.
    8abort      bitcoind
    

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    tag v25.1 @ 7da4ae1f78ab4f6c8b19c8ca89bd6b2a6c4836ea

    Operating system and version

    macOS 13.4.1 (22F77)

    Machine specifications

    Apple M1 Max

  25. Sjors commented at 2:07 pm on December 20, 2023: member
    @mononaut do you know if these nodes were connected via p2p encryption (BIP324)? (nvm, that’s not part of v25.1)
  26. mzumsande commented at 4:13 pm on December 20, 2023: contributor

    So it can happen both on M1 and M2. Also, it’s probably not due to something recently introduced in bitcoind, at least I can’t see any changes between v25.0 and v25.1 that look like they could be related - so the bisecting suggested earlier probably won’t lead to much either.

    The fact that it happened on two unrelated instances almost at the same time is also weird. Maybe the operating system can get some kind of hiccup and then suddenly the send syscall returns incorrect values? @mononaut for how long have you been running these instances before the crash, without any problems? Was anything unusual (possibly unrelated to bitcoind) happening on the machine around the time it crashed?

  27. mononaut commented at 5:46 pm on December 20, 2023: none

    I’ve been running the two instances for several months in more or less the same conditions without any problems, and running them on that specific v25.1 commit for at least a few weeks.

    It’s possible there was more network activity happening on the machine than usual, but I couldn’t say for sure.

  28. maflcko commented at 9:34 am on December 22, 2023: member
    It may be good to check the upstream source code inside macOS for recently introduced issues.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-11 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me