Deadlock in net code #8297

issue jonasschnelli opened this issue on June 30, 2016
  1. jonasschnelli commented at 7:55 PM on June 30, 2016: contributor

    I'm running into this deadlock all the time:

    2016-06-29 15:41:50 POTENTIAL DEADLOCK DETECTED
    2016-06-29 15:41:50 Previous lock order was:
    2016-06-29 15:41:50  pnode->cs_vRecvMsg  net.cpp:1828 (TRY)
    2016-06-29 15:41:50  cs_main  main.cpp:4798
    2016-06-29 15:41:50  (1) pfrom->cs_filter  main.cpp:4856
    2016-06-29 15:41:50  (2) cs_vSend  net.cpp:2537
    2016-06-29 15:41:50 Current lock order is:
    2016-06-29 15:41:50  (2) pnode->cs_vSend  net.cpp:1847 (TRY)
    2016-06-29 15:41:50  cs_main  main.cpp:6373 (TRY)
    2016-06-29 15:41:50  pto->cs_inventory  main.cpp:6593
    2016-06-29 15:41:50  (1) pto->cs_filter  main.cpp:6616
    

    I'm running 63fbdbc94d76e586b2fb0754a9a69acfc4b622d3.

    It's the machine with 1GB/s SSD and 32GB RAM Xeon CPU E3-1275 v5 @ 3.60GHz

  2. MarcoFalke added the label P2P on Jun 30, 2016
  3. throckmortonsign commented at 1:57 AM on July 1, 2016: none
    2016-07-01 01:36:18 POTENTIAL DEADLOCK DETECTED
    2016-07-01 01:36:18 Previous lock order was:
    2016-07-01 01:36:18  (1) pnode->cs_vSend  net.cpp:1847 (TRY)
    2016-07-01 01:36:18  (2) cs_main  main.cpp:6372 (TRY)
    2016-07-01 01:36:18  (2) cs_main  main.cpp:3501
    2016-07-01 01:36:18 Current lock order is:
    2016-07-01 01:36:18  pnode->cs_vRecvMsg  net.cpp:1828 (TRY)
    2016-07-01 01:36:18  (2) cs_main  main.cpp:5437
    2016-07-01 01:36:18  (1) cs_vSend  net.cpp:2537
    

    I'm running 6a87eb0

  4. pstratem commented at 3:14 AM on July 1, 2016: contributor

    Can you verify that this is actually deadlocking?

    I believe this is a false positive. On Jun 30, 2016 12:55 PM, "Jonas Schnelli" notifications@github.com wrote:

    I'm running into this deadlock all the time:

    2016-06-29 15:41:50 POTENTIAL DEADLOCK DETECTED 2016-06-29 15:41:50 Previous lock order was: 2016-06-29 15:41:50 pnode->cs_vRecvMsg net.cpp:1828 (TRY) 2016-06-29 15:41:50 cs_main main.cpp:4798 2016-06-29 15:41:50 (1) pfrom->cs_filter main.cpp:4856 2016-06-29 15:41:50 (2) cs_vSend net.cpp:2537 2016-06-29 15:41:50 Current lock order is: 2016-06-29 15:41:50 (2) pnode->cs_vSend net.cpp:1847 (TRY) 2016-06-29 15:41:50 cs_main main.cpp:6373 (TRY) 2016-06-29 15:41:50 pto->cs_inventory main.cpp:6593 2016-06-29 15:41:50 (1) pto->cs_filter main.cpp:6616

    I'm running 63fbdbc https://github.com/bitcoin/bitcoin/commit/63fbdbc94d76e586b2fb0754a9a69acfc4b622d3 .

    It's the machine with 1GB/s SSD and 32GB RAM Xeon CPU E3-1275 v5 @ 3.60GHz

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub #8297, or mute the thread https://github.com/notifications/unsubscribe/AAl4Q7wqDEvSwyh5ntHH5Y32hK5WNe5dks5qRB8tgaJpZM4JCiHn .

  5. throckmortonsign commented at 3:52 AM on July 1, 2016: none

    In my case it seemed like a true deadlock. I only built with -DDEBUG_LOCKORDER after my node seemed to not sync newer blocks after a reindex on testnet.

  6. laanwj commented at 7:40 AM on July 1, 2016: member

    Neither of them looks like a false positive to me, they concern the same locks in different orders at the same time.

  7. MarcoFalke added this to the milestone 0.13.0 on Jul 8, 2016
  8. jonasschnelli commented at 12:14 PM on July 9, 2016: contributor

    My nodes is now running a couple of days in -disable-debug mode and there where no signs of deadlocks... maybe this is a false positive?

  9. throckmortonsign commented at 1:51 PM on July 9, 2016: none

    I haven't had any for a few days now (with debug). The whole reason I turned on debug in the first place was a suspected deadlock and it occurred under a similar set of circumstances. Seemed to occur just after reindexing and/or disconnecting peers. It doesn't seem to happen to me during normal operation and I haven't tried to replicate again.

  10. theuni commented at 9:57 PM on July 11, 2016: member

    I see these in my log too. While they're scary, after tracing through for a while, I don't think they're dangerous. (I do think we should fix up to avoid them, though).

    cs_vSend is held for the duration of SendMessages(), which, after b5599147533103efea896a1fc4ff51f2d3ad5808, sometimes locks cs_filter. Additionally, cs_filter is sometimes held while calling PushMessage, which locks cs_vSend.

    However, all of that happens on the message handling thread. The net thread locks cs_vSend, but never cs_filter.

    All of that could be worked around by making sure not to hold cs_filter while sending.

    As for the other potential deadlock, it's a similar story. When processing incoming messages, cs_main is sometimes held while calling PushMessage (which locks cs_vSend). When processing outgoing messages though, cs_vSend is always held, followed sometimes by locking cs_main in SendMessages. But the net thread that contends for cs_vSend never locks cs_main.

    The one exception is the optimistic send on new outgoing connections, which does lock cs_main and cs_vSend on the net thread, but not simultaneously.

    tl;dr: Both lock order reversals are happening on the message handling thread, and other threads don't attempt to lock both mutexes. I don't think either one presents an obvious deadlock risk. We should still strive to fix by minimizing overlapping locks though.

  11. pstratem commented at 1:01 AM on July 12, 2016: contributor

    Possibly the deadlock detection code should include the calling thread in it's tuple?

  12. laanwj commented at 10:38 AM on July 14, 2016: member

    Thanks for investigating @theuni. @pstratem Well, the deadlock detection code keeps track of all orders in which locks are taken, and remembers that for later. Hacking in an exception for locks taken by the same thread (e.g. remembering the thread) would result in quite muddled logic, and this code isn't the most pretty already.

  13. sipa commented at 10:42 AM on July 14, 2016: member

    @pstratem For reporting purposes, or for actual detection?

    I agree with @laanwj that we should not add exceptions for locks taken by only one thread. Either we treat that as a given and remove the lock entirely, or we treat it as a temporary situation in the current codebase, in which case the deadlock code should report it.

  14. laanwj removed this from the milestone 0.13.0 on Jul 14, 2016
  15. promag commented at 1:05 AM on February 24, 2018: member

    Is this still observed?

  16. MarcoFalke closed this on Feb 24, 2018

  17. DingGeng-GitHub commented at 9:04 AM on February 26, 2019: none

    错误: 2019-02-26T08:18:21Z POTENTIAL DEADLOCK DETECTED 2019-02-26T08:18:21Z Previous lock order was: 2019-02-26T08:18:21Z cs_main net_processing.cpp:2462 2019-02-26T08:18:21Z (1) pool.cs validation.cpp:398 2019-02-26T08:18:21Z cs_main wallet/wallet.cpp:1279 2019-02-26T08:18:21Z (2) cs_wallet wallet/wallet.cpp:1279 2019-02-26T08:18:21Z Current lock order is: 2019-02-26T08:18:21Z cs_main wallet/wallet.cpp:1800 2019-02-26T08:18:21Z (2) cs_wallet wallet/wallet.cpp:1800 2019-02-26T08:18:21Z (1) g_mempool.cs wallet/wallet.cpp:1821

    环境:Ubuntu16.04 Bitcoin ABC RPC client version v0.19.0.0-e619210

    只要启动环境就报上面的错,请问该怎么解决?QQ:707170913 解决有红包

  18. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-24 12:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me