qt: lock cs_main, m_cached_tip_mutex in that order #19132

pull vasild wants to merge 1 commits into bitcoin:master from vasild:lock_order_m_cached_tip_mutex changing 1 files +15 −1
  1. vasild commented at 12:28 PM on June 1, 2020: member

    Always lock the mutexes cs_main and m_cached_tip_mutex in the same order: cs_main, m_cached_tip_mutex. Otherwise we may end up in a deadlock.

    ClientModel::m_cached_tip_blocks is protected by ClientModel::m_cached_tip_mutex. There are two access paths that lock the two mutexes in opposite order:

    validation.cpp:2868 CChainState::ActivateBestChain(): lock cs_main
    validation.cpp:2916 CChainState::ActivateBestChain(): call uiInterface.NotifyBlockTip()
    ui_interface.cpp:52 CClientUIInterface::NotifyBlockTip(): go deep in boost
    ...
    qt/clientmodel.cpp:255 BlockTipChanged(): lock m_cached_tip_mutex
    

    and

    qt/clientmodel.cpp:119 ClientModel::getBestBlockHash(): lock m_cached_tip_mutex
    qt/clientmodel.cpp:121 ClientModel::getBestBlockHash(): call m_node.getBestBlockHash()
    interfaces/node.cpp:200 NodeImpl::getBestBlockHash(): lock cs_main
    

    From debug.log:

    POTENTIAL DEADLOCK DETECTED
    Previous lock order was:
     m_cs_chainstate validation.cpp:2851
     (1) cs_main validation.cpp:2868
     ::mempool.cs validation.cpp:2868
     (2) clientmodel->m_cached_tip_mutex qt/clientmodel.cpp:255
    Current lock order is:
     (2) m_cached_tip_mutex qt/clientmodel.cpp:119
     (1) ::cs_main interfaces/node.cpp:200
    

    The possible deadlock was introduced in #17993

  2. fanquake added the label GUI on Jun 1, 2020
  3. fanquake requested review from ryanofsky on Jun 1, 2020
  4. fanquake requested review from jonasschnelli on Jun 1, 2020
  5. fanquake requested review from hebasto on Jun 1, 2020
  6. vasild commented at 12:58 PM on June 1, 2020: member

    Notice: this is just one possible fix. Other possible fixes are:

    1. Reverse the other access path, so we always have the order as m_cached_tip_mutex, cs_main. I think this would require bigger changes.

    2. Ditch m_cached_tip_mutex and protect ClientModel::m_cached_tip_blocks by cs_main. Adding more stuff under the protection of cs_main would in general reduce concurrency. Maybe not in this particular case if we are always holding cs_main anyway when we try to access ClientModel::m_cached_tip_blocks. Do we?

    cc @furszy

  7. in src/qt/clientmodel.cpp:127 in 367c8a6df7 outdated
     122 | +
     123 | +    if (!tip.IsNull()) {
     124 | +        return tip;
     125 | +    }
     126 | +
     127 | +    tip = m_node.getBestBlockHash(); // Will lock `cs_main` (and release it).
    


    MarcoFalke commented at 1:11 PM on June 1, 2020:
        tip = m_node.getBestBlockHash(); // Will lock `cs_main` (and release it). This is needed to be done early to not violate the implicit lock order cs_main -> m_cached_tip_mutex
    

    I think the comment about the lock order can be combined with this comment?


    vasild commented at 3:16 PM on June 1, 2020:

    Combined the two comments and reworded them a little.

  8. qt: lock cs_main, m_cached_tip_mutex in that order
    Always lock the mutexes `cs_main` and `m_cached_tip_mutex` in
    the same order: `cs_main`, `m_cached_tip_mutex`. Otherwise we may end up
    in a deadlock.
    
    `ClientModel::m_cached_tip_blocks` is protected by
    `ClientModel::m_cached_tip_mutex`. There are two access paths that
    lock the two mutexes in opposite order:
    
    ```
    validation.cpp:2868 CChainState::ActivateBestChain(): lock cs_main
    validation.cpp:2916 CChainState::ActivateBestChain(): call uiInterface.NotifyBlockTip()
    ui_interface.cpp:52 CClientUIInterface::NotifyBlockTip(): go deep in boost
    ...
    qt/clientmodel.cpp:255 BlockTipChanged(): lock m_cached_tip_mutex
    ```
    
    and
    
    ```
    qt/clientmodel.cpp:119 ClientModel::getBestBlockHash(): lock m_cached_tip_mutex
    qt/clientmodel.cpp:121 ClientModel::getBestBlockHash(): call m_node.getBestBlockHash()
    interfaces/node.cpp:200 NodeImpl::getBestBlockHash(): lock cs_main
    ```
    
    From `debug.log`:
    
    ```
    POTENTIAL DEADLOCK DETECTED
    Previous lock order was:
     m_cs_chainstate validation.cpp:2851
     (1) cs_main validation.cpp:2868
     ::mempool.cs validation.cpp:2868
     (2) clientmodel->m_cached_tip_mutex qt/clientmodel.cpp:255
    Current lock order is:
     (2) m_cached_tip_mutex qt/clientmodel.cpp:119
     (1) ::cs_main interfaces/node.cpp:200
    ```
    
    The possible deadlock was introduced in
    https://github.com/bitcoin/bitcoin/pull/17993
    f46b678acf
  9. in src/qt/clientmodel.cpp:137 in 367c8a6df7 outdated
     132 | +    // meantime. Thus, check again.
     133 |      if (m_cached_tip_blocks.IsNull()) {
     134 | -        m_cached_tip_blocks = m_node.getBestBlockHash();
     135 | +        m_cached_tip_blocks = tip;
     136 |      }
     137 |      return m_cached_tip_blocks;
    


    MarcoFalke commented at 1:17 PM on June 1, 2020:

    If the requirement to check IsNull twice is dropped, then I think this whole function can be written shorter in just 4 lines of code:

         uint256 tip{WITH_LOCK(m_cached_tip_mutex, return m_cached_tip_blocks)};
         if (!tip.IsNull()) return tip;
    
         tip = m_node.getBestBlockHash(); // Will lock `cs_main` (and release it). This is needed to be done early to not violate the implicit lock order cs_main -> m_cached_tip_mutex
    
         return WITH_LOCK(m_cached_tip_mutex, return m_cached_tip_blocks = tip);
    

    vasild commented at 3:27 PM on June 1, 2020:

    The above snippet can end up setting/overwriting m_cached_tip_blocks if it is not null. I.e. if it is null during the if, later some other thread sets it to some non null value (while we are calling m_node.getBestBlockHash()) and we end up overwriting what the other thread set.

    Maybe that is ok in this case or maybe it cannot happen due to some constraints outside of this function. This code is new to me so I played safe :)

    Btw if (A) return B; has two disadvantages compared to putting the return on a separate line:

    1. it is not possible to set a breakpoint on return B; only. For example if the condition is evaluated 1000 times, but A is true only a few times and one wants to break when return B; is executed.
    2. it will skew line-based code coverage.
  10. vasild force-pushed on Jun 1, 2020
  11. furszy commented at 5:20 PM on June 1, 2020: member

    Hmm, good catch.

    Would actually think it on the other way around. The problem shouldn't be on the slot side (the slot can be inside the same process or be another process -- GUI division -- or whatever). Better to review the reason behind cs_main being locked on the signal trigger side on the first place.

    Why not doing the same as what is being done on the NotifyHeaderTip function. Trigger the signal without cs_main locked.

  12. MarcoFalke commented at 5:37 PM on June 1, 2020: member

    See commit d86edd3d3093be4c00d2c6a6fde6dfa77e2f4855

  13. jonasschnelli commented at 7:22 AM on June 5, 2020: contributor

    Tested ACK f46b678acff0b2e75e26aa50b14d935b3d251a2a

  14. jonasschnelli removed review request from jonasschnelli on Jun 5, 2020
  15. jonasschnelli merged this on Jun 5, 2020
  16. jonasschnelli closed this on Jun 5, 2020

  17. vasild deleted the branch on Jun 5, 2020
  18. sidhujag referenced this in commit e70bd9a484 on Jun 5, 2020
  19. Fabcien referenced this in commit 9ad2859f2e on Feb 23, 2021
  20. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-21 18:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me