Make mininode_lock non-reentrant #19178

pull jnewbery wants to merge 4 commits into bitcoin:master from jnewbery:2020-05-mininode-lock-reentrancy changing 2 files +5 −8
  1. jnewbery commented at 3:03 pm on June 5, 2020: member
    There’s no need for mininode_lock to be reentrant. Use a simpler non-recursive lock.
  2. [tests] Only acquire lock once in p2p_compactblocks.py edae6075aa
  3. [tests] Don't acquire mininode_lock twice in wait_for_broadcast() 9d80762fa0
  4. [tests] Don't call super twice in P2PTxInvStore.on_inv() c67c1f2c03
  5. [tests] Make mininode_lock non-reentrant
    There's no need for mininode_lock to be reentrant.
    Use a simpler non-recursive lock.
    62068381a3
  6. DrahtBot added the label Tests on Jun 5, 2020
  7. in test/functional/test_framework/mininode.py:495 in 62068381a3
    491@@ -492,7 +492,7 @@ def test_function():
    492 # P2PConnection acquires this lock whenever delivering a message to a P2PInterface.
    493 # This lock should be acquired in the thread running the test logic to synchronize
    494 # access to any data shared with the P2PInterface or P2PConnection.
    495-mininode_lock = threading.RLock()
    496+mininode_lock = threading.Lock()
    


    glozow commented at 6:48 pm on June 11, 2020:
    Seems like Lock will try to acquire indefinitely by default (I’m looking at these docs). I tried out some re-acquiring behavior with this code and it hangs forever as expected… Just in case someone like me does some dumb deadlocky stuff, would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error?

    glozow commented at 7:31 pm on June 11, 2020:
    Actually I have a question because I’m not 100% sure how the test_runner works - does it allot a certain amount of time for each test? It looks like it would run infinitely? How would it affect Travis if 1 test is in deadlock?

    MarcoFalke commented at 8:27 pm on June 11, 2020:

    Travis is running our ci system, which calls the test_runner helper:

    0  DOCKER_EXEC LD_LIBRARY_PATH=$DEPENDS_DIR/$HOST/lib ${TEST_RUNNER_ENV} test/functional/test_runner.py --ci $MAKEJOBS --tmpdirprefix "${BASE_SCRATCH_DIR}/test_runner/" --ansi --combinedlogslen=4000 ${TEST_RUNNER_EXTRA} --quiet --failfast
    

    If a test takes a long time, it will show up in the log as Remaining jobs: [foo_bar.py].

    If a test never finished, the virtual machine that the ci system runs on will eventually reboot and mark the run as failed.


    jnewbery commented at 10:04 pm on June 11, 2020:

    would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error

    We use the with <lock>: syntax everywhere, which doesn’t allow a blocking or timeout argument. I don’t think there’s a way to set a default time on a lock.

    I think if you did accidentally introduce a deadlock, you’d figure it out pretty easily. If your script was hanging, I think the best way would be to implement some way to attach pdb to the running process (eg https://stackoverflow.com/a/25329467/933705 or https://stackoverflow.com/a/147114/933705) and then inspect the stack trace.

    (in fact, I think this would be generally useful, so perhaps someone should implement it :slightly_smiling_face: )


    glozow commented at 11:57 pm on June 11, 2020:
    Ok this makes sense, thanks! I was worried I could accidentally stall Travis hahaha. Having some debugging help sounds useful, let me go look for a someone…
  8. in test/functional/test_framework/mininode.py:661 in 62068381a3
    657@@ -658,8 +658,6 @@ def on_inv(self, message):
    658                 # save txid
    659                 self.tx_invs_received[i.hash] += 1
    660 
    661-        super().on_inv(message)
    


    glozow commented at 6:50 pm on June 11, 2020:
    Good catch, I think this was from Amiti + me both adding it in separate PRs.
  9. glozow commented at 7:34 pm on June 11, 2020: member
    Concept ACK, I just learned a lot about locks haha. I have a comment about timeouts because opening up the potential for deadlock scares me 🥶
  10. jnewbery commented at 10:06 pm on June 11, 2020: member
    I wonder if we should actually change the Lock to be a Conditional Variable: https://docs.python.org/3/library/threading.html#condition-objects. A lot of what we do with the lock is polling for changes on objects, so having a notify() function that wakes the blocked thread could be much nicer.
  11. MarcoFalke commented at 10:31 pm on June 11, 2020: member
    Concept ACK. Seems like a good first step for future cleanups. (e.g. making the lock a private member and then replacing the polling loops with cvs)
  12. in test/functional/test_framework/mininode.py:662 in 62068381a3
    657@@ -658,8 +658,6 @@ def on_inv(self, message):
    658                 # save txid
    659                 self.tx_invs_received[i.hash] += 1
    660 
    661-        super().on_inv(message)
    662-
    663     def get_invs(self):
    664         with mininode_lock:
    


    gillichu commented at 6:01 pm on June 15, 2020:
    I’m a bit confused on why get_invs needs to grab the lock, but on_inv doesn’t? I understand that in the case of wait_for_broadcast, self.wait_until already acquires the lock, but I just wanted to confirm that on_inv already expects to be holding the lock on call.

    jnewbery commented at 6:46 pm on June 15, 2020:
    Yes. All of the on_<msgtype>() already hold the lock, since they’re called by the on_message() method (specifically, the getattr(self, 'on_' + msgtype)(message) line)
  13. gillichu commented at 6:07 pm on June 15, 2020: none
    Concept ACK, I learned more about how locks are built into the Bitcoin testing framework too. I’m not familiar with whether polling is a resource consuming activity/busy waiting in the current implementation, but in general use cases I know CV’s are a good replacement.
  14. MarcoFalke commented at 11:37 pm on June 15, 2020: member
    Shower thought: The lock could even be removed completely without a cv replacement if all code that previously required the lock was executed in the network thread. Not sure if this makes sense at all, so consider it a “shower thought”.
  15. jonatack commented at 9:29 am on June 16, 2020: member
    ACK 62068381a3b9c0
  16. MarcoFalke commented at 9:45 am on June 16, 2020: member

    ACK 62068381a3b9c065d81300be79abba7aecfdb41b 😃

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3ACK 62068381a3b9c065d81300be79abba7aecfdb41b 😃
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUg1zgv/Yt3dwdeZi7IRInaHKprMrQbGGBB/MvgXliI7NbdZZmVGTcONWCPMXWDA
     8NRPTCD2oP4scNp2uRwFYr3WGXo8XrHQImByajBJWrgMxwiPkzHGYN0VHh9WUJiQt
     94XQyQhtr6nNg3T9dPvTyM0aUblItMkh17dcnQoynH7FGSh7njbhD9m+Nr0yJFMaX
    10pMeiIyMTbOIao2VNX1htUF5XN0twFjHNvFKHlHTyJxSyK/85vd2UPYo1eCzgBQwa
    11Pq9Ax5/MUtHnD5lucuthH4IHC7HfGxoJ7FpQGTZWPTDH7LA9HPn5z1D+JUXf7b3t
    12zOCvVNm1DSvBG47EcPEHDIBLvhT4AhNJOd+APWN4NTV7y2AiGZwufipvWbPuBjpS
    13J2YTw57Y7CIbos3O8fGDuA0bNAR1mYNwBrOvyqXtxQaS6KV0WKecCtRbFrxoAxu5
    14565+kQoGp4sjp3KAEalyPVu+td9orvKCejNRI5MloMcG55EKQ+3zLtQgGPoVBVx/
    155vm8xtfy
    16=YJKb
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash 236b7f082f677ae84f62c6b2ec461c0d23883d147cfecbaf72127c0e22631b33 -

  17. MarcoFalke merged this on Jun 16, 2020
  18. MarcoFalke closed this on Jun 16, 2020

  19. jnewbery deleted the branch on Jun 16, 2020
  20. sidhujag referenced this in commit 9be866b47d on Jul 7, 2020
  21. deadalnix referenced this in commit 2b1e2420f9 on Jan 22, 2021
  22. Fabcien referenced this in commit e7039654cd on Apr 23, 2021
  23. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me