http: Release work queue after event base finish #19033

pull promag wants to merge 1 commits into bitcoin:master from promag:2020-05-fix-race-interrupt-http-server changing 1 files +6 −4
  1. promag commented at 9:05 pm on May 20, 2020: member

    This fixes a race between http_request_cb and StopHTTPServer where the work queue is used after release.

    Fixes #18856.

  2. promag commented at 9:06 pm on May 20, 2020: member
    @MarcoFalke can’t find the issue where this was discussed 😞
  3. MarcoFalke commented at 9:30 pm on May 20, 2020: member
  4. promag commented at 9:32 pm on May 20, 2020: member
    Thanks! I was filtering for your issues…
  5. MarcoFalke commented at 9:32 pm on May 20, 2020: member

    Concept ACK, assuming that it does what we do in the python code. Haven’t looked at the code here.

    0self.network_event_loop.call_soon_threadsafe(self.network_event_loop.stop)
    
  6. MarcoFalke added the label RPC/REST/ZMQ on May 20, 2020
  7. MarcoFalke added the label Needs backport (0.20) on May 20, 2020
  8. MarcoFalke added this to the milestone 0.20.1 on May 20, 2020
  9. promag force-pushed on May 21, 2020
  10. promag commented at 8:58 am on May 21, 2020: member
    @hebasto included 38e2be4 from #17457. If you run QT_QPA_PLATFORM=cocoa src/qt/test/test_bitcoin-qt without this commit you will see the warning.
  11. tarboss commented at 3:34 pm on May 21, 2020: none

    setting env variable: QT_QPA_PLATFORM=windows ->same warning QWARN : AppTests::appTests() QObject::connect: Cannot queue arguments of type ‘size_t’ (Make sure ‘size_t’ is registered using qRegisterMetaType().)

    also on windows 10 & msvc the first test exits silent becoz fs::remove_all(m_path_root); in BasicTestingSetup::~BasicTestingSetup() says cannot remove dir,its not empty, but u can catch with try statement, if u want do more small fixes 👍

  12. promag commented at 3:37 pm on May 21, 2020: member

    setting env variable: QT_QPA_PLATFORM=windows ->same warning QWARN : AppTests::appTests() QObject::connect: Cannot queue arguments of type ‘size_t’ (Make sure ‘size_t’ is registered using qRegisterMetaType().) @tarboss 38e2be4be00914e8be7ae9c7215985c9b481b0ae fixes it right?

  13. tarboss commented at 3:40 pm on May 21, 2020: none

    at least that warning

    ********* Start testing of AppTests ********* Config: Using QtTest library 5.9.8, Qt 5.9.8 (x86_64-little_endian-llp64 static debug build; by MSVC 2017) PASS : AppTests::initTestCase() QINFO : AppTests::appTests() Backing up GUI settings to “C:\Users\tester\AppData\Local\Temp\test_common_Bitcoin Core\b516affdfbc77f1640d7e3e97f95db107556a19c4535813c07f5a21862418b89\regtest\guisettings.ini.bak” QDEBUG : AppTests::appTests() requestInitialize : Requesting initialize QDEBUG : AppTests::appTests() initialize : Running initialization in thread QDEBUG : AppTests::appTests() initializeResult : Initialization result: true QINFO : AppTests::appTests() Platform customization: “windows” QDEBUG : AppTests::appTests() TransactionTablePriv::refreshWallet QWARN : AppTests::appTests() QWindowsWindow::setGeometry: Unable to set geometry 751x430+598+336 on QWidgetWindow/‘RPCConsoleWindow’. Resulting geometry: 751x431+598+336 (frame: 8, 31, 8, 8, custom margin: 0, 0, 0, 0, minimum size: 751x365, maximum size: 16777215x16777215). QDEBUG : AppTests::appTests() requestShutdown : Requesting shutdown QDEBUG : AppTests::appTests() shutdown : Running Shutdown in thread QDEBUG : AppTests::appTests() NotifyUnload QDEBUG : AppTests::appTests() shutdown : Shutdown finished

    E:\win10dev\bc\bitcoin\build_msvc\x64\Debug\test_bitcoin-qt.exe (process 11012) exited with code 3. To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops. Press any key to close this window . . .

  14. in src/httpserver.cpp:452 in c8c1f2453b outdated
    449+        if (eventBase) {
    450+            event_base_once(eventBase, -1, EV_TIMEOUT, [](evutil_socket_t, short, void*) {
    451+                 workQueue->Interrupt();
    452+            }, nullptr, nullptr);
    453+        } else {
    454+            workQueue->Interrupt();
    


    laanwj commented at 5:06 pm on May 21, 2020:
    workQueue is not created if eventBase couldn’t be created, so it seems this else {} clause is redundant?

    promag commented at 5:22 pm on May 21, 2020:
    @laanwj changed approach, please take a new look.
  15. promag force-pushed on May 21, 2020
  16. promag renamed this:
    http: Fix workQueue race on InterruptHTTPServer
    http: Release work queue after event base finish
    on May 21, 2020
  17. promag commented at 5:25 pm on May 21, 2020: member
    This might result in a connection reset by peer or something like that because one request job can be enqueued but not handled (no response sent).
  18. laanwj commented at 12:54 pm on May 26, 2020: member

    I’m confused as to why the Qt change is in here. Please add an explanation to the opening post, or open a separate PR for it.

    ACK 1d39b7cf00f9d4a64d93caa9d47a177d759c2d6a

    ACK fec477712f9b37dc16b61a54e037073974633f67

  19. promag commented at 1:09 pm on May 26, 2020: member
    @laanwj see #19033 (comment). I can drop the commit, again.
  20. MarcoFalke commented at 1:18 pm on May 26, 2020: member
    I prefer a new pull as well. The gui test thing doesn’t need backport
  21. promag force-pushed on May 26, 2020
  22. promag commented at 1:22 pm on May 26, 2020: member
    Done, and rebased.
  23. laanwj added this to the "Blockers" column in a project

  24. hebasto approved
  25. hebasto commented at 4:13 pm on June 4, 2020: member
    ACK fec477712f9b37dc16b61a54e037073974633f67, I have not tested the code, but I have reviewed it and it looks OK, I agree it can be merged.
  26. MarcoFalke commented at 7:22 pm on June 6, 2020: member

    Testing with diff:

     0diff --git a/src/httpserver.cpp b/src/httpserver.cpp
     1index 5e78fd1d71..fe3a3a8aab 100644
     2--- a/src/httpserver.cpp
     3+++ b/src/httpserver.cpp
     4@@ -264,6 +264,7 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
     5     // Dispatch to worker thread
     6     if (i != iend) {
     7         std::unique_ptr<HTTPWorkItem> item(new HTTPWorkItem(std::move(hreq), path, i->handler));
     8+        UninterruptibleSleep(std::chrono::milliseconds{55});
     9         assert(workQueue);
    10         if (workQueue->Enqueue(item.get()))
    11             item.release(); /* if true, queue took ownership */
    

    Before:

     0$ ./test/functional/feature_uacomment.py --valgrind 
     12020-06-06T19:12:04.028000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_rjp_1t0x
     22020-06-06T19:12:14.514000Z TestFramework (INFO): test multiple -uacomment
     32020-06-06T19:12:25.066000Z TestFramework (INFO): test -uacomment max length
     42020-06-06T19:13:04.021000Z TestFramework (INFO): test -uacomment unsafe characters
     52020-06-06T19:14:39.777000Z TestFramework (ERROR): Assertion failed
     6Traceback (most recent call last):
     7  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 484, in assert_start_raises_init_error
     8    self.wait_for_rpc_connection()
     9  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 219, in wait_for_rpc_connection
    10    raise FailedToStartError(self._node_msg(
    11test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status -6 during initialization
    12
    13During handling of the above exception, another exception occurred:
    14
    15Traceback (most recent call last):
    16  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_framework.py", line 119, in main
    17    self.run_test()
    18  File "./test/functional/feature_uacomment.py", line 36, in run_test
    19    self.nodes[0].assert_start_raises_init_error(["-uacomment=" + unsafe_char], expected, match=ErrorMatch.FULL_REGEX)
    20  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 501, in assert_start_raises_init_error
    21    self._raise_assertion_error(
    22  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 161, in _raise_assertion_error
    23    raise AssertionError(self._node_msg(msg))
    24AssertionError: [node 0] Expected message "Error: User Agent comment \(\)\) contains unsafe characters." does not fully match stderr:
    25"Error: User Agent comment ()) contains unsafe characters.
    26bitcoind: httpserver.cpp:268: void http_request_cb(struct evhttp_request *, void *): Assertion `workQueue' failed."
    

    After:

      0$ while ./test/functional/feature_uacomment.py --valgrind ; do true ; done 
      12020-06-06T19:17:01.430000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_oy_7aci5
      22020-06-06T19:17:12.355000Z TestFramework (INFO): test multiple -uacomment
      32020-06-06T19:17:23.166000Z TestFramework (INFO): test -uacomment max length
      42020-06-06T19:18:02.112000Z TestFramework (INFO): test -uacomment unsafe characters
      52020-06-06T19:19:56.602000Z TestFramework (ERROR): Assertion failed
      6Traceback (most recent call last):
      7  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 484, in assert_start_raises_init_error
      8    self.wait_for_rpc_connection()
      9  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 219, in wait_for_rpc_connection
     10    raise FailedToStartError(self._node_msg(
     11test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization
     12
     13During handling of the above exception, another exception occurred:
     14
     15Traceback (most recent call last):
     16  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_framework.py", line 119, in main
     17    self.run_test()
     18  File "./test/functional/feature_uacomment.py", line 36, in run_test
     19    self.nodes[0].assert_start_raises_init_error(["-uacomment=" + unsafe_char], expected, match=ErrorMatch.FULL_REGEX)
     20  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 501, in assert_start_raises_init_error
     21    self._raise_assertion_error(
     22  File "/home/marco/workspace/btc_bitcoin_core/test/functional/test_framework/test_node.py", line 161, in _raise_assertion_error
     23    raise AssertionError(self._node_msg(msg))
     24AssertionError: [node 0] Expected message "Error: User Agent comment \(🏃\) contains unsafe characters." does not fully match stderr:
     25"Error: User Agent comment (🏃) contains unsafe characters.
     26==3931735== Invalid read of size 8
     27==3931735==    at 0x4ACA354: evhttp_request_get_output_buffer (in /usr/lib64/libevent-2.1.so.6.0.2)
     28==3931735==    by 0x3E67AC: HTTPRequest::WriteReply(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (httpserver.cpp:581)
     29==3931735==    by 0x3E62C5: HTTPRequest::~HTTPRequest() (httpserver.cpp:526)
     30==3931735==    by 0x3E9417: operator() (unique_ptr.h:84)
     31==3931735==    by 0x3E9417: ~unique_ptr (unique_ptr.h:360)
     32==3931735==    by 0x3E9417: ~HTTPWorkItem (httpserver.cpp:47)
     33==3931735==    by 0x3E9417: HTTPWorkItem::~HTTPWorkItem() (httpserver.cpp:47)
     34==3931735==    by 0x3EA42D: operator() (unique_ptr.h:84)
     35==3931735==    by 0x3EA42D: ~unique_ptr (unique_ptr.h:360)
     36==3931735==    by 0x3EA42D: _Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > (stl_construct.h:140)
     37==3931735==    by 0x3EA42D: __destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *> (stl_construct.h:152)
     38==3931735==    by 0x3EA42D: _Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *> (stl_construct.h:184)
     39==3931735==    by 0x3EA42D: _Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > (alloc_traits.h:738)
     40==3931735==    by 0x3EA42D: std::deque<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::allocator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > >::_M_destroy_data_aux(std::_Deque_iterator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >&, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >*>, std::_Deque_iterator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >&, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >*>) (deque.tcc:872)
     41==3931735==    by 0x3EA1F9: _M_destroy_data (stl_deque.h:2046)
     42==3931735==    by 0x3EA1F9: std::deque<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> >, std::allocator<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > > >::~deque() (stl_deque.h:1004)
     43==3931735==    by 0x3E5D7D: ~WorkQueue (httpserver.cpp:89)
     44==3931735==    by 0x3E5D7D: StopHTTPServer() (httpserver.cpp:480)
     45==3931735==    by 0x14329B: Shutdown(NodeContext&) (init.cpp:190)
     46==3931735==    by 0x13C627: AppInit (bitcoind.cpp:164)
     47==3931735==    by 0x13C627: main (bitcoind.cpp:180)
     48==3931735==  Address 0x1b92c238 is 152 bytes inside a block of size 216 free'd
     49==3931735==    at 0x483B9F5: free (vg_replace_malloc.c:538)
     50==3931735==    by 0x4ACA8BC: evhttp_connection_free (in /usr/lib64/libevent-2.1.so.6.0.2)
     51==3931735==    by 0x4ACACD4: evhttp_free (in /usr/lib64/libevent-2.1.so.6.0.2)
     52==3931735==    by 0x3E5D41: StopHTTPServer() (httpserver.cpp:472)
     53==3931735==    by 0x14329B: Shutdown(NodeContext&) (init.cpp:190)
     54==3931735==    by 0x13C627: AppInit (bitcoind.cpp:164)
     55==3931735==    by 0x13C627: main (bitcoind.cpp:180)
     56==3931735==  Block was alloc'd at
     57==3931735==    at 0x483CAE9: calloc (vg_replace_malloc.c:760)
     58==3931735==    by 0x4ACA66C: evhttp_request_new (in /usr/lib64/libevent-2.1.so.6.0.2)
     59==3931735==    by 0x4ACA7AC: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
     60==3931735==    by 0x4ACB074: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
     61==3931735==    by 0x4AB8850: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
     62==3931735==    by 0x4AB04A9: ??? (in /usr/lib64/libevent-2.1.so.6.0.2)
     63==3931735==    by 0x4AB0BA6: event_base_loop (in /usr/lib64/libevent-2.1.so.6.0.2)
     64==3931735==    by 0x3E59A6: ThreadHTTP(event_base*) (httpserver.cpp:292)
     65==3931735==    by 0x3E9D17: __invoke_impl<bool, bool (*)(event_base *), event_base *> (invoke.h:60)
     66==3931735==    by 0x3E9D17: __invoke<bool (*)(event_base *), event_base *> (invoke.h:95)
     67==3931735==    by 0x3E9D17: _M_invoke<0, 1> (thread:264)
     68==3931735==    by 0x3E9D17: operator() (thread:271)
     69==3931735==    by 0x3E9D17: std::thread::_State_impl<std::thread::_Invoker<std::tuple<bool (*)(event_base*), event_base*> > >::_M_run() (thread:215)
     70==3931735==    by 0x4C6FAC3: ??? (in /usr/lib64/libstdc++.so.6.0.28)
     71==3931735==    by 0x48C0431: start_thread (in /usr/lib64/libpthread-2.31.so)
     72==3931735==    by 0x4FEB9D2: clone (in /usr/lib64/libc-2.31.so)
     73==3931735== 
     74{
     75   <insert_a_suppression_name_here>
     76   Memcheck:Addr8
     77   fun:evhttp_request_get_output_buffer
     78   fun:_ZN11HTTPRequest10WriteReplyEiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
     79   fun:_ZN11HTTPRequestD1Ev
     80   fun:operator()
     81   fun:~unique_ptr
     82   fun:~HTTPWorkItem
     83   fun:_ZN12HTTPWorkItemD0Ev
     84   fun:operator()
     85   fun:~unique_ptr
     86   fun:_Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > >
     87   fun:__destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *>
     88   fun:_Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *>
     89   fun:_Destroy<std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > *, std::unique_ptr<HTTPClosure, std::default_delete<HTTPClosure> > >
     90   fun:_ZNSt5dequeISt10unique_ptrI11HTTPClosureSt14default_deleteIS1_EESaIS4_EE19_M_destroy_data_auxESt15_Deque_iteratorIS4_RS4_PS4_ESA_
     91   fun:_M_destroy_data
     92   fun:_ZNSt5dequeISt10unique_ptrI11HTTPClosureSt14default_deleteIS1_EESaIS4_EED2Ev
     93   fun:~WorkQueue
     94   fun:_Z14StopHTTPServerv
     95   fun:_Z8ShutdownR11NodeContext
     96   fun:AppInit
     97   fun:main
     98}
     99==3931735== 
    100==3931735== Exit program on first error (--exit-on-first-error=yes)"
    
  27. MarcoFalke added the label Needs backport (0.19) on Jun 8, 2020
  28. promag commented at 9:47 pm on June 11, 2020: member
    Thanks for these instructions @MarcoFalke.
  29. MarcoFalke added the label Waiting for author on Jun 17, 2020
  30. in src/httpserver.cpp:478 in fec477712f outdated
    474@@ -477,6 +475,10 @@ void StopHTTPServer()
    475         event_base_free(eventBase);
    476         eventBase = nullptr;
    477     }
    478+    if (workQueue) {
    


    laanwj commented at 3:16 pm on June 18, 2020:
    I don’t think this is the correct fix. By the time eventHTTP / eventBase is freed it needs to be sure that there are no workers any more, and that there’s nothing in the work queue. After all, work in progress would refer to them.

    promag commented at 10:59 pm on June 22, 2020:

    In latest commit:

    • prevent adding stuff to queue after interrupt
    • after interrupt queue is drained

    So at this point queue is empty and concurrent http_request_cb fail to enqueue.

  31. promag force-pushed on Jun 23, 2020
  32. promag commented at 8:59 am on June 23, 2020: member
  33. promag force-pushed on Jun 26, 2020
  34. promag commented at 10:11 pm on June 28, 2020: member
    Please remove tag “waiting for author”.
  35. fanquake removed the label Waiting for author on Jun 28, 2020
  36. promag force-pushed on Jul 5, 2020
  37. MarcoFalke commented at 7:07 pm on July 9, 2020: member
    Concept ACK. Fixing bugs is good, but I can’t review this. Happy to test some time
  38. promag commented at 7:10 pm on July 9, 2020: member
    Ops, bad rebase! Should have db867b59d9ce29bcc2c872986085d99af4d1bcfc rebased. Will fix..
  39. http: Release work queue after event base finish
    This fixes a race between http_request_cb and StopHTTPServer where
    the work queue is used after release.
    4e353cb618
  40. promag force-pushed on Jul 9, 2020
  41. promag commented at 1:00 am on July 12, 2020: member
    @laanwj ping.
  42. fjahr commented at 10:31 pm on July 15, 2020: member

    Code review ACK 4e353cb618745cdb5d98e58e7dcd400ded01299a

    Couldn’t reproduce it locally per Marco’s instructions right now (some local Valgrind issue so I couldn’t run the tests under it) but this looks like a correct fix for the race to me.

  43. fanquake removed this from the milestone 0.20.1 on Jul 30, 2020
  44. fanquake added this to the milestone 0.20.2 on Jul 30, 2020
  45. promag commented at 10:52 am on September 7, 2020: member
    Ping.
  46. laanwj removed this from the "Blockers" column in a project

  47. MarcoFalke removed this from the milestone 0.20.2 on Nov 16, 2020
  48. MarcoFalke added this to the milestone 22.0 on Nov 16, 2020
  49. in src/httpserver.cpp:94 in 4e353cb618
    90@@ -91,7 +91,7 @@ class WorkQueue
    91     bool Enqueue(WorkItem* item)
    92     {
    93         LOCK(cs);
    94-        if (queue.size() >= maxDepth) {
    95+        if (!running || queue.size() >= maxDepth) {
    


    LarryRuane commented at 7:34 pm on December 14, 2020:

    This is a necessary change, but because of this it, the following can appear in the debug.log:

    02020-12-14T19:03:41Z WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting
    

    which is harmless, but a bit confusing and definitely misleading. I suggest adding this guard around that LogPrintf:

    0        if (workQueue->Enqueue(item.get()))
    1            item.release(); /* if true, queue took ownership */
    2        else {
    3            if (!ShutdownRequested()) {
    4                LogPrintf("WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting\n");
    5            }
    6            item->req->WriteReply(HTTP_INTERNAL_SERVER_ERROR, "Work queue depth exceeded");
    

    (I did manually test that change.) Or you could change Enqueue() to return more than just a bool to indicate the reason the request was rejected, but it’s probably not worth it. Or, you could check ShutdownRequested() before calling Enqueue() and there would be no need to check running in Enqueue. But I think your change is best because the WorkQueue object knows if it’s running or not, makes sense to take advantage of that knowledge.


    promag commented at 11:18 pm on December 14, 2020:
    Well in that case the RPC error is also misleading. I’d like to avoid more calls to ShutdownRequested. Maybe just reword the error to something along “work queue full or shutdown in progress”.
  50. in src/httpserver.cpp:110 in 4e353cb618
    106@@ -107,7 +107,7 @@ class WorkQueue
    107                 WAIT_LOCK(cs, lock);
    108                 while (running && queue.empty())
    109                     cond.wait(lock);
    110-                if (!running)
    111+                if (!running && queue.empty())
    


    LarryRuane commented at 8:32 pm on December 14, 2020:

    Good change! This ensures that the queue is empty before exiting the thread, so there are no leftover items in the work queue when the destructor runs; without this, I got the following to appear in debug.log during testing:

    02020-12-14T19:11:40Z ~HTTPRequest: Unhandled request
    

    promag commented at 11:21 pm on December 14, 2020:
    Right, I would have to test to make sure but I think without this the client wouldn’t get a response. I also think that very last response would now have connection: close header, see HTTPRequest::WriteReply - another case I’d like to remove the call to ShutdownRequested.

    LarryRuane commented at 1:05 am on December 15, 2020:

    Yes, I tested this, and without the changes to Enqueue and Run, bitcoin-cli prints the following:

    0error: Could not connect to the server 127.0.0.1:8332 (error code 1 - "EOF reached")
    1
    2Make sure the bitcoind server is running and that you are connecting to the correct RPC port.
    

    (I think this is the client not getting a response.)

    With your changes to Enqueue and Run:

    0error: couldn't parse reply from server
    

    This is a slight improvement, I guess. This error seems familiar… If I cherry-pick the 2-line change from #18335 then we get:

    0error: Server response: Work queue depth exceeded
    

    It’s the same wrong error message that I mentioned for the LogPrintf :laughing:. I guess that PR should change to say something like work queue depth exceeded or client or shutdown in progress, as you suggested. (Nothing for this needed by your PR here.)

  51. in src/httpserver.cpp:481 in 4e353cb618
    474@@ -477,6 +475,10 @@ void StopHTTPServer()
    475         event_base_free(eventBase);
    476         eventBase = nullptr;
    477     }
    478+    if (workQueue) {
    479+        delete workQueue;
    480+        workQueue = nullptr;
    481+    }
    


    LarryRuane commented at 8:44 pm on December 14, 2020:

    Agree that this is the correct fix, because this now occurs after the call to g_thread_http.join() (just above, line 468). This join ensures that no new items can be enqueued.

    Suggestion, you may want to keep all of the workQueue shutdown code together and move it to just after the call to g_thread_http.join(), that is, after line 469 (with the current line numbering, in other words just before if (eventHTTP)), instead of having it split between two places (beginning at line 453 above, and here). I don’t think there’s any reason those worker threads can’t live a little longer (and there’s no harm that I can think of). This would follow the pattern of tearing down the higher-level stuff before the lower-level stuff (not doing that is the source of this bug in the first place!).


    promag commented at 11:32 pm on December 14, 2020:
    It can’t be 😄 this is the goal of this PR - destroy the work queue after event base finish. After g_thread_http.join() there can be stuff pending on the event base, and if you delete the workQueue before the event base finishes then all pending workItems will be destroyed (because they are unique_ptr) and then it’s only problems from here on.

    LarryRuane commented at 1:03 am on December 15, 2020:

    Ah, okay, I didn’t understand about the event base; so maybe we could do all the shutdown (deconstruct) of the workQueue down here, where you now have the delete. Would this work?

     0void StopHTTPServer()
     1{
     2    LogPrint(BCLog::HTTP, "Stopping HTTP server\n");
     3    // Unlisten sockets, these are what make the event loop running, which means
     4    // that after this and all connections are closed the event loop will quit.
     5    for (evhttp_bound_socket *socket : boundSockets) {
     6        evhttp_del_accept_socket(eventHTTP, socket);
     7    }
     8    boundSockets.clear();
     9    if (eventBase) {
    10        LogPrint(BCLog::HTTP, "Waiting for HTTP event thread to exit\n");
    11        if (g_thread_http.joinable()) g_thread_http.join();
    12    }
    13    if (eventHTTP) {
    14        evhttp_free(eventHTTP);
    15        eventHTTP = nullptr;
    16    }
    17    if (eventBase) {
    18        event_base_free(eventBase);
    19        eventBase = nullptr;
    20    }
    21    if (workQueue) {
    22        LogPrint(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n");
    23        for (auto& thread: g_thread_http_workers) {
    24            thread.join();
    25        }
    26        g_thread_http_workers.clear();
    27        delete workQueue;
    28        workQueue = nullptr;
    29    }
    30    LogPrint(BCLog::HTTP, "Stopped HTTP server\n");
    31}
    

    (That seems slightly simpler and clearer to me, but the way you have it now is good.) If you make changes, I don’t mind taking another look (and testing) and quickly giving another ack.

  52. LarryRuane commented at 9:01 pm on December 14, 2020: contributor

    I’m able to reproduce the problem consistently by adding the artificial delay suggested by @MarcoFalke and then starting a constant stream of RPC requests (of any kind):

    0$ while src/bitcoin-cli listbanned ; do :; done
    

    and then stopping bitcoind with either control-c or the stop RPC.

    0bitcoind: httpserver.cpp:271: void http_request_cb(evhttp_request*, void*): Assertion `workQueue' failed.
    1Aborted (core dumped)
    

    I verified that with this PR, this doesn’t happen. Nice PR! My suggestions are optional, although guarding the LogPrintf is probably worth doing.

    ACK 4e353cb618745cdb5d98e58e7dcd400ded01299a

  53. promag commented at 11:35 pm on December 14, 2020: member
    Thanks for reviewing @LarryRuane! I’ll hold on your suggestion to keep your and @fjahr ACK. I also think the error and log messages can be improved in a follow to avoid dragging this. In any case I hope my above comments are clear to you, otherwise let me know. @hebasto mind to test again? It would be great to push this early on the release cycle.
  54. MarcoFalke removed the label Needs backport (0.19) on Dec 16, 2020
  55. MarcoFalke removed the label Needs backport (0.20) on Dec 16, 2020
  56. hebasto approved
  57. hebasto commented at 7:03 pm on May 8, 2021: member

    ACK 4e353cb618745cdb5d98e58e7dcd400ded01299a, tested (rebased on top of master 9313c4e6aa4b707c06a86b33d5d2753cd8383340) on Linux Mint 20.1 (x86_64) using MarcoFalke’s patch, including different -rpcthreads/-rpcworkqueue cases. The bug is fixed. The code is correct.

    Agree, that the message “error: Server response: Work queue depth exceeded” should be reworded as it covers two different cases now.

    Edit (laanwj): removed the @ in the ACK line

  58. MarcoFalke referenced this in commit d2f6d2976f on May 11, 2021
  59. sidhujag referenced this in commit 22f5f857e1 on May 11, 2021
  60. achow101 commented at 6:58 pm on June 14, 2021: member

    ACK 4e353cb618745cdb5d98e58e7dcd400ded01299a

    Reviewed the code and it makes sense. However I was unable to reproduce the race condition so I cannot test this.

  61. LarryRuane commented at 0:09 am on June 15, 2021: contributor
    (re-)ACK 4e353cb618745cdb5d98e58e7dcd400ded01299a Tested again rebased on master d75a1df6178aabf6f19271786e48bc5d57eddb54, Ubuntu 20.04, was able to reproduce the bug using the one-line patch Marco suggested, and the PR still fixes it.
  62. LarryRuane approved
  63. laanwj merged this on Jun 21, 2021
  64. laanwj closed this on Jun 21, 2021

  65. sidhujag referenced this in commit 65b2508920 on Jun 24, 2021
  66. PastaPastaPasta referenced this in commit d7116b63db on Jun 27, 2021
  67. PastaPastaPasta referenced this in commit d98ea7d83e on Jun 27, 2021
  68. PastaPastaPasta referenced this in commit 71c7936b3d on Jun 28, 2021
  69. PastaPastaPasta referenced this in commit c9f495cf28 on Jun 28, 2021
  70. PastaPastaPasta referenced this in commit a90486e84b on Jun 29, 2021
  71. PastaPastaPasta referenced this in commit 35de8157e0 on Jun 29, 2021
  72. PastaPastaPasta referenced this in commit 7e416d4b54 on Jul 1, 2021
  73. PastaPastaPasta referenced this in commit c5b5037146 on Jul 1, 2021
  74. PastaPastaPasta referenced this in commit b310c38362 on Jul 1, 2021
  75. PastaPastaPasta referenced this in commit b0f61bbd01 on Jul 1, 2021
  76. PastaPastaPasta referenced this in commit 38b3d18083 on Jul 15, 2021
  77. PastaPastaPasta referenced this in commit d3f9a66580 on Jul 15, 2021
  78. gwillen referenced this in commit c0a9e5c872 on Jun 1, 2022
  79. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me