bitcoin core crashes when too many rpc calls are made #11368

issue KanoczTomas openend this issue on September 19, 2017
  1. KanoczTomas commented at 10:19 am on September 19, 2017: none

    Describe the issue

    bitcoin-core crashes after too many rpc calls done.

    Can you reliably reproduce the issue?

    If so, please list the steps to reproduce below:

    1. listen on zeromq ‘rawtx’ event.
    2. make an rpc request (getrawtransaction) for every transactions input
    3. wait a while
    4. when a new block comes bitcoin-core crashes with database IO failure.

    Increasing the rpcqueue in config does not solve the issue, but delays crash When bitcoin-core’s nofiles limit (increased with prlimit) got increased to 8192 it seems to be stable, but maybe it is just a function of time to make it crash.

    Expected behaviour

    I would expect I get a queue full error (as I get), but core will not crash, just my requests will time out

    Actual behaviour

    core crashes with IO database error when a new block arrives. When checking fd’s for bitcoin it is normaly aroun 300, after a while it spikes to more than 1000 and bitcoind crashes. If the rpc gets queried while bitcoind is starting and is not in sync, or a new block arrives it crashes almost immediately.

    What version of bitcoin-core are you using?

    happened on 0.14.2, 0.15.0.rc3, 0.15.0, 0.15.0.1 - did not test others

    Machine specs:

    trid on 2 computers a laptop and armhf based odroid xu4 , runing linux. Happened on both.

  2. meshcollider commented at 10:22 am on September 19, 2017: contributor
    Duplicate of #11322?
  3. KanoczTomas commented at 10:23 am on September 19, 2017: none
    Not sure, these are valid queries … not invalid ones. RPC queue gets filled up. Getting responses back …
  4. KanoczTomas commented at 10:34 am on September 19, 2017: none
    and I forgot, tested on rest interface as well, managed to kill it with the same procedure … calling GET /rest/tx/TX-HASH.json instead of getrawtransaction … maybe it helps …
  5. fanquake added the label RPC/REST/ZMQ on Sep 20, 2017
  6. KanoczTomas commented at 6:12 am on September 27, 2017: none

    Hello,

    Here is a small script to reproduce in nodejs.

     0'use strict';
     1var request = require('request-promise');
     2var zmq = require('zeromq');
     3var sock = zmq.socket('sub');
     4sock.connect('tcp://127.0.0.1:28332');
     5sock.subscribe('hashtx');
     6
     7sock.on('message', function(topic, message) {
     8    var uri = 'http://127.0.0.1:8332/rest/tx/' + message.toString('hex') + '.json'; 
     9    request(uri)
    10    .then(function (res){
    11        console.log(res);
    12    })
    13    .error(function (err){
    14        console.error(err);
    15        process.exit(1);
    16    })
    17});
    

    bitcoind config:

    0prune=550
    1walletbroadcast=0
    2daemon=1
    3server=1
    4rest=1
    5zmqpubhashtx=tcp://127.0.0.1:28332
    6rpcworkqueue=100
    

    It takes some time to kill bitcoind, the time varies, could not figure out what influences the time variance.

    bitcoind crashes and all I see in the logs is this:

    0<lot of similar output>
    12017-09-26 09:55:42 WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting
    22017-09-26 09:55:44 Error reading from database: Database I/O error
    
  7. dcousens commented at 9:06 am on November 1, 2017: contributor
    Your reproduction is using the /rest/tx/..., not RPC?
  8. KanoczTomas commented at 12:14 pm on November 2, 2017: none

    Hi,

    calling the RPC directly behaves the same. Same error, same behavior as in the example log output above. When talking to @gmaxwell on #bitcoin I recall he mentioned they use the same internals in the background.

    Good catch though, I was testing if the rest API behaves the same and copied that here, was not really thinking it should have been RPC there to make the point - my bad!

    Working with a queue to not overwhelm bitcoind helps, it does not kill it, but I think it should not be so easily crashed! The crash happens only if RPC/rest calls are not policed/shaped …

  9. TheBlueMatt commented at 7:29 pm on November 2, 2017: contributor
    If possible, can you reproduce the issue with -debug=leveldb or after calling ’logging [“leveldb”]'?
  10. KanoczTomas commented at 11:34 pm on November 2, 2017: none

    now used RPC to show it behaves the same. The log is full of repeating messages, so pasting the relevant ones only, I can provide the full log, but it created 70k+ lines …

    Code to reproduce: test.js

     0'use strict';
     1var zmq = require('zeromq');
     2var sock = zmq.socket('sub');
     3sock.connect('tcp://127.0.0.1:28332');
     4sock.subscribe('hashtx');
     5var bitcoinRPC = require('node-bitcoin-rpc');
     6var Promise = require('bluebird');
     7Promise.promisifyAll(bitcoinRPC);
     8
     9bitcoinRPC.init('localhost', 8332, 'secret_rpc_username', 'even_more_secret_rpc_password');
    10
    11sock.on('message', function(topic, message) {
    12    console.log('received message ' + message.toString('hex'));
    13    for(var i=0;i< 10; i++){//making sure we make enough calls to stress bitcoind
    14        bitcoinRPC.callAsync('getrawtransaction', [message.toString('hex')])
    15        .then(function (res){
    16            console.log(res);
    17        })
    18        .error(function (err){
    19            console.error(err);
    20        });
    21    }
    22});
    

    Used forever (available on npm) to restart the js script as the rpc lib has an annoying error and throws when bitcoin replies with work queue full - too lazy to work around it. envoking the script:

    0forever test.js
    

    debug.log output:

     02017-11-02 23:07:45 ThreadRPCServer method=getrawtransaction
     12017-11-02 23:07:45 ThreadRPCServer method=getrawtransaction
     22017-11-02 23:07:45 Generated table [#958831](/bitcoin-bitcoin/958831/)@3: 39102 keys, 2174699 bytes
     32017-11-02 23:07:46 Generated table [#958832](/bitcoin-bitcoin/958832/)@3: 6726 keys, 382075 bytes
     42017-11-02 23:07:46 Compacted 1@3 + 11@4 files => 22123265 bytes
     52017-11-02 23:07:46 compacted to: files[ 0 0 1 369 1119 0 0 ]
     62017-11-02 23:07:46 Delete type=2 [#956892](/bitcoin-bitcoin/956892/)
     72017-11-02 23:07:46 Delete type=2 [#956893](/bitcoin-bitcoin/956893/)
     82017-11-02 23:07:46 Delete type=2 [#956894](/bitcoin-bitcoin/956894/)
     92017-11-02 23:07:46 Delete type=2 [#956895](/bitcoin-bitcoin/956895/)
    102017-11-02 23:07:46 Delete type=2 [#956896](/bitcoin-bitcoin/956896/)
    112017-11-02 23:07:46 Delete type=2 [#957071](/bitcoin-bitcoin/957071/)
    122017-11-02 23:07:46 Delete type=2 [#957072](/bitcoin-bitcoin/957072/)
    132017-11-02 23:07:46 Delete type=2 [#957069](/bitcoin-bitcoin/957069/)
    142017-11-02 23:07:46 Delete type=2 [#956891](/bitcoin-bitcoin/956891/)
    152017-11-02 23:07:46 Delete type=2 [#957070](/bitcoin-bitcoin/957070/)
    162017-11-02 23:07:46 Delete type=2 [#957387](/bitcoin-bitcoin/957387/)
    172017-11-02 23:07:46 Delete type=2 [#957068](/bitcoin-bitcoin/957068/)
    182017-11-02 23:07:46 ThreadRPCServer method=getrawtransaction
    192017-11-02 23:07:46 ThreadRPCServer method=getrawtransaction
    20--- same message repeated ---
    212017-11-02 23:07:49 WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting
    222017-11-02 23:07:49 WARNING: request rejected because http work queue depth exceeded, it can be increased with the -rpcworkqueue= setting
    232017-11-02 23:07:49 ThreadRPCServer method=getrawtransaction
    24--- same message repeated ---
    25--- this part is where bitcoin-qt crashed ---
    262017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    272017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    282017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    292017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    302017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    312017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    322017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    332017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    342017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    352017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    362017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    372017-11-02 23:07:54 ThreadRPCServer method=getrawtransaction
    382017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    392017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    402017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    412017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    422017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    432017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    442017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    452017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    462017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    472017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    48--- same message repeated (around 1k lines)
    492017-11-02 23:07:54 LevelDB read failure: IO error: /media/sentinel/B2783BC6783B87DD/Users/Sentinel/AppData/Roaming/Bitcoin/chainstate/957866.ldb: Too many open files
    502017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    512017-11-02 23:07:54 libevent: Error from accept() call: Too many open files
    52--- 70 000 times repeated the above ---
    

    If you need more debug flags turned on, please let me know!

  11. theuni commented at 2:16 am on November 3, 2017: member

    Could you please give this a shot: https://github.com/theuni/bitcoin/commit/c22ab2282bc68ab71ed052e9e6fe7e956b490e11 ? It’s a wild stab in the dark.

    It’s built on latest master, but you can just build from my “fix-rpc-error” branch

  12. KanoczTomas commented at 4:58 pm on November 4, 2017: none

    Hello,

    I have built bitcoin-qt from your fix-rpc-error branch.

    I am testing now on a laptop with SSD disk, an it takes roughly 14 minutes to see the error. The same error, same behavior. The previous test was done on a laptop with spinning disks, it took 5-8 minutes to crash bitcoin-qt. I also ran (after a restart, to make sure all is clear) v0.15.0.1 and it took 14 minutes to bring down as well.

    I created a 4G tmpfs directory (in ram) and copied the data dir content there (started bitcoin-qt with -datadir=). As the node is pruned it fitted fine. Than ran the test script again, I could not crash your rpc fix version (gave up after running it for 1 hour), v0.15.0.1 crashed after 33 minutes.

    I will have access to the spinning disk laptop later, I will run it there as well and update the comment.

  13. TheBlueMatt commented at 5:54 pm on November 4, 2017: contributor
    @theuni I suppose a better fix is to drop incoming RPC sockets on the floor before we read any data from them if we’re way over some upper water mark? @KanoczTomas as a workaround, you can change the open files limit using ulimit, which may solve your issue.
  14. KanoczTomas commented at 9:53 pm on November 4, 2017: none

    @TheBlueMatt I tried that at first, raising it to e.g. 8k helped, but only changed the frequency of the crash. With 8k It crashed every few days, the same error. So it seems whatever high number I give it, it gets always full, as I am fetching the input of every new mempool TX received through zmq I do stress bitcoind in my app a lot. The problem is that sometimes it corrupts the database and I have to reindex which takes days on my node (odroid).

    When I rate limit the RPC queries on the client side, all is fine.

  15. TheBlueMatt commented at 10:24 pm on November 4, 2017: contributor
    Yea, sorry, I realized after posting that that wont help - you’re outrunning the RPC server - there’s nothing we can do to really fix it, though its true bitcoind shouldn’t crash, it should instead be just cleanly not responding to the requests that come in when there are no queue slots left instead of trying to return errors.
  16. theuni commented at 7:50 pm on November 5, 2017: member

    @TheBlueMatt To clarify, the current code uses a deferred response to try to return the error, which I think may be creating a feedback loop. Connections may be accepted faster than we can close them due to a full queue. The patch makes rejection immediate, which reverses that, and ensures that no more than 1 superfluous connection is never accepted.

    We could just drop the connection on the floor, but I think that the above fix addresses the issue. I’m not convinced that the reply overhead is relevant once that is fixed. @laanwj Mind having a look?

  17. TheBlueMatt commented at 8:00 pm on November 5, 2017: contributor

    Yea, agreed, if we can return an error that’d be nice, but it’s not gonna help the user all that much, and if the user is pegging the interface faster than we can respond to much, just dropping it on the floor works just as well.

    On November 5, 2017 2:50:50 PM EST, Cory Fields notifications@github.com wrote:

    @TheBlueMatt To clarify, the current code uses a deferred response to try to return the error, which I think may be creating a feedback loop. Connections may be accepted faster than we can close them due to a full queue. The patch makes rejection immediate, which reverses that, and ensures that no more than 1 superfluous connection is never accepted.

    We could just drop the connection on the floor, but I think that the above fix addresses the issue. I’m not convinced that the reply overhead is relevant once that is fixed.

    @laanwj Mind having a look?

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/11368#issuecomment-341999817

  18. laanwj commented at 8:41 pm on November 5, 2017: member

    @laanwj Mind having a look?

    Can we make WriteReplyImmediate private or protected so that it is not part of the public interface? It should never be called from one of the worker threads. If not, this restriction should at least be documented very well.

    Otherwise: Looks good to me. I hope this works. I prefer returning an error to just dropping the connection, it is incredibly frustrating for things to fail without error message. Though arguably we at least (unconditionally) log an error to the debug log.

  19. theuni commented at 9:25 pm on November 5, 2017: member
    @laanwj Sure. WriteReplyImmediate was just a quick hack to test, but looking now, I don’t really see another straightforward way without doing a bunch of refactors first. So I guess I’ll just make it private and document as you suggested. @KanoczTomas Can you confirm that this has indeed fixed your issue on all tested configs?
  20. TheBlueMatt commented at 9:46 pm on November 5, 2017: contributor

    @theuni he already confirmed that the branch did not fix his issue. #11368 (comment)

    On November 5, 2017 4:26:00 PM EST, Cory Fields notifications@github.com wrote:

    @laanwj Sure. WriteReplyImmediate was just a quick hack to test, but looking now, I don’t really see another straightforward way without doing a bunch of refactors first. So I guess I’ll just make it private and document as you suggested.

    @KanoczTomas Can you confirm that this has indeed fixed your issue on all tested configs?

    – You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/bitcoin/bitcoin/issues/11368#issuecomment-342007050

  21. theuni commented at 10:11 pm on November 5, 2017: member
    @TheBlueMatt Thanks, I misread the first paragraph there. So this only fixes on tmpfs.
  22. KanoczTomas commented at 5:05 pm on November 14, 2017: none
    Hi, yes @theuni is correct. It “fixed” only on tmpfs … the quotes are there as I ran it only for an hour so not sure it is an absolute win.
  23. TheBlueMatt commented at 5:57 pm on November 14, 2017: contributor
    Ultimately we’re going to need to make sure that we ensure a newly accept()ed socket is always close()d by the time we accept() again, otherwise we’re always going to be screwed if a client makes enough threads just hammering the RPC.
  24. vii commented at 5:28 am on November 29, 2017: none
    Please try #11785 - it should fix the issue!
  25. vii referenced this in commit 13885abe11 on Dec 1, 2017
  26. KanoczTomas commented at 2:28 pm on December 5, 2017: none

    @vii I have tested your fix, it looks fine. Was running it for: real 121m26.863s without any issues!

    Here is the debug.log during test. What worries me is the progress more than 1.0. But that started yesterday before I have built your commit. So it is not related, but not sure if it is a normal behavior :)

    The log during tests: (no errors, hurah!)

     02017-12-05 11:57:34 receive version message: /Satoshi:0.15.0.1/: version 70015, blocks=497720, us=138.68.64.155:8333, peer=5
     12017-12-05 11:57:51 receive version message: /Satoshi:0.14.2/: version 70015, blocks=497720, us=217.12.53.60:38324, peer=6
     22017-12-05 11:58:10 receive version message: /Satoshi:0.15.0.1/: version 70015, blocks=497720, us=104.199.165.17:8333, peer=7
     32017-12-05 11:58:11 receive version message: /Satoshi:0.13.1/: version 70014, blocks=497720, us=217.12.53.60:37284, peer=8
     42017-12-05 11:58:29 receive version message: /Satoshi:0.15.1(NO2X)/: version 70015, blocks=497720, us=217.12.53.60:52946, peer=9
     52017-12-05 11:59:53 UpdateTip: new best=0000000000000000005289baf4152085e7aaf5158c1c0d090ef58a25c91c2265 height=497721 version=0x20000000 log2_work=87.591937 tx=278346929 date='2017-12-05 12:14:57' progress=1.000010 cache=11.1MiB(82777txo)
     62017-12-05 12:01:37 connect() to [2a02:8108:11c0:7270:34ed:d2ef:3a5d:dca7]:8333 failed: Network is unreachable (101)
     72017-12-05 12:02:52 connect() to [2001:0:9d38:90d7:1071:d529:e048:2bc0]:8333 failed: Network is unreachable (101)
     82017-12-05 12:05:02 Imported mempool transactions from disk: 21487 succeeded, 130 failed, 0 expired, 55 already there
     92017-12-05 12:06:47 UpdateTip: new best=000000000000000000c6ebb23fddf31a6129f368deaf30d19a9570234f2a4e5f height=497722 version=0x20000000 log2_work=87.591972 tx=278349117 date='2017-12-05 12:22:34' progress=1.000011 cache=31.7MiB(242873txo)
    102017-12-05 12:08:36 connect() to [2a01:4f8:140:931a::2]:8333 failed: Network is unreachable (101)
    112017-12-05 12:09:49 connect() to [2a01:4f8:173:294f::2]:8333 failed: Network is unreachable (101)
    122017-12-05 12:12:50 connect() to [2001:0:9d38:78cf:2069:299:a441:36ed]:8333 failed: Network is unreachable (101)
    132017-12-05 12:13:13 UpdateTip: new best=000000000000000000480820a49b2c969619a9119191f96de8018938d0345261 height=497723 version=0x20000000 log2_work=87.592008 tx=278351647 date='2017-12-05 12:28:55' progress=1.000010 cache=32.8MiB(251542txo)
    142017-12-05 12:15:15 connect() to [2620:9b::1967:12c9]:8333 failed: Network is unreachable (101)
    152017-12-05 12:16:25 UpdateTip: new best=000000000000000000bc26425859226751047f56d53d94281809a4783424b028 height=497724 version=0x20000000 log2_work=87.592044 tx=278354316 date='2017-12-05 12:32:14' progress=1.000011 cache=35.5MiB(257787txo)
    162017-12-05 12:23:49 connect() to [2001:0:4137:9e76:1014:4e4e:7ef0:bf01]:8333 failed: Network is unreachable (101)
    172017-12-05 12:24:32 connect() to [2605:a601:cd1:d900:1012:a6ac:cfaf:f62e]:8333 failed: Network is unreachable (101)
    182017-12-05 12:26:31 UpdateTip: new best=0000000000000000009b50d139898b9303c5113d89b50698cdc3706ae48fd5cb height=497725 version=0x20000000 log2_work=87.59208 tx=278356212 date='2017-12-05 12:43:13' progress=1.000011 cache=36.8MiB(268143txo)
    192017-12-05 12:37:52 Pre-allocating up to position 0x900000 in rev01088.dat
    202017-12-05 12:37:52 UpdateTip: new best=0000000000000000005c5fbfea5340e000151517a2140a7199f1cb9a7ef2d53c height=497726 version=0x20000000 log2_work=87.592116 tx=278358734 date='2017-12-05 12:54:50' progress=1.000011 cache=38.1MiB(278668txo)
    212017-12-05 12:38:03 UpdateTip: new best=00000000000000000096852e0f389f0cc490e6984005c88798703dbdb911702e height=497727 version=0x20000000 log2_work=87.592151 tx=278361351 date='2017-12-05 12:54:56' progress=1.000011 cache=38.6MiB(282932txo)
    222017-12-05 12:40:10 UpdateTip: new best=0000000000000000004656f5fe04884b8ab29ccbe0213952a8f0df1e58199dbb height=497728 version=0x20000000 log2_work=87.592187 tx=278363885 date='2017-12-05 12:57:22' progress=1.000011 cache=39.1MiB(287299txo)
    232017-12-05 12:44:13 connect() to 77.178.64.183:8333 failed after select(): No route to host (113)
    242017-12-05 12:44:13 receive version message: /Satoshi:0.15.1/: version 70015, blocks=497728, us=217.12.53.60:33654, peer=11
    252017-12-05 12:45:53 receive version message: /Satoshi:0.14.0/: version 70015, blocks=497728, us=217.12.53.60:44008, peer=12
    262017-12-05 12:47:40 connect() to 80.91.175.194:8333 failed after select(): Connection refused (111)
    272017-12-05 12:50:42 UpdateTip: new best=00000000000000000000954afb20e8f663294194833bf19a435295aa3e7e46ae height=497729 version=0x20000000 log2_work=87.592223 tx=278365578 date='2017-12-05 13:08:08' progress=1.000012 cache=40.2MiB(295693txo)
    282017-12-05 12:50:49 connect() to [2001:984:4574:1:544c:75b0:750e:2794]:8333 failed: Network is unreachable (101)
    292017-12-05 12:56:42 connect() to 5.147.57.218:8333 failed after select(): No route to host (113)
    302017-12-05 12:58:30 connect() to [2001:0:5ef5:79fb:3c40:2ad8:971a:f334]:8333 failed: Network is unreachable (101)
    312017-12-05 12:58:31 connect() to [2001:0:9d38:6abd:4ae:841:a6e3:37a5]:8333 failed: Network is unreachable (101)
    322017-12-05 12:58:49 Pre-allocating up to position 0x5000000 in blk01088.dat
    332017-12-05 12:58:49 UpdateTip: new best=0000000000000000002cbc5bf3afa5402035f71d6b6f96baedf675bce1faca9b height=497730 version=0x20000000 log2_work=87.592259 tx=278367543 date='2017-12-05 13:16:16' progress=1.000012 cache=41.4MiB(305757txo)
    342017-12-05 12:59:21 UpdateTip: new best=000000000000000000a7cdb717117dfeac04d261f702d4f7b14cd44d5e76cee8 height=497731 version=0x20000000 log2_work=87.592294 tx=278368396 date='2017-12-05 13:16:44' progress=1.000012 cache=41.6MiB(307701txo)
    352017-12-05 12:59:59 UpdateTip: new best=0000000000000000009b7fe7cd542196ec9694b74f222de67c2b4a8e14b8d1d4 height=497732 version=0x20000000 log2_work=87.59233 tx=278371111 date='2017-12-05 13:17:36' progress=1.000012 cache=42.1MiB(311735txo)
    362017-12-05 13:01:46 connect() to [2001:0:4137:9e76:1c4c:5ed:b6f1:2d6]:8333 failed: Network is unreachable (101)
    372017-12-05 13:02:54 Pre-allocating up to position 0xa00000 in rev01088.dat
    382017-12-05 13:02:55 UpdateTip: new best=0000000000000000009a0a6f9e3c59ebed0c1d847cf597d9b1b5fb8365380c30 height=497733 version=0x20000000 log2_work=87.592366 tx=278374157 date='2017-12-05 13:20:42' progress=1.000012 cache=42.6MiB(315853txo)
    392017-12-05 13:10:06 connect() to [2001:0:9d38:6abd:307b:dac:a6ec:8cc4]:8333 failed: Network is unreachable (101)
    402017-12-05 13:11:02 UpdateTip: new best=00000000000000000076eaa0f154c37f5e60bf1c18b86775d0fbbc32d0169c2c height=497734 version=0x20000000 log2_work=87.592402 tx=278376431 date='2017-12-05 13:28:24' progress=1.000012 cache=43.5MiB(323295txo)
    412017-12-05 13:11:36 connect() to 104.238.169.108:8333 failed after select(): Connection refused (111)
    422017-12-05 13:24:08 UpdateTip: new best=0000000000000000009fbd3b262b06c09654dca9abe7f7c4206ced4390af9fb2 height=497735 version=0x20000000 log2_work=87.592438 tx=278378513 date='2017-12-05 13:42:11' progress=1.000012 cache=44.8MiB(333595txo)
    432017-12-05 13:25:54 UpdateTip: new best=0000000000000000009610abe5298400fa918110e4429eab304b801c6cff5832 height=497736 version=0x20000000 log2_work=87.592473 tx=278381069 date='2017-12-05 13:44:13' progress=1.000012 cache=45.4MiB(338628txo)
    442017-12-05 13:31:54 UpdateTip: new best=00000000000000000028a03ddbebcfe19e933fdde654af13531956853bd02dff height=497737 version=0x20000000 log2_work=87.592509 tx=278383136 date='2017-12-05 13:50:12' progress=1.000012 cache=46.1MiB(344589txo)
    452017-12-05 13:33:16 connect() to [2a00:f940:2:1:2::184b]:8333 failed: Network is unreachable (101)
    462017-12-05 13:40:04 UpdateTip: new best=0000000000000000005ca38745749278f9ca3b78c86dccb2c6556126640d1686 height=497738 version=0x20000000 log2_work=87.592545 tx=278385594 date='2017-12-05 13:59:02' progress=1.000013 cache=47.0MiB(351693txo)
    472017-12-05 13:42:59 connect() to [2601:589:3:d670:113:1aa5:dacc:4b]:8333 failed: Network is unreachable (101)
    482017-12-05 13:46:32 connect() to 37.186.122.122:8333 failed after select(): Connection refused (111)
    492017-12-05 13:48:16 receive version message: /Satoshi:0.15.0.1(NO2X)/: version 70015, blocks=497738, us=[::]:0, peer=13
    502017-12-05 13:52:06 connect() to [2001:0:9d38:6ab8:24ed:fc5:c26b:c68]:8333 failed: Network is unreachable (101)
    512017-12-05 13:53:01 UpdateTip: new best=0000000000000000006f32c61e1d974d795fa83ee26d0a9e69723cc528bcd684 height=497739 version=0x20000000 log2_work=87.592581 tx=278387819 date='2017-12-05 14:12:16' progress=1.000013 cache=48.2MiB(361336txo)
    522017-12-05 13:54:55 connect() to [2001:0:9d38:6abd:20c6:30f1:68c9:574e]:8333 failed: Network is unreachable (101)
    532017-12-05 13:55:52 connect() to 185.202.213.33:8333 failed after select(): Network is unreachable (101)
    542017-12-05 14:01:32 UpdateTip: new best=000000000000000000800a32240c4f982aa7d8fa48649cd286664a67ac49ea22 height=497740 version=0x20000000 log2_work=87.592616 tx=278390250 date='2017-12-05 14:21:20' progress=1.000013 cache=49.1MiB(369122txo)
    552017-12-05 14:01:49 receive version message: /Satoshi:0.15.0/: version 70015, blocks=497740, us=217.12.53.60:49072, peer=14
    
  27. vii referenced this in commit 6e1a7e1859 on Dec 16, 2017
  28. vii referenced this in commit 1e1f699a00 on Dec 16, 2017
  29. vii referenced this in commit 40ea13636b on Dec 17, 2017
  30. azat referenced this in commit 727bcea130 on Dec 18, 2017
  31. maflcko commented at 11:11 pm on February 22, 2018: member
    A user reports datadir corruption due to this: #12483 (comment)
  32. frndxyz commented at 8:29 am on October 4, 2019: none

    Can someone guide me patches for this fixes , there are many solution patches i found through git issues. also applied it but still it doesnt work.

    i tried to running bitcore-node ( bitpay insight) . it starts ok and after sometime it crashed bitcoin daemon without error.

    i guess is issue related to request flood , but patches isnt work for me Or i am missing something?

    this is log :

    2019-10-04T07:50:28Z UpdateTip: new best=b5b683030f0d2321b9308f04ab1fba0c1e89a5b00730fad9a79ab26243554dd1 height=10120 version=0x00000001 algo=1 (scrypt) log2_work=54.594743 tx=102375 date=‘2014-04-13T19:53:26Z’ progress=0.014012 cache=16.5MiB(120983txo) 2019-10-04T07:50:28Z UpdateTip: new best=3c1a87ac3a5de2a09094787758270f6dfbb9072f7f3a4e4b34169b7a76d1b5aa height=10121 version=0x00000001 algo=1 (scrypt) log2_work=54.594785 tx=102376 date=‘2014-04-13T19:53:48Z’ progress=0.014012 cache=16.5MiB(120984txo) 2019-10-04T07:50:28Z UpdateTip: new best=b18a91520515621dfaed75e91761b138686c650320619f058ecee9bbee57297d height=10122 version=0x00000001 algo=1 (scrypt) log2_work=54.594827 tx=102380 date=‘2014-04-13T19:56:12Z’ progress=0.014013 cache=16.5MiB(120984txo) 2019-10-04T07:50:28Z UpdateTip: new best=ebdf5e793563b73fe29331df88ede8abe03cfd26bf4bfc820b4efc3c6e1af09d height=10123 version=0x00000001 algo=1 (scrypt) log2_work=54.594869 tx=102381 date=‘2014-04-13T19:56:21Z’ progress=0.014013 cache=16.5MiB(120985txo) 2019-10-04T07:50:28Z UpdateTip: new best=3eed3a8224eed3289894b91167e7897f6e4757344a1896751511d85540b78f24 height=10124 version=0x00000001 algo=1 (scrypt) log2_work=54.594911 tx=102384 date=‘2014-04-13T19:57:25Z’ progress=0.014013 cache=16.5MiB(120988txo) [2019-10-04T07:50:28.147Z] error: Failed to start services [2019-10-04T07:50:28.148Z] error: RPCError: Bitcoin JSON-RPC: Request Error: connect ECONNREFUSED 127.0.0.1:12341 at Bitcoin._wrapRPCError (/home/ltex/insight/node_modules/bitcore-node/lib/services/bitcoind.js:449:13) at /home/ltex/insight/node_modules/bitcore-node/lib/services/bitcoind.js:782:28 at ClientRequest. (/home/ltex/insight/node_modules/bitcoind-rpc/lib/index.js:116:7) at emitOne (events.js:115:13) at ClientRequest.emit (events.js:210:7) at Socket.socketErrorListener (_http_client.js:399:9) at emitOne (events.js:115:13) at Socket.emit (events.js:210:7) at emitErrorNT (internal/streams/destroy.js:62:8) at _combinedTickCallback (internal/process/next_tick.js:102:11) [2019-10-04T07:50:28.149Z] info: Beginning shutdown [2019-10-04T07:50:28.150Z] info: Stopping web (not started) [2019-10-04T07:50:28.151Z] info: Stopping bitcoind 2019-10-04T07:50:28Z tor: Thread interrupt 2019-10-04T07:50:28Z Shutdown: In progress… 2019-10-04T07:50:28Z dandelion thread exit 2019-10-04T07:50:28Z torcontrol thread exit 2019-10-04T07:50:28Z addcon thread exit 2019-10-04T07:50:28Z net thread exit 2019-10-04T07:50:28Z msghand thread exit 2019-10-04T07:50:32Z opencon thread exit 2019-10-04T07:50:32Z scheduler thread interrupt 2019-10-04T07:50:32Z Dumped mempool: 5e-06s to copy, 0.004893s to dump 2019-10-04T07:50:32Z [default wallet] Releasing wallet 2019-10-04T07:50:32Z Shutdown: done [2019-10-04T07:50:32.890Z] info: Halted

    after getting some blocks it just crash daemon.

  33. lordfeo commented at 8:35 am on December 26, 2019: none
    @frndxyz I have same problem. did you find any solution?
  34. jdmadriz commented at 0:18 am on February 21, 2020: none
    Same problem here my bitcoin core get unresponsive because of too many rpc calls, any solution?
  35. fjahr commented at 2:20 pm on May 22, 2020: contributor

    I have been investigating this issue and I am trying to find a stable solution for it. First of all, I can reproduce this issue very quickly and reliably if I send a large number of requests to the rest server while the node is in IBD, since in that case it is constantly writing blocks, and failing to write a block is what makes the node ultimately crash. I have tried the patches that were previously suggested as fixes (https://github.com/bitcoin/bitcoin/pull/11785 and #12274) but those don’t solve the problem since they can not prevent connections from being opened and they can also not close these connections fast enough. So compared to master I did not see any reliable improvement.

    I am happy to investigate other ideas but I currently don’t see a real fix other than waiting until libevent 2.2 is released which should include either https://github.com/libevent/libevent/pull/578 and/or https://github.com/libevent/libevent/pull/592 which can then be used to fix the issue. Unfortunately there does not seem to be a release schedule and I have not received an answer after I reached out on their IRC.

    #12274 does take one of the improvements of libevent into account after the new version is released but I would rather like to wait until there is at least a release date or an alpha version available, also because the other improvement (592) may be a better solution than using 578.

    Additionally I have opened #19050 to add a hint about the problem to the docs until we have the fix. My impression is that mostly these issues are caused by people spamming their own rest interface with their own scripts. If users know about the issue they should be able to accommodate it pretty easily and not be surprised by the issue.

  36. promag commented at 11:10 pm on May 22, 2020: member

    I am happy to investigate other ideas

    How about evhttp_del_accept_socket when there are N active connections and then evhttp_bind_socket_with_handle when X are free?

  37. fjahr commented at 8:07 pm on May 31, 2020: contributor

    How about evhttp_del_accept_socket when there are N active connections and then evhttp_bind_socket_with_handle when X are free?

    I just gave it a try but if requests come in fast enough it seems there is nothing we can do (in http_request_cb) that can stop the node from crashing if there are not enough file descriptors available. Connections will have to be stopped before they can even open.

  38. promag commented at 10:15 am on June 4, 2020: member
    So far couldn’t reproduce with -txindex and calling back /rest/tx after zmq message while on IBD. FWIW I’m running on lldb.
  39. fjahr commented at 10:42 am on June 4, 2020: contributor

    So far couldn’t reproduce with -txindex and calling back /rest/tx after zmq message while on IBD. FWIW I’m running on lldb.

    I am using a node console with the request module on testnet. I haven’t tried with lldb yet so I am not sure if that could make a difference.

    0> const request = require('request');
    1> var uri = 'http://127.0.0.1:18332/rest/block/000000000401f8016b9a5f54bc84499011ea484c88c82be555f7b240b8e2a4bf.json'
    2> for (var i = 0; i < 800; i++) { request(uri) }
    
  40. maflcko referenced this in commit 8452f922d2 on Dec 17, 2020
  41. sidhujag referenced this in commit 9402a9a8c3 on Dec 17, 2020
  42. EthanHeilman commented at 4:59 am on December 28, 2020: contributor

    I ran into this issue today. If I create wallets faster 10 seconds per call, after about 80 wallets I get a hard crash.

    02020-12-28T04:35:52Z DumpAnchors: Flush 2 outbound block-relay-only peer addresses to anchors.dat completed (0.08s)
    12020-12-28T04:35:52Z scheduler thread exit
    22020-12-28T04:35:52Z Writing 0 unbroadcast transactions to disk.
    32020-12-28T04:35:52Z Dumped mempool: 1.3e-05s to copy, 0.086836s to dump
    42020-12-28T04:35:52Z FlushStateToDisk: write coins cache to disk (638190 coins, 88337kB) started
    52020-12-28T04:35:54Z FlushStateToDisk: write coins cache to disk (638190 coins, 88337kB) completed (1.12s)
    6libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: BerkeleyDatabase: Error 24, can't open database wallet.dat
    7EXP: Terminating... 
    8Terminated
    

    Sometimes it seems to leave the wallet in an incomplete state where it writes one of the files in the wallet directory but not the other files. For instance it created this wallet directory without a wallet.dat or a db.log:

    0e0@lira btcd % ls -l /tmp/btcd/wallet_lnttrkgima_84
    1total 0
    2drwx------  2 e0  wheel  64 Dec 27 23:58 database
    
  43. bitcoin deleted a comment on Jun 22, 2021
  44. PastaPastaPasta referenced this in commit f1a25d1923 on Jun 27, 2021
  45. PastaPastaPasta referenced this in commit f021cebd95 on Jun 28, 2021
  46. PastaPastaPasta referenced this in commit c79214739d on Jun 29, 2021
  47. PastaPastaPasta referenced this in commit d4787c11e2 on Jul 1, 2021
  48. PastaPastaPasta referenced this in commit aad0f0d5b9 on Jul 1, 2021
  49. PastaPastaPasta referenced this in commit 3c0b82f6b4 on Jul 15, 2021
  50. PastaPastaPasta referenced this in commit 0e1bfa62d4 on Jul 15, 2021
  51. PastaPastaPasta referenced this in commit d40eedbf01 on Jul 16, 2021
  52. gabriel-bjg referenced this in commit 8a1d9f8c45 on Jul 16, 2021
  53. pinheadmz assigned willcl-ark on Apr 27, 2023
  54. pinheadmz assigned stickies-v on Apr 27, 2023
  55. willcl-ark commented at 10:55 am on May 2, 2023: contributor

    My impression is that mostly these issues are caused by people spamming their own rest interface with their own scripts. If users know about the issue they should be able to accommodate it pretty easily and not be surprised by the issue.

    Agreed.

    I am happy to investigate other ideas but I currently don’t see a real fix other than waiting until libevent 2.2 is released which should include either libevent/libevent#578 and/or libevent/libevent#592 which can then be used to fix the issue. Unfortunately there does not seem to be a release schedule and I have not received an answer after I reached out on their IRC.

    https://github.com/libevent/libevent/pull/592 has now been merged into libevent master and should provide by far the cleanest solution to our issue here. The last release was cut 5th July 2020, so perhaps a new one is due soon, although they do not seem to follow any particular schedule that I can see. I have likewise reached out to them (but on their matrix channel).

    I also set up a notification for new relases with a reminder to fix this issue when it lands.

  56. fjahr commented at 12:08 pm on May 2, 2023: contributor

    libevent/libevent#592 has now been merged into libevent master and should provide by far the cleanest solution to our issue here. The last release was cut 5th July 2020, so perhaps a new one is due soon, although they do not seem to follow any particular schedule that I can see. I have likewise reached out to them (but on their matrix channel).

    I also set up a notification for new relases with a reminder to fix this issue when it lands.

    Oh sweet summer child ;) I wouldn’t get my hopes up that there is a release soon, I have been asking about 2.2 for 3 years now https://github.com/libevent/libevent/issues/1024 others as well https://github.com/libevent/libevent/issues/1365 so it could be soon but I would probably put money it not being this year.

    If you still want to bug them, this probably is the right place: https://github.com/libevent/libevent/issues/1094 (though I don’t know how active they are on matrix).

    Setting yourself a reminder so you don’t have to check in with them is probably the right strategy, I am also watching https://github.com/libevent/libevent/issues/1094 which should serve the same purpose.

  57. willcl-ark commented at 10:54 am on May 23, 2023: contributor
    2.2.1-alpha is now out 👀
  58. fjahr commented at 11:28 pm on May 23, 2023: contributor

    2.2.1-alpha is now out 👀

    I have opened a draft PR #27731 that uses the proposed solution with the max connection feature that was added there. But currently I still have a hard time reproducing the issue in the same way I was able to back in 2020 described above: #11368 (comment) I have a much faster machine now but there may also be other reasons.

    I also tried @EthanHeilman ’s problem and that does allow me to crash the node. I hadn’t tried that one back in 2020 but looking at it now it doesn’t look like a problem that libevent can fix. From looking at the logs it rather seems like Sqlite isn’t closing the files fast enough or opening too many at the same time and then the node crashes when it tries to open another file that is unrelated to wallet stuff. I will open a separate issue for this.

  59. fjahr commented at 11:40 pm on May 23, 2023: contributor
    See #27732 for the wallet issue, I hope that makes sense to keep these things separate.
  60. willcl-ark commented at 7:19 pm on May 24, 2023: contributor

    I have opened a draft PR #27731 that uses the proposed solution with the max connection feature that was added there. But currently I still have a hard time reproducing the issue in the same way I was able to back in 2020 described above: #11368 (comment) I have a much faster machine now but there may also be other reasons.

    I also do not seem to be able to reproduce this on master with a similar python script.

     0import zmq
     1from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
     2import concurrent.futures
     3import os
     4import random
     5
     6context = zmq.Context()
     7socket = context.socket(zmq.SUB)
     8socket.connect("tcp://127.0.0.1:28333")
     9socket.setsockopt_string(zmq.SUBSCRIBE, "hashtx")
    10
    11rpc_user = os.getenv('RPC_USER')
    12rpc_password = os.getenv('RPC_PASSWORD')
    13rpc_port = 8332
    14rpc_ip = 'localhost'
    15
    16
    17rpc_connections = [AuthServiceProxy(f"http://{rpc_user}:{rpc_password}@{rpc_ip}:{rpc_port}") for _ in range(10)]
    18
    19def fetch_transaction(tx_hex, connection_index):
    20    try:
    21        return rpc_connections[connection_index].getrawtransaction(tx_hex)
    22    except JSONRPCException as e:
    23        return f"JSONRPCException: {e}"
    24
    25while True:
    26    try:
    27        topic, message, extra = socket.recv_multipart()
    28        print(f"{topic.decode('utf8'):}, {message.hex():}, {extra.hex():}")
    29 
    30        with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    31            futures = {executor.submit(fetch_transaction, message.hex(), i) for i in range(10)}
    32            for future in concurrent.futures.as_completed(futures):
    33                try:
    34                    print(future.result())
    35                except Exception as e:
    36                    print(f"Futures Error: {e}")
    37    except zmq.ZMQError as e:
    38        print(f"ZMQ Error: {e}")
    

    Bumping the number of rpc calls per zmq notification to 100 didn’t appear to do much either. @KanoczTomas are you able to reproduce this on v24.1 or master?

  61. willcl-ark commented at 2:27 pm on April 10, 2024: contributor

    The problem is not easily reproducible.

    Please open a new issue (or leave a comment in here if you want this re-opened) if you experience the problem again.

  62. willcl-ark closed this on Apr 10, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me