[WIP] rest: Stream entire utxo set #7759

pull laanwj wants to merge 2 commits into bitcoin:master from laanwj:2016_03_utxo_streaming changing 3 files +204 −2
  1. laanwj commented at 8:55 pm on March 28, 2016: member

    This is by no means ready, but anyhow this builds on #7756 and

    • Adds a streaming API to the HTTP server. This allows streaming data to the client chunk by chunk, which is useful when not the entire data is available at once or it is huge and wouldn’t fit (efficiently) in memory.
    • Allows downloading the entire UTXO set through /rest/utxoset. This is a raw dump of all outputs, the state normally hashed by gettxoutsetinfo. The dump is performed in the background by making use of leveldb snapshotting, so without keeping cs_main locked.
      • This can be useful for analysis purposes if you don’t want to mess with bitcoin core’s database
      • Filename (via content-disposition) is utxoset-<height>-<bestblockhash>.dat. Also a custom X-Best-Block and X-Block-Height header is added.

    It matches:

    0$ src/bitcoin-cli -datadir=/store/tmp/testbtc gettxoutsetinfo
    1{
    2...
    3  "hash_serialized": "5017f82bbb82a8199ae0fbaa9e5881a0c82575db89e6edd5b39414b35299363b",
    4...
    5}
    6$ wget --content-disposition http://127.0.0.1:8332/rest/utxoset 
    72016-03-28 22:58:32 (44.3 MB/s) - ‘utxoset-404681-0000000000000000034854f5a3ab27cfbc220a42c75061dd13d2067cda71191d.dat’ saved [1291578967]
    8$ ~/bin/dsha256 utxoset-404681-0000000000000000034854f5a3ab27cfbc220a42c75061dd13d2067cda71191d.dat
    95017f82bbb82a8199ae0fbaa9e5881a0c82575db89e6edd5b39414b35299363b utxoset
    

    TODO

    • Rebase after #7756 merged
    • Sensibly name and split up commits
    • Clean up and split up code
    • Actually handle errors (you can crash a worker thread right now by disconnecting while downloading)
    • The timeout actually runs while downloading, causing it to break off after downloading. I don’t understand why this is. You can work around it with -rpcservertimeout=6000 or such.
    • UTXO set dump doesn’t contain keys (?) I’m not sure this format is actually useful this way (see #7758) (fixed in #7848)
    • Other formats, potentially ^^

    Note that the HTTP streaming API could in principle also be used for other large data (say, wallet backups), or even for websocket-like event notification.

  2. laanwj added the label RPC on Mar 28, 2016
  3. laanwj added the label REST on Mar 28, 2016
  4. paveljanik commented at 6:46 am on March 29, 2016: contributor
    Looks like evhttp_send_reply_chunk_with_cb is new in libevent 2.1 which is in alpha as of now.
  5. laanwj commented at 7:08 am on March 29, 2016: member

    Looks like evhttp_send_reply_chunk_with_cb is new in libevent 2.1 which is in alpha as of now.

    Weird. What good is a chunk function if you have no clue if the data was sent. I’ll take a look.

  6. laanwj removed the label REST on Mar 29, 2016
  7. laanwj force-pushed on Apr 28, 2016
  8. sipa commented at 3:05 pm on June 2, 2016: member
    It seems the last stable release of libevent (2.0.22) was 2.5 years, though the master branch is being updated still. Do we want for libevent 2.1 for this, or find another way?
  9. laanwj added this to the milestone Future on Jun 9, 2016
  10. laanwj commented at 8:20 am on June 9, 2016: member

    It seems the last stable release of libevent (2.0.22) was 2.5 years, though the master branch is being updated still. Do we want for libevent 2.1 for this, or find another way?

    Yes, libevent version management is like that, unfortunately.

    I’d tend to include a newer libevent in depends, then disable the functionality when building with older libevent.

    This will be too late for 0.13. Let’s hope there will be a stable libevent release again before 0.14 which includes this. Not holding my breath though.

  11. rest: Stream entire utxo set
    This builds on #7756 and
    
    - Adds a streaming API to the HTTP server. This allows streaming data to
      the client chunk by chunk, which is useful when not the entire data is
      available at once or it is huge and wouldn't fit (efficiently) in
      memory.
    
    - Allows downloading the entire UTXO set through `/rest/utxoset`. This
      is a raw dump of all outputs, the state normally hashed by
      `gettxoutsetinfo`. The dump is performed in the background by making
      use of leveldb snapshotting, so without keeping cs_main locked.
    
        - This can be useful for analysis purposes if you don't want to mess
          with bitcoin core's database
    
        - Filename (via content-disposition) is
          `utxoset-<height>-<bestblockhash>.dat`. Also a custom
          `X-Best-Block` and `X-Block-Height` header is added.
    2498324ff9
  12. laanwj force-pushed on Sep 28, 2016
  13. laanwj commented at 3:25 pm on September 28, 2016: member
    Rebased, updated for boost-removal from httpserver.h
  14. Add hack to prevent this from failing compile on older libevent
    This needs a better interface so that HTTPServer's users (such as rest)
    can query capabilities.
    5cd59bbe03
  15. in src/httpserver.cpp: in 2498324ff9 outdated
    713+void HTTPRequest::StreamingData::SendChunk(struct evhttp_request* req)
    714+{
    715+    // LogPrintf("set_http_chunk_cb\n");
    716+    {
    717+        std::unique_lock<std::mutex> lock(cs);
    718+        evhttp_send_reply_chunk_with_cb(req, databuf, &http_chunk_cb, this);
    


    jonasschnelli commented at 3:29 pm on September 28, 2016:
    I guess this requires libevent2.1 (depends package is still on 2.0.x IIRC)

    jonasschnelli commented at 3:31 pm on September 28, 2016:
    Sorry.. that was already discussed.

    laanwj commented at 3:41 pm on September 28, 2016:
    Yes, I need to find exactly what version this was introduce in and guard the streaming stuff with #if LIBEVENT_VERSION_NUMBER >= 0x0201XXXX. I think it can only be supported for newer libevent .

    laanwj commented at 3:54 pm on September 28, 2016:
    For reference, the commit that introduced evhttp_send_reply_chunk_with_cb https://github.com/libevent/libevent/commit/8d8decf114aebf10188cfdf52a8479cd24d1e3e5 , first appearing in version 0x02010401 is from 2009, and that’s still the beta branch. I’m getting a bit concerned about libevent’s release process.

    jgarzik commented at 1:05 am on September 29, 2016:

    RE libevent release process – several projects are feeling the limits of libevent http support, and moving to https://github.com/ellzey/libevhtp

    I had to do that in one project, in order to support streaming chunked http downloads.

    libevent’s http was really written for simple app servers with small replies.


    laanwj commented at 7:29 am on September 29, 2016:
    I know of that project, but I’d prefer to avoid adding another dependency. It can be considered if there is really no other way out, but it seems to me that chunked downloads can be done with that function.
  16. laanwj force-pushed on Sep 29, 2016
  17. laanwj commented at 7:14 pm on October 18, 2016: member
    Closing this. I think it was a nice experiement but I don’t expect to get around to it again in the near future. If anyone needs this functionality feel free to pick it up.
  18. laanwj closed this on Oct 18, 2016

  19. sipa commented at 7:17 pm on October 18, 2016: member
    :(
  20. laanwj commented at 1:44 pm on December 6, 2017: member

    For anyone thinking about picking this up:

    The good news here is that libevent 2.1 is out of alpha, and is stable as of 2.1.7 at this moment.

    The bad news is that it might be impossible to stream reliably using libevent’s http server. At least it’s mentioned as one of the motivations for the libevhtp replacement:

    As far as I know, streaming data back to a client is hard, if not impossible without messing with underlying bufferevents.

    So I’m not sure “The timeout actually runs while downloading, causing it to break off after downloading.” mentioned in the OP is solvable. It might be with some hack.

  21. NathanFrench commented at 11:19 am on December 7, 2017: none

    I will be happy to assist with any migration issues to libevhtp if this project decides to go this route.

    Cheers!

  22. DrahtBot locked this on Sep 8, 2021
  23. adamjonas added the label Up for grabs on Aug 2, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me