New file-partition.md doc describing how to partition files to ensure fast initial blockchain synchronization.. #10922

pull jimhashhq wants to merge 11 commits into bitcoin:master from jimhashhq:master changing 1 files +53 −0
  1. jimhashhq commented at 11:51 pm on July 24, 2017: none
    After native build from source on Mac OS, my initial attempts to synchronize the blockchain were very very slow. Upon finding Issue Sync Taking Too Long, I found discussion by all and comments by @sipa in particular to be very useful, and reorganized $datadir folders on my local macOS build/install and summarized steps taken in file-partition.md doc. These comments might find their audience more appropriately elsewhere, please feel free to suggest, thank you very much. -jimhash Note: This looks to be logged as issue: https://github.com/bitcoin/bitcoin/issues/10736
  2. Create file-partition.md
    Describe partitioning of datadir files between the high-frequency/low-capacity "index" files and the low-frequency/high-capacity "blocks" files.  These steps are probably obvious to more adept bitcoind admins, but for newbies, like myself I didn't see these steps written anywhere else.
    d13c4531e5
  3. Update file-partition.md
    Formatting
    d8c266fa95
  4. Update file-partition.md
    Typos.
    50882357a2
  5. Update file-partition.md 279cbcfb86
  6. Update file-partition.md 13f433ca01
  7. sipa commented at 0:53 am on July 25, 2017: member

    I think you missed something. There are three directories that matter:

    • $DATADIR/blocks (raw blocks)
    • $DATADIR/blocks/index (LevelDB database with information about raw blocks)
    • $DATADIR/chainstate (LevelDB database that holds the UTXO set)

    The first is high-bandwidth, low IO. The second has hardly any activity at all. The third is where all activity happens, and is critical for performance.

    You should not separate the blocks from the blocks/index directory, as things may get ugly if they’re out of sync. You can however put the chainstate on a faster/smaller device.

  8. jimhashhq commented at 1:14 am on July 25, 2017: none

    Hi @sipa, thank you very much for the prompt follow up. Per your feedback, I have updated the doc to explicitly call out that chainstate folder must also stay on a fast (internal) disk. The “file-partition.md” notes I made should describe:

    • Installing $datadir to local internal disk (so “chainstate” will never need to move).
    • Start/stop bitcoind (though maybe this isn’t necessary).
    • Move the “index” files up one folder level.
    • Move the “blocks” folder off the internal disk.
    • Go to the new external “blocks” folder and soft-link back to the new “index” location.
    • Go to the original internal drive “data” location and link the now external “blocks” folder.

    Again, the “chainstate” folder remains on the internal disk and never needs to move.

    Again these “file-partition.md” notes are not the direct route I took, but (if correct) might save the next person considerable delay in initial blockchain synchronization.

    To your point even though these low-capacity/high-IOfrequency LevelDb and high-capacity/low-IOfrequency blockchain files are physically separated on separate disks, they are kept in sync via the soft links created.

    Maybe ultimately a more natural configuration of these files would be to have the “index” folder up a level to start with. That’s what makes moving these folders confusing in my opinion.

    Thank you again very much for the prompt feedback, very flattering, thanks.

  9. fanquake added the label Docs and Output on Jul 25, 2017
  10. Update file-partition.md
    Per @sipa, add mention of the fact that, like the "index" folder, the "chainstate" LevelDB folder must also remain on a fast (internal) drive if reasonable synchronization time is to be expected.
    1bce187993
  11. Update file-partition.md d4ab8343c3
  12. Update file-partition.md 3c0377db48
  13. jimhashhq commented at 3:57 pm on July 25, 2017: none
    “file-partition.md"has been updated to highlight the need to keep “chainstate” folder on a fast (internal) drive in addition to “index” files as per @sipa. Thanks @fanquake for adding the label. This issue seems also related to “installation” and/or “configuration”; I did not see either of these as bitcoin project label categories, was thinking they might prove useful as well. Thank you both for your guidance.
  14. in doc/file-partition.md:26 in 3c0377db48 outdated
    21+        ls -l /Users/coinadm/local/bitcoin/data/blocks/blk00000.dat
    22+        -rw-------  1 coinadm  staff  134191893 Jul 27 11:31 /Users/coinadm/local/data/blocks/blk00000.dat
    23+
    24+2) Stop bitcoind, so that we can rearrange some datadir folders:
    25+
    26+       kill -QUIT `cat /Volumes/WD-Passport-Mac/bitcoin/data/bitcoind.pid`
    


    achow101 commented at 8:16 pm on July 31, 2017:
    Instead of killing the process, you should use bitcoin-cli stop.
  15. in doc/file-partition.md:52 in 3c0377db48 outdated
    47+
    48+6) Replace the original index folder location with a soft link:
    49+      
    50+          ln -s /Volumes/WD-Passport-Mac/bitcoin/blocks /Users/coinadm/local/bitcoin/data/blocks
    51+
    52+ 注意 - Nota - Note - ध्यान दें - ﻢﻠﺣﻮﻇﺓ - метка 
    


    achow101 commented at 8:17 pm on July 31, 2017:
    What’s with the muptiple languages here?

    jimhashhq commented at 8:02 pm on August 2, 2017:
    Again, good question, thank you. To me, the “key finding” here (if any, really) is simply that the “index” folder by default is nested within the “blocks” folder, unlike the “chainstate” folder which is a sibling folder of “blocks”. This slightly complicates moving the “blocks” folder off of the internal disk to an external disk; my apologies for repeating the obvious. Database administrators out there (me included) might argue that this nested folder configuration is less desirable as it complicates physical separation of high-capacity/low-frequency block files from the lower-capacity/higher-frequency index (and chainstate) LevelDB files. When I open the hood to my car there are warnings labels on the radiator cap, etc., etc.., and these labels are in multiple languages to point out appropriate cautions to naive vehicle operators who may have never looked under the hood or checked a radiator. I’m taking inspiration from this and wanted to (hopefully) say “Note” in 1/2 dozen or so most common languages by usage. Also, I want to be especially friendly in this day and age; I feel like we could all use it. In closing, I hope it doesn’t sound like I am trying to “make a mountain out of a molehill” here, it’s not that at all; I just wanted to share my personal experiences in hopes of further facilitating ease of use of the system. Overall I find the system to be very easy to work with and well thought out.
  16. Update file-partition.md
    Correct the command for step #2 in this outline; per @achow101, thanks very much.
    281b1b25cc
  17. sipa commented at 8:06 pm on August 2, 2017: member
    Your document is still suggesting to split the blocks/ directory from the blocks/index/ directory. Please don’t do that; it’s dangerous (they need to be in sync), and unnecessary (the blocks/index/ directory hardly sees any I/O). You should just suggest to move the chainstate/ to a faster drive compared to blocks/.
  18. in doc/file-partition.md:9 in 281b1b25cc outdated
    0@@ -0,0 +1,53 @@
    1+# File Partition:
    2+Since the blockchain is around ~140GB, storage of large files on an external drive is convenient.  If this is not done properly, the external drive will also contain high i/o-frequency LevelDB index files, protracting time for initial blockchain synchronization. This document describes how partition datadir files between the high-frequency/low-capacity "index" files and the low-frequency/high-capacity "blocks" files. Examples are given for macOS, but Linux / Windows should be similar. These instructions result in the following physical folder rearrangement:
    3+
    4+| Original Location       | New Location | Capacity Needs | i/o Frequency  |
    5+| ----------------------- | ------------ | -------------- | -------------- |
    6+| ${datadir}/blocks/index | ${datadir}   | low            | high           |
    7+| ${datadir}/blocks       | ${EXTERAL}   | high           | low            |
    8+| ${datadir}/chainstate   | n/a          | low            | high           |
    9+
    


    jimhashhq commented at 10:45 pm on August 2, 2017:

    At issue still is the value in the last column of the 1st row above (the index file folder “i/o Frequency”; is it high or low). My experience suggests that the index folder files are indeed high frequency, which is really the impetus for this doc; if they were not high frequency I would not have looked into this re-configuration detail. I only identified this possible re-configuration pitfall (which I had mistakenly made) via bitcoind file usage reported by the following command: lsof -p cat ${datadir}/bitcoind.pid | grep ldb$ This motivated me to move the subordinate/child “index” folder back onto the internal drive and set up the soft links described here. I did not yet follow though and verify i/o frequency demands of these index LeveDB files (yet).
    I admit the experience related here is qualitative and currently lacks supporting i/o reporting, but is [i]hopefully[/i] nonetheless correct.
    I first moved the entire blocks folder (including the index subfolder) from $datadir to the external disk for internal capacity reasons (basically to save space because I’m cheap), and it synched very very slowly. Then I looked at open files using “lsof” as described above, and saw LevelDB index folder files open on the external USB 3.0 drive. Moving them as described here sped things up to the point that performance seemed to match that with the default configuration of everything (including the blocks folder) on the internal drive. Basically I was just doing a simplistic du -k . & ls -lrt blk*.dat every so often and watching how fast the blk*.dat files were growing in both cases. So qualitative, and not quantitative (I apologize). The responsible thing for me to do at this point is to gather quantitative evidence to support this position.
    I’ll maybe look for a blockchain indexing or synchronization test that I can run twice; once with the index files on the external drive, and again with the index files off-of the external drive, while trying to capture i/o frequency statistics as well as wall clock time? Note that what I described above with the du -k . and long listings while watching the wall clock is really what I did above.

    A possible misperception here is that this doc was meant to be some sort of performance advice. It is not, rather it’s really just meant to be a note to help other developers/analysts who (like me) work on very low end commodity hardware yet perform initial synchronization quickly, and without the large-capacity internal disk space requirements. That said, performance minded would likely benefit from moving any high i/o frequency files to the fastest storage available, similar to how traditional file-based databases are tuned.

    Neither are these notes meant as instructions to backup the blockchain for portability between bitcoin development instances either; as @sipa points out, the “blocks” folder ’needs to be kept in sync with the “index” folder’, rendering the blocks folder useless by itself for backup/portability purposes.

    Missing from these notes is the (reasonable?) expectation that bitcoind not be started until the external disk is mounted and likewise that the external disk not be dismounted/ejected until bitcoind is shut down. To this point, I have not yet tried to see what happens, if after running in the configuration suggested here, the operator/developer accidentally tries to start bitcoin w/o the the external storage plugged in; I would hope that the index folder files are not corrupted if the blocks folder is not accessible.

    Also, I’m still trying to understand the usage patterns of GitHub, like maybe this would have been better if this were reported as an “issue”, I was tempted to do that, but don’t personally see any issues. This is really more of a “pitfalls to avoid” type of document I was hoping might further adoption.

    I do feel like I am onto something (albeit very minor) here. I do very much appreciate all of the feedback and consideration – thank you very much.

  19. in doc/file-partition.md:42 in 281b1b25cc outdated
    37+Finally, set up soft links to restore the original folder structure:
    38+   
    39+| Folder Name              | Link Name           |
    40+| ------------------------ | ------------------- |
    41+| ${EXTERNAL}/blocks/index | ${datadir}/../index |
    42+| ${datadir}/blocks        | ${EXTERAL}/blocks   |
    


    flack commented at 10:32 am on August 5, 2017:
    typo: EXTERAL
  20. laanwj commented at 12:47 pm on October 4, 2017: member

    (the blocks/index/ directory hardly sees any I/O).

    Except with -txindex I guess :(

  21. laanwj commented at 3:39 pm on November 30, 2017: member
    @jimhashhq I think this is pretty good, certainly as a start. Can you please address the comments and squash?
  22. laanwj commented at 3:46 pm on November 30, 2017: member

    Please don’t do that; it’s dangerous (they need to be in sync)

    Also: sometimes there’s the problem with the leveldb not supporting the filesystem that the blocks/ directory points to. I think this happens with some network filesystems. If blocks/index is on the same partition that will never work :/ (see e.g. #10787)

  23. laanwj renamed this:
    New file-partiion.md doc describing how to partition files to ensure fast initial blockchain synchronization..
    New file-partition.md doc describing how to partition files to ensure fast initial blockchain synchronization..
    on Nov 30, 2017
  24. Update file-partition.md daf4b53554
  25. jimhashhq commented at 1:13 am on December 1, 2017: none
    My apologies for not getting back sooner, I was ill but am feeling better. I think my intent here was just to share/communicate my experiences with “symlink(ing) out the block-files (the large part)” as mentioned in issue #10787 referenced above, thanks. I was hoping this experience might prove useful to others on low end commodity hardware who wish to store the blocks on an external USB. Thanks!
  26. jimhashhq closed this on Dec 1, 2017

  27. Update file-partition.md
    Correct typo per @arowser, thank you.
    7be8f7c788
  28. in doc/file-partition.md:48 in daf4b53554 outdated
    43+
    44+5) Replace the original index folder location with a soft link:
    45+      
    46+          ln -s /Users/coinadm/local/bitcoin/index /Volumes/WD-Passport-Mac/bitcoin/blocks/index 
    47+
    48+6) Replace the original index folder location with a soft link:
    


    arowser commented at 4:07 am on December 4, 2017:
    should be “block folder”?

    jimhashhq commented at 4:33 pm on December 4, 2017:
    Corrected, thanks!
  29. laanwj commented at 7:34 am on December 13, 2017: member

    I was hoping this experience might prove useful to others on low end commodity hardware who wish to store the blocks on an external USB. Thanks!

    Why close?

  30. jimhashhq reopened this on Dec 13, 2017

  31. laanwj closed this on Mar 5, 2018

  32. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me