Allow to optional specify the directory for the blocks storage #12653

pull jonasschnelli wants to merge 3 commits into bitcoin:master from jonasschnelli:2018/03/blocksdir changing 9 files +85 −11
  1. jonasschnelli commented at 6:07 am on March 9, 2018: contributor

    Since the actual block files taking up more and more space, it may be desirable to have them stored in a different location then the data directory (use case: SSD for chainstate, etc., HD for blocks).

    This PR adds a -blocksdir option that allows one to keep the blockfiles and the blockindex external from the data directory (instead of creating symlinks).

    I fist had an option to keep the blockindex within the datadir, but seems to make no sense since accessing the index will (always) lead to access (r/w) the block files.

  2. jonasschnelli added the label Block storage on Mar 9, 2018
  3. jonasschnelli force-pushed on Mar 9, 2018
  4. jonasschnelli force-pushed on Mar 9, 2018
  5. randolf approved
  6. randolf commented at 6:37 am on March 9, 2018: contributor
    As the size of the blockchain continues to increase, this will facilitate the need to store the blocks on a different hard drive, hence this feature serves a probable future need. There are likely other uses for this feature too.
  7. eklitzke commented at 4:25 pm on March 10, 2018: contributor

    The argument for keeping the block index in the data dir would be that in the vast majority of cases it means the block index will be on the same filesystem as the utxo database, and therefore we can rely on regular filesystem semantics regarding data ordering. If the block index and utxo database are on different filesystems then it’s not possible to guarantee that one is ahead of the other without fully serializing the operations with fsync.

    I will try to look through the code more and think about this more closely. I think this change is fine the way the code currently works because the flushes are basically serialized anyway (and we fsync for block storage and block index updates), but this might become annoying long term.

  8. Allow to optional specify the directory for the blocks storage 386a6b62a8
  9. jonasschnelli force-pushed on Mar 11, 2018
  10. jonasschnelli commented at 4:43 am on March 11, 2018: contributor
    Added a commit (basically a one-liner) that will make the block index database stick in datadir.
  11. eklitzke commented at 7:11 am on March 11, 2018: contributor
    utACK Once you fix the python lint failure.
  12. promag commented at 7:05 pm on March 11, 2018: member
    Concept ACK.
  13. in src/util.h:185 in 34ec8e308b outdated
    181@@ -182,6 +182,7 @@ void ReleaseDirectoryLocks();
    182 
    183 bool TryCreateDirectories(const fs::path& p);
    184 fs::path GetDefaultDataDir();
    185+const fs::path &GetBlocksDir(bool fNetSpecific = true);
    


    MarcoFalke commented at 8:13 pm on March 11, 2018:
    I am wondering why this should be net specific, considering that we have the same concept for wallets and the GetWalletDir is not net specific.

    laanwj commented at 1:42 pm on March 15, 2018:
    Isn’t the blocks directory by definition net specific?
  14. in test/functional/feature_blocksdir.py:20 in 34ec8e308b outdated
    15+        self.setup_clean_chain = True
    16+        self.num_nodes = 1
    17+
    18+    def run_test(self):
    19+        self.stop_node(0)
    20+        node0path = os.path.join(self.options.tmpdir, "node0");
    


    promag commented at 3:37 pm on March 12, 2018:
    Remove ;.
  15. in test/functional/feature_blocksdir.py:24 in 34ec8e308b outdated
    19+        self.stop_node(0)
    20+        node0path = os.path.join(self.options.tmpdir, "node0");
    21+        shutil.rmtree(node0path)
    22+        initialize_datadir(self.options.tmpdir, 0)
    23+        self.log.info("Starting with non exiting blocksdir ...")
    24+        self.assert_start_raises_init_error(0, ["-blocksdir="+self.options.tmpdir+ "/blocksdir"], "Specified blocks director")
    


    promag commented at 3:44 pm on March 12, 2018:

    Add

    0blocksdir = os.path.join(self.options.tmpdir, "blocksdir")
    

    and reuse here and below.

  16. in src/txdb.cpp:150 in 34ec8e308b outdated
    146@@ -147,7 +147,7 @@ size_t CCoinsViewDB::EstimateSize() const
    147     return db.EstimateSize(DB_COIN, (char)(DB_COIN+1));
    148 }
    149 
    150-CBlockTreeDB::CBlockTreeDB(size_t nCacheSize, bool fMemory, bool fWipe) : CDBWrapper(GetDataDir() / "blocks" / "index", nCacheSize, fMemory, fWipe) {
    151+CBlockTreeDB::CBlockTreeDB(size_t nCacheSize, bool fMemory, bool fWipe) : CDBWrapper(gArgs.IsArgSet("-blocksdir") ? GetDataDir() / "blockindex" : GetBlocksDir() / "index", nCacheSize, fMemory, fWipe) {
    


    promag commented at 3:54 pm on March 12, 2018:

    "blockindex"? Should be "blocks" / "index"?

    Maybe add GetBlockIndexDir()?


    laanwj commented at 7:31 pm on March 15, 2018:

    Yes, this seems wrong. I tried providing my existing blocks directory:

    0$ src/bitcoind -testnet -blocksdir=/store2/tmp/bitcoin/testnet3/blocks -printtoconsole
    12018-03-15T19:29:04Z Opening LevelDB in /.../.bitcoin/testnet3/blockindex
    22018-03-15T19:29:04Z Opened LevelDB successfully
    3...
    42018-03-15T19:29:05Z : Error initializing block database.
    5Please restart with -reindex or -reindex-chainstate to recover.
    6: Error initializing block database.
    7Please restart with -reindex or -reindex-chainstate to recover.
    

    This fails. Shouldn’t it use the index directory within the blocksdir?


    jonasschnelli commented at 2:35 am on March 20, 2018:
    Using the same directory looks most obvious correct, though, if you switch the blocks dir to a different path (with -blocksdir=<path>) users may run into troubles since the old blocks dir may still be in the datadir… But I agree, lets keep “block” / “Index” for now… fixed.
  17. laanwj commented at 11:27 am on March 13, 2018: member
    Concept ACK, this would avoid some symlinking gymnastics I’m currently doing. (ref; #10922)
  18. laanwj assigned laanwj on Mar 13, 2018
  19. hkjn commented at 1:35 pm on March 13, 2018: contributor

    ACK 34ec8e3, once lint issues are fixed.

    Does what it says on the tin, seems like a useful feature.

  20. eklitzke commented at 1:29 pm on March 15, 2018: contributor
    I might have to retract my earlier statement, as @sipa says here that the blocks index and block files should be on the same filesystem, contradicting my earlier statement. I’m not sure if I understand that reasoning though, as we fsync after writing the block and before committing to the block index, which should be fine across filesystems. If it’s really true that they need to be on the same filesystem that would be problematic in the future if we ever wanted to merge the block index and the chainstate index.
  21. laanwj commented at 1:43 pm on March 15, 2018: member

    I think this is almost ready. Will test. However, python-nanny wants you to fix:

    0./test/functional/feature_blocksdir.py:20:63: E703 statement ends with a semicolon
    1^---- failure generated from contrib/devtools/lint-python.sh
    
  22. donaloconnor commented at 10:56 pm on March 15, 2018: contributor
    utACK
  23. QA: Add -blocksdir test f38e4fdb06
  24. jonasschnelli force-pushed on Mar 18, 2018
  25. laanwj commented at 3:09 pm on March 19, 2018: member

    I might have to retract my earlier statement, as @sipa says here that the blocks index and block files should be on the same filesystem

    Wouldn’t this imply that we’re better off without commit https://github.com/bitcoin/bitcoin/pull/12653/commits/4855b9e8365e0037fcd9a4c6796dea04efa019ec? That would keep the block index with the blocks, which is @sipa’s preference.

  26. -blocksdir: keep blockindex leveldb database in datadir a1926362ec
  27. jonasschnelli force-pushed on Mar 20, 2018
  28. jonasschnelli commented at 2:36 am on March 20, 2018: contributor

    I might have to retract my earlier statement, as @sipa says here that the blocks index and block files should be on the same filesystem

    Wouldn’t this imply that we’re better off without commit 4855b9e? That would keep the block index with the blocks, which is @sipa’s preference. @sipa: can you elaborate your concerns of having the index outside of the blocks directory?

  29. sipa commented at 2:37 am on March 20, 2018: member
    @jonasschnelli @eklitzke I don’t think there’s an actual problem with having the blocks and the block index on separate filesystems. I was worried about them going out of sync due to management (you copy one but not the other, etc).
  30. jonasschnelli commented at 2:39 am on March 20, 2018: contributor
    Thanks @sipa. I think then we should keep the last commit (a1926362ecb3c354ae338ef7d7020daf78f980c9).
  31. laanwj commented at 7:06 pm on March 27, 2018: member
    In that case, utACK a1926362ecb3c354ae338ef7d7020daf78f980c9
  32. laanwj merged this on Mar 27, 2018
  33. laanwj closed this on Mar 27, 2018

  34. laanwj referenced this in commit 534b8fa560 on Mar 27, 2018
  35. Sjors commented at 12:54 pm on March 29, 2018: member
    Sorry I’m late to the party, but I’m seeing some strange behavior on macOS, see #12828.
  36. opacey commented at 5:07 pm on April 3, 2018: none

    I ran a little test just to see the effect of putting the entire datadir an external USB 3.0 HDD versus the internal SSD (cross ref #10736): MacBookPro2013 ExternalUSBHDD (NoNetwork) = ~1min, CPU ~35% MacBookPro2013 InternalSSD (NoNetwork) = ~2.0sec, CPU ~120% MacBookPro2017 InternalSSD (NoNetwork) = ~2.0sec, CPU ~120% MacBookPro2017 InternalSSD (NetworkOn) = ~2.5sec, CPU ~120%

    Timings are gaps between block validations. There is a ~30x slow down when using the USB 3.0 HDD.

    I also experimented with different combinations of the following settings: par=1, 2, 3, 4 dbcache=1000, 2000, 4000, 5000 maxsigcachesize=64, 1000 …they didn’t seem to have a significant effect on timing except dbcache (bigger = better).

    I’m surprised that signature verification speed increase is negligible between a 2013 Core i5 and a 2017 Core i5 [and that par doesn’t seem to effect results significantly, though that’s off topic.]

  37. sipa added the label Needs release notes on Aug 14, 2018
  38. promag commented at 9:04 am on October 2, 2018: member

    This PR adds a -blocksdir option that allows one to keep the blockfiles and the blockindex external from the data directory (instead of creating symlinks). @jonasschnelli could you update/fix the PR description?

    Edit: Done

  39. hebasto referenced this in commit 16d79b064c on Oct 2, 2018
  40. hebasto referenced this in commit 3045704502 on Oct 5, 2018
  41. laanwj referenced this in commit fe23553edd on Oct 18, 2018
  42. jfhk referenced this in commit 4d565f2ff6 on Nov 14, 2018
  43. JeremyRubin referenced this in commit 636f5453d5 on Nov 24, 2018
  44. HashUnlimited referenced this in commit a9ff66310c on Nov 26, 2018
  45. Empact commented at 6:59 pm on January 7, 2019: member
  46. laanwj referenced this in commit 195d28fecb on Jan 9, 2019
  47. laanwj referenced this in commit 64ee94356f on Jan 16, 2019
  48. fanquake removed the label Needs release note on Mar 20, 2019
  49. in src/txdb.cpp:150 in a1926362ec
    146@@ -147,7 +147,7 @@ size_t CCoinsViewDB::EstimateSize() const
    147     return db.EstimateSize(DB_COIN, (char)(DB_COIN+1));
    148 }
    149 
    150-CBlockTreeDB::CBlockTreeDB(size_t nCacheSize, bool fMemory, bool fWipe) : CDBWrapper(GetDataDir() / "blocks" / "index", nCacheSize, fMemory, fWipe) {
    151+CBlockTreeDB::CBlockTreeDB(size_t nCacheSize, bool fMemory, bool fWipe) : CDBWrapper(gArgs.IsArgSet("-blocksdir") ? GetDataDir() / "blocks" / "index" : GetBlocksDir() / "index", nCacheSize, fMemory, fWipe) {
    


    markblundeberg commented at 6:03 pm on May 9, 2019:
    What is this conditional for – why not always just GetDataDir() / "blocks" / "index" ?

    Sjors commented at 6:43 pm on May 9, 2019:
    The -blocksdir parameter lets you put blocks in a completely different directory, e.g. on a slow cheap magnetic harddrive, keeping the rest on your datadir on SSD. blocks/index also needs to be on a fast drive. Ideally tough I think datadir/blocks/index should move to datadir/index/blocks

    markblundeberg commented at 9:47 pm on May 9, 2019:
    Yes, I just mean that no matter whether the gArgs.IsArgSet("-blocksdir") ? conditional here is True / False, either way it results in using “DATADIR/blocks/index”. So the conditional seems like just extra code … though this makes me suspect I’m missing something.

    Sjors commented at 1:53 pm on May 10, 2019:

    gArgs.IsArgSet("-blocksdir") ? GetDataDir() / "blocks" / "index" : GetBlocksDir() / "index"

    If launched without -blocksdir:

    0false -> GetBlocksDir() / "index" -> ~/.bitcoin/blocks/index
    

    If launched with with -blocksdir=/mnt/magnetic/blocks:

    0true -> GetDataDir() / "blocks" / "index" ->   /mnt/magnetic/blocks/index
    

    Note that GetBlocksDir() internally also checks the -blocksdir argument, and create a directory if needed, so it won’t use DATADIR in that is set.

    https://github.com/bitcoin/bitcoin/blob/a1926362ecb3c354ae338ef7d7020daf78f980c9/src/util.cpp#L607-L634


    markblundeberg commented at 7:35 pm on May 12, 2019:
    Ah sorry, I meant to say either way it uses GetBlocksDir()/"blocks"/"index" but wrote DATADIR. Anyway though as you say, the directory check&create does make it slightly different.

    promag commented at 1:50 pm on October 6, 2019:

    If launched with with -blocksdir=/mnt/magnetic/blocks:

    0true -> GetDataDir() / "blocks" / "index" ->   /mnt/magnetic/blocks/index
    

    This is incorrect, right @Sjors?


    Sjors commented at 6:58 pm on October 6, 2019:
    I believe it’s like this: -blocksdir=/mnt/magnetic will put blocks in /mnt/magnetic/blocks. The block index will remain at ~/.bitcoin/blocks/index and ~/.bitcoin/blocks will otherwise be empty.
  50. PastaPastaPasta referenced this in commit cb037ee4fe on Sep 27, 2020
  51. PastaPastaPasta referenced this in commit d5dc2b3840 on Sep 27, 2020
  52. PastaPastaPasta referenced this in commit ba32ad8620 on Oct 14, 2020
  53. random-zebra referenced this in commit edfaec63c2 on May 1, 2021
  54. PastaPastaPasta referenced this in commit c0fa59f214 on Jun 26, 2021
  55. PastaPastaPasta referenced this in commit c1d86731ef on Jun 26, 2021
  56. PastaPastaPasta referenced this in commit a8c5821419 on Jun 27, 2021
  57. PastaPastaPasta referenced this in commit 69dd85bd3d on Jun 28, 2021
  58. PastaPastaPasta referenced this in commit b12c459007 on Jun 28, 2021
  59. Munkybooty referenced this in commit 6b87e6e5db on Jul 21, 2021
  60. Munkybooty referenced this in commit e2043dc2e9 on Jul 21, 2021
  61. Munkybooty referenced this in commit 29c0104010 on Jul 22, 2021
  62. Munkybooty referenced this in commit b70b4f01b7 on Jul 22, 2021
  63. Munkybooty referenced this in commit 8c71bc1f3c on Jul 23, 2021
  64. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me