Bitcoin: (bug) loadblock/bootstrap.dat will not read file larger than 2.0GB #1951

issue qubez opened this issue on October 23, 2012
  1. qubez commented at 1:49 PM on October 23, 2012: none

    Issue: New bitcoin install imports only 2.0GB of blockchain bootstrap.dat (height=189205) before continuing startup.

    Platform: Windows 7 SP1 x64 Client: Bitcoin-0.7.1-Win32

    Steps to replicate: Obtain bootstrap.dat torrent (SHA256 a3f258e7af...) (block height 193000, 2.32 GB)

    Command line used: bitcoind.exe -datadir=C:\datadir -loadblock=C:\bootstrap.dat -connect=127.0.0.1 -detachdb -printtoconsole

    Wait 5000 seconds or so, only blocks up to 189205 are processed before bitcoin continues normal operation (giving expected "no RPC password" error if no bitcoin.cfg file present).

    https://bitcointalk.org/index.php?topic=117982.msg1292042#msg1292042

  2. Diapolo commented at 8:00 PM on October 23, 2012: none

    Seems logical, as before the blk000x.dat files had a hard-coded limit of < 2GiB on Windows. I'm sure @jgarzik or @sipa can clarify this. Are you using NTFS or FAT32 as filesystem?

  3. qubez commented at 2:09 AM on October 24, 2012: none

    NTFS and EXT4. Linux PPA build 0.7.0 exhibits the same behaviour.

  4. laanwj commented at 6:35 AM on October 24, 2012: member

    Don't blame the file system, modern filesystems can handle huge files (how else would you store your BluRay images :smirk:) It's probably some trivial issue using int or unsigned int as file pointer with fseek instead of off_t.

    Looking at CDiskBlockPos in main.h:

    ...
    unsigned int nPos
    ....
    

    Also line 2502 in main.cpp https://github.com/bitcoin/bitcoin/blob/master/src/main.cpp#L2502. And in util.h:

    void AllocateFileRange(FILE *file, unsigned int offset, unsigned int length) 
    
  5. sipa commented at 9:21 AM on October 24, 2012: member

    I suppose a CReadBuffer that wraps around CAutoFile or other reader classes, and has a method for skipping input until a fixed string is found, would be a very neat solution that doesn't require any seeking at all.

    Also, fixing the byte offsets to use off_t instead of ints in the code would certainly be an improvement, but at least in the current flow, AllocateFileRange should never be called with offset+length > 128 MiB.

  6. Diapolo commented at 12:40 PM on October 24, 2012: none

    Is off_t a 64 bit unsigned integer? That would allow quite bigfiles ^^.

    We need to fix this even with ultraprune, because of the bootstrap.dat file mentioned.

  7. sipa commented at 1:06 PM on October 24, 2012: member

    off_t is whatever the system supports for offsets, but it's not entirely standardized (there's also a off64_t sometimes, with corresponding lseek64 function, defeating the purpose of the origin off_t somewhat).

    What I'm saying is that off_t would certainly be an improvement over what we have now, it'd even be better not to need to seek at all, which is certainly possible in LoadExternalBlockFile.

  8. jgarzik commented at 1:37 PM on October 24, 2012: contributor

    As noted in IRC, the specific problem is that LoadExternalBlockFile() calls fseek(), whose file offset is limited to signed 32-bit (long) on 32-bit platforms. This impacts Windows, Linux and presumably OSX as well.

    1. LoadExternalBlockFile() should be updated to avoid seeking. Most likely fread() will continue to work beyond even 4GB boundary, if we read linearly and accumulate the file position ourselves.

    2. Most of the code uses 32 bits for file position, which is highly disappointing. At a minimum, we should make sure that external serialized storage in databases like ultraprune record 64-bit file positions.

  9. luke-jr commented at 1:41 PM on October 24, 2012: member

    I wonder if SEEK_CUR would work?

  10. jgarzik commented at 1:44 PM on October 24, 2012: contributor

    SEEK_CUR would probably work, but why chance it? A simple linear read works fine too.

  11. qubez commented at 8:55 PM on October 24, 2012: none

    A different solution would be to have blockchain-generating scripts create transaction block files less than 2GB in size, perhaps named bootstrap.001, bootstrap.002, etc, and have Bitcoin look for and import these sequentially instead. One must be concerned about another file - blkindex.dat, it is 1.1GB and must be replaced in all clients with the ultraprune leveldb and/or reviewed for big file support before it approaches 2GB.

  12. laanwj commented at 5:57 AM on October 25, 2012: member

    off_t is 64 bit if -D_FILE_OFFSET_BITS=64 is defined. I've just verified this with mingw and linux (someone needs to verify on OSX).

    Another problem is that we use fseek takes a long for the offset. This depends on the architecture. We could instead use fseeko which takes an off_t (and ftello which returns one). @qubez: blkindex.dat is not affected, it is a berkelydb database, which has no problems with large files

  13. Diapolo commented at 7:19 AM on February 11, 2013: none
  14. jgarzik commented at 7:46 AM on February 11, 2013: contributor

    Yes, it is solved

  15. jgarzik closed this on Feb 11, 2013

  16. KolbyML referenced this in commit cec4b92afe on Dec 5, 2020
  17. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-27 21:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me