bitcoin/contrib/linearize/linearize-data.py. Premature end of data. #14986

issue educob openend this issue on December 17, 2018
  1. educob commented at 5:07 pm on December 17, 2018: none

    Hi.

    I have the mainnet blockchain updated (block 554227). My app prcesses the whole blockchain and for that I need to read the blocks directly from disk as asking the node for them takes months (to get them and proccess them).

    I run python linearize-hashes.py linearize.cfg > hashlist.txt with max_height=554000 and then I run: python linearize-data.py linearize.cfg

    It only processes block 0 cause in blk00000.dat the 2nd block starts with 0000000000000000 instead of f9beb4d9xxxxxxxx. After reading all blkxxxxxx.dat files linearize-data.py ends without finding block 1.

    This same thing happened days ago in block around 120000 so I deleted the blockchain and started all over again (wasting almost 3 days).

    Why does blk00000.dat have a 0000000000000000 after the 1st block? How can I know where block 1 is?

    Needless to say the node is capable of returning block 1 when calling client.getBlockByHash.

    Please help.

    ps: I think that if I could read the index *.ldb files it would be much easyer that linearize-data.py shots in the dark. Where can I find info on the index files?

  2. educob commented at 10:52 pm on December 17, 2018: none

    I have seen that just ignoring the 8 bytes of zeros everything works ok.

    So the question is, why does the node, apparently randomly, write 8 bytes with zeros between blocks in blkxxxxx.dat?

  3. MarcoFalke commented at 11:08 pm on December 17, 2018: member
    Is this helper script still useful? If so, a test should probably be added to test it. Otherwise, it might be best to remove it.
  4. educob commented at 11:20 pm on December 17, 2018: none

    I changed:

    0            if (not inhdr or (inhdr[0] == "\0")):
    1                self.inF.close()
    2                self.inF = None
    3                self.inFn = self.inFn + 1
    4                continue
    

    to:

    0            if (not inhdr or (inhdr[0] == "\0")):
    1                if(inhdr):
    2                    print("inhdr was 0. inFn:" + str(self.inFn) + str(len(inhdr)))
    3                    continue
    4                self.inF.close()
    5                self.inF = None
    6                self.inFn = self.inFn + 1
    7                continue
    

    And it’s working (still processing block 480000 but it seems it will end successfully) So it was useful. But I would like to know why the node inserts randomly these 8 zero bytes between blocks. Thanks.

    update: it correctly linearized 554000 blocks after the little modification I did.

  5. mc-buckets commented at 11:28 pm on May 13, 2019: none

    @MarcoFalke yes, very useful. My node regularly has data corruption issues (still can’t figure out why). Using -reindex or -reindex-chainstate does not always fix the issue, instead it usually just causes more data corruption issues. I run your script to make bootstrap.dat backup files which I use to reseed my node after it runs into issues.

    I don’t know how to write tests yet but I will try to figure it out and add some tests for this script, which I think is invaluable.

    Is this helper script still useful? If so, a test should probably be added to test it. Otherwise, it might be best to remove it. @educob I have a similar issue but it occurs at blk00201.dat (about ~57% of the way through at height 575910 on today’s date). I get this error:

    Invalid magic: 00000000

    I modified the inMagic if statement to continue on if it fails the check but equals ‘00000000’, which has worked so far. Obviously not thrilled about this solution because I don’t know why I had to implement it to begin with. Doesn’t answer your question about the node randomly writing zeros but know that you are not alone!

    I have seen that just ignoring the 8 bytes of zeros everything works ok.

    So the question is, why does the node, apparently randomly, write 8 bytes with zeros between blocks in blkxxxxx.dat?

  6. takinbo commented at 7:32 am on September 4, 2019: contributor
    A few weeks ago, I was also having this same problem while attempting to export the block data. This problem was also discussed a few years ago in #5028. I worked on a slightly more elegant solution to search for the next position of the valid magic bytes rather than failing in #16802. I have ran the script with block data from a node with the issue and another without the issue and they both yield the same data output.
  7. laanwj added the label Scripts and tools on Oct 2, 2019
  8. laanwj referenced this in commit df50fd194f on Oct 8, 2019
  9. sidhujag referenced this in commit 2624540adc on Oct 8, 2019
  10. adamjonas commented at 11:07 pm on April 29, 2020: member
    Believe this was resolved by #16802.
  11. MarcoFalke closed this on Apr 29, 2020

  12. PastaPastaPasta referenced this in commit 2e6335c475 on Sep 11, 2021
  13. PastaPastaPasta referenced this in commit 158ca47c3c on Sep 11, 2021
  14. PastaPastaPasta referenced this in commit c928eae616 on Sep 12, 2021
  15. PastaPastaPasta referenced this in commit debed8d819 on Sep 12, 2021
  16. PastaPastaPasta referenced this in commit 0ab338a7bd on Sep 12, 2021
  17. PastaPastaPasta referenced this in commit e5c7de4e87 on Sep 14, 2021
  18. PastaPastaPasta referenced this in commit c9f2068cde on Sep 14, 2021
  19. PastaPastaPasta referenced this in commit bba59cabc3 on Sep 15, 2021
  20. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-06 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me