bitcoind crash #31712

issue frbitten openend this issue on January 22, 2025
  1. frbitten commented at 8:12 pm on January 22, 2025: none

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    My bitcoin node is crashing quite frequently. And after the crash you can’t recover anymore. It seems that the files become inaccessible and you cannot start again. Because I can’t even read the debug.log file. Everything only works again after restarting the system.

    The problem started after I updated to version 27.0.0. Before I had version 25.0.0

    Expected behaviour

    Don’t crash

    Steps to reproduce

    I didn’t identify any steps to reproduce. It happens at random times every 3 or 4 days.

    Relevant log output

     0Jan 21 09:38:57 freedom-server systemd[1]: Started bitcoind.service - Bitcoin daemon.
     1Jan 22 18:27:11 freedom-server systemd[1]: bitcoind.service: Main process exited, code=dumped, status=6/ABRT
     2Jan 22 18:27:11 freedom-server systemd[1]: bitcoind.service: Failed with result 'core-dump'.
     3Jan 22 18:27:11 freedom-server systemd[1]: bitcoind.service: Consumed 2h 53min 32.054s CPU time, 648.0M memory peak, 0B memory swap peak.
     4Jan 22 18:27:11 freedom-server systemd[1]: bitcoind.service: Scheduled restart job, restart counter is at 1.
     5Jan 22 18:27:11 freedom-server systemd[1]: Starting bitcoind.service - Bitcoin daemon...
     6Jan 22 18:27:11 freedom-server bitcoind[58682]: Error: Settings file could not be read:
     7Jan 22 18:27:11 freedom-server bitcoind[58682]: - /hdd/bitcoin/settings.json. Please check permissions.
     8Jan 22 18:27:11 freedom-server systemd[1]: bitcoind.service: Control process exited, code=exited, status=1/FAILURE
     9Jan 22 18:27:11 freedom-server systemd[1]: bitcoind.service: Failed with result 'exit-code'.
    10Jan 22 18:27:11 freedom-server systemd[1]: Failed to start bitcoind.service - Bitcoin daemon.
    

    How did you obtain Bitcoin Core

    Pre-built binaries

    What version of Bitcoin Core are you using?

    27.0.0

    Operating system and version

    Ubuntu 24.04 LTS

    Machine specifications

    Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz 8GB RAM

  2. darosior commented at 8:31 pm on January 22, 2025: member
    0Jan 22 18:27:11 freedom-server bitcoind[58682]: - /hdd/bitcoin/settings.json. Please check permissions.
    

    Looks like a misconfiguration of your system, not an issue with Bitcoin Core itself.

  3. frbitten commented at 8:43 pm on January 22, 2025: none
    0Jan 22 18:27:11 freedom-server bitcoind[58682]: - /hdd/bitcoin/settings.json. Please check permissions.
    

    Looks like a misconfiguration of your system, not an issue with Bitcoin Core itself.

    If you notice this message appears after the core dump when trying to restart bitcoind. So it is a post-error effect.

  4. maflcko commented at 11:28 am on January 23, 2025: member

    It seems unlikely that Bitcoin Core would change the file permissions of all files in the datadir. It seems a bit more likely that changing the file permissions is the reason for the crash (for example, I could imagine that writing blocks won’t work).

    What is the error message when you try to open the debug log? What is the error message when you try with root?

    What filesystem are you using? external/internal hdd? Is the hardware known to fail? Is anything else on the hdd?

  5. maflcko commented at 11:43 am on January 23, 2025: member

    Bitcoin Core makes heavy use of CPU, RAM and storage IO. Hardware defects might only become visible when running Bitcoin Core. You might want to check your hardware for defects.

    • Use software such as memtest86 to check your RAM.
    • Use software such as linpack, or Prime95 to check the CPU behaviour under load.
    • Use software such as smartctl, fsck, badblocks, or CrystalDiskInfo to test your storage device use.

    Source: https://bitcoin.stackexchange.com/a/12206

  6. frbitten commented at 1:57 pm on January 23, 2025: none

    It seems unlikely that Bitcoin Core would change the file permissions of all files in the datadir. It seems a bit more likely that changing the file permissions is the reason for the crash (for example, I could imagine that writing blocks won’t work).

    What is the error message when you try to open the debug log? What is the error message when you try with root?

    What filesystem are you using? external/internal hdd? Is the hardware known to fail? Is anything else on the hdd?

    The hardware has always worked well without any issues. A few days ago I installed Ubuntu 24.04 over 22.10. But only on the system disk. I updated bitcoinCore and CLN to recent versions and the problem started appearing.

    When trying to view the debug file (cat) an error occurs because the file is read-only. It is a relatively new HDD (less than 2 years old) that has the blockchain and an SDD where Ubuntu is.

    I will scan the hardware to see if I can find any problems with the recommended software.

  7. maflcko commented at 2:03 pm on January 23, 2025: member

    When trying to view the debug file (cat) an error occurs because the file is read-only.

    Reading a file should be possible even when the file is read-only.

    The filesystem will usually go into read-only while the system is running if there is a filesystem consistency issue …

  8. frbitten commented at 2:06 pm on January 23, 2025: none
    Yes, even though it is read-only, cat should show the content. But he can’t. It only becomes visible after restarting the computer. Then I suspect you have some file system problem. I’m not sure if the filesystem failure causes bitcoind to crash or the crash causes the filesystem failure.
  9. frbitten commented at 9:50 am on January 27, 2025: none

    I used fsck to repair the HD and the crashes that the system could not recover from stopped. But I still have bitcoind crashes every 3 days or so where the service restarts and recovers.

    I get this error in the debug file. 2025-01-27T00:07:33Z ERROR: ReadRawBlockFromDisk: Read from block file failed: AutoFile::read: fread failed: iostream error for FlatFilePos (nFile=3002, nPos=110158663)

    Do I have a damaged HD? You can correct this possible damage or isolate the damaged area. I’m not a Linux expert and I don’t know any tools for these fixes.

  10. darosior commented at 2:48 pm on January 27, 2025: member
    I think this issue can be closed because it’s clear it’s not an issue with Bitcoin Core but the user’s system.
  11. frbitten commented at 2:55 pm on January 27, 2025: none

    I think this issue can be closed because it’s clear it’s not an issue with Bitcoin Core but the user’s system.

    I would like some help to confirm that it is a problem with the HD and how to solve it. I would like it to remain open so that someone can help with this issue.

  12. sipa commented at 3:12 pm on January 27, 2025: member
    @frbitten Bitcoin Core, as a user-level application, is simply incapable of corrupting a filesystem, as it has no access to the raw disk. Corruption is a sign of either a hardware issue (likely) or an operation system issue (unlikely).
  13. frbitten commented at 3:35 pm on January 27, 2025: none

    @sipa
    The HD is a Crucial BX500 and is 2 years old, the node will be 2 years old in March.

    Since the HD is a good brand and has been in use for 2 years, I find it strange that there would be a problem with the hardware. Is there any way to identify where the problem is and isolate if it is bad sectors, so as not to have to buy a new HD?

  14. sipa commented at 3:37 pm on January 27, 2025: member
    @frbitten It could simply be your system being far more loaded by Bitcoin Core’s synchronization process than any other tasks it is used for, revealing issues like CPU overheating, or bad memory, or a bad disk, or bad wiring, that isn’t otherwise detectable.
  15. frbitten commented at 3:44 pm on January 27, 2025: none

    The node has always worked without problems until I installed Ubuntu version 24.04 (previously it was 22.10) and updated bitcoind to version 27.

    The HD where the blockchain is and has a recording error is a secondary HD just for this data.

    CPU and memory always stay below 20%.

  16. jimhashhq commented at 7:09 pm on January 27, 2025: none

    In addition to great suggestions above, am wondering how big the internal SSD data disk is and what capacity it’s at?

    0$ df -k . <path-to-data-dir>
    

    Also, what’s the reboot policy for the node? Best practice is to: bitcoin-cli stop prior to node reboot, either manually or by adding to the system shutdown.

    If internal SSD storage is at or near capacity, it might benefit from added storage. In lieu of adding more internal storage on commodity hardware, it might be cheaper to add external storage and reference it with

    -blocksdir

    argument to node startup. Though this would require extra vigilance to be sure external storage is quiesced/synched before shutdown.

    Another possible explanation might be AV software inadvertently conflicting with block storage. This AV conflict was alleviated by the -blocksxor node startup switch added in v28.0.

  17. jimhashhq commented at 8:11 pm on January 27, 2025: none
    Also, maybe check the administrative account’s resource limits via: $ ulimit -a , and upping any conservative limits (especially stack size) to their hard limits.
  18. sipa commented at 8:14 pm on January 27, 2025: member

    Also, what’s the reboot policy for the node? Best practice is to: bitcoin-cli stop prior to node reboot, either manually or by adding to the system shutdown.

    This will not do anything; shutdown is the same whether you send the stop command or not.

    Also, maybe check the administrative account’s resource limits via: $ ulimit -a , and upping any conservative limits (especially stack size) to their hard limits.

    While this might prevent certain problems, it is unrelated to disk corruption.

  19. jimhashhq commented at 9:18 pm on January 27, 2025: none
    Thank you, @sipa for corrections. It does sound like I was wrongly conflating device-related “data corruption” with process/application-related “data inconsistencies.” Hopefully other capacity and configuration suggestions may be of use.
  20. frbitten commented at 9:20 pm on January 27, 2025: none

    In addition to great suggestions above, am wondering how big the internal SSD data disk is and what capacity it’s at?

    It’s a 1TB disk. It is 90% used.

    Also, what’s the reboot policy for the node? Best practice is to: bitcoin-cli stop prior to node reboot, either manually or by adding to the system shutdown.

    bitcoind runs as a service. There is no specific command indicated for reboot.

    Also, maybe check the administrative account’s resource limits via: $ ulimit -a , and upping any conservative limits (especially stack size) to their hard limits.

     0real-time non-blocking time  (microseconds, -R) unlimited
     1core file size              (blocks, -c) 0
     2data seg size               (kbytes, -d) unlimited
     3scheduling priority                 (-e) 0
     4file size                   (blocks, -f) unlimited
     5pending signals                     (-i) 31022
     6max locked memory           (kbytes, -l) 1001720
     7max memory size             (kbytes, -m) unlimited
     8open files                          (-n) 1024
     9pipe size                (512 bytes, -p) 8
    10POSIX message queues         (bytes, -q) 819200
    11real-time priority                  (-r) 0
    12stack size                  (kbytes, -s) 8192
    13cpu time                   (seconds, -t) unlimited
    14max user processes                  (-u) 31022
    15virtual memory              (kbytes, -v) unlimited
    16file locks                          (-x) unlimited
    
  21. jimhashhq commented at 9:55 pm on January 27, 2025: none
    If you had to fsck / repair your filesystem, it may be unavoidable, you may need new storage device(s). System error logs at the times /var/log/syslog, might help you shed light on on this. Also, 90% capacity seems (to me) kind of high for one SDD that has to do everything – OS, swap, tmp, apps, blockchain data, etc. Also, you did not mention AV software.. You might at least benefit from cheap external storage at least, especially if you are not running pruned. That, or switch to running pruned if you are not already.
  22. frbitten commented at 10:20 pm on January 27, 2025: none

    @jimhashhq The 1TB SSD is only for bitcoind and lightning. The system is on another 124G SSD that is only 10% used. This system SSD contains OS, swap, tmp, apps.

    I use a full node, is a pruned node reliable to use with CLN?

    What is “AV software”? Antivirus? I don’t understand what software you are talking about. The hardware is dedicated for the bitcoin node and CLN. I don’t use it as a desktop and it has the Ubuntu Server Lite version installed.

  23. davidgumberg commented at 10:28 pm on January 27, 2025: contributor

    If you had to fsck / repair your filesystem, it may be unavoidable, you may need new storage device(s).

    +1

    Since the HD is a good brand and has been in use for 2 years, I find it strange that there would be a problem with the hardware.

    Disks from every brand have failed at every age, and it is reasonable that some examples of a generally reliable drive will fail after 2 years, it’s possible that the manufacturer of your drive will replace it under warranty if it has failed before its advertised lifetime. (https://www.crucial.com/ssd/bx500)

    You might want to check the SMART data of the drive, doing something like:

    0$ sudo apt install smartmontools
    1$ sudo smartctl -Hc /dev/sdX && sudo smartctl -A /dev/sdX
    2$ # replace X with your drive letter, if you're not sure what that is try  'fdisk -l' or 'lsblk'
    

    This article: https://www.linuxjournal.com/article/6983 has helpful information about how to interpret the output of smartctl -A /dev/sdX

  24. jimhashhq commented at 10:58 pm on January 27, 2025: none

    @frbitten, It does sound to me personally that that something may be going on with the physical storage.

    I mentioned the AV software (yes AntiVrius) thing only as I saw something about file permissions changing, which made me think external process control/monitoring. I regret bringing up the AV topic now as it is unlikely to be related and confused the issue. Also, I brought up the ulimit/stack thing as stack faults can cause signal ABRT error for any Linux process (not just Bitcoin). Again, this is unlikely.

    I personally feel like this may be at a jumping off point where you might need to go off by yourself and try:

    • Deeper physical/logical storage diagnosis.
    • Swapping out new/different storage.

    Though the latter might be the cheapest/simplest. Or maybe swapping the commodity system for another one altogether.

  25. frbitten commented at 9:51 am on January 28, 2025: none

    You might want to check the SMART data of the drive, doing something like:

     0SMART Attributes Data Structure revision number: 16
     1Vendor Specific SMART Attributes with Thresholds:
     2ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
     3  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       679
     4  5 Reallocate_NAND_Blk_Cnt 0x0032   074   074   010    Old_age   Always       -       13
     5  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       16603
     6 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       12
     7171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
     8172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
     9173 Ave_Block-Erase_Count   0x0032   000   000   000    Old_age   Always       -       14561
    10174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2
    11180 Unused_Reserve_NAND_Blk 0x0033   100   100   000    Pre-fail  Always       -       36
    12183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
    13184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       679
    14187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       142
    15194 Temperature_Celsius     0x0022   072   055   000    Old_age   Always       -       28 (Min/Max 20/45)
    16196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       13
    17197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       13
    18198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       142
    19199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
    20202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100
    21206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
    22210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       63
    23246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       25188647128
    24247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       787145222
    25248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       136863118272
    26249 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       0
    27251 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       170062911
    28252 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       676
    29253 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       1
    

    According to SMART, the problem may be in item 202 Percent_Lifetime_Remain. But I find it strange that an SSD’s lifespan ends so quickly.

    What is the most suitable SSD and how to use it to ensure a long life for your equipment? Is it better to have one large SSD or 2 smaller ones?

  26. maflcko commented at 7:34 pm on February 5, 2025: member

    Usually the issue tracker is used to track technical issues related to the Bitcoin Core code base.

    General bitcoin questions and/or support requests are best directed to the Bitcoin StackExchange or the #bitcoin IRC channel on Libera Chat, or one of the Bitcoin subreddits, or any other place that you feel is well suited.

  27. maflcko closed this on Feb 5, 2025

  28. maflcko added the label Data corruption on Feb 5, 2025
  29. maflcko added the label Questions and Help on Feb 5, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me