test: Handle functional test disk-full error #29335

BrandonOdiwuor commented at 8:35 AM on January 28, 2024: contributor

Fixes: #23099

Handle disk-full more gracefully in functional tests

DrahtBot commented at 8:35 AM on January 28, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	cbergqvist, tdb3, itornaza, achow101
Concept ACK	epiccurious
Stale ACK	maflcko, kevkevinpal, davidgumberg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29387 (test: fix RPC coverage check by BrandonOdiwuor)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Jan 28, 2024

in test/functional/test_runner.py:34 in f6703be2ac outdated

  29 | @@ -30,6 +30,9 @@
  30 |  
  31 |  os.environ["REQUIRE_WALLET_TYPE_SET"] = "1"
  32 |  
  33 | +# Should fail testing and immediately stop if free space falls below.
  34 | +MIN_FREE_SPACE = (50 + 16) * 1024 * 1024

maflcko commented at 12:02 PM on January 28, 2024:

Where comes 50+16 from?

BrandonOdiwuor commented at 12:13 PM on January 28, 2024:

I based the values based on the discussions on #23100 (comment) and #23100 (review) and Screenshot from 2024-01-28 15-08-25

maflcko commented at 2:38 PM on January 29, 2024:

What is the total size of the directory when no test dir is cleaned up and all tests are run?

BrandonOdiwuor commented at 11:08 AM on January 30, 2024:

The total size of the generated testdir after running ./test_runner.py --nocleanup is 11GB on my system Screenshot from 2024-01-30 13-58-06

See the list of the generated folders and their sizes on the attached file below: sudo du -h /tmp/test_runner_₿_🏃_20240130_125641/ > ~/tmp-test_runner_₿_🏃_20240130_125641-test-files.txt

tmp-test_runner_₿_🏃_20240130_125641-test-files.txt

maflcko commented at 11:15 AM on January 30, 2024:

1.1G	/tmp/test_runner_₿_🏃_20240130_125641/feature_block_289

So this value seems wrong either way? How can 66 MB be enough to fit 1.1GB, or even 11GB?

BrandonOdiwuor commented at 12:59 PM on January 30, 2024:

Looks like I need at least 1.25GB (sudo mount -t tmpfs -o size=1250M none /tmp) to run the tests on my system with no errors

I have plotted a map using gnuplot from the data collected tracking the size of the tmpdir when running the tests using: watch -n 10 "sudo du -s /tmp/test_runner_₿_🏃_20240130_153709/ | awk '{print strftime(\"%H:%M:%S\"), int(\$1/1024)}' | tee -a ~/watch-tee1.log"

directory_size_plot1

directory_size_plot

in test/functional/test_runner.py:647 in f6703be2ac outdated

 641 | @@ -639,6 +642,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 642 |                      logging.debug("Early exiting after test failure")
 643 |                      break
 644 |  
 645 | +                testdir_disk_usage_stats = shutil.disk_usage(testdir)
 646 | +                if (testdir_disk_usage_stats.free < MIN_FREE_SPACE):
 647 | +                    logging.debug("Not space enough on device: %s" % testdir)

maflcko commented at 12:03 PM on January 28, 2024:

nit: In python the () are not needed. Also, could use f-strings?

BrandonOdiwuor commented at 2:22 PM on January 29, 2024:

fixed

in test/functional/test_runner.py:645 in f6703be2ac outdated

 641 | @@ -639,6 +642,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 642 |                      logging.debug("Early exiting after test failure")
 643 |                      break
 644 |  
 645 | +                testdir_disk_usage_stats = shutil.disk_usage(testdir)

maflcko commented at 10:34 AM on January 29, 2024:

I wonder if the check could be done upfront, to not even run any test, if it is unclear whether the device will even hold enough to run the tests.

BrandonOdiwuor commented at 2:23 PM on January 29, 2024:

moved the check up run_checks(...)

BrandonOdiwuor force-pushed on Jan 29, 2024

BrandonOdiwuor marked this as ready for review on Jan 29, 2024

BrandonOdiwuor requested review from maflcko on Jan 29, 2024

kevkevinpal commented at 10:23 AM on January 30, 2024: contributor

Concept ACK ebb842a

in test/functional/test_runner.py:583 in ebb842ae4f outdated

 576 | @@ -574,6 +577,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 577 |      if not result.wasSuccessful():
 578 |          sys.exit("Early exiting after failure in TestFramework unit tests")
 579 |  
 580 | +    # Exit the test if there is not enough space on the testing dir
 581 | +    testdir_disk_usage_stats = shutil.disk_usage(tmpdir)
 582 | +    if (testdir_disk_usage_stats.free < MIN_FREE_SPACE):
 583 | +        sys.exit(f"Tests aborted: Insufficient space available on the device. Directory: {tmpdir}")

kevkevinpal commented at 10:26 AM on January 30, 2024:

I think we can reword this to Tests aborted: Insufficient space available in directory: {tmpdir}

Since the device may have space but the tmpdir we are using might have insufficient space

BrandonOdiwuor force-pushed on Jan 30, 2024

BrandonOdiwuor requested review from kevkevinpal on Jan 30, 2024

epiccurious commented at 2:02 PM on January 31, 2024: contributor

Tested ACK.

Aborts as expected with low disk space.

Temporary test directory at /tmp/test_runner_₿_🏃_20240131_100802
Running Unit Tests for Test Framework Modules
.....................
----------------------------------------------------------------------
Ran 21 tests in 22.349s

OK
Tests aborted: Insufficient space available in directory: /tmp/test_runner_₿_🏃_20240131_100802

epiccurious commented at 2:18 AM on February 11, 2024: contributor

Right now, the test aborts for low disk space after running the Unit Tests for Test Framework Modules.

Instead, does it make sense to abort for low disk space before?

BrandonOdiwuor force-pushed on Feb 21, 2024

BrandonOdiwuor commented at 2:58 PM on February 21, 2024: contributor

@epiccurious moved aborting the tests before the Unit Tests for Test Framework Modules.

in test/functional/test_runner.py:567 in 0d30cffec0 outdated

 561 | @@ -559,6 +562,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 562 |      if os.path.isdir(cache_dir):
 563 |          print("%sWARNING!%s There is a cache directory here: %s. If tests fail unexpectedly, try deleting the cache directory." % (BOLD[1], BOLD[0], cache_dir))
 564 |  
 565 | +    # Exit the test if there is not enough space on the testing dir
 566 | +    testdir_disk_usage_stats = shutil.disk_usage(tmpdir)
 567 | +    if (testdir_disk_usage_stats.free < MIN_FREE_SPACE):

maflcko commented at 3:10 PM on February 21, 2024:

    if testdir_disk_usage_stats.free < MIN_FREE_SPACE:

nit: no need for () in python

BrandonOdiwuor commented at 3:16 PM on February 21, 2024:

fixed

BrandonOdiwuor force-pushed on Feb 21, 2024

BrandonOdiwuor requested review from maflcko on Feb 21, 2024

in test/functional/test_runner.py:567 in 865dbb80a6 outdated

 561 | @@ -559,6 +562,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 562 |      if os.path.isdir(cache_dir):
 563 |          print("%sWARNING!%s There is a cache directory here: %s. If tests fail unexpectedly, try deleting the cache directory." % (BOLD[1], BOLD[0], cache_dir))
 564 |  
 565 | +    # Exit the test if there is not enough space on the testing dir
 566 | +    testdir_disk_usage_stats = shutil.disk_usage(tmpdir)
 567 | +    if testdir_disk_usage_stats.free < MIN_FREE_SPACE:

maflcko commented at 3:31 PM on February 21, 2024:

    # Exit the test if there is not enough space on the testing dir
 
    if shutil.disk_usage(tmpdir).free < MIN_FREE_SPACE:

maflcko commented at 3:31 PM on February 21, 2024:

nit: can just inline this?

BrandonOdiwuor commented at 6:31 AM on February 22, 2024:

fixed

maflcko approved

maflcko commented at 3:31 PM on February 21, 2024: member

lgtm

BrandonOdiwuor force-pushed on Feb 22, 2024

BrandonOdiwuor requested review from maflcko on Feb 22, 2024

in test/functional/test_runner.py:665 in dbe3ed5fe5 outdated

 661 | @@ -655,7 +662,7 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 662 |  
 663 |      all_passed = all_passed and coverage_passed
 664 |  
 665 | -    # Clean up dangling processes if any. This may only happen with --failfast option.
 666 | +    # Clean up dangling processes if any. This may only happen with --failfast option or not enough space on test directory.

maflcko commented at 7:27 AM on February 22, 2024:

This change is wrong, no?

In general, please pay attention to the details yourself, before and after pushing the changes. The goal of review is not to write the changeset for you line-by-line, but to double check a finished, complete changeset for any accidental oversights.

BrandonOdiwuor commented at 9:02 AM on February 22, 2024:

fixed

BrandonOdiwuor force-pushed on Feb 22, 2024

BrandonOdiwuor requested review from maflcko on Feb 22, 2024

maflcko commented at 9:11 AM on February 22, 2024: member

review ACK 16baf8651d17ac1e0c7d7fd3526c19ad018ceb9c

DrahtBot removed review request from maflcko on Feb 22, 2024

DrahtBot removed review request from kevkevinpal on Feb 22, 2024

DrahtBot requested review from kevkevinpal on Feb 22, 2024

kevkevinpal commented at 3:56 AM on February 23, 2024: contributor

tested ACK 16baf86

This may be out of scope for this PR but I have filled up an SD card almost full and when I run the following,

I get the correct response

./test/functional/test_runner.py -t /Volumes/32GBSDCARD/

Temporary test directory at /Volumes/32GBSDCARD//test_runner_₿_🏃_20240222_215044
Tests aborted: Insufficient space available in directory: /Volumes/32GBSDCARD//test_runner_₿_🏃_20240222_215044

But when I run an individual test for example

./test/functional/mempool_accept.py --tmpdir /Volumes/32GBSDCARD/ I get the following error

Traceback (most recent call last):
  File "/Users/kevkevin/DEVDIR/bitcoin/test/functional/test_framework/test_framework.py", line 130, in main
    self.setup()
  File "/Users/kevkevin/DEVDIR/bitcoin/test/functional/test_framework/test_framework.py", line 266, in setup
    os.makedirs(self.options.tmpdir, exist_ok=False)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/Volumes/32GBSDCARD'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kevkevin/DEVDIR/bitcoin/test/functional/test_framework/test_framework.py", line 148, in main
    self.log.exception("Unexpected exception caught during testing")
    ^^^^^^^^
AttributeError: 'MempoolAcceptanceTest' object has no attribute 'log'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kevkevin/DEVDIR/bitcoin/./test/functional/mempool_accept.py", line 376, in <module>
    MempoolAcceptanceTest().main()
  File "/Users/kevkevin/DEVDIR/bitcoin/test/functional/test_framework/test_framework.py", line 154, in main
    exit_code = self.shutdown()
                ^^^^^^^^^^^^^^^
  File "/Users/kevkevin/DEVDIR/bitcoin/test/functional/test_framework/test_framework.py", line 304, in shutdown
    if self.success == TestStatus.FAILED and self.options.pdbonfailure:
       ^^^^^^^^^^^^
AttributeError: 'MempoolAcceptanceTest' object has no attribute 'success'

The disk that I used

df -h /dev/disk8s1 30Gi 29Gi 160Mi 100% 2.5k 1.6M 0% /Volumes/32GBSDCARD

DrahtBot removed review request from kevkevinpal on Feb 23, 2024

maflcko commented at 8:29 AM on February 23, 2024: member

@kevkevinpal --tmpdir must not exist. In python this is expressed via exist_ok=False. Feel free to fixup the docstring with that. In any case, --tmpdir is different from --tmpdirprefix.

in test/functional/test_runner.py:34 in 16baf8651d outdated

  29 | @@ -30,6 +30,9 @@
  30 |  
  31 |  os.environ["REQUIRE_WALLET_TYPE_SET"] = "1"
  32 |  
  33 | +# Should fail testing and immediately stop if free space falls below.
  34 | +MIN_FREE_SPACE = 1250 * 1024 * 1024

cbergqvist commented at 1:22 PM on February 23, 2024:

2024-02-23 14-08-26_Disk_usage

Thanks for the monitoring script you provided earlier @BrandonOdiwuor! Was able to get disk usage almost at 5GB using:

$ ./test/functional/test_runner.py --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp --extended --jobs=16

So maybe some heuristic could be applied when running many jobs in parallel? Default number of jobs is 4, so maybe something like

MIN_FREE_SPACE = max(jobs, 4) * 320 * 1024 * 1024

That way we get 1280 * 1024 * 1024 for 4 jobs or less, and 5120 * 1024 * 1024 for 16 jobs.

maflcko commented at 1:53 PM on February 23, 2024:

Not sure how much logic this deserves. There is also --nocleanup, see #29335 (review), which should be the max

in test/functional/test_runner.py:567 in 16baf8651d outdated

 561 | @@ -559,6 +562,10 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 562 |      if os.path.isdir(cache_dir):
 563 |          print("%sWARNING!%s There is a cache directory here: %s. If tests fail unexpectedly, try deleting the cache directory." % (BOLD[1], BOLD[0], cache_dir))
 564 |  
 565 | +    # Exit the test if there is not enough space on the testing dir
 566 | +    if shutil.disk_usage(tmpdir).free < MIN_FREE_SPACE:
 567 | +        sys.exit(f"Tests aborted: Insufficient space available in directory: {tmpdir}")

cbergqvist commented at 1:27 PM on February 23, 2024:

Since the MIN_FREE_SPACE value is a bit of a guess, it might make sense to either make this into a VERY LOUD non-fatal warning, or add an arg like --non-fatal-disk-space-check to make it into a warning. That way people running in a bit resource-constrained environments and/or only interested in a specific low-disk-usage test don't need to edit the MIN_FREE_SPACE to be able to run their tests.

maflcko commented at 1:54 PM on February 23, 2024:

Seems fine to turn it into a warning.

BrandonOdiwuor commented at 2:46 PM on February 26, 2024:

@maflcko I have converted this to a warning as suggested

cbergqvist changes_requested

DrahtBot added the label Needs rebase on Feb 26, 2024

BrandonOdiwuor force-pushed on Feb 26, 2024

BrandonOdiwuor requested review from maflcko on Feb 26, 2024

BrandonOdiwuor requested review from kevkevinpal on Feb 26, 2024

BrandonOdiwuor requested review from cbergqvist on Feb 26, 2024

cbergqvist approved

cbergqvist commented at 3:20 PM on February 26, 2024: contributor

ACK 951a004!

DrahtBot removed review request from kevkevinpal on Feb 26, 2024

DrahtBot requested review from kevkevinpal on Feb 26, 2024

DrahtBot removed the label Needs rebase on Feb 26, 2024

marcofleon commented at 6:51 AM on February 28, 2024: contributor

I managed to create a disk image with available space just above MIN_FREE_SPACE such that the warning is bypassed when running test_runner.py but one of the tests (feature_signet.py) fails with error:

    raise AssertionError("Unexpected stderr {} != {}".format(stderr, expected_stderr))
AssertionError: Unexpected stderr Warning: Disk space for "/Volumes/Untitled/test_runner_₿_🏃_20240227_223648/feature_signet_91/node2/signet/blocks" may not accommodate the block files. Approximately 1 GB of data will be stored in this directory. !=

All of the tests pass for me when there's about 1.35 GB of space available, so 1280*1024*1024 would probably work better.

However, we've already acknowledged that this variable is sort of just a guess, so I'm not sure how critical it is to get exactly right. I'm also not sure how different these tests' disk space usage is across systems/machines. The fact that it's shown only as a warning makes sense then.

nit: I think the warning could be louder. The check above uses bold in its warning. Maybe this one can follow suit?

DrahtBot removed review request from kevkevinpal on Feb 28, 2024

DrahtBot requested review from kevkevinpal on Feb 28, 2024

DrahtBot removed review request from kevkevinpal on Feb 28, 2024

DrahtBot requested review from kevkevinpal on Feb 28, 2024

davidgumberg commented at 12:11 AM on March 5, 2024: contributor

I think it would be good to stop the test runner once we actually run into an out of disk space error

Something like this:

 while i < test_count:
     if failfast and not all_passed:
         break
     for test_result, testdir, stdout, stderr, skip_reason in job_queue.get_next():
         test_results.append(test_result)
         i += 1
         done_str = "{}/{} - {}{}{}".format(i, test_count, BOLD[1], test_result.name, BOLD[0])
         if test_result.status == "Passed":
             logging.debug("%s passed, Duration: %s s" % (done_str, test_result.time))
         elif test_result.status == "Skipped":
             logging.debug(f"{done_str} skipped ({skip_reason})")
         else:
             all_passed = False
             print("%s failed, Duration: %s s\n" % (done_str, test_result.time))
             print(BOLD[1] + 'stdout:\n' + BOLD[0] + stdout + '\n')
             print(BOLD[1] + 'stderr:\n' + BOLD[0] + stderr + '\n')
             if combined_logs_len and os.path.isdir(testdir):
                 pass #[omitted]

+       if stderr.strip().split('\n')[-1] == "OSError: [Errno 28] No space left on device":
+           logging.debug(f"Early exiting after test failure due to disk running out of space in {tmpdir}")
+           break
+
        if failfast:
            logging.debug("Early exiting after test failure")
            break

We could also avoid the question of picking a good number for warning if we used this approach without printing any warning, while still solving the underlying issue of stopping the test runner from continuing to dump test failures and errors that are not obviously related to being out of disk space, and having an opportunity to communicate the reason for failure to the user.

I am not opposed to the warning message, but if we keep it I think it should

use a number that is closer to the upper bound of space that might be used
use language more like: there may be insufficient free space in {tmpdir} to run the bitcoin functional test suite. Running the test suite with fewer than {amt} of free space might cause tests to fail.

DrahtBot removed review request from kevkevinpal on Mar 5, 2024

DrahtBot requested review from kevkevinpal on Mar 5, 2024

DrahtBot removed review request from kevkevinpal on Mar 5, 2024

DrahtBot requested review from kevkevinpal on Mar 5, 2024

kevkevinpal commented at 3:20 PM on March 10, 2024: contributor

reACK 951a00440539ba1faba62c49c9cf0a7d8c950769

Tested with --failfast and normally and both seem to be working properly with the correct error message

DrahtBot removed review request from kevkevinpal on Mar 10, 2024

BrandonOdiwuor commented at 6:57 AM on March 11, 2024: contributor

cc @maflcko could you have another look at this

maflcko commented at 7:56 AM on March 11, 2024: member

@BrandonOdiwuor As the pull request author, the burden is on you to read comments and reply to them, if they have questions.

in test/functional/test_runner.py:34 in 951a004405 outdated

  29 | @@ -30,6 +30,9 @@
  30 |  
  31 |  os.environ["REQUIRE_WALLET_TYPE_SET"] = "1"
  32 |  
  33 | +# Minimum amount of space to run the tests.
  34 | +MIN_FREE_SPACE = 1250 * 1024 * 1024

achow101 commented at 7:48 PM on March 22, 2024:

When a high enough parallelism is used, this is not enough.

It may be better to scale this based on the number of jobs that will be run in parallel. Could put a lower bound of 1.1 GB for feature_block which is an outlier, then start increasing by 100 MB for each additional job that will be run.

Also, if --nocleanup is provided, we'd probably want to warn at the size where nothing is cleaned up.

BrandonOdiwuor commented at 8:35 AM on March 27, 2024:

@achow101 could you elaborate more on this

Also, if --nocleanup is provided, we'd probably want to warn at the size where nothing is cleaned up.

BrandonOdiwuor commented at 8:36 AM on March 27, 2024:

I have updated the PR as suggested:

When a high enough parallelism is used, this is not enough.

It may be better to scale this based on the number of jobs that will be run in parallel. Could put a lower bound of 1.1 GB for feature_block which is an outlier, then start increasing by 100 MB for each additional job that will be run.

achow101 commented at 9:45 PM on March 27, 2024:

@achow101 could you elaborate more on this

Also, if --nocleanup is provided, we'd probably want to warn at the size where nothing is cleaned up.

--nocleanup requires about 12 GB, so if that option is passed, the check should be for that amount.

BrandonOdiwuor commented at 9:24 AM on March 28, 2024:

updated the PR to include the check for minimum space required for --nocleanup

achow101 commented at 7:50 PM on March 22, 2024: member

I agree that if a test fails out of disk space, then we should stop running any further tests as well.

I think it would be useful for each test to also have a disk space check, and that could even have its own exit code that the test runner can use to detect when a test failed because of low disk space.

BrandonOdiwuor commented at 1:52 PM on March 26, 2024: contributor

@achow101 @davidgumberg would it be a good idea to do both i.e include the check with a warning at the beginning and also check after every test is run if any fails then stop running any further test?

achow101 commented at 7:07 PM on March 26, 2024: member

@achow101 @davidgumberg would it be a good idea to do both i.e include the check with a warning at the beginning and also check after every test is run if any fails then stop running any further test?

Yes, I think so.

BrandonOdiwuor force-pushed on Mar 27, 2024

BrandonOdiwuor commented at 8:34 AM on March 27, 2024: contributor

I have updated the PR to include both the warning at the start and an exit if any tests fail due to insufficient space as suggested above

BrandonOdiwuor requested review from achow101 on Mar 27, 2024

BrandonOdiwuor requested review from kevkevinpal on Mar 27, 2024

BrandonOdiwuor requested review from marcofleon on Mar 27, 2024

BrandonOdiwuor requested review from cbergqvist on Mar 27, 2024

BrandonOdiwuor force-pushed on Mar 27, 2024

BrandonOdiwuor force-pushed on Mar 28, 2024

itornaza commented at 6:23 PM on April 2, 2024: contributor

tested ACK 2f7fcd5fe7289f7ad3efd566dbbd725264654f8f

The user is notified when appropriate that there might be insufficient free space to run the tests. Moreover, if a test fails due to insufficient space, the functional test runner halts. Also, there is a relevant warning when the --nocleanup flag is used and hence the free disk requirements get bigger.

Note: All tests run on macos 14.4.1 (23E224)

Standard tests

Did a code review and checked that there is adequate space on the file system to run the tests
Configured with --with-incompatible-bdb and --enable-suppress-external-warnings
Run unit tests with make check and all tests pass
Run all functional tests with no extra flags and all tests pass and no warnings about free space
Run all functional tests with --extended and all tests pass as well

Manual tests

Used -j 13 for the number of jobs to be run in parallel since the M2 Pro Max chip supports them. Calculated the required space to run the tests with the 13 parallel jobs as described in the pr:

1.1 * 1024 + (13 - 1) * 100 MB = 2326 MB

and reated an image at /Volumes/tmp with the a little less space set to 2.2 GB, with the df -h /Volumes/tmp command returning:

Filesystem      Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk4s2   2.2Gi    16Mi   2.2Gi     1%       4  4.3G    0%   /Volumes/tmp

Running the functional tests with ./test/functional/test_runner.py -j13 -t /Volumes/tmp I get the warning up front and all tests are executed.

Temporary test directory at /Volumes/tmp/test_runner_₿_🏃_20240402_192153
WARNING! There may be insufficient free space in /Volumes/tmp/test_runner_₿_🏃_20240402_192153 to run the Bitcoin functional test suite. Running the test suite with fewer than 2326.0 MB of free space might cause tests to fail.

Kept reducing the available space in the /Volumes/tmp partition, and a test eventually fails at 1.8 GB of free space

Filesystem      Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk4s1   1.8Gi    12Mi   1.8Gi     1%      13  4.3G    0%   /Volumes/tmp

with the following error:

Early exiting after test failure due to disk running out of space in /Volumes/tmp/test_runner_₿_🏃_20240402_202655

`--nocleanup`

Testing the with the ./test/functional/test_runner.py -j13 -t /Volumes/tmp --nocleanup requirement for a minimum of 12 GB, with a partition of 11 GB and the df -h displaying:

Filesystem      Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk4s2    11Gi    27Mi    11Gi     1%       4  4.3G    0%   /Volumes/tmp

I get the expected insufficient free space warning and all tests are executed.

WARNING! There may be insufficient free space in /Volumes/tmp/test_runner_₿_🏃_20240402_203805 to run the functional test suite with --nocleanup. A minimum of 12 GB of free space is required.

Tested down to at 5.5 GB and eventually a test fails halting the test runner as well.

Early exiting after test failure due to disk running out of space in /Volumes/tmp/test_runner_₿_🏃_20240402_210436

in test/functional/test_runner.py:671 in 2f7fcd5fe7 outdated

 666 | @@ -646,6 +667,9 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 667 |                      logging.debug("Early exiting after test failure")
 668 |                      break
 669 |  
 670 | +                if "[Errno 28] No space left on device" in stdout:
 671 | +                    sys.exit(f"Early exiting after test failure due to disk running out of space in {tmpdir}")

tdb3 commented at 5:13 PM on April 5, 2024:

nit: The directory is left full upon exit. The user would need to clean it up before re-running tests. Having the user manually clean up the directory seems reasonable to me (the user may want to see actions leading to the failure). An enhanced error message adds value.

diff --git a/test/functional/test_runner.py b/test/functional/test_runner.py
index dd3ea370bd..f08eb63b21 100755
--- a/test/functional/test_runner.py
+++ b/test/functional/test_runner.py
@@ -668,7 +668,9 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
                     break
 
                 if "[Errno 28] No space left on device" in stdout:
-                    sys.exit(f"Early exiting after test failure due to disk running out of space in {tmpdir}")
+                    sys.exit(f"Early exiting after test failure due to insuffient free space in {tmpdir}\n"
+                             f"Test execution data left in {tmpdir}.\n"
+                             f"Additional storage is needed to execute testing.")
 
     print_results(test_results, max_len_name, (int(time.time() - start_time)))

BrandonOdiwuor commented at 12:31 PM on April 11, 2024:

updated the exit error message

tdb3 commented at 5:13 PM on April 5, 2024: contributor

Nice work. Adding free storage awareness is a great addition.

ACK for 2f7fcd5fe7289f7ad3efd566dbbd725264654f8f. Left a nit inline.

Used a small ramdisk (1.2GB) with high jobs count (16) to force space exhaustion. As expected, received the initial warning Running the test suite with fewer than 2626.0 MB of free space might cause tests to fail. and also received the exiting error Early exiting after test failure due to disk running out of space...

fanquake referenced this in commit a160a6a081 on Apr 9, 2024

BrandonOdiwuor force-pushed on Apr 11, 2024

BrandonOdiwuor requested review from tdb3 on Apr 11, 2024

BrandonOdiwuor commented at 12:37 PM on April 11, 2024: contributor

Could you please take another look at this after incorporating the suggested changes @achow101 @maflcko @kevkevinpal @davidgumberg @epiccurious @tdb3 @itornaza

itornaza commented at 6:45 PM on April 11, 2024: contributor

tested re-ACK for 858fa78637041ae704005d4b6e564dd8245f4b5d

No size restrictions

Did a code review and checked that there is adequate space on the file system to run the tests
Configured with --with-incompatible-bdb and --enable-suppress-external-warnings
Ran unit tests with make check and all tests pass
Ran all functional tests with no extra flags, all tests pass and no warnings about free space are displayed
Ran all functional tests with --extended and all tests pass as well

Limited test space enforced using a ramdisk

Ran the functional tests with the following options:

./test/functional/test_runner.py -t /Volumes/ramdisk with a ramdisk of 512 MB
./test/functional/test_runner.py -t /Volumes/ramdisk --nocleanup with a ramdisk of 4 GB

The expected warnings are displayed upfront:

WARNING! There may be insufficient free space in /Volumes/ramdisk/test_runner_₿_🏃_20240411_211144 to run the Bitcoin functional test suite. Running the test suite with fewer than 1426.0 MB of free space might cause tests to fail.

and for the --nocleanup case:

WARNING! There may be insufficient free space in /Volumes/ramdisk/tmp/test_runner_₿_🏃_20240411_213828 to run the functional test suite with --nocleanup. A minimum of 12 GB of free space is required.

eventually when a test fails due to lack of disk space, the test harness exits and the new error message is displayed and is indeed much more informative!

Early exiting after test failure due to insuffient free space in /Volumes/ramdisk/tmp/test_runner_₿_🏃_20240411_212332
Test execution data left in /Volumes/ramdisk/tmp/test_runner_₿_🏃_20240411_212332.
Additional storage is needed to execute testing.

tdb3 commented at 3:28 PM on April 13, 2024: contributor

re-ACK for 858fa78637041ae704005d4b6e564dd8245f4b5d

Thanks for incorporating the suggestion.

Used a small ramdisk (1.2GB) with high jobs count (20) to force space exhaustion. As expected, received the initial warning Running the test suite with fewer than 3026.0 MB of free space might cause tests to fail. and also received the exiting error Early exiting after test failure due to disk running out of space...Test execution data left in ...Additional storage is needed to execute testing with the new text added.

davidgumberg commented at 6:05 PM on April 13, 2024: contributor

reACK https://github.com/bitcoin/bitcoin/pull/29335/commits/858fa78637041ae704005d4b6e564dd8245f4b5d

Looks great, tested on a tmpfs 1 GiB in size, and the test runner prints the intended warning, and fails immediately when a test fails due to disk full.

in test/functional/test_runner.py:673 in 858fa78637 outdated

 666 | @@ -646,6 +667,11 @@ def run_tests(*, test_list, src_dir, build_dir, tmpdir, jobs=1, enable_coverage=
 667 |                      logging.debug("Early exiting after test failure")
 668 |                      break
 669 |  
 670 | +                if "[Errno 28] No space left on device" in stdout:
 671 | +                    sys.exit(f"Early exiting after test failure due to insuffient free space in {tmpdir}\n"
 672 | ++                            f"Test execution data left in {tmpdir}.\n"
 673 | ++                            f"Additional storage is needed to execute testing.")

achow101 commented at 7:08 PM on April 16, 2024:

In 858fa78637041ae704005d4b6e564dd8245f4b5d "test: Handle functional test disk-full error"

The +s at the beginning of the lines are unecessary.

                             f"Test execution data left in {tmpdir}.\n"
                             f"Additional storage is needed to execute testing.")

BrandonOdiwuor commented at 10:07 AM on April 19, 2024:

fixed

BrandonOdiwuor force-pushed on Apr 19, 2024

BrandonOdiwuor requested review from achow101 on Apr 19, 2024

achow101 commented at 6:13 PM on April 25, 2024: member

ACK 7c0c599f1e6d412b3cb92cd67aa46644a8286cb6

DrahtBot requested review from tdb3 on Apr 25, 2024

DrahtBot added the label Needs rebase on Apr 26, 2024

test: Handle functional test disk-full error 357ad11054

BrandonOdiwuor force-pushed on Apr 29, 2024

DrahtBot removed the label Needs rebase on Apr 29, 2024

cbergqvist approved

cbergqvist commented at 10:04 PM on May 1, 2024: contributor

reACK 357ad110548d726021547d85b5b2bfcf3191d7e3. Looks good!

Tested on RAM disk of 1GB, 1430MB (doas mount -t tmpfs -o size=1430m tmpfs /mnt/tmp/) and varied the --jobs and --nocleanup, observing expected warnings + error on disk full.

tdb3 commented at 8:47 PM on May 3, 2024: contributor

re ACK for 357ad110548d726021547d85b5b2bfcf3191d7e3

Retested #29335#pullrequestreview-1999297020 (and received the same results, which warn the user of insufficient free space and early exit due to insufficient space).

itornaza commented at 7:56 PM on May 4, 2024: contributor

re-ACK 357ad110548d726021547d85b5b2bfcf3191d7e3

Ran again all the tests as in #29335 (comment) and I get the same results as expected on disk space in the cases it is being insufficient.

achow101 commented at 10:05 PM on May 8, 2024: member

reACK 357ad110548d726021547d85b5b2bfcf3191d7e3

achow101 merged this on May 8, 2024

achow101 closed this on May 8, 2024

BrandonOdiwuor deleted the branch on May 9, 2024

PastaPastaPasta referenced this in commit 2c2f215457 on Oct 25, 2024

PastaPastaPasta referenced this in commit 1a8e805aab on Oct 26, 2024

bitcoin locked this on May 9, 2025