optimization: Optimize IsSpace function for common non-whitespace characters #29602

pull paplorinc wants to merge 2 commits into bitcoin:master from paplorinc:paplorinc/is_space_optimization changing 2 files +13 −2
  1. paplorinc commented at 4:11 pm on March 8, 2024: contributor

    The IsSpace function has been optimized for the more common case where a character is not whitespace. Previously, the function checked each whitespace character individually, which was not efficient for non-whitespace inputs as it required multiple comparisons. This method is often used for parsing various inputs more flexibly, see usages. This results in an additional ~2% performance improvement for base conversions (see related hexadecimal and base58 optimizations).

    The updated IsSpace function now first checks if the character is less than or equal to ’ ’ (the space character, which has the highest ASCII value among the whitespace characters). This single condition can quickly determine that most characters (i.e. letters, numbers) are not whitespace, thus short-circuiting the evaluation for the most common case.

    Otherwise the function performs additional checks to see if it is either a space character or within the range of horizontal tab to carriage return (’\t’ to ‘\r’), which are the remaining whitespace characters in the ASCII table.

    This change assumes that most calls to IsSpace are for non-whitespace characters, as can be inferred from the usage patterns where IsSpace is often used in loops that parse strings until a non-whitespace character is encountered.

    To make sure the changes keep the previous functionality, I’ve added an exhaustive unit test for it.

  2. DrahtBot commented at 4:11 pm on March 8, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK laanwj

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Refactoring on Mar 8, 2024
  4. paplorinc marked this as ready for review on Mar 8, 2024
  5. paplorinc force-pushed on Mar 8, 2024
  6. paplorinc force-pushed on Mar 9, 2024
  7. paplorinc force-pushed on Mar 9, 2024
  8. maflcko commented at 12:18 pm on March 15, 2024: member
    Is this used in a hot path that is relevant, to warrant the changes?
  9. paplorinc commented at 12:29 pm on March 15, 2024: contributor

    Is this used in a hot path that is relevant, to warrant the changes?

    It’s used in base 58 and 16 conversions while iterating over each character of the input. We’re expecting most of them to not be spaces, it would make sense to optimize for that scenario and exit early.

  10. maflcko commented at 12:32 pm on March 15, 2024: member
    Yes, but base58 isn’t used in a hot path, where optimizing it would result in a noticeable difference. (When sending an address over RPC, the remaining overhead is too large to notice a speedup of a few nanoseconds)
  11. paplorinc commented at 12:38 pm on March 15, 2024: contributor
    It’s a small improvement, indeed, but it eliminates useless work, done frequently. The exhaustive tests make sure the behavior is exactly the same as before, so while it may be considered low-reward, it should be low-risk as well. Is there any area (testing, documentation, code) that you would like me to improve to increase the risk/reward ratio?
  12. laanwj commented at 7:35 am on April 9, 2024: member

    Is this really a bottleneck? A benchmark would be useful to see if this isn’t something the compiler already does.

    ACK on the test.

  13. paplorinc commented at 10:37 am on April 9, 2024: contributor

    Thanks for the review @laanwj. The gain is small, but measurable:

    make && ./src/bench/bench_bitcoin –filter=HexParse –min-time=10000

    Before:

    0|           ns/base16 |            base16/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.68 |    1,461,990,359.30 |    0.0% |     10.83 | `HexParse`
    3|                0.68 |    1,461,442,745.57 |    0.1% |     10.83 | `HexParse`
    4|                0.68 |    1,461,262,276.18 |    0.1% |     10.83 | `HexParse`
    

    After:

    0|           ns/base16 |            base16/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.67 |    1,495,985,893.94 |    0.0% |     10.82 | `HexParse`
    3|                0.67 |    1,494,916,055.36 |    0.1% |     10.82 | `HexParse`
    4|                0.67 |    1,494,545,662.66 |    0.0% |     10.82 | `HexParse`
    

    which results in a speedup of ~2% for 130 non-whitespace hexadecimal characters:

    0(1,495,985,893.94 / 1,461,990,359.30) = 2.3%
    
  14. paplorinc force-pushed on Apr 9, 2024
  15. paplorinc force-pushed on May 11, 2024
  16. paplorinc force-pushed on May 29, 2024
  17. paplorinc force-pushed on May 29, 2024
  18. paplorinc force-pushed on May 29, 2024
  19. achow101 referenced this in commit fcc3b653dc on Jun 13, 2024
  20. DrahtBot added the label CI failed on Jun 21, 2024
  21. Add exhaustive range test for IsSpace 783e8391d2
  22. Optimize IsSpace function for common non-whitespace characters
    The IsSpace function has been optimized for the more common case where a character is not whitespace. Previously, the function checked each whitespace character individually, which was not efficient for non-whitespace inputs as it required multiple comparisons.
    
    The updated IsSpace function now first checks if the character is less than or equal to ' ' (the space character, which has the highest ASCII value among the whitespace characters). This single condition can quickly determine that most characters (those with ASCII values greater than ' ') are not whitespace, thus short-circuiting the evaluation for the most common case.
    
    If the character is less than or equal to ' ', the function performs additional checks to see if it is either a space character or within the range of horizontal tab to carriage return ('\t' to '\r'), which are the remaining whitespace characters in the ASCII table.
    
    This change assumes that most calls to IsSpace are for non-whitespace characters, as can be inferred from the usage patterns where IsSpace is often used in loops that parse strings until a non-whitespace character is encountered.
    f11c063e89
  23. paplorinc force-pushed on Jun 21, 2024
  24. DrahtBot removed the label CI failed on Jun 21, 2024
  25. paplorinc renamed this:
    refactor: Optimize IsSpace function for common non-whitespace characters
    optimization: Optimize IsSpace function for common non-whitespace characters
    on Jun 23, 2024
  26. paplorinc closed this on Aug 18, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me