base58: use map instead of strchr() when decode #12704

pull bitkevin wants to merge 1 commits into bitcoin:master from bitkevin:b58_bitmap changing 2 files +24 −5
  1. bitkevin commented at 5:56 am on March 16, 2018: contributor

    Use array map instead of find string position.

    Test code snippet:

     0
     1#include <assert.h>
     2#include <stdint.h>
     3#include <stdio.h>
     4#include <stdlib.h>
     5
     6#include <string>
     7
     8int main(int argc, const char * argv[]) {
     9
    10  static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    11  static const int8_t mapBase58[] = {
    12    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    13    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    14    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    15    -1, 0, 1, 2, 3, 4, 5, 6,  7, 8,-1,-1,-1,-1,-1,-1,
    16    -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1,
    17    22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1,
    18    -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46,
    19    47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1,
    20  };
    21
    22  const std::string b58Str(pszBase58);
    23
    24  for (size_t i = 0; i < b58Str.length(); i++) {
    25    const char *ch = strchr(pszBase58, b58Str[i]);
    26    printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]);
    27    assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]);
    28  }
    29
    30  assert(mapBase58['1'] == 0);
    31  assert(mapBase58['z'] == 57);
    32
    33  /** All alphanumeric characters except for "0", "I", "O", and "l" */
    34  assert(mapBase58['0'] == -1);
    35  assert(mapBase58['I'] == -1);
    36  assert(mapBase58['O'] == -1);
    37  assert(mapBase58['l'] == -1);
    38
    39  return 0;
    40}
    
  2. fanquake added the label Refactoring on Mar 16, 2018
  3. in src/base58.cpp:53 in 7b4e3cb175 outdated
    40@@ -31,11 +41,9 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
    41     // Process the characters.
    42     while (*psz && !isspace(*psz)) {
    43         // Decode base58 character
    44-        const char* ch = strchr(pszBase58, *psz);
    45-        if (ch == nullptr)
    46+        int carry = mapBase58[(uint8_t)*psz];
    


    ken2812221 commented at 6:33 am on March 16, 2018:
    Should you check the character *psz is less than 128?

    bitkevin commented at 7:40 am on March 16, 2018:
    Oops…I was thought size of mapBase58 is 256, thanks

    promag commented at 3:28 pm on March 16, 2018:
    Fixed, as per @laanwj suggestion mapBase58 has 256 elements.
  4. promag commented at 7:39 am on March 16, 2018: member
    Have you measured performance improvement?
  5. bitkevin commented at 8:09 am on March 16, 2018: contributor

    Have you measured performance improvement?

    Performance improvement is about 40%. 1,000,000 rounds, it’s about 450ms vs 250ms.

     0uint64_t getCurrentTime() {
     1  struct timeval tv;
     2  gettimeofday(&tv, NULL);
     3  return tv.tv_sec * 1000 + tv.tv_usec / 1000;
     4}
     5
     6const std::string b58Str("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy");
     7size_t len = b58Str.length();
     8
     9uint64_t cnt;
    10
    11cnt = 0;
    12uint64_t t1 = getCurrentTime();
    13for (size_t j = 0; j < 1000000; j++) {
    14  for (size_t i = 0; i < len; i++) {
    15    const char *ch = strchr(pszBase58, b58Str[i]);
    16    cnt += ch - pszBase58;
    17  }
    18}
    19uint64_t t2 = getCurrentTime();
    20printf("%lld\n", t2 - t1);
    21
    22cnt = 0;
    23uint64_t t3 = getCurrentTime();
    24for (size_t j = 0; j < 1000000; j++) {
    25  for (size_t i = 0; i < len; i++) {
    26    cnt += mapBase58[(uint8_t)b58Str[i]];
    27  }
    28}
    29uint64_t t4 = getCurrentTime();
    30printf("%lld\n", t4 - t3);
    
  6. laanwj commented at 10:36 am on March 16, 2018: member

    Concept ACK. Seems very straightforward (haven’t checked the table yet, though).

    Performance improvement is about 40%. 1,000,000 rounds, it’s about 450ms vs 250ms.

    Nice. Though here you’re not benchmarking the entire DecodeBase58 function, but the specific part that you sped up, so that will give somewhat distored results.

    FWIW there’s a benchmark for base58 in src/bench - it’s somewhat representative of what base58 is used for in bitcoin - encoding/decoding addresses.

  7. in src/base58.cpp:44 in 29d357b32f outdated
    40@@ -31,11 +41,11 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
    41     // Process the characters.
    42     while (*psz && !isspace(*psz)) {
    43         // Decode base58 character
    44-        const char* ch = strchr(pszBase58, *psz);
    45-        if (ch == nullptr)
    46+        if ((uint8_t)*psz >= sizeof(mapBase58)/sizeof(mapBase58[0]))
    


    laanwj commented at 10:42 am on March 16, 2018:
    Just make the mapBase58 array 256 bytes large, and you don’t need the range check.
  8. in src/base58.cpp:50 in d2a052b731 outdated
    46@@ -29,13 +47,12 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
    47     int size = strlen(psz) * 733 /1000 + 1; // log(58) / log(256), rounded up.
    48     std::vector<unsigned char> b256(size);
    49     // Process the characters.
    50+    assert(sizeof(mapBase58)/sizeof(mapBase58[0]) == 256);  // guarantee not out of range
    


    promag commented at 2:48 pm on March 16, 2018:
    IMO just leave this as a comment or make it static_assert?

    MarcoFalke commented at 2:44 pm on March 17, 2018:

    With c++11 it should be easy to make it a static_assert.

     0--- a/src/base58.cpp
     1+++ b/src/base58.cpp
     2@@ -20,7 +20,7 @@
     3 
     4 /** All alphanumeric characters except for "0", "I", "O", and "l" */
     5 static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
     6-static const int8_t mapBase58[] = {
     7+constexpr std::array<int8_t, 256> mapBase58{
     8     -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
     9     -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    10     -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    11@@ -55,7 +55,7 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
    12     int size = strlen(psz) * 733 /1000 + 1; // log(58) / log(256), rounded up.
    13     std::vector<unsigned char> b256(size);
    14     // Process the characters.
    15-    assert(sizeof(mapBase58)/sizeof(mapBase58[0]) == 256);  // guarantee not out of range
    16+    static_assert(mapBase58.size() == 256); // guarantee not out of range
    17     while (*psz && !isspace(*psz)) {
    18         // Decode base58 character
    19         int carry = mapBase58[(uint8_t)*psz];
    

    laanwj commented at 8:21 am on March 19, 2018:
    A run-time assertion is certainly overkill here.
  9. promag commented at 9:45 am on March 17, 2018: member
    Kicked travis due to timeout.
  10. donaloconnor commented at 12:30 pm on March 17, 2018: contributor

    utACK. Your test in comment 0 threw me off because it only has 128 elements.

    I think we need tests also submitted in this to test all values 0-255.

  11. MarcoFalke commented at 2:41 pm on March 17, 2018: member
    Would you mind removing the test in comment 0 and adding it to the unit test suite?
  12. sipa commented at 6:36 pm on March 17, 2018: member

    I benchmarked this on my desktop system: a full address decode goes from 1.50 us to 1.29 us (including checksum check).

    I’m not convinced this is worth it.

  13. randolf commented at 3:54 am on March 19, 2018: contributor

    @sipa One use case I can think of immediately is that Vanity Address Generators can benefit from this performance increase because they repeatedly use the Base58-encoded addresses in attempting to match the desired string(s).

    Generally I also value speed optimizations, even in normal application use, and I don’t regard the increased size of the resulting binary to be significant enough to warrant trade-off concern here. YMMV.

  14. dcousens approved
  15. dcousens commented at 4:31 am on March 19, 2018: contributor

    IMHO, simpler, utACK.

    nit: why is mapBase58 indented in the 8th element?

  16. in src/base58.cpp:15 in d2a052b731 outdated
    11@@ -12,6 +12,24 @@
    12 
    13 /** All alphanumeric characters except for "0", "I", "O", and "l" */
    14 static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    15+static const int8_t mapBase58[] = {
    


    laanwj commented at 8:22 am on March 19, 2018:

    If you make this

    0static const int8_t mapBase58[256] = {
    

    That’s pretty much a static assertion that the size will be 256.

  17. in src/base58.cpp:50 in d8215d9657 outdated
    46@@ -47,11 +47,11 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
    47     int size = strlen(psz) * 733 /1000 + 1; // log(58) / log(256), rounded up.
    48     std::vector<unsigned char> b256(size);
    49     // Process the characters.
    50-    assert(sizeof(mapBase58)/sizeof(mapBase58[0]) == 256);  // guarantee not out of range
    51+    static_assert(mapBase58.size() == 256, "mapBase58.size() should be 256"); // guarantee not out of range
    


    dcousens commented at 3:46 pm on March 19, 2018:
    The static_assert is pointless now? Maybe?

    bitkevin commented at 4:22 pm on March 19, 2018:
    Just in case, after all static_assert() is no harm.
  18. in src/base58.cpp:15 in d8215d9657 outdated
    11@@ -12,7 +12,7 @@
    12 
    13 /** All alphanumeric characters except for "0", "I", "O", and "l" */
    14 static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    15-static const int8_t mapBase58[] = {
    16+static constexpr std::array<int8_t, 256> mapBase58 {
    


    donaloconnor commented at 3:56 pm on March 19, 2018:
    missing #include <array> ?

    bitkevin commented at 4:23 pm on March 19, 2018:
    sorry, I don’t familiar with c++11, so just change it back to old school style.

    dcousens commented at 0:30 am on March 20, 2018:
    IMHO, the .size() method was worth the import… but anyway.
  19. promag commented at 0:36 am on March 21, 2018: member
    utACK 5d71e4d, please squash.
  20. sipa commented at 0:58 am on March 21, 2018: member
    utACK, needs squash.
  21. eklitzke commented at 1:13 am on March 21, 2018: contributor
    This is good for a 20% speedup for me with GCC 7.3 (median goes from 8.70969e-07 to 7.00866e-07). ACK once squashed.
  22. use base58 map instead of strchr() bcab47bc1b
  23. bitkevin force-pushed on Mar 21, 2018
  24. JeremyRubin commented at 4:08 am on March 21, 2018: contributor

    I’d like to see a comparison with one or two other methods of doing a table lookup to make sure this is optimal.

    For instance

     0switch(ch) {
     1  case '1': 
     2    carry = 0;
     3    break;
     4  // ....
     5  case 'z':
     6    carry = 57;
     7    break;
     8  default:
     9    return false;
    10}
    

    Additionally you can try some outputs from gperf https://www.gnu.org/software/gperf/

  25. eklitzke commented at 4:36 am on March 21, 2018: contributor

    I don’t think you can get any faster than this approach, which is a flat lookup table that maps ints to ints (without any hashing).

    The typical use case of gperf is for something kind of different: you’d provide it to a tokenizer where you have a grammar of long human readable strings, and you want to hash all of the tokens in the grammar to small ints without collisions.

  26. sipa commented at 5:08 am on March 21, 2018: member

    Stop wasting time on discussing the performance. This does not matter. Decoding an address could take 50 us and I don’t think anyone would notice.

    If the resulting code looks better, go for it. Otherwise, don’t.

    -0

  27. JeremyRubin commented at 5:18 am on March 21, 2018: contributor
    I have some notes on why some alternatives that would be faster, but as @sipa notes, there are bigger fish to fry.
  28. laanwj commented at 8:58 am on March 22, 2018: member

    If the resulting code looks better, go for it. Otherwise, don’t.

    Yes, I do prefer the code like this, because it’s more consistent with how we handle base32 and hex for ex. So utACK bcab47b. Agree that this is a dead end in regard to performance, if you are interested in performance please review @eklitzke’s work he’s doing great things.

  29. laanwj merged this on Mar 22, 2018
  30. laanwj closed this on Mar 22, 2018

  31. laanwj referenced this in commit ad823178e8 on Mar 22, 2018
  32. bitkevin deleted the branch on Mar 25, 2018
  33. Fabcien referenced this in commit 7b0d5028a6 on Aug 30, 2019
  34. PastaPastaPasta referenced this in commit 9b47b31bd2 on Dec 16, 2020
  35. PastaPastaPasta referenced this in commit a420a27528 on Dec 18, 2020
  36. gades referenced this in commit 3945e82e20 on Jun 24, 2021
  37. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me