utils and libraries: checking for bitcoin address in translations #13374

pull kaplanmaxe wants to merge 1 commits into bitcoin:master from kaplanmaxe:check-addr-translations changing 1 files +9 −1
  1. kaplanmaxe commented at 5:47 PM on June 2, 2018: contributor

    Closes #13363

  2. in contrib/devtools/update-translations.py:24 in 4f03c2240a outdated
      20 | @@ -21,6 +21,7 @@
      21 |  import os
      22 |  import io
      23 |  import xml.etree.ElementTree as ET
      24 | +import re
    


    marcoagner commented at 6:01 PM on June 2, 2018:

    This line is not needed; re is already imported above.

  3. in contrib/devtools/update-translations.py:127 in 4f03c2240a outdated
     122 | @@ -122,6 +123,13 @@ def escape_cdata(text):
     123 |      text = text.replace('"', '"')
     124 |      return text
     125 |  
     126 | +def check_bitcoin_addr(text):
     127 | +    p = re.compile('^([1,3]|bc1)[a-zA-Z0-9]{33,39}$')
    


    MarcoFalke commented at 6:14 PM on June 2, 2018:

    This is too strict and wouldn't match an address hidden between words.


    kaplanmaxe commented at 6:33 PM on June 2, 2018:

    Agreed. I want to avoid false positives here. What do you suggest the best way forward is? I could split on any spaces in the string then perform this check, but that might be rather expensive, although I'm not sure if performance is all that important in this file? If I don't set some sort of string length in the regex pattern, there are going to be a bunch of false positives.


    MarcoFalke commented at 7:14 PM on June 2, 2018:

    Hmm, could you elaborate what you mean by false positives? Imo a word that is longer than 30 chars and includes at least one digit should never be part of a translation (regardless if it was a bitcoin address)


    kaplanmaxe commented at 10:01 PM on June 2, 2018:

    You're right here. There were some translations that started with a 1 but they all contain spaces. The new commit I just pushed should work. Thanks for the input

  4. marcoagner changes_requested
  5. in contrib/devtools/update-translations.py:126 in 4b1e55e4c9 outdated
     122 | @@ -124,11 +123,10 @@ def escape_cdata(text):
     123 |      return text
     124 |  
     125 |  def check_bitcoin_addr(text):
     126 | -    p = re.compile('^([1,3]|bc1)[a-zA-Z0-9]{33,39}$')
     127 | +    p = re.compile('^([1,3]|bc1)[a-zA-Z0-9]{33,}$')
    


    marcoagner commented at 11:25 PM on June 2, 2018:

    This still won't match an address between words (as @MarcoFalke pointed). You may want to remove the start and end of line anchors, use a group like (.*?), or something else.

  6. in contrib/devtools/update-translations.py:131 in f51b1e6708 outdated
     121 | @@ -122,6 +122,12 @@ def escape_cdata(text):
     122 |      text = text.replace('"', '"')
     123 |      return text
     124 |  
     125 | +def check_bitcoin_addr(text):
     126 | +    p = re.compile('^([13]|bc1)[a-zA-Z0-9]{33,}$')
     127 | +    if (text != None and p.match(text) != None):
     128 | +        return True
     129 | +    return False
    


    Empact commented at 12:45 AM on June 3, 2018:

    nit: could return the comparison directly

  7. in contrib/devtools/update-translations.py:161 in f51b1e6708 outdated
     155 | @@ -150,6 +156,11 @@ def postprocess_translations(reduce_diff_hacks=False):
     156 |                  numerus = message.get('numerus') == 'yes'
     157 |                  source = message.find('source').text
     158 |                  translation_node = message.find('translation')
     159 | +
     160 | +                address = check_bitcoin_addr(translation_node.text)
     161 | +                if address == True:
    


    Empact commented at 12:45 AM on June 3, 2018:

    nit: could test on the method call directly

  8. in contrib/devtools/update-translations.py:160 in f51b1e6708 outdated
     155 | @@ -150,6 +156,11 @@ def postprocess_translations(reduce_diff_hacks=False):
     156 |                  numerus = message.get('numerus') == 'yes'
     157 |                  source = message.find('source').text
     158 |                  translation_node = message.find('translation')
     159 | +
     160 | +                address = check_bitcoin_addr(translation_node.text)
    


    Empact commented at 12:46 AM on June 3, 2018:

    nit: could call this something declarative like contains_bitcoin_addr

  9. in contrib/devtools/update-translations.py:126 in f51b1e6708 outdated
     121 | @@ -122,6 +122,12 @@ def escape_cdata(text):
     122 |      text = text.replace('"', '"')
     123 |      return text
     124 |  
     125 | +def check_bitcoin_addr(text):
     126 | +    p = re.compile('^([13]|bc1)[a-zA-Z0-9]{33,}$')
    


    Empact commented at 12:49 AM on June 3, 2018:

    Probably want to make this a constant computed once on load rather than once per method call.

  10. fanquake added the label Scripts and tools on Jun 3, 2018
  11. in contrib/devtools/update-translations.py:161 in f4cbeb70e5 outdated
     154 | @@ -150,6 +155,10 @@ def postprocess_translations(reduce_diff_hacks=False):
     155 |                  numerus = message.get('numerus') == 'yes'
     156 |                  source = message.find('source').text
     157 |                  translation_node = message.find('translation')
     158 | +
     159 | +                if contains_bitcoin_addr(translation_node.text):
     160 | +                    print('Warning! Address found in translation in file %s. This will be removed' % (filepath))
     161 | +                    translation_node.clear()
    


    laanwj commented at 2:44 PM on June 4, 2018:

    Why not move this with the check_format_specifiers check below. It means you set valid to false and can re-use the !valid code.

                         if not valid: # set type to unfinished and clear string if invalid
                             translation_node.clear()
                             translation_node.set('type', 'unfinished')
    have_errors = True
    

    kaplanmaxe commented at 3:51 PM on June 4, 2018:

    Good call. I have updated this. I'm not all that familiar with Python so if there's a better way of formatting what I have now, let me know.

  12. marcoagner commented at 3:01 PM on June 4, 2018: contributor

    Shouldn't this account for html tags too? As I understand, I could upload a translation with the text Nosso endereço: 3K9Tjd<span>UWMXRG</span>feRefeW<span>o48NQjp9syDdeXh</span>, bypass an address-only regex, and have the same effect in the GUI.

  13. MarcoFalke commented at 3:14 PM on June 4, 2018: member

    Imo, html tags should never end up in translation, so should probably be rejected as well.

  14. marcoagner commented at 3:23 PM on June 4, 2018: contributor

    I agree. There are still some tr() calls here and there containing html tags, though (example). They should probably be eliminated and then rejected in update-translations.py as well (probably in another PR)..

  15. kaplanmaxe commented at 3:24 PM on June 4, 2018: contributor

    @marcoagner @MarcoFalke I think it makes sense to reject any translations that contain HTML. How do you want to go forward with this? The issue here was to prevent bitcoin addresses from getting inside a translation. What I could do is strip out any html tags then do a check to see if the translation matches a bitcoin address. However if we want to prevent any html tags from ending up in a translation, it might be best to open a separate issue (which I would be happy to work on as well) and instead of removing any html tags and then do the bitcoin addr check, immediately reject and remove any translation containing any HTML.

  16. marcoagner commented at 3:34 PM on June 4, 2018: contributor

    @kaplanmaxe In my opinion, this PR should focus on rejecting addresses only. It just made me think that, as it is now, it will be kind of by-passable, but the problem is not with your check. Imo, it is the tr() calls with html tags that should be eliminated and then have html tags rejected in another check by another PR as well.

  17. in contrib/devtools/update-translations.py:169 in df00ffd497 outdated
     165 | @@ -160,7 +166,7 @@ def postprocess_translations(reduce_diff_hacks=False):
     166 |                      if translation is None:
     167 |                          continue
     168 |                      errors = []
     169 | -                    valid = check_format_specifiers(source, translation, errors, numerus)
     170 | +                    valid = check_format_specifiers(source, translation, errors, numerus) and not contains_bitcoin_addr(translation_node.text)
    


    MarcoFalke commented at 3:57 PM on June 4, 2018:

    Should say translation instead of translation_node?


    kaplanmaxe commented at 4:00 PM on June 4, 2018:

    Whoops! Good catch. Copied and pasted from my previous commit. Updating now

  18. MarcoFalke commented at 3:58 PM on June 4, 2018: member
  19. kaplanmaxe force-pushed on Jun 4, 2018
  20. kaplanmaxe commented at 4:16 PM on June 4, 2018: contributor

    Commits have been squashed

  21. kaplanmaxe commented at 5:00 PM on June 4, 2018: contributor

    Hm, anyone know why the CI failed all of a sudden? Correct me if I'm wrong here, but it looks like it's failing in an area that has nothing to do with this PR?

    Potentially related to #11696 ?

  22. MarcoFalke commented at 5:11 PM on June 4, 2018: member

    This is http.client.RemoteDisconnected: Remote end closed connection without response #11777

  23. in contrib/devtools/update-translations.py:168 in c88aaeb945 outdated
     164 | @@ -160,7 +165,7 @@ def postprocess_translations(reduce_diff_hacks=False):
     165 |                      if translation is None:
     166 |                          continue
     167 |                      errors = []
     168 | -                    valid = check_format_specifiers(source, translation, errors, numerus)
     169 | +                    valid = check_format_specifiers(source, translation, errors, numerus) and not contains_bitcoin_addr(translation)
    


    Empact commented at 10:12 AM on June 5, 2018:

    Would like to have a report of what error occurred. Current code reports only what ends up in errors which is not set in this case, so the failure is silent.

  24. kaplanmaxe force-pushed on Jun 5, 2018
  25. kaplanmaxe force-pushed on Jun 5, 2018
  26. utils: checking for bitcoin addresses in translations
    Checking for and removing any bitcoin addresses in translations
    85f0135eae
  27. kaplanmaxe force-pushed on Jun 5, 2018
  28. kaplanmaxe commented at 12:58 AM on June 7, 2018: contributor

    Hey guys, anything else I can do on this PR?

  29. kaplanmaxe commented at 3:08 AM on June 7, 2018: contributor

    @promag just responding to your comment in the other PR (https://github.com/bitcoin/bitcoin/pull/13375#issuecomment-395268629) so we can keep discussions relevant to this PR in here without clogging up someone elses.

    maybe incorporate above suggestions in yours?

    Can you clarify which suggestions you are referring to? The only one I could see being relevant is your suggestion on how to properly remove white spaces. I'm not sure why that would be needed given my regular expression. The regular expression in the other PR wouldn't catch addresses that were contained inside of a longer string. Ex:

    Lorem ipsum <BITCOIN_ADDRESS_HERE> Lorem Ipsum Lorem Ipsum etc...

    The regular expression in my PR would catch any address regardless of the text surrounding it.

    As @MarcoFalke stated which I agree with, "a word that is longer than 30 chars and includes at least one digit should never be part of a translation (regardless if it was a bitcoin address)" (https://github.com/bitcoin/bitcoin/pull/13374#discussion_r192567791)

    Maybe I am overlooking one of your suggestions. If so, could you please clarify which one? I am also making use of errors which outputs the filename

  30. in contrib/devtools/update-translations.py:128 in 85f0135eae
     123 | @@ -122,6 +124,12 @@ def escape_cdata(text):
     124 |      text = text.replace('"', '&quot;')
     125 |      return text
     126 |  
     127 | +def contains_bitcoin_addr(text, errors):
     128 | +    if text != None and ADDRESS_REGEXP.search(text) != None:
    


    promag commented at 1:57 PM on June 7, 2018:

    Could be:

    if ADDRESS_REGEXP.search(text):
        ...
    

    kaplanmaxe commented at 6:11 PM on June 7, 2018:

    If someone else feels strongly about making the change here, lmk and I will make it. If it's not necessary, don't want to reset the ACKs based off a new commit hash :)

  31. promag commented at 1:59 PM on June 7, 2018: member

    The idea is to prevent things like

    ... bc1 qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq ...
    

    Either way, utACK 85f0135 with or without the comment below.

  32. kaplanmaxe commented at 2:20 PM on June 7, 2018: contributor

    @promag I think you bring up an interesting point with the spaces. I think it really depends on how granular we want to get with this. If we want to prevent spaces, there are still plenty of ways "around" this regular expression. Ex:

    b̈c1qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq

    Personally, I would argue that if the user can't copy and paste it and use it as a valid bitcoin address, we shouldn't worry about it. There are a few other ways "around" the regular expression as well, although every one of them lead to invalid bitcoin addresses.

  33. kaplanmaxe commented at 1:34 PM on June 8, 2018: contributor

    @laanwj I saw your comment in #13375 . I have tested this pretty extensively and haven't had any false positives. It will also catch addresses between words like I mentioned here (https://github.com/bitcoin/bitcoin/pull/13374#issuecomment-395278440) which I feel is really important. When you get a chance, maybe you could check this one out?

  34. laanwj commented at 2:16 PM on June 8, 2018: member

    @kaplanmaxe Thanks, I'll test this one next.

  35. laanwj commented at 2:52 PM on June 8, 2018: member

    Tested ACK 85f0135eaefe3d9f696689a7e83606c579da40a8 (with normal and bech32 address)

  36. MarcoFalke commented at 2:53 PM on June 8, 2018: member

    utACK 85f0135eaefe3d9f696689a7e83606c579da40a8

  37. laanwj merged this on Jun 8, 2018
  38. laanwj closed this on Jun 8, 2018

  39. laanwj referenced this in commit 56f69360dc on Jun 8, 2018
  40. UdjinM6 referenced this in commit 22554c7ab9 on Jun 19, 2021
  41. UdjinM6 referenced this in commit a3023c8ec6 on Jun 24, 2021
  42. UdjinM6 referenced this in commit 776694374a on Jun 26, 2021
  43. UdjinM6 referenced this in commit 2469e28de4 on Jun 26, 2021
  44. UdjinM6 referenced this in commit 349c85e1a1 on Jun 26, 2021
  45. PastaPastaPasta referenced this in commit 063d8ad940 on Jun 27, 2021
  46. PastaPastaPasta referenced this in commit b406ba5b97 on Jun 28, 2021
  47. PastaPastaPasta referenced this in commit 2dd83ba60a on Jun 28, 2021
  48. UdjinM6 referenced this in commit 1f537f32f7 on Jun 28, 2021
  49. PastaPastaPasta referenced this in commit 3a4553d99c on Jun 29, 2021
  50. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 18:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me