Closes #13363
utils and libraries: checking for bitcoin address in translations #13374
pull kaplanmaxe wants to merge 1 commits into bitcoin:master from kaplanmaxe:check-addr-translations changing 1 files +9 −1-
kaplanmaxe commented at 5:47 PM on June 2, 2018: contributor
-
in contrib/devtools/update-translations.py:24 in 4f03c2240a outdated
20 | @@ -21,6 +21,7 @@ 21 | import os 22 | import io 23 | import xml.etree.ElementTree as ET 24 | +import re
marcoagner commented at 6:01 PM on June 2, 2018:This line is not needed; re is already imported above.
in contrib/devtools/update-translations.py:127 in 4f03c2240a outdated
122 | @@ -122,6 +123,13 @@ def escape_cdata(text): 123 | text = text.replace('"', '"') 124 | return text 125 | 126 | +def check_bitcoin_addr(text): 127 | + p = re.compile('^([1,3]|bc1)[a-zA-Z0-9]{33,39}$')
MarcoFalke commented at 6:14 PM on June 2, 2018:This is too strict and wouldn't match an address hidden between words.
kaplanmaxe commented at 6:33 PM on June 2, 2018:Agreed. I want to avoid false positives here. What do you suggest the best way forward is? I could split on any spaces in the string then perform this check, but that might be rather expensive, although I'm not sure if performance is all that important in this file? If I don't set some sort of string length in the regex pattern, there are going to be a bunch of false positives.
MarcoFalke commented at 7:14 PM on June 2, 2018:Hmm, could you elaborate what you mean by false positives? Imo a word that is longer than 30 chars and includes at least one digit should never be part of a translation (regardless if it was a bitcoin address)
kaplanmaxe commented at 10:01 PM on June 2, 2018:You're right here. There were some translations that started with a 1 but they all contain spaces. The new commit I just pushed should work. Thanks for the input
marcoagner changes_requestedin contrib/devtools/update-translations.py:126 in 4b1e55e4c9 outdated
122 | @@ -124,11 +123,10 @@ def escape_cdata(text): 123 | return text 124 | 125 | def check_bitcoin_addr(text): 126 | - p = re.compile('^([1,3]|bc1)[a-zA-Z0-9]{33,39}$') 127 | + p = re.compile('^([1,3]|bc1)[a-zA-Z0-9]{33,}$')
marcoagner commented at 11:25 PM on June 2, 2018:This still won't match an address between words (as @MarcoFalke pointed). You may want to remove the start and end of line anchors, use a group like
(.*?), or something else.in contrib/devtools/update-translations.py:131 in f51b1e6708 outdated
121 | @@ -122,6 +122,12 @@ def escape_cdata(text): 122 | text = text.replace('"', '"') 123 | return text 124 | 125 | +def check_bitcoin_addr(text): 126 | + p = re.compile('^([13]|bc1)[a-zA-Z0-9]{33,}$') 127 | + if (text != None and p.match(text) != None): 128 | + return True 129 | + return False
Empact commented at 12:45 AM on June 3, 2018:nit: could return the comparison directly
in contrib/devtools/update-translations.py:161 in f51b1e6708 outdated
155 | @@ -150,6 +156,11 @@ def postprocess_translations(reduce_diff_hacks=False): 156 | numerus = message.get('numerus') == 'yes' 157 | source = message.find('source').text 158 | translation_node = message.find('translation') 159 | + 160 | + address = check_bitcoin_addr(translation_node.text) 161 | + if address == True:
Empact commented at 12:45 AM on June 3, 2018:nit: could test on the method call directly
in contrib/devtools/update-translations.py:160 in f51b1e6708 outdated
155 | @@ -150,6 +156,11 @@ def postprocess_translations(reduce_diff_hacks=False): 156 | numerus = message.get('numerus') == 'yes' 157 | source = message.find('source').text 158 | translation_node = message.find('translation') 159 | + 160 | + address = check_bitcoin_addr(translation_node.text)
Empact commented at 12:46 AM on June 3, 2018:nit: could call this something declarative like
contains_bitcoin_addrin contrib/devtools/update-translations.py:126 in f51b1e6708 outdated
121 | @@ -122,6 +122,12 @@ def escape_cdata(text): 122 | text = text.replace('"', '"') 123 | return text 124 | 125 | +def check_bitcoin_addr(text): 126 | + p = re.compile('^([13]|bc1)[a-zA-Z0-9]{33,}$')
Empact commented at 12:49 AM on June 3, 2018:Probably want to make this a constant computed once on load rather than once per method call.
fanquake added the label Scripts and tools on Jun 3, 2018in contrib/devtools/update-translations.py:161 in f4cbeb70e5 outdated
154 | @@ -150,6 +155,10 @@ def postprocess_translations(reduce_diff_hacks=False): 155 | numerus = message.get('numerus') == 'yes' 156 | source = message.find('source').text 157 | translation_node = message.find('translation') 158 | + 159 | + if contains_bitcoin_addr(translation_node.text): 160 | + print('Warning! Address found in translation in file %s. This will be removed' % (filepath)) 161 | + translation_node.clear()
laanwj commented at 2:44 PM on June 4, 2018:Why not move this with the
check_format_specifierscheck below. It means you set valid to false and can re-use the !valid code.if not valid: # set type to unfinished and clear string if invalid translation_node.clear() translation_node.set('type', 'unfinished') have_errors = True
kaplanmaxe commented at 3:51 PM on June 4, 2018:Good call. I have updated this. I'm not all that familiar with Python so if there's a better way of formatting what I have now, let me know.
marcoagner commented at 3:01 PM on June 4, 2018: contributorShouldn't this account for html tags too? As I understand, I could upload a translation with the text
Nosso endereço: 3K9Tjd<span>UWMXRG</span>feRefeW<span>o48NQjp9syDdeXh</span>, bypass an address-only regex, and have the same effect in the GUI.MarcoFalke commented at 3:14 PM on June 4, 2018: memberImo, html tags should never end up in translation, so should probably be rejected as well.
marcoagner commented at 3:23 PM on June 4, 2018: contributorI agree. There are still some tr() calls here and there containing html tags, though (example). They should probably be eliminated and then rejected in
update-translations.pyas well (probably in another PR)..kaplanmaxe commented at 3:24 PM on June 4, 2018: contributor@marcoagner @MarcoFalke I think it makes sense to reject any translations that contain HTML. How do you want to go forward with this? The issue here was to prevent bitcoin addresses from getting inside a translation. What I could do is strip out any html tags then do a check to see if the translation matches a bitcoin address. However if we want to prevent any html tags from ending up in a translation, it might be best to open a separate issue (which I would be happy to work on as well) and instead of removing any html tags and then do the bitcoin addr check, immediately reject and remove any translation containing any HTML.
marcoagner commented at 3:34 PM on June 4, 2018: contributor@kaplanmaxe In my opinion, this PR should focus on rejecting addresses only. It just made me think that, as it is now, it will be kind of by-passable, but the problem is not with your check. Imo, it is the
tr()calls with html tags that should be eliminated and then have html tags rejected in another check by another PR as well.in contrib/devtools/update-translations.py:169 in df00ffd497 outdated
165 | @@ -160,7 +166,7 @@ def postprocess_translations(reduce_diff_hacks=False): 166 | if translation is None: 167 | continue 168 | errors = [] 169 | - valid = check_format_specifiers(source, translation, errors, numerus) 170 | + valid = check_format_specifiers(source, translation, errors, numerus) and not contains_bitcoin_addr(translation_node.text)
MarcoFalke commented at 3:57 PM on June 4, 2018:Should say
translationinstead oftranslation_node?
kaplanmaxe commented at 4:00 PM on June 4, 2018:Whoops! Good catch. Copied and pasted from my previous commit. Updating now
MarcoFalke commented at 3:58 PM on June 4, 2018: memberPlease squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits
kaplanmaxe force-pushed on Jun 4, 2018kaplanmaxe commented at 4:16 PM on June 4, 2018: contributorCommits have been squashed
kaplanmaxe commented at 5:00 PM on June 4, 2018: contributorHm, anyone know why the CI failed all of a sudden? Correct me if I'm wrong here, but it looks like it's failing in an area that has nothing to do with this PR?
Potentially related to #11696 ?
MarcoFalke commented at 5:11 PM on June 4, 2018: memberThis is
http.client.RemoteDisconnected: Remote end closed connection without response#11777in contrib/devtools/update-translations.py:168 in c88aaeb945 outdated
164 | @@ -160,7 +165,7 @@ def postprocess_translations(reduce_diff_hacks=False): 165 | if translation is None: 166 | continue 167 | errors = [] 168 | - valid = check_format_specifiers(source, translation, errors, numerus) 169 | + valid = check_format_specifiers(source, translation, errors, numerus) and not contains_bitcoin_addr(translation)
Empact commented at 10:12 AM on June 5, 2018:Would like to have a report of what error occurred. Current code reports only what ends up in
errorswhich is not set in this case, so the failure is silent.kaplanmaxe force-pushed on Jun 5, 2018kaplanmaxe force-pushed on Jun 5, 201885f0135eaeutils: checking for bitcoin addresses in translations
Checking for and removing any bitcoin addresses in translations
kaplanmaxe force-pushed on Jun 5, 2018kaplanmaxe commented at 12:58 AM on June 7, 2018: contributorHey guys, anything else I can do on this PR?
kaplanmaxe commented at 3:08 AM on June 7, 2018: contributor@promag just responding to your comment in the other PR (https://github.com/bitcoin/bitcoin/pull/13375#issuecomment-395268629) so we can keep discussions relevant to this PR in here without clogging up someone elses.
maybe incorporate above suggestions in yours?
Can you clarify which suggestions you are referring to? The only one I could see being relevant is your suggestion on how to properly remove white spaces. I'm not sure why that would be needed given my regular expression. The regular expression in the other PR wouldn't catch addresses that were contained inside of a longer string. Ex:
Lorem ipsum <BITCOIN_ADDRESS_HERE> Lorem Ipsum Lorem Ipsum etc...The regular expression in my PR would catch any address regardless of the text surrounding it.
As @MarcoFalke stated which I agree with, "a word that is longer than 30 chars and includes at least one digit should never be part of a translation (regardless if it was a bitcoin address)" (https://github.com/bitcoin/bitcoin/pull/13374#discussion_r192567791)
Maybe I am overlooking one of your suggestions. If so, could you please clarify which one? I am also making use of
errorswhich outputs the filenamein contrib/devtools/update-translations.py:128 in 85f0135eae
123 | @@ -122,6 +124,12 @@ def escape_cdata(text): 124 | text = text.replace('"', '"') 125 | return text 126 | 127 | +def contains_bitcoin_addr(text, errors): 128 | + if text != None and ADDRESS_REGEXP.search(text) != None:
promag commented at 1:57 PM on June 7, 2018:Could be:
if ADDRESS_REGEXP.search(text): ...
kaplanmaxe commented at 6:11 PM on June 7, 2018:If someone else feels strongly about making the change here, lmk and I will make it. If it's not necessary, don't want to reset the ACKs based off a new commit hash :)
promag commented at 1:59 PM on June 7, 2018: memberThe idea is to prevent things like
... bc1 qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq ...Either way, utACK 85f0135 with or without the comment below.
kaplanmaxe commented at 2:20 PM on June 7, 2018: contributor@promag I think you bring up an interesting point with the spaces. I think it really depends on how granular we want to get with this. If we want to prevent spaces, there are still plenty of ways "around" this regular expression. Ex:
b̈c1qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdqPersonally, I would argue that if the user can't copy and paste it and use it as a valid bitcoin address, we shouldn't worry about it. There are a few other ways "around" the regular expression as well, although every one of them lead to invalid bitcoin addresses.
kaplanmaxe commented at 1:34 PM on June 8, 2018: contributor@laanwj I saw your comment in #13375 . I have tested this pretty extensively and haven't had any false positives. It will also catch addresses between words like I mentioned here (https://github.com/bitcoin/bitcoin/pull/13374#issuecomment-395278440) which I feel is really important. When you get a chance, maybe you could check this one out?
laanwj commented at 2:16 PM on June 8, 2018: member@kaplanmaxe Thanks, I'll test this one next.
laanwj commented at 2:52 PM on June 8, 2018: memberTested ACK 85f0135eaefe3d9f696689a7e83606c579da40a8 (with normal and bech32 address)
MarcoFalke commented at 2:53 PM on June 8, 2018: memberutACK 85f0135eaefe3d9f696689a7e83606c579da40a8
laanwj merged this on Jun 8, 2018laanwj closed this on Jun 8, 2018laanwj referenced this in commit 56f69360dc on Jun 8, 2018UdjinM6 referenced this in commit 22554c7ab9 on Jun 19, 2021UdjinM6 referenced this in commit a3023c8ec6 on Jun 24, 2021UdjinM6 referenced this in commit 776694374a on Jun 26, 2021UdjinM6 referenced this in commit 2469e28de4 on Jun 26, 2021UdjinM6 referenced this in commit 349c85e1a1 on Jun 26, 2021PastaPastaPasta referenced this in commit 063d8ad940 on Jun 27, 2021PastaPastaPasta referenced this in commit b406ba5b97 on Jun 28, 2021PastaPastaPasta referenced this in commit 2dd83ba60a on Jun 28, 2021UdjinM6 referenced this in commit 1f537f32f7 on Jun 28, 2021PastaPastaPasta referenced this in commit 3a4553d99c on Jun 29, 2021MarcoFalke locked this on Sep 8, 2021
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 18:15 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me