Improvement to getcoins.py:
- handled request excecptions
- added limit handler to SVG size
- restricted format
- added caption to captcha (From the url)
Improvement to getcoins.py:
hey @prayank23 please review the PR
141 | + raise SystemExit(f"No captcha found URL {args.captcha}") 142 | 143 | # Convert SVG image to PPM, and load it 144 | try: 145 | - rv = subprocess.run([args.imagemagick, '-', '-depth', '8', 'ppm:-'], input=res.content, check=True, capture_output=True) 146 | + rv = subprocess.run([args.imagemagick, '-', '-depth', '8', 'ppm:-'], input=content, check=True, capture_output=True)
FWIW, if you want to allow only svgs you can also replace the first '-' with 'svg:-'. This makes imagemagick check the format and would avoid adding two dependencies (or at least one).
yes, okay
130 | + raise SystemExit(f"Unexpected error when contacting faucet: {e}") 131 | + 132 | + # Size limitation 133 | + svg = xml.etree.ElementTree.fromstring(res.content) 134 | + if (svg.attrib.get('width') != '150' or svg.attrib.get('height') != '50'): 135 | + raise SystemExit("Captcha dimentions doesn't satisify")
This is not correct English, "Captcha size doesn't match expected 150x50" maybe
Okay, my bad!
Concept ACK. I think a dependency on etree xml is acceptable as it is commonly installed (or maybe even part of Python, i don't know for sure), but would prefer to avoid bleach.
129 | + except requests.exceptions.RequestException as e: 130 | + raise SystemExit(f"Unexpected error when contacting faucet: {e}") 131 | + 132 | + # Size limitation 133 | + svg = xml.etree.ElementTree.fromstring(res.content) 134 | + if (svg.attrib.get('width') != '150' or svg.attrib.get('height') != '50'):
Nit: no outer () needed for Python if
okay
hey @prayank23 please review the PR
Concept ACK
Welcome as a new contributor 🥂
Agree about dependency and other things suggested by @laanwj I won't be able to review or test for a few days. Took a break. Hopefully this PR gets merged in a week or I will review and test next week.
would prefer to avoid
bleach
In order to sanitize html content we need some or the other modules such as html-sanitizer, bs4, lxml.html, xml.html, etc. All these modules are not commonly installed. so I have choosen bleach because its has some good set of options. If you are familiar or heard of any other html content sanitizing module please mention, I will play with it for a while and update it. Thank you.
Just an anecdotal metric: My system has bleach available without any additional dependencies, but not html-sanitizer, bs4, or xml.html. On the other hand, lxml is already installed for KDE's Breeze icons, Inkscape, lv2 (an audio framework used by Audacity), and a few Gentoo development tools.
You could have used a new branch in your fork repo for this pull request as mentioned in https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#contributor-workflow
Still not sure about error message which was corrected based on #24180 (review). It looks better but size and dimension have different meaning.
I think a dependency on etree xml is acceptable as it is commonly installed (or maybe even part of Python, i don't know for sure), but would prefer to avoid bleach.
laanwj
Just an anecdotal metric: My system has bleach available without any additional dependencies, but not html-sanitizer, bs4, or xml.html. On the other hand, lxml is already installed for KDE's Breeze icons, Inkscape, lv2 (an audio framework used by Audacity), and a few Gentoo development tools.
luke-jr
bleach on Fedora :ModuleNotFoundError: No module named 'bleach'
Not sure if there are better alternatives or we could sanitize without using it.
print(f"Captch from URL {args.captcha}") prints this which looks good:
res.raise_for_status()
except requests.exceptions.RequestException as e:
Sorry I could not test this PR today because of below error:
./getcoins.py
Unexpected error when contacting faucet: 502 Server Error: Bad Gateway for url: https://signetfaucet.com/captcha
I have asked @kallewoof and hoping it gets fixed in next few days so that I can test everything.
@prayank23 It's back up. I shut it down for a few days as it was being swarmed with spammers.
but not html-sanitizer, bs4, or xml.html
xml.etree is available in the python standard library. Not any of the other submodules IIRC.
And I don't really like to add any dependencies beyond the standard library. I intentially used ImageMagick instead of any python specific SVG library.
I'm a bit skeptical about the need to "sanitize" the XML at all, to be honest. It feels like a bit of security theater that doesn't address a specific issue. I'm ok with removing that part and just keeping the size check + ensure SVG format.
That said, iterating over an xml tree and keeping only certain elements and attributes doesn't seem rocket science, if we really want that it should be possible with just etree manipulation.
130 | + raise SystemExit(f"Unexpected error when contacting faucet: {e}") 131 | + 132 | + # Size limitation 133 | + svg = xml.etree.ElementTree.fromstring(res.content) 134 | + if svg.attrib.get('width') != '150' or svg.attrib.get('height') != '50': 135 | + raise SystemExit("Captcha size doesn't match expected dimensions 150x50")
Tested this part of code by using different url for captcha in getcoins.py:
DEFAULT_GLOBAL_CAPTCHA = 'https://prayank23.github.io/svg-test/size.svg'
and this svg file:
<svg xmlns="http://www.w3.org/2000/svg" height="300" width="100">
<text x="0" y="15">Size test</text>
</svg>
Works as expected:
$ ./getcoins.py
Captcha size doesn't match expected dimensions 150x50
Hey @laanwj and @prayank23, I have made all the changes to the script, waiting for merge. Thanks
Hey @laanwj and @prayank23, I have made all the changes to the script, waiting for merge. Thanks
You will have to squash commits: https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits
If you haven't done this before and want to avoid mistakes, trying squashing a few commits in test repository or other branch could be helpful. Or you could even use GitHub desktop application for it.
Tested ACK 7c7dbe43983ea8408ece8e483320ec55083d6c09
7 | @@ -8,6 +8,7 @@ 8 | import requests 9 | import subprocess 10 | import sys 11 | +import xml.etree.ElementTree
import defusedxml.ElementTree
Sorry for the late review. I was still going through everything in the code and testing things. Found that xml.etree.ElementTree.fromstring is known to be vulnerable to XML attacks. Maybe this can be changed in a follow up PR.
If there is a specific problem with xml.etree it needs to be addressed. But without introducing a third-party dependency, please. Adding defusedxml is not acceptable here.
Details: https://deepsource.io/gh/prayank23/bitcoin/run/61a5e60f-9376-4fd4-8a65-3b76190a56af/python/BAN-B314
I tried XML attacks with some test svg files added here: https://github.com/prayank23/svg-test but found nothing interesting. Maybe its not an issue for running the script in terminal and such attacks only work for websites.
One of the error:
limit on input amplification factor (from DTD and entities) breached: line 15, column 35