test: switch from curl to urllib for HTTP requests #29970
pull iw4p wants to merge 1 commits into bitcoin:master from iw4p:feature/requests-library-usage changing 1 files +17 −14-
iw4p commented at 12:24 pm on April 26, 2024: noneSwitched from using subprocess to make HTTP requests (curl) to using the Python urllib library, improving cleanliness and maintainability.
-
DrahtBot commented at 12:24 pm on April 26, 2024: contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage
For detailed information about the code coverage, see the test coverage report.
Reviews
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
-
DrahtBot added the label Refactoring on Apr 26, 2024
-
in test/get_previous_releases.py:145 in 38176a7333 outdated
151- ret = subprocess.run(cmd).returncode 152- if ret: 153- return ret 154+ response = requests.get(tarballUrl) 155+ with open(tarball, 'wb') as file: 156+ file.write(response.content)
maflcko commented at 12:30 pm on April 26, 2024:This is not a refactor. It is a behavior change when an error occurs, for example a network error, or a file error
laanwj commented at 12:40 pm on April 26, 2024:If you’re going to read the entire data into memory, you might as well do the hashing at the same time? Currently, it reads back the sametarball
it wrote and passes it to the hasher, which seems kind of a waste.
iw4p commented at 12:40 pm on April 26, 2024:You’re right about behavior change. I am still not sure about usingscript
orutil
tag.
laanwj commented at 1:38 pm on April 26, 2024:It’s important to be careful about behavior changes but i do think this is an improvement, usingrequests.head
is better practice than calling curl then looking for a string in the output.laanwj removed the label Refactoring on Apr 26, 2024laanwj added the label Tests on Apr 26, 2024laanwj commented at 1:43 pm on April 26, 2024: memberThis is a script used for the tests, so addingtest
tag.iw4p renamed this:
refactor: switch from curl to requests for HTTP requests
test: switch from curl to requests for HTTP requests
on Apr 26, 2024iw4p commented at 2:00 pm on April 26, 2024: noneThe title’s tag also changed fromRefactor
toTest
DrahtBot added the label CI failed on Apr 26, 2024iw4p force-pushed on Apr 26, 2024iw4p commented at 5:46 pm on April 26, 2024: none@laanwj I investigated and found that you modified
import requests
here here. Is there any shell script that I can providepip install requests
to pass theprevious release, depends DEBUG
?edit: It seems bitcoin/ci/test/00_setup_env_native_previous_releases.sh is responsible to provides everything that the environment’s needed. Shall I add
pip install requests
here?laanwj commented at 10:25 pm on April 26, 2024: memberYes, that’s usually how it’d be done.
But gah,
requests
is an external dependency? That’s unfortunate, i don’t think we should be adding dependencies unless absolutely necessary. Probably better to do this with python’s built in urllib or http client functionality, or keep it like this.iw4p force-pushed on Apr 27, 2024iw4p renamed this:
test: switch from curl to requests for HTTP requests
test: switch from curl to urllib for HTTP requests
on Apr 27, 2024iw4p force-pushed on Apr 27, 2024iw4p commented at 5:35 am on April 27, 2024: noneI replaced therequests
withurllib
, tested the script for downloading .tar.gz file and works fine for me.DrahtBot removed the label CI failed on Apr 27, 2024in test/get_previous_releases.py:138 in 7fe94f79c9 outdated
142- 143- curlCmds = [ 144- ['curl', '--remote-name', tarballUrl] 145- ] 146+ try: 147+ response = urlopen(Request(tarballUrl, method='HEAD'))
maflcko commented at 7:20 am on April 27, 2024:Is it required to check for 404 in a separate, additional request?
laanwj commented at 7:28 am on April 27, 2024:That’s how the current code works too - it first does a HEAD request to check that the file is there, then goes to downloading. i don’t know the original reasoning behind this, though.
iw4p commented at 7:29 am on April 27, 2024:No, I could write more pythonically and with one request, but the reason I wrote it this way was to commit to the previous code. If you agree with one request, I can change it. Should this change be in a new commit or should I force it on the previous commit?
maflcko commented at 7:30 am on May 13, 2024:Should this change be in a new commit or should I force it on the previous commit?
Squashing seems fine?
iw4p commented at 7:37 am on May 13, 2024:To rewrite the git history for having the last change on the last commit and ignore old ones, I forced it. I used squashing only for merging before that.
maflcko commented at 4:17 pm on May 25, 2024:Was this addressed?
maflcko commented at 9:16 am on June 9, 2024:I tested this myself, and the reason seems to be that absent aHEAD
request, a 404.html would be downloaded as the targz, which then fails the shasum check.
AngusP commented at 9:51 pm on June 12, 2024:With curl that makes sense, though here theurlopen
will raise aHTTPError
on a 404 for both aGET
andHEAD
request, so the earlier HEAD does seem superfluous?in test/get_previous_releases.py:147 in 7fe94f79c9 outdated
155- for cmd in curlCmds: 156- ret = subprocess.run(cmd).returncode 157- if ret: 158- return ret 159+ try: 160+ response = urlopen(tarballUrl)
AngusP commented at 10:20 am on May 10, 2024:AFAIK there’s a minor behaviour change here, where
urlopen
will follow redirects whereascurl
won’t usually0$ curl -I https://httpbin.org/absolute-redirect/3 1HTTP/2 302 2# ...
0>>> from urllib.request import urlopen 1>>> response = urlopen("https://httpbin.org/absolute-redirect/3") 2>>> response.code 3200 # Not 302 because redirects were followed
This should be fine, but worth a mention.
iw4p commented at 12:21 pm on May 12, 2024:Great point, Thank you.AngusP approvedAngusP commented at 10:22 am on May 10, 2024: contributorcrACK 7fe94f79c90b689dce89b287d1016a97218021f6test: switch from curl to urllib for HTTP requests 588cad38f7in test/get_previous_releases.py:8 in 7fe94f79c9 outdated
4@@ -5,7 +5,7 @@ 5 # file COPYING or http://www.opensource.org/licenses/mit-license.php. 6 # 7 # Download or build previous releases. 8-# Needs curl and tar to download a release, or the build dependencies when 9+# Needs urllib built-in python library and tar to download a release, or the build dependencies when
fanquake commented at 7:33 am on May 13, 2024:If this is part of the standard library, I don’t think you need to list it as a requirement.iw4p force-pushed on May 25, 2024iw4p requested review from laanwj on May 25, 2024iw4p requested review from fanquake on May 25, 2024iw4p requested review from maflcko on May 25, 2024AngusP approvedAngusP commented at 9:50 pm on May 25, 2024: contributorACK 588cad38f7813d88f022d0304aa20a0b96d184edtheStack commented at 2:47 pm on June 7, 2024: contributorOne drawback of this change is that a user doesn’t see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0.iw4p commented at 3:53 pm on June 7, 2024: noneOne drawback of this change is that a user doesn’t see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0.
I appreciate your feedback. While it’s true that switching from
curl
tourllib
removes the per-file download progress visibility, this change offers significant improvements in code cleanliness and maintainability, which are crucial for long-term project sustainability.Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host, enhancing overall performance. Although users might not see detailed progress, the efficiency gains and cleaner codebase justify this trade-off. For those needing more detailed progress feedback, we can consider integrating optional logging mechanisms directly into the code, ensuring we don’t rely on third-party libraries like
tqdm
.maflcko commented at 9:20 am on June 9, 2024: memberMoreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host,
This isn’t true? There are still two requests before and after this change, so this isn’t changed. Also a
HEAD
request shouldn’t cause any meaningful traffic, compared to a full download.in test/get_previous_releases.py:149 in 588cad38f7
157- if ret: 158- return ret 159+ try: 160+ response = urlopen(tarballUrl) 161+ with open(tarball, 'wb') as file: 162+ file.write(response.read())
maflcko commented at 9:23 am on June 9, 2024:Does this load the full response into memory, which curl does not do? (Probably fine, but if the approach is extended to debug symbols, which are 0.5GB, this may become an issue)
AngusP commented at 9:49 pm on June 12, 2024:You canresponse.read(amt=<number of bytes>)
so it could be chunked to avoid loading the whole thing in to memory. That could also be used to add back some coarse progress indicator
maflcko commented at 7:33 am on June 13, 2024:Right, but at some point I wonder if it is worth it to try to re-implementcurl
in Python. I am not sure, but is curl not bundled ingit
, which is already a dependency of this project? Of all open source software projectscurl
is currently unlikely to go unmaintained, so the motivation still seems weak.maflcko commented at 9:26 am on June 9, 2024: memberAlso tend toward NACK here, as there are several issues here from a quick glance:
- This is more lines of code, so possibly more to maintain.
- This worsens the UX, due to lack of progress.
- Review and testing this change may not be worth it, considering the limited benefit, if any?
iw4p commented at 9:50 am on June 9, 2024: noneMoreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host,
This isn’t true? There are still two requests before and after this change, so this isn’t changed. Also a
HEAD
request shouldn’t cause any meaningful traffic, compared to a full download.Twice I asked if you think it’s fine, we can use one request to check the status code (if it’s 404) or download (if status code isn’t 404) and I got no answer. Right now the code send two requests and I did not change that to one request because we’re not aware of what’s the reason, but it’s clear that one request is enough. It seems I can not change the logic easily and I must continue on the old same logic, otherwise I’d like to write it in that way which is more reliable and general for handling all kind of errors, not only 404.
0try: 1 response = urlopen(tarballUrl) 2 with open(tarball, 'wb') as file: 3 file.write(response.read()) 4except Exception as e: 5 print(f"Failed to download file. Error: {str(e)}") 6 return 1
maflcko commented at 10:34 am on June 9, 2024: memberTwice I asked if you think it’s fine […] and I got no answer.
As the author of a pull request, there is an expectation that you spend time to review and to test the code you have written yourself. I try to ask questions during review to motivate pull request authors to look into interesting areas and edge-cases to explore in the change. There are currently more than 300 open pull request and I currently don’t have the time to test everything I can imagine in all of them. In this case, it is not really that hard, so I did the test myself, see #29970 (review). So with the current code in master, the two requests are needed. I haven’t tested this with your code, so I can’t provide you an answer to that.
DrahtBot added the label CI failed on Jun 18, 2024DrahtBot removed the label CI failed on Jun 20, 2024bitcoin deleted a comment on Jun 20, 2024maflcko commented at 7:30 am on June 21, 2024: memberI think this can be closed for now. It seems too controversial for a simple test change?iw4p commented at 1:33 pm on June 21, 2024: noneI think this can be closed for now. It seems too controversial for a simple test change?
I think so. I’ll try to take a look at other issues, I’d be happy if I can help.
maflcko commented at 2:03 pm on June 21, 2024: memberOk, closing for now.
You can also help with review of other open pull requests. This will teach you about the project and naturally you will find leftovers, follow-ups, or (un)related changes that you can tackle yourself.
maflcko closed this on Jun 21, 2024
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me