← index

Public archive for Delving Bitcoin

An archive of delvingbitcoin.org · view original topic →

James O'Beirne · #1 ·

It would be great to have a public archive of the posts here, so that

  1. if this site goes away, we still have all the content, and
  2. it is searchable by indexers like https://bitcoinsearch.xyz.

As far as I can tell, there are three approaches:

Scraping

Use some kind of crawler (selenium, wget, etc.) to manually create a backup by scraping the site.

API

Use the Discourse API to pull content. This could be done either periodically as a full crawl or incrementally on a continuous basis.

DB dump

Get a SQL dump of the database and run some kind of sanitization/export script on it every so often.


I’d say we should either go with API or DB, and then post the results on a git repo somewhere public (preferably a few places!).

Thoughts from the admins?

Anthony Towns · #2 ·

Scraping seems to be what upstream recommends:

There’s also the /raw/ endpoint: https://delvingbitcoin.org/raw/87 and the api, eg https://delvingbitcoin.org/posts.json

There’s also Discourse Public Data Dump - developers - Discourse Meta because of course exporting data for AI training is a much higher priority than just information accessibility and continuity…

James O'Beirne · #3 · · in reply to #2

Oh, that’s great! I think these two might give us everything we need?

Anthony Towns · #4 · · in reply to #3

Might need to do something extra to also get any uploaded images / attachments?

Also need to be prepared to add ?page=2 etc if there’s more than 100 comments, eg https://meta.discourse.org/raw/69776?page=8. I think the API is rate limited kinda heavily by default, Available settings for global rate limits and throttling - sysadmin - Discourse Meta so may need some tweaking.

midnight · #5 ·

Having sat there and considered significantly the longterm utility of logs and access to information that Bitcoin requires to defend itself—literally transparency is one of its best and most-resulted-in-rescues defences—I would like to also recommend that you make the archival and thus witnessing something that more than one person or process can perform.

On IRC, people can create and maintain logs longer-term on a per-person basis. This creates a much more robust and participation-based consensus on what constitutes the historical record.

I have found that this is an important facet of archival effectiveness as well. There are on occasion some attacker-injection problems that can be a problem for the safety of individuals, but I have also found that the most diligent and reliable sources of archival information are also the most reasonable and realistically practical people as well—so this tends to be an addressable problem.

Thus, may I suggest that the archival process itself be made available to individuals who are interested in participating. :slight_smile:

Further, now that I’m thinking about it, I would like to point out that for those forums like Slack where the relevant historical archive is spotty, questionable, inaccessible, or otherwise opaque, these places are the sources of significant and ongoing attacker fuel—a simple propaganda-only example would be the ongoing nonsense about the “dragon’s den.”

James O'Beirne · #6 ·

Okay! I’ve come up with a minimum viable script for doing this.

Here’s the archive repository: GitHub - jamesob/delving-bitcoin-archive: A public archive of delvingbitcoin.org.

The script for doing the actual archiving is here (and should be easily pip-installable by anyone wanting to reproduce this process for themselves): GitHub - jamesob/discourse-archive: Provides a simple archive of Discourse content

pip install discourse-archive
discourse-archive -u https://delvingbitcoin.org

The only caveat with this script is if someone updates an old post (older than a day), the update won’t be detected. I’m not sure if there’s a good solution for this, but maybe some other part of the API could clue us in to updated posts.

0xB10C · #7 ·

I’ve opened [no-merge] what's the diff to a fresh run by 0xB10C · Pull Request #1 · jamesob/delving-bitcoin-archive · GitHub to check how big the diff of GitHub - jamesob/delving-bitcoin-archive: A public archive of delvingbitcoin.org to a fresh archival would be. Some diff is expected (e.g. number of likes, JSON fields only added in newer discourse versions, …), but I also noticed some posts and topics that were never archived.

0xB10C · #8 ·

Note that the discourse-archive tool has an off-by-one bug and might miss some posts: fix: off-by-one error in post fetching by 0xB10C · Pull Request #3 · jamesob/discourse-archive · GitHub

/dev/fd0 · #9 · · in reply to #8

I don’t trust these archives (no offense to @jamesob and @ajtowns). One of my posts was deleted and it was important to get the CVE ID. I still managed to get it: NVD - CVE-2025-65548 (9.1 critical).

The post was censored on “bitcoin” dev mailing list.

So, I archive my all posts and comments here. I use archive.org and archive.is for it.

Archive

0xB10C · #10 · · in reply to #9

I think your comment is completly off-topic. You are obviously welcome to archive the posts yourself and host it on your own infrastructure. I’d even recommend you do - and hopefully others do too.

An archive of this forum can only capture what’s on this website at the time of archival. It can’t capture a deleted post. Use archive.org and archive.is if you fear your post is going to be deleted (that seems to be a moderation topic, not an archival topic), yes, but that’s for individual posts, not a browsable version of topics and their comments.

0xB10C · #12 ·

I’m now running a delvingbitcoin.org mirror build from a discourse-archive archive on https://mirror.b10c.me/delvingbitcoin-org/. The mirror tool can be found here: discourse-archive/mirror.py at 93061b09f9d5fbaaeaa0261300302e069563cc22 · 0xB10C/discourse-archive · GitHub.

An archive of this topic can be found here: Public archive for Delving Bitcoin

The mirror is updated every few hours.

The tool does not yet archive images contained in posts, but I plan to extend it to do so.