backup
March 14, 2025 11:37 AM Subscribe
together they founded the Data Rescue Project to preserve the enormous data sets that website-focussed efforts had missed. Its tracker now catalogues more than four hundred publicly accessible volunteer backups of government repositories…
By mid-February, the Data Rescue Project was recruiting from r/DataHoarder and a few related networks. Majstorovic and others began teaching the less experienced members how to back up government data with ArchiveTeam Warrior—an app whose creators have launched a data-rescue campaign—and to upload it to a secure public repository called DataLumos [newyorker/archived]
I am in awe of all the people doing what they can to preserve and share this knowledge that we have created together.
This is what we should be, who we should be, and every time I see people pulling together and doing these things that are the best of us, it gives me greater courage and resolve to pitch in where I can.
I am always so glad to get to read more about this fine and noble work.
Thank you so much for posting this, HearHere.
And to everyone involved in the Data Rescue Project, my heartfelt and lasting gratitude.
posted by kristi at 12:02 PM on March 14 [5 favorites]
This is what we should be, who we should be, and every time I see people pulling together and doing these things that are the best of us, it gives me greater courage and resolve to pitch in where I can.
I am always so glad to get to read more about this fine and noble work.
Thank you so much for posting this, HearHere.
And to everyone involved in the Data Rescue Project, my heartfelt and lasting gratitude.
posted by kristi at 12:02 PM on March 14 [5 favorites]
The r/DataHoarder people are insane, in a good way, and if they're a part of this, then this shit is getting archived, hard. Little things like this make me more optimistic in dark times.
posted by Joakim Ziegler at 12:24 PM on March 14 [3 favorites]
posted by Joakim Ziegler at 12:24 PM on March 14 [3 favorites]
> Interesting. Looks like they are backing up some repositiries that aren't indexed by Archive.org?
Ultimately archive.org can only back up what they're allowed to back up. Their crawler respects robots.txt files and owners can also request an archive be deleted.
The DRP is probably not beholden to these same rules.
posted by at by at 12:36 PM on March 14 [3 favorites]
Ultimately archive.org can only back up what they're allowed to back up. Their crawler respects robots.txt files and owners can also request an archive be deleted.
The DRP is probably not beholden to these same rules.
posted by at by at 12:36 PM on March 14 [3 favorites]
Looks like they are backing up some repositiries that aren't indexed by Archive.org?
In general, Archive doesn't tend to grab "raw" data files, whether that be in a database format or in something portable like csv or xml. Archive also respects robots.txt files that ask it not to index/grab, which sometimes gets in the way of things being saved. Things like End of Term Archive grabbed a lot of the "websites" of the previous administration but again, didn't go deep into saving data itself.
I've worked with a few groups that started on this effort back in late November. There's a LOT of work being done silently behind the scenes to make sure that old data is still around. The real worry now is that new data isn't being saved or produced, or that data that is being save and shared isn't reliable...healthcare data, climate data, economic data, all are now -extremely- likely to be cooked in ways that are not reflective of reality.
That's the current going concern that I'm seeing.
posted by griffey at 12:42 PM on March 14 [6 favorites]
In general, Archive doesn't tend to grab "raw" data files, whether that be in a database format or in something portable like csv or xml. Archive also respects robots.txt files that ask it not to index/grab, which sometimes gets in the way of things being saved. Things like End of Term Archive grabbed a lot of the "websites" of the previous administration but again, didn't go deep into saving data itself.
I've worked with a few groups that started on this effort back in late November. There's a LOT of work being done silently behind the scenes to make sure that old data is still around. The real worry now is that new data isn't being saved or produced, or that data that is being save and shared isn't reliable...healthcare data, climate data, economic data, all are now -extremely- likely to be cooked in ways that are not reflective of reality.
That's the current going concern that I'm seeing.
posted by griffey at 12:42 PM on March 14 [6 favorites]
Looks like they are backing up some repositories that aren't indexed by Archive.org?
Internet Archive is currently in grave financial danger. Lots of copies keep things safe, no matter what.
posted by reedbird_hill at 12:45 PM on March 14 [5 favorites]
Internet Archive is currently in grave financial danger. Lots of copies keep things safe, no matter what.
posted by reedbird_hill at 12:45 PM on March 14 [5 favorites]
I’m part of this if anyone has questions. We’re academic researcher focused on datasets for our researchers. Not so much on websites.
posted by jwells at 3:34 PM on March 14 [13 favorites]
posted by jwells at 3:34 PM on March 14 [13 favorites]
Godspeed, You Database Emperors and Empresses.
posted by Smedly, Butlerian jihadi at 3:44 PM on March 14 [4 favorites]
posted by Smedly, Butlerian jihadi at 3:44 PM on March 14 [4 favorites]
I wanted to know how to get involved, but then I found that info in the FAQ!
posted by itsatextfile at 5:47 PM on March 14 [4 favorites]
posted by itsatextfile at 5:47 PM on March 14 [4 favorites]
« Older Human labor is the new buggy-whip | The Three Stooges' "Brideless Groom" Newer »
posted by rageagainsttherobots at 11:49 AM on March 14 [1 favorite]