Millions of research papers at risk of disappearing from the Internet
March 7, 2024 8:19 PM Subscribe
"An analysis of DOIs suggests that digital preservation is not keeping up with burgeoning scholarly knowledge." More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January, indicate that systems to preserve papers online have failed to keep pace with the growth of research output.
"Our entire epistemology of science and research relies on the chain of footnotes," explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. "If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself."
...
“Many people have the blind assumption that if you have a DOI, it’s there forever,” says Mikael Laakso, who studies scholarly publishing at the Hanken School of Economics in Helsinki. “But that doesn’t mean that the link will always work.” In 2021, Laakso and his colleagues reported that more than 170 open-access journals had disappeared from the Internet between 2000 and 2019.
It's always concerning how much information on the internet could disappear but I'm happy there's so much focus on it right now. Hopefully we'll get ourselves to a more stable position soon.
archive.org link
their references:
original link [archive.org link]
original link [archive.org link]
"Our entire epistemology of science and research relies on the chain of footnotes," explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. "If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself."
...
“Many people have the blind assumption that if you have a DOI, it’s there forever,” says Mikael Laakso, who studies scholarly publishing at the Hanken School of Economics in Helsinki. “But that doesn’t mean that the link will always work.” In 2021, Laakso and his colleagues reported that more than 170 open-access journals had disappeared from the Internet between 2000 and 2019.
It's always concerning how much information on the internet could disappear but I'm happy there's so much focus on it right now. Hopefully we'll get ourselves to a more stable position soon.
archive.org link
their references:
original link [archive.org link]
original link [archive.org link]
Nick Zentner rallied a small army of Zentnerds and pulled J Harlan Bretz notes back from the brink of being forgotten: https://www.youtube.com/watch?v=1tqBgiozZs4
posted by wmo at 9:35 PM on March 7
posted by wmo at 9:35 PM on March 7
Is scholarly knowledge burgeoning? Or is there a burgeoning of scholarly knowledge? More papers than ever being written, but are they all advancing knowledge? I feel like there’s a humongous amount of garbage being produced. Can’t speak to every field, but most of what might might be lost isn’t interesting or useful to humanity. And also, do we really need a perfect historical record of our era? Lots of it is fine to be forgotten, I know it’s theoretically possible for exhaustive recording of all this, but, nah, plenty can be lost and it will be fine.
posted by ixipkcams at 10:54 PM on March 7 [5 favorites]
posted by ixipkcams at 10:54 PM on March 7 [5 favorites]
Adjacent: Forty Years ago Acorn Computers, Philips, Logica, the EEC and 1 million UK skoolkids compiled the BBC Domesday Project recording what life was like 900 years after William I's original survey. Within 20 years this bright new snapshot of daily life (maps! stats! pics! virtual walks!) was unreadable. Tim Harford has a Cautionary Tale [Nov '23, 40mins] about the loss, the partial recovery, the second loss. Too many pioneers, not enough archivists.
posted by BobTheScientist at 12:57 AM on March 8 [7 favorites]
posted by BobTheScientist at 12:57 AM on March 8 [7 favorites]
Seven million digital publications that are publishing scholarly articles? That sounds like a suspiciously high number and makes me wonder if all seven million are high-quality peer-reviewed journals.
posted by Umami Dearest at 1:35 AM on March 8 [5 favorites]
posted by Umami Dearest at 1:35 AM on March 8 [5 favorites]
A colleague was recently seeking opinions on social about a journal for which she’d been invited to guest edit an issue. The journal I didn’t know about for sure, but the publisher has a long, troubling history. I checked into it again to refresh my memory, and one of their journals was at one point publishing nine special issues per day. So, uh, about that quality thing…
Anyway, yes. DOIs/PIDs are great, but they aren’t an archival infrastructure. I bump into disappeared journals and articles regularly. Sometimes copies are floating around somewhere, sometimes not.
posted by cupcakeninja at 2:51 AM on March 8 [1 favorite]
Anyway, yes. DOIs/PIDs are great, but they aren’t an archival infrastructure. I bump into disappeared journals and articles regularly. Sometimes copies are floating around somewhere, sometimes not.
posted by cupcakeninja at 2:51 AM on March 8 [1 favorite]
One of my job duties is reminding people at our government research institution that they are required under applicable rules to submit all published work to the NIH manuscript submission system. Doing so ensures that a freely-available copy gets archived in PubMed Central.
“But we published open access!” they tell me. Sure, you did, but will that journal EXIST in a decade? PubMed Central will. Open access ≠ NIH open access. You can choose the former, but you MUST adhere to the latter no matter what. EU has similar rules, and a similar government-run archive, for the same reasons.
So, so many fly-by-night low quality (or sheer garbage) journals out there, I don’t know why anyone publishes with them. One of my colleagues LOVES sending stuff to MDPI for example, and every time, I’m just annoyed. Pro tip: if the journal makes you, the author, do the typesetting, it’s a shit journal. Why are you paying them to do the editorial work, when they won’t accept it until after you’ve done that work for them? Some other poor sap volunteered to do the peer review for free. You already provided the content for the journal, and you paid handsomely for the “right” to do so… it’s not like they’ll pass on micropayments to you every time someone opens your article.
posted by caution live frogs at 5:47 AM on March 8 [8 favorites]
“But we published open access!” they tell me. Sure, you did, but will that journal EXIST in a decade? PubMed Central will. Open access ≠ NIH open access. You can choose the former, but you MUST adhere to the latter no matter what. EU has similar rules, and a similar government-run archive, for the same reasons.
So, so many fly-by-night low quality (or sheer garbage) journals out there, I don’t know why anyone publishes with them. One of my colleagues LOVES sending stuff to MDPI for example, and every time, I’m just annoyed. Pro tip: if the journal makes you, the author, do the typesetting, it’s a shit journal. Why are you paying them to do the editorial work, when they won’t accept it until after you’ve done that work for them? Some other poor sap volunteered to do the peer review for free. You already provided the content for the journal, and you paid handsomely for the “right” to do so… it’s not like they’ll pass on micropayments to you every time someone opens your article.
posted by caution live frogs at 5:47 AM on March 8 [8 favorites]
Over-production of scholarly articles aside, this seems to me like another example of digital information, distributed over the Internet, being so much more fragile than printed text.
Websites require active maintenance to stay online, and if an organization goes out of business, so does their website. Granted that it’s easy for anyone with access to that website to make a copy and back up the information. But that depends on someone actually bothering to do it. (Thank you, Internet Archive, I owe you a donation.) Especially for journals behind paywalls, no one may have bothered.
With a printed journal, if copies were mailed to subscribers or academic libraries, they at least still have those copies. But if they depended on digital access? Oops.
posted by learning from frequent failure at 7:20 AM on March 8 [3 favorites]
Websites require active maintenance to stay online, and if an organization goes out of business, so does their website. Granted that it’s easy for anyone with access to that website to make a copy and back up the information. But that depends on someone actually bothering to do it. (Thank you, Internet Archive, I owe you a donation.) Especially for journals behind paywalls, no one may have bothered.
With a printed journal, if copies were mailed to subscribers or academic libraries, they at least still have those copies. But if they depended on digital access? Oops.
posted by learning from frequent failure at 7:20 AM on March 8 [3 favorites]
Paper lasts a long, long time given decent storage conditions. Nothing digital seems to come even close (I'm looking at you, my collection of CD-Rs). We're on track to lose more than we realize unless we do something in the meantime.
posted by tommasz at 8:01 AM on March 8 [3 favorites]
posted by tommasz at 8:01 AM on March 8 [3 favorites]
> If you're an academic consider sharing your papers in open libraries like Z-Library.
and whether or not you're an academic always remember to pirate everything all the time. it's a moral obligation, it's a sacred rite, it preserves the past, it produces the future, it's fun, it's cool, go do piracy right now.
posted by bombastic lowercase pronouncements at 8:13 AM on March 8 [8 favorites]
and whether or not you're an academic always remember to pirate everything all the time. it's a moral obligation, it's a sacred rite, it preserves the past, it produces the future, it's fun, it's cool, go do piracy right now.
posted by bombastic lowercase pronouncements at 8:13 AM on March 8 [8 favorites]
Umami Dearest, in this case I believe "publication" means an individual article, not a journal. So the quote should be referring to 7 million articles, not 7 million journals. The exploding number of low-quality paper mill journals has been dramatic but I don't think it's yet that bad.
posted by biogeo at 8:45 AM on March 8 [1 favorite]
posted by biogeo at 8:45 AM on March 8 [1 favorite]
If a paper is not on sci-hub can it really be said to exist at all?
I used to work at the Santa Fe Institute which has a small but important set of working papers. Not peer-reviewed, often quite quickly written, and a publishing system that predates preprint servers like LANL's XXX (aka arXiv). They had a full time librarian and a serious archival commitment. A paper I wrote in 1996 is still there; it never even occurred to me to be check, I was sure the institution would preserve it.
I see now they retired the archives in 2017. Old papers are still online, hopefully as long as the institution lasts. I wonder what they replaced it with?
posted by Nelson at 9:30 AM on March 8 [2 favorites]
I used to work at the Santa Fe Institute which has a small but important set of working papers. Not peer-reviewed, often quite quickly written, and a publishing system that predates preprint servers like LANL's XXX (aka arXiv). They had a full time librarian and a serious archival commitment. A paper I wrote in 1996 is still there; it never even occurred to me to be check, I was sure the institution would preserve it.
I see now they retired the archives in 2017. Old papers are still online, hopefully as long as the institution lasts. I wonder what they replaced it with?
posted by Nelson at 9:30 AM on March 8 [2 favorites]
Institutional repositories exist. Fuck for-profit journals, fuck the big 5, and if you are an academic put your shit in your IR.
posted by aspersioncast at 9:30 AM on March 8 [5 favorites]
posted by aspersioncast at 9:30 AM on March 8 [5 favorites]
Is scholarly knowledge burgeoning? Or is there a burgeoning of scholarly knowledge? More papers than ever being written, but are they all advancing knowledge?
I know that in my own corners of IEEE and ACM, there are plenty of journal articles that are written not to advance the state of the art but to document that the authors have done something. Reading those articles doesn't hurt, but it seldom teaches you anything that can help you with your own tasks.
A lot of the text we fear losing is knowledge that we never attempted to preserve in the 20th century. Knowledge that went from one colleague to another over the phone or over coffee, and then dissolved into the ether. There are needles in the haystack, and it would be a shame to lose them, but every archivist from Sumer's era to today had to make decisions on what to preserve.
posted by ocschwar at 9:49 AM on March 8
Piracy from the big 5 parasites is especially a moral good. They don't pay for the work, the review, or most of the editors. And then they charge institutions insanely high prices to let them see the digital works that many of us already have (legally!) on our institutional repositories. The sooner they die the better.
More papers than ever being written, but are they all advancing knowledge?
This framing is basically a category flaw. But to answer what you probably mean: sure, the vast majority of papers that I read or review in my fields advance knowledge.
posted by SaltySalticid at 11:38 AM on March 8 [2 favorites]
More papers than ever being written, but are they all advancing knowledge?
This framing is basically a category flaw. But to answer what you probably mean: sure, the vast majority of papers that I read or review in my fields advance knowledge.
posted by SaltySalticid at 11:38 AM on March 8 [2 favorites]
yes but providing justifications for pirating works from specific publishers is unnecessary and can be somewhat unhelpful, since giving a justification for specific acts of piracy can implicitly suggest that acts of piracy need a justification, and piracy needs no justification.
posted by bombastic lowercase pronouncements at 10:10 PM on March 8 [1 favorite]
posted by bombastic lowercase pronouncements at 10:10 PM on March 8 [1 favorite]
« Older That's a beautiful speech, but nobody's listening.... | Rage keeps you breathing Newer »
This thread has been archived and is closed to new comments
If you're an academic consider sharing your papers in open libraries like Z-Library.
posted by neonamber at 8:55 PM on March 7 [7 favorites]