Taylor & Francis sells access to research material to Microsoft AI
July 21, 2024 6:31 AM Subscribe

Authors have expressed their shock after the news that academic publisher Taylor & Francis, which owns Routledge, had sold access to its authors’ research as part of an Artificial Intelligence (AI) partnership with Microsoft—a deal worth almost £8m ($10m) in its first year.

For those unfamiliar with it, scholarly publishing is often (primarily, even) not a for-profit enterprise for most academics. Most receive no remuneration for their work, and neither do most peer reviewers or academics serving on editorial boards, though there are notable exceptions. Many authors pay out of pocket for image rights, editing, or in some cases article processing fees that can stretch to the thousands to publish a single article Open Access with a major publisher.

Most of this is "business as usual," with the remuneration ostensibly coming in prestige and tertiary benefits that are intended to accrue to full-time tenured positions in colleges and universities, although these positions have dwindled in number in most fields, almost entirely vanished in others. Publishers [SLPDF] argue that they provide various benefits that justify all of this.

posted by cupcakeninja (39 comments total) 26 users marked this as a favorite

Is this more or less than is being paid to train AI on random Reddit posts?
posted by clawsoon at 6:35 AM on July 21 [1 favorite]

That's a good question. I don't know the scope of deals that Microsoft AI or other companies are making for access to other resources.

In fairness, I should say that while the news appalls me personally, I have heard writing-averse academics make the argument that GenAI academic writing tools will be (and/or already are) a boon to them. The theory is that it allows some researchers to focus on gathering and analyzing data, thereafter leaving the step they dislike to a robot.
posted by cupcakeninja at 6:41 AM on July 21

leaving the step they dislike to a robot

That might be the most depressing thing I’ve read this year, in a very strong field.
posted by Phanx at 6:59 AM on July 21 [28 favorites]

Ah, the publishing world, and academia...the noblest of professions!
posted by Czjewel at 7:01 AM on July 21 [4 favorites]

Outsource or perish?
posted by chavenet at 7:03 AM on July 21 [4 favorites]

I also depresses me and I had to think a moment to figure out why.

As a former academic, I also struggled with writing. I like to write, but it was also immensely stressful due to all of the pressures involved. I understand the appeal of giving some directions to generative AI and having it spit out a summary of my results for me. After all, the purpose is different - I'm not trying to create art, I'm not trying to express some feeling or experience, I'm trying to communicate what I did and what I found out.

But I guess I still see science as, ideally, a conversation and collaboration between scientists. The idea of sitting down to read a bunch of AI-generated academic papers just feels so sad and alienating to me, beyond all of the ethical concerns with how that AI is trained and used, and beyond all of the concerns about accuracy, etc.

I can also imagine how this will destroy the literature review. It will settle on a "standard" literature review for a topic, meaning coverage of new papers and reconsiderations of old papers will be that much less likely to appear.
posted by Kutsuwamushi at 7:08 AM on July 21 [25 favorites]

Always read the fine print.
posted by Tell Me No Lies at 7:09 AM on July 21 [2 favorites]

Phanx, it bugs me too, because I am a writer and enjoy writing. I have (no sarcasm) found the "I can't wait to pass on the writing chore" perspective easier to deal with by reminding myself that I am good at some things and not others, and that I enjoy some things and not others. Not everyone can write or read easily, or can do so without experiencing significant stress. For myself, I regularly outsource to machines and/or robots a bunch of small and large chores that have historically and sometimes been viewed as virtuous or pleasurable. That doesn't, however, mean I have much interest in reading the word salad that Gen AI is often producing right now.

Kutsuwamushi, maybe! Or maybe not. There are already various AI-driven or AI-aided tools to facilitate research, including literature reviews, some focused on scanning scholarly networks to find the newest research. I suspect reconsiderations may be harder, though, unless a researcher specifically sets out to do that and manually adds pre-2000 (or whatever) sources.
posted by cupcakeninja at 7:13 AM on July 21 [1 favorite]

> The theory is that it allows some researchers to focus on gathering and analyzing data, thereafter leaving the step they dislike to a robot.

the writing IS the analysis of the data. the data is just numbers. you have to make inferences based on it and draw conclusions from it and you have to use words to do that. so i guess we’ll just have the ai do that now? truly a fallen time
posted by dis_integration at 7:30 AM on July 21 [28 favorites]

I have an academic book that’ll be part of this.

Paid for image rights myself, but did receive royalty checks. Enough to pay for groceries once. Like Ted Chiang said, fear of AI is about fear of capitalism.

The book's on Radiohead. The irony, amirite. This lyric is 21 years old:

“Your alarm bells, your alarm bells
They should be ringing, they should be ringing”
posted by josephtate at 7:34 AM on July 21 [14 favorites]

There are numerous reasons to not be happy with this, but I'm more surprised by the paltry amount (£8M/year?). I'm surprised Taylor and Francis values their back catalogue so cheaply. M
Big academic publishers have delighted in driving up prices for university libraries for the past couple decades and then when someone with real money comes calling they sell to them for chump change?

One future I thought might be possible was that AI killed off open access publishing since academics would run to the shelter of pay-walled gardens to protect their work from AI harvesting. If Microsoft et al are able to jump over those walls for the equivalent of (for them) pocket change, I guess that puts paid to that idea. If I were a T&F editor/fact-checker/proofreader I'd be stewing at how little my company understands the value of my labour.
posted by nangua at 7:59 AM on July 21 [8 favorites]

Re: "read the fine print"

I did. I actually queried and changed things. Nothing in the contracts I've signed is there anything about the use of digital models, LLMs, AI, training platforms, or selling the data to third parties other than in the context of acquisitions, mergers or translated publishing deals.

In fact, at least two of the things I wrote for Routledge were before LLMs were a thing, so how I was supposed to psychically know to protect myself against a technology that did not exist is beyond me.

I also know colleagues who have never published with these publishers, but their smaller publishing houses have been bought out, and they suddenly find themselves a Routledge Author and bundled in to this deal without even being alerted to it.

The best bit - you know who actually paid for this research? You probably did. My research income has been entirely from tax-funded research organisations, or charities/not-for-profits. These publishers have paid for nothing, done very little, and reaped the rewards. Nice grift.
posted by AFII at 8:06 AM on July 21 [42 favorites]

The journals and academic areas appear to be mostly non-science, or at least not hard science, which I assume is relevant. And leaves me less certain of the norms.

In the hard sciences a lot of the core of the paper is the data. People have been trying to use machines to extract and aggregate relevant data from the literature and patents for the length of my career. You can guess about the success rate because there are companies that employ mountains of scientists (often Ph.D.'s, I'm not joking) to do this manually. I wouldn't have much patience for people in my field complaining about this; in fact, I consider AI "reading" of scientific literature one of the more likely-to-succeed use cases.

And, of course, in most cases the work product is the stuff in the lab or in the field. There's not the feeling you can be replaced just because something else can read papers and write paragraphs.

I'll accept the calculations might be different in other fields.
posted by mark k at 8:07 AM on July 21 [1 favorite]

Always read the fine print.

Tell Me No Lies, or have AI summarize it for you ….

If I were a T&F editor/fact-checker/proofreader I'd be stewing at how little my company understands the value of my labour.

nangua, if they're treated anything like authors, my suspicion is that this had already been made abundantly clear to them.
posted by It is regrettable that at 8:07 AM on July 21 [3 favorites]

the writing IS the analysis of the data. the data is just numbers. you have to make inferences based on it and draw conclusions from it and you have to use words to do that. so i guess we’ll just have the ai do that now? truly a fallen time

The writing is largely boring sentences that string together the few bits of genuine insight expressed as either (depending on the field) tight phrases or equations. The insights are supported by citation of other papers or data, which in academic papers is usually contained in tables.

Modern LLMs aren't going to generate tight phrases or the correct equations. Any citations or data would be the purest hallucination and you'd be very lucky to get something even plausible there. Even if OpenAI is correct in their assertion that they can replace >51% of office workers in 3~4 years (which is pretty goddamn optimistic), academic research is about as deep into the other 49% as it gets.

The only time I actually use LLMs in my job is for writing generic design templates, because despite its necessity I fucking hate documentation (who doesn't?) and every minute spent not in the editor is a net loss to both the company and my own sanity. Depending on the project (the partner for my current assignment expressly bans all AI use, so not for the past few months) when authoring documentation I have an LLM spit out its guess as to a spec for, say, a compass element at the top of the HUD. 70% of what it spews is correct in a junior systems designer-y way, and I rewrite the other 30% with the key elements it missed, anything proprietary I didn't want OpenAI to see, and the actual details of the compass I'd already implemented. The structure and the bulk of the words aren't mine, but everything that actually matters is. It seems extremely likely this same basic pattern would hold for academic papers, and ...good for them? More time in the lab, or reading other papers and expanding their knowledge of their field. Less time writing the parts everyone is just going to skim past anyway.

The problem here is yet again the failure to obtain consent to use the work in training, even if the authors of the papers agreed to allow their work to be used in any fashion the publisher deemed fit as part of their publishing agreement, this is a novel use case and permission should have been obtained for each. I don't think it rises to the level of theft or copyright violation, but it can be neither of those things and still deeply unethical. Capitalism breeds a robber baron mentality, this is the result, and if anyone has a solution to that they've been keeping it to themselves.
posted by Ryvar at 8:13 AM on July 21 [3 favorites]

All the reasons academic writing feels like a chore are reasons, I suspect, that generative text models are a very, very bad fit for this.

When you're writing a academic paper, you are trying to cram a very large amount of information into very few words, some of which is hopefully completely new and novel information. Leaving out subtle but crucial details is very bad and will rapidly make the reasoning difficult or impossible to follow. Explaining concepts ambiguously or imprecisely will have the same effect. And if you're reading through a paper, it tends to take a very close reading and a deep understanding of the subject matter to even tell if a paper has the aforementioned problems. Dealing with all of this is painful, but it is very necessary!

Ultimately, these fancy generative text models are statistical models. They are going to favor things that appear frequently rather than favoring unique information. They are going to favor ambiguity because words that fit multiple places are going to be more likely than things that fit only one context. They are primed to do all the very bad things in academic writing while making it very easy to generate something that looks on the surface like it might be useful. Figuring out that it isn't will waste a lot of people's time.
posted by Zalzidrax at 8:21 AM on July 21 [12 favorites]

> ok computer
posted by HearHere at 8:24 AM on July 21 [2 favorites]

It appears that my own comment is a good illustration of why I value proofreaders so much :)
posted by nangua at 8:36 AM on July 21

I've published four books with Palgrave. Three of them were nominated for awards, though none of them won. One nomination was definitely pro forma. I've received about $3000 in total for the four books, but they got me tenure and promotion to full professor. I very much doubt I'm going to receive anything for this AI training. Just in case you didn't already know, the academic publishing industry is completely corrupt and broken, and was long before this AI thing started.
posted by outgrown_hobnail at 8:39 AM on July 21 [8 favorites]

I routinely tell my trainees to fire up ChatGPT or Claude Sonnet if they’re having trouble writing a first draft. Regardless of how they produce it, I will almost certainly edit the draft to the point that perhaps 5% of the original words remain, and that’s if it’s a good draft. So why should they stress over doing it themselves if a robot can break the inertia? I want them to concentrate on discovery and rigor, not selling stuff to editors and reviewers (or at least not at first; they will all eventually get fellowships, grants, or both). Once we get closer to publication or proposal submission, of course, I expect everyone to use their brain. But breaking the inertia is important, and finished is better than perfect in this line of work. My published works are just about to hit 100K citations, with fairly routine requests to review career-defining grant and fellowship proposals, so I don’t think it’s fair to characterize me as some bomb-tossing grad student anymore. (Bomb-tossing mid-career academic, perhaps, but I had a real job, once.) Nobody ever made any money in our line of work, but you can kill a lot of kids in failed clinical trials if you’re sloppy, so I don’t think money is the answer either.

Academics are mostly rubes — selling their souls for shitty promotions and flimsy tenure. Academic research is tough enough without pointless hazing. I also get paid to deal with this bullshit. Until my trainees also get paid to do so, I’d rather not burn them out. However…

Academic publishers, particularly for-profit outfits, are a notorious rogue’s gallery. It’s not the editors or the news people (who are separate!) with whom I take issue. They’re usually great (ex academics or journalists, respectively) and they turn over frequently (pour one out for that awesome editor who ran interference with the jackass reviewer at Nature, the vindictive author at Science, or the bullshit artist trying to get into NEJM). It’s the goddamned MBAs in the back rooms that cut these horrid deals and are never held accountable. I hold multiple patents and come from a much more brutal line of work, such that most of my colleagues are horrified at the bare knuckle tactics my group uses in negotiations. Yet we have very few enemies and a great many serial collaborators, all the way up to the head of the NIH. Oddly enough, the “society journals” (Blood, Exp Hem, AACR imprints, Development, Bioinformatics, even NEJM) do seem to walk back their mistakes, while Skipper at Nature or the clowns at T&F double down on them.

Bottom line, if you sign away the copyright to your work for a chance at scholarly advancement, and if you believe cynical peers regarding “scooped” preprints (how does that even work, with a clear priority date and patents being first to file?!), this is what you get. Please don’t be a chump!

If you are a taxpayer, please recognize that this system benefits neither you nor the researchers nor the consumers of research; it only subsidizes some very rich companies who add very little value. Support zero-embargo release of publicly funded research and help protect academic rubes from themselves (some of whom still believe, after decades of experience to the contrary, that their paper reviews ought to be paid! 🤣🤣🤣). The principal stakeholder in publicly funded research is the public. Nature puts less effort into typesetting than arXiv or bioRxiv these days (you can verify this by looking at “early article preview” PDF dumps) and they’ve never put the level of editorial or typesetting effort in that NEJM does (if you’ve published in both, you know). So why support this nonsense? T&F imprints are trash compared to either of the preceding, and PubMed does a pretty good job of weeding out true garbage, so why not train on that and the *rXivs? Fuck for-profit academic publishers. The only profit to be had is by rooking the public coming & going.
posted by apathy at 8:45 AM on July 21 [7 favorites]

I did. I actually queried and changed things. Nothing in the contracts I've signed is there anything about the use of digital models, LLMs, AI, training platforms, or selling the data to third parties other than in the context of acquisitions, mergers or translated publishing deals.

Great, it sounds like you have a good civil case.
posted by Tell Me No Lies at 9:07 AM on July 21 [2 favorites]

> Great, it sounds like you have a good civil case

Because Microsoft and T&F are so strapped for lawyers, right? Large corporations ignore the law because they can. I wish the plaintiffs the best of luck; they will certainly need it.
posted by apathy at 9:11 AM on July 21 [4 favorites]

What would really be scandalous is if Taylor & Francis gave access to papers which had been accepted for publication and were in their final form, but had not yet appeared online or in print.

I’d bet they did.
posted by jamjam at 9:23 AM on July 21 [2 favorites]

As if academic writing wasn’t of bad enough quality already. I wish that more academics would… not outsource their writing, but bring in to the process actual writers, with writing skill, who could work with researchers to craft well-written papers that are accurate and precise and also clear and lucid and easy to understand. But the current structure of academia doesn’t give people the time or resources for that. AI tools are no substitute when part of the problem is that many researchers have insufficient training in the craft of writing - sure, an editor can start from something AI generated to skip the initial writing step and then mold it into good writing, but that is a particular skill that is not as common as one might hope within academia.
posted by eviemath at 9:46 AM on July 21 [5 favorites]

I was actually thinking that the savvy move for green open access repositories and diamond open access journals is to 403 the hell out of known AI bots in their .htaccess (or equivalent) and publicly pledge not to sell to commercial AI.

Trivial effort, substantial return in attention if nothing else. If I were still running an institutional repository, I'd frankly have done the 403ing already.
posted by humbug at 9:55 AM on July 21 [2 favorites]

It certainly doesn't help that at least in the humanities, a lot of eminent scholars seem to think that writing in the most opaque and baroque prose they possibly can is a sign of their magnificence. And that a lot of early-career scholars seem to think that creating opacity and baroqueness in their own writing will lead them to magnificence, or at least for being mistaken for magnificent. And that waaayyy too many people eat this shit up instead of demanding or striving for clarity.
posted by outgrown_hobnail at 10:10 AM on July 21 [4 favorites]

Tbh it may be a positive that somebody- anybody- is being paid for training data. This catalogue is a drop in the ocean relative to the entirety of arxiv which has been scraped without any compensation to anyone. Maybe in the long run some of this money will trickle down to authors (but highly unlikely in the near term).
posted by simra at 10:17 AM on July 21 [1 favorite]

Any academic that would be "shocked" about this an idiot. All data are going to be ingested for AI training. All. Some will be made part of AIs accessible at modest fees, some part of AIs accessible at extremely high fees, most will be made part of private AIs, available only to their proprietors at any price, because they rely upon proprietary corporate or government data.
posted by MattD at 12:12 PM on July 21 [1 favorite]

Your writing is only as clear as your thinking, and your thinking is only as clear as your writing. You can start out not having any clear thoughts about something, and start writing, and start having better thoughts about it, which then means that you start writing something good, and now you are having complex, nuanced thoughts that you could never have gotten to if you were just staring at a wall or something. It's a virtuous cycle, and I can hardly conceive of any way to educate people that doesn't utilize it.

One of my biggest fear about MLLs is that turning our writing over to machines is going to leave our brains a pile of mush. We will not develop writing skills, and we won't develop the thinking skills that go along with good writing.

We will not be able to say what we mean, and we won't mean what we say (or what the machine says for us). We won't know what anybody else means either.

Reading good writing makes us better writers, and writing well makes us better readers, but since everybody else will be using MLLs, we will be surrounded by nigh-meaningless gibberish. Machines will write and we will ask machines to look at other machines' writing and give us a summary, and that's all we'll read. How can that be considered a scholarly conversation? How are we going to get an idea out of one person's head so that it can light another person's brain on fire?

(Of course, I fully expect that in 20 years this post will read like the lamentations of Socrates that the kids learning how to read and write will bring on the intellectual death of society because people won't have to memorize everything. So, who knows, maybe AI will lead to a whole new way of communicating and using the human brain that will give rise to technologies and arts that we cannot yet imagine.)
posted by BrashTech at 1:24 PM on July 21 [13 favorites]

Of course, I fully expect that in 20 years this post will read like the lamentations of Socrates that the kids learning how to read and write will bring on the intellectual death of society because people won't have to memorize everything. So, who knows, maybe AI will lead to a whole new way of communicating and using the human brain that will give rise to technologies and arts that we cannot yet imagine.)

I don’t know when Socrates made his admonitions against literacy, but even before his execution, one of his pupils led the regime of the Thity Tyrants:

Led by Critias, the Thirty Tyrants presided over a reign of terror in which they executed, murdered, and exiled hundreds of Athenians, seizing their possessions afterward. Both Isocrates and Aristotle (the latter in the Athenian Constitution) have reported that the Thirty executed 1,500 people without trial.[12][7] Critias, a former pupil of Socrates, has been described as "the first Robespierre"[13] because of his cruelty and inhumanity; he evidently aimed to end democracy, regardless of the human cost.[14] The Thirty removed criminals as well as many ordinary citizens whom they considered "unfriendly" to the new regime for expressing support for democracy. One of their targets was one of their own, Theramenes, whom Xenophon depicts as revolted by Critias' excessive violence and injustice and trying to oppose him. Critias accused Theramenes of conspiracy and treason and then forced him to drink hemlock.[15] Many wealthy citizens were executed simply so the oligarchs could confiscate their assets, which were then distributed among the Thirty and their supporters.[16] They also hired 300 "lash-bearers,” or whip-bearing men to intimidate Athenian citizens.[7]

Socrates' diagnosis of the source of the problem might have been wrong but his reservations about the character of the coming generation were anything but misplaced.
posted by jamjam at 2:46 PM on July 21 [4 favorites]

As someone who has been unfortunate enough to have to read lab reports submitted by students written by LLMs, they suck at scientific writing. They can't use evidence to make an argument. They can't make useful references to the literature because, statistically, nonsense references to non-existent but reasonable sounding literature works just as well so why bother with real refrences. And they certainly can't evaluate data and present a novel interpretation. Because they can't do novel. Because that this not what they were created to do. Maybe if someone wrote a much more thorough prompt summarizing all relevant literature with real references and all the data meant to presented and what the novel interpretation of those data should be then they could, but at that point you might as just write your own fucking journal article.
posted by hydropsyche at 3:55 PM on July 21 [7 favorites]

are the 30 tyrants going to do something about self-appointed tech-bro "visionaries" and/or terminally gullible institutions that enable them? 'cause i'm temperamentally anti-tyrant but if "yes" i will happily make an exception in this instance; 30 tyrants are gonna have to get up pretty early in the morning to be a bigger pain in the arse than sam altman's messiah complex.
posted by busted_crayons at 4:25 PM on July 21

ARGH. My roguelike book is out through Taylor & Francis!
posted by JHarris at 4:32 PM on July 21 [3 favorites]

Like Ted Chiang said, fear of AI is about fear of capitalism.

Josephtate where is that quote from? I'm always interested in what Ted Chiang has to say.
posted by BrStekker at 8:23 PM on July 21

I expect that many AI companies are offering a bunch of "pray I don't alter it any further" deals to publishers/rightsholders/content providers in order to have some public respectability and legal coverage, in the event that courts decide against the organizations that have done the most public content grabs.

As someone who has been unfortunate enough to have to read lab reports submitted by students written by LLMs, they suck at scientific writing.

Right now. They suck at scientific writing right now. In time, I expect they'll attain "good enough" for some limited applications within all fields. Dunno if lab reports are actually standardized enough for that to be the case here.
posted by cupcakeninja at 4:32 AM on July 22 [1 favorite]

"good enough" for some limited applications within all fields

jesus christ. we barely have "good enough" in most endeavours despite heroic efforts and enormous collective expertise. we also have to work around a totally needless obstacle wrought by some basically mediocre weirdos imposing their masturbatory nonsense vision against any kind of informed collective consent?
posted by busted_crayons at 5:08 AM on July 22 [3 favorites]

Yes, many of us do. That's not in question, not even vaguely theoretical, and it wasn't going to be a question or theoretical from the moment one could click a button and generate text and images based on prompts. I'm not trying to irritate or upset anyone here by posting this stuff -- it's one of the reasons I tag it "AI," so people can use the "MyMeFi" option to screen it out. Hell, I don't want to hear about this shit all the time, but it's unquestionably now a part of my professional life, and many other folks', and it's not going away anytime soon.
posted by cupcakeninja at 5:39 AM on July 22 [1 favorite]

why would i screen it out? i also have to deal with it professionally, including working with other academics who are seemingly angling for a "senpai noticed me" moment from the aforementioned weirdos, to the presumably enormous detriment of actual science and what should be its actual goals. why would i want to screen it out? we should be talking about how to sabotage it.
posted by busted_crayons at 6:25 AM on July 22 [2 favorites]

Right now. They suck at scientific writing right now. In time, I expect they'll attain "good enough" for some limited applications within all fields. Dunno if lab reports are actually standardized enough for that to be the case here.

Lab reports are in the form of scientific articles, which is a standard format. The problem is not the format. LLMs are more than capable of making things that look like a Introduction, Methods, Results, and Discussion sections. They can generate gibberish that statistically belongs in those sections and gibberish that looks like references to support the gibberish above. They are incapable of creating the original content required, and they will remain so because LLMs are explicitly not for creating original content. That's not what they do. They do not develop novel syntheses of ideas or novel interpretations of data--they repeat various wordy gibberish that is already statistically common in their training data. And that's all they'll ever be able to do because that's what LLMs are for.

If you want to develop novel syntheses of ideas or novel interpretations of data: Great news! We have an entire planet full of scientists who are trained to do those things! We don't need LLMs to do it for us because that is our job. That's what science is. People who don't like doing those things or aren't good at those things make amazing technicians and educators and play all kinds of other important roles in science. But journal articles will continue to be written by scientists, not LLMs.
posted by hydropsyche at 10:09 AM on July 22 [9 favorites]

« Older Cheers as snuggling marsupial sighting sparks hope | Bare chested men are ugly! Rolling Stone told me... Newer »

This thread has been archived and is closed to new comments

MetaFilter

Taylor & Francis sells access to research material to Microsoft AI
July 21, 2024 6:31 AM Subscribe

Tags

Share

Taylor & Francis sells access to research material to Microsoft AI July 21, 2024 6:31 AM Subscribe

Tags

Share

Taylor & Francis sells access to research material to Microsoft AI
July 21, 2024 6:31 AM Subscribe