The DALL·E 2 of MUSIC?
September 23, 2022 1:28 AM   Subscribe

Systems such as OpenAI's DALL·E 2 have shown impressive progress recently in generating images from text-based descriptions. Composer David Bruce looks at how these trends are starting to impact the world of music composition.

Systems mentioned in the video include DALL·E 2, Musenet, Open Jukebox, Dreamstudio, Midjourney and Imagen

Previously:
Any sufficiently advanced technology is indistinguishable from magic (about DALL·E 2)
Rage within the Machine - about Jukebox.
posted by rongorongo (17 comments total) 10 users marked this as a favorite
 
I watched a talk about AI music generation in 2018. It was at a convention for production music composers, and the thrust was: watch out, because this AI music is going to eat your lunch (for certain values of "you"). (And, to a lesser extent: this is a copyright nightmare.)

The music was godawful -- owing in part to the terrible samples it used, but also because it was just ... bad music. Bad enough that I'm not even sure it would do the job in the background of really low-rent corporate videos. But of course, that was 2018. He gets to this around the 15 minute mark, and the quality of that music is about what we heard in 2018. It's terrible and, frankly, unusable unless you have no standards.

I do think he's right that certain types of AI music are coming, and probably on the horizon, but the truth is that, like with AI art, there are certain things that are so challenging as to maybe be unconquerable, at least in my lifetime. Synthesized or sampled guitars, for example, sound terrible, and even at the highest level aren't really workable outside of background accents. Brass is somewhat better, but it takes such art to produce convincing-sounding music with these types of sampled instruments that outside of the most background of uses (which, to be clear, do make up a very large portion of the music used in media) I don't think an AI could hack it.

And his contention about different jobs in the industry emerging is probably true too: look at the diversity of jobs in film that would have been tech jobs in the past. We have tech jobs in music, too, but it's very likely that more are coming.

For now though? I'm not terribly worried about AI music. (Ask me again in 5 years.)
posted by uncleozzy at 6:15 AM on September 23, 2022 [3 favorites]


Even if it gets to a point where it seems indistinguishable from human-authored music, listeners will get better at distinguishing between the two, especially if the artificial stuff is cheaper and tends to get used where budget is an issue. I think this will be true of images as well.
posted by condour75 at 7:15 AM on September 23, 2022 [1 favorite]


Budget is always an issue and recent history shows that people can be trained to accept not especially good products if they are easily available and cheap.
All art will tend to the condition of software - shitty and barely fit for purpose.
posted by thatwhichfalls at 7:25 AM on September 23, 2022 [6 favorites]


I found that to be a well researched and argued video.

Prompt generated high fidelity AI music is probably closer than we think, given, as he suggests, the technology underpinning text-to-image and text-to-video can be transferred to audio.

Just yesterday there was a big announcement from OpenAI-- Whisper: an open source speech recognition model that can do multi-language translation (with a single model)
posted by gwint at 7:27 AM on September 23, 2022


>All art will tend to the condition of software - shitty and barely fit for purpose.

Yeah, anything that can be mass-produced, if possible without paying anybody, always somehow turns out to be just good enough. This is clearly bearing down, tidal-wave-fashion, on the worlds of art and music, and I've been deciding whether or not to be pointlessly cranky about it for the rest of my life, and I think I'm gonna go for it.
posted by Sing Or Swim at 7:34 AM on September 23, 2022 [5 favorites]


Prompt generated high fidelity AI music is probably closer than we think, given, as he suggests, the technology underpinning text-to-image and text-to-video can be transferred to audio.

I'm sure someone has the answer to this, but what's the copyright situation on AI-generated art? The models are trained on data sets -- do those data sets contain copyrighted works? If a generated image contains a recognizable portion of a copyrighted work, is it, itself, a transformative work?

With audio -- if the AI is generating not sequenced music that relies on playing back samples or synthesized instruments but rather whole audio -- it's all-but-certain to contain pieces of copyrighted audio, which again, would make it a transformative work.
posted by uncleozzy at 7:47 AM on September 23, 2022


This is a really good video. I think unlike how quickly AI image generation exploded we're gonna see AI composition and mixing/mastering assistance tools marketed towards artists get really good first.
posted by frenetic at 8:15 AM on September 23, 2022


mixing/mastering assistance tools

iZotope's mastering assistant was so-so in the previous iteration, but the latest version tries a little harder to be a one-stop tool, and I do find the results to be quite a bit better. It's got two new AI modules that aim to manage microdynamics and dynamic EQ, and although they still need a little help, they really do add a little extra pop and shine without a lot of fiddling.

Automated mixing is a different beast -- iZotope has had rudimentary tools for this for a few years, but they're mostly pretty useless. Mixing is a much more difficult beast, I think, for a lot of reasons. That's not to say the tools won't improve (they will!) but understanding the purpose of each track is important in a way that current tools can't (and don't try to) capture.
posted by uncleozzy at 8:39 AM on September 23, 2022


And soon, human creativity faded into a melange of generic, looks the same, sounds the same output. It was once called “art”, but now it’s just output. Creators are gone, as now everyone is merely a consumer. If you like musical piece #1567SDT then surely you will now want to hear musical piece #6328AFU.
posted by njohnson23 at 8:47 AM on September 23, 2022 [3 favorites]


In visual arts, the AI I'm more familiar with, Midjourney, is currently really good at aping some artist's styles and adding realistic texturing and lighting. It's terrible at anatomy in general, at eyes in particular, at detailed eyes especially, and don't get me started on hands. Also, it's fairly sexist and racist and has models for like two women, total.
I'm guessing something similar will hold for music.
posted by signal at 9:19 AM on September 23, 2022 [3 favorites]


uncleozzy: "The models are trained on data sets -- do those data sets contain copyrighted works?"

They certainly do, in fact many of the AI-artist things are marketed referencing the names of the artists they're trained on.
posted by signal at 9:21 AM on September 23, 2022


I'm sure someone has the answer to this, but what's the copyright situation on AI-generated art?

It's unclear at this point. The famous Monkey selfie case established that animals cannot hold copyright. So, at some level, U.S. law holds that a human must be involved with artistic creation in order to obtain copyright.

There have been rulings in Australia that automated photo-taking systems (such as a computer program or "AI" that triggers a shutter) have no copyright at all; i.e. the resulting images cannot be copyrighted by anyone. I'm not sure if there's been a similar ruling in the U.S., but because of the WIPO Copyright Treaty we can use the Australian ruling as guidance for what it would probably look like.

Now, the true question (which hasn't been answered anywhere yet as far as I know) is if the courts will rule that a Midjourney prompt is more like pressing a button on a camera (copyrightable image) or a non-human process (monkey or AI-controlled camera shutter, and thus no copyright). I personally think it's much more like the former (and so do the OpenAI lawyers) but nobody knows quite yet.

I would prefer, however, if the Australian ruling were upheld and extended. Not everything needs to be locked behind a copyright, and a tool that lets anyone create art, but it all goes into the public domain sounds wonderful to me.
posted by riotnrrd at 9:26 AM on September 23, 2022


They certainly do [contain copyrighted material]

No, they absolutely do not contain unlicensed copyrighted material, unless a a grave error was made during dataset construction. There are well-established legal obligations for using data to train neural networks. For example, ImageNet is a ubiquitous research dataset of millions of images scraped from Flickr, and the web broadly. It's huge, useful, and kind of a benchmark, but licensing is a tangled nightmare. So, it's still used a lot for research, but you cannot release a commercial product that used ImageNet at any point in its training. My company is very careful about this, and OpenAI, Google, etc. are certain to be so as well (and they have way more money). I trust that they covered their asses.
posted by riotnrrd at 9:31 AM on September 23, 2022 [2 favorites]


Artist finds private medical record photos in popular AI training data set

As always they will start within what they think the legal enforcement mechanisms will tolerate. They have more money of course.
posted by thatwhichfalls at 9:35 AM on September 23, 2022 [1 favorite]


Metafilter's own waxpancake's research on whether they contain copyrighted material: Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator.
posted by aneel at 5:37 PM on September 23, 2022 [1 favorite]


riotnrrd: "No, they absolutely do not contain unlicensed copyrighted material,"

I didn't specify 'unlicensed'. The fact is, however, that if I prompt, say, Midjourney with "in the style of Moebius", it gives me an image in the style of copyrighted images made by Moebius. So unless it's just guessing from first principles what "style of Moebius" means, there are some copyrighted images in its dataset.
posted by signal at 6:52 PM on September 23, 2022 [1 favorite]




« Older Go ahead and stare at my prosthetic arm. I know...   |   Were you a ‘parentified child’? Newer »


This thread has been archived and is closed to new comments