What does the hand we're biting do, again?
August 23, 2024 6:25 PM   Subscribe

Hank Green (who has a long-standing (nearly seminal) relationship with both Google and Youtube [even before youtube was eaten by google]) has some things to say about data-mining and AI practices, especially as they pertain to those outfits.
posted by es_de_bah (26 comments total) 17 users marked this as a favorite
 
Great post title but surely "Useragent: Everyone. Disallow: Everything" would have been even better. I'm glad to see Hank spotlighting this and very curious if Google will actually respond. Hank has been around for a long time and has a lot of relationships in the industry.
posted by Wretch729 at 8:34 PM on August 23 [2 favorites]


I actually agree with most of what he says, even though I disagree with the assertion that using copyrighted data to train AI can possibly be a violation of copyright.

His stance is “I don’t know if it’s a violation of copyright, the courts are deciding that, but I do know that it’s wrong. And at mininum I should be allowed to explicitly opt out of being in training data. Even if myself and all other content creators are initially defaulted to opt-in.”

My stance is “It is not violation of copyright, I don’t care what the courts say, but I do know that it’s wrong. And at mininum he should be allowed to explicitly opt out of being in training data. Even if he and all other content creators are initially defaulted to opt-in.

…and since the technology simply isn’t possible without an explicit opt-out approach like he describes - and I believe it could eventually lead to amazing things way down the line like post-scarcity - we should do what he suggests.”

And the reason that I say this about copyright is because unless a researcher royally fucks up or bypasses a significant amount of their due diligence adversarial training (be it due to time pressure imposed by Sam Altman or just laziness is irrelevant), a copy of your copyrighted work is not stored within the model.

The reason most major models can be coaxed into reproducing copyrighted works in part or in whole is because most major companies in this space are skipping like 95% of what would constitute good faith effort at adversarial training to prevent it. And when that happens, and copyrighted works can be conjured from the model with relative ease, it ought to be legally actionable as violation of copyright. But only then.

That said, the mere act of pre-training an AI on copyrighted data should not, in and of itself, be legally compared to copyright violation. If I pre-train a model with 20,000 images and descriptions of the phrase “fish-scale shaped,” and from all that data the model accrues a mathematical representation of the Platonic ideal of a fish scale’s shape without ever directly cloning any one image or description, then ultimately the stored model and its user facing aspects/output are not actually copying anything. It is not a violation of copyright, full stop.

I really could not give two shits what some judge who does not understand any of this says about it; our legal system nearly always bases its decisions regarding new technology on a palpably and laughably misinformed or incomplete understanding of the system in question, and it is virtually impossible for something as complex as transformer architecture in artificial neural networks to be one of the exceptions. Until proven otherwise I assume their ruling has as much actual merit as a random cow’s opinion: it’s a moo point.

But, the people who created those fish-scale images and descriptions did not consent to have their work used in that fashion. And things can be unethical, and morally a form of theft, without being a violation of copyright. It is wrong to do so.

For better or for worse, it has already happened. The technology exists, and if you outlaw it in the US and Europe you’re just shifting the research to China and handing them a massive competitive advantage because if there’s one thing the Chinese government has made clear regarding software it is they don’t really consider intellectual property to be actual property. If they want it, they’re gonna take it, and just ask Google whether they have, historically, done exactly that on a consistent basis. So whether it’s in the US or in China, genie’s out of the bottle and horse has fled the barn.

Furthermore, I believe this technology will, once properly hybridized with reinforcement models (“proper” here excludes OpenAI’s pending releases over the next four years), lead to amazing things in the next half century, or century. Full-on post scarcity things.

Hank’s solution is bang-on: we can’t change the past, but we can quit doing anything like this shit in the future. We’ll always want more human-authored data, but synthetic data generation has grown by leaps and bounds this past year and is accessible to everyone - giant companies and us little people alike - provided they drop the $30 or 100K on the hardware necessary to run inference on Nemotron or Llama 3.1 (so very few individuals but plenty of small teams and universities). Researchers can get by with SDG and continue advancing at a breakneck clip while not further screwing over artists, writers, or even Youtube “creators” like Hank. Just not quite as breakneck as it has been to date.

And as someone who wants to see all this grow into everything that it can become… I’m not gonna live long enough to meet Commander Data or whatever, no matter what policy we put in place or how we do or don’t restrict ourselves. So we should start doing the right thing or less-wrong thing until we get there.

I hope the New York Times loses their lawsuit, and Hank’s proposal ultimately wins out. And I realize a lot of Metafilter is not going to agree with that.
posted by Ryvar at 8:47 PM on August 23 [6 favorites]


And when that happens, and copyrighted works can be conjured from the model with relative ease, it ought to be legally actionable as violation of copyright. But only then.

I agree with a lot of what you said except the "with relative ease" part. Every individual instance of a generative model producing an output that infringes copyright is an instance of copyright infringement by the creators of the model, period. Whether it is easy or hard to get the model to do so should be irrelevant from a legal perspective, only the number of infrigements matters. I see no reason that these systems should be held to a lower standard than a human would be.
posted by biogeo at 9:03 PM on August 23 [15 favorites]


I think it was 2018 when the Hello Internet podcast transitioned into being mostly CGPGrey and Periodic Videos discussing their dissatisfaction with youTube as a platform.

I'm sure all of Google/ Apple /Amazon/ Microsoft/ Facebook are too big to fail at this point, but it's interesting that Google has the most consistently disdainful attitude to its suppliers (suppliers in its context being both content creators and eyeballs that see advertisement).
posted by midmarch snowman at 9:05 PM on August 23 [1 favorite]


I feel extremely not confident about any of this. Realistically, the rulings about how copyright laws will apply here will depend on the political landscape. If enough influential or wealthy people are scared by it, then this use of YouTube (et al.) data for training generative AI models will be deemed illegal. However, if the industrial proponents of this technology wield enough political influence, there are ways to justify it. The technology is so new that it’s an open question how to describe what it’s actually doing to the information it’s trained on, legally speaking.

A hundred years from now, it will seem like whatever rulings they make were inevitable. But right now, I really don’t know.
posted by voltairemodern at 9:10 PM on August 23 [2 favorites]


And as someone who wants to see all this grow into everything that it can become… I’m not gonna live long enough to meet Commander Data or whatever, no matter what policy we put in place or how we do or don’t restrict ourselves. So we should start doing the right thing or less-wrong thing until we get there.

Terminator and Star Trek and all of that are fiction and will always be fiction. Stealing content to make stockholders money is wrong and if the court systems don't fix it, no one is going to create content and be ripped off.
posted by 922257033c4a0f3cecdbd819a46d626999d1af4a at 9:14 PM on August 23 [17 favorites]


That said, the mere act of pre-training an AI on copyrighted data should not, in and of itself, be legally compared to copyright violation

The act of pre-training, maybe not, but I’d be shocked if companies weren’t routinely violating copyright in the process of assembling material for training, hoping that the process of training and the less clear-cut legalities on the other side will launder it. That seems like one of the obvious “checkpoints” where copyright law would apply as-is.

Every individual instance of a generative model producing an output that infringes copyright is an instance of copyright infringement by the creators of the model, period.

It’s not illegal for a person to “produce an output” that’s potentially infringing, it’s illegal to publish, distribute, perform it and so on. If you mean specifically models the output of which is made available as a commercial service there might be an argument that one of those things is happening in the process of delivering that output. It seems ambiguous to me whether the burden would fall on the company selling the service or the end user but I’m not a lawyer.
posted by atoxyl at 9:45 PM on August 23 [3 favorites]


if the court systems don't fix it, no one is going to create content and be ripped off

obviously you can argue that it would be bad for art but I’d bet a lot of money that this isn’t a literally true statement
posted by atoxyl at 9:48 PM on August 23 [3 favorites]


If an unauthorized copy is used to create the model, that's a copyright violation. The question is what types of uses are authorized by putting stuff on the open web, and is training LLMs one of those uses? My feeling is fuck no, and may generative AI researchers be forced to live in the world they are trying to create.
posted by surlyben at 11:04 PM on August 23 [3 favorites]


I'm not 100% sold on the case that AI is not copyright infringement.

But even barring that, I still can't get past the idea that AI machines don't have a product without the input of other peoples' product. If there is no art for training, there is no AI images/music/text.

Is there any legal structure that takes that into consideration?
posted by ishmael at 11:09 PM on August 23 [1 favorite]


And the reason that I say this about copyright is because unless a researcher royally fucks up or bypasses a significant amount of their due diligence adversarial training (be it due to time pressure imposed by Sam Altman or just laziness is irrelevant), a copy of your copyrighted work is not stored within the model.

the NYT example is like about 10 seconds in the video which is used to illustrate what the actual copyright issue is that is certainly illegal, and debating whether or not it is okay for an application to spit out copyrighted work without an explicit license is missing the point entirely because that’s always been illegal too

I’d be shocked if companies weren’t routinely violating copyright in the process of assembling material for training, hoping that the process of training and the less clear-cut legalities on the other side will launder it.

this is the big issue, in case anyone is curious about what this video, or most posts over the last few years have been about.

everyone’s labor and work have provably been stolen and used by OpenAI to build products that are now being sold to make money. in order to make the “product” they needed to steal and infringe at a scale that would be incomprehensible, and the hope from the beginning was that the scale would make it impossible to assess damages and that they would be able to get away with it

in response the NYT did something pretty standard - requesting that OpenAI pay damages and not be allowed to distribute the software because it was created using the illegally obtained works from NYT, the proof that this occurred was for a time freely offered up by the software (chad snitching on OpenAI was the only “bug” here, the copyright infringement is the feature)

now every other company (Google, and Microsoft to keep everything in the context of the posted video) is sitting around thinking, gee well I wish we could get away with that too so we can make some bucks on this shit! and maybe we can! if we just lean into nonspecific language in our TOS and hope that this new use case for this data falls under our existing license agreements for derivative works or processing or whatever… and maybe we won’t get sued into the ground like OpenAI when some judicial system intervenes and decides copyright infringement at scale should result in commensurate damages at scale

the video is like actually really great at specifically highlighting all these things - the insane illegal thing that occurred, proof that this is what happened, what is technically gray area and proof that certain companies are choosing to lean into that gray area as well as examples of their incentives for doing so, and how youtube in this case is caught in the crossfire and not able to autonomously decide to do right by their creators because of their arrangements with their parent company

the video covers a lot to get to the point that we need privacy checkboxes for this “AI” thing because this whole fucking “AI” thing is a shit show and people get like real distracted by how cool it is that an app can make really high resolution images of cornbread pictures or whatever and they want to talk about how they should be able to make cornbread pictures and like if it makes cornbread videos that were obviously copyrighted then that’s maybe a problem but it’s really complicated to think about how the cornbread / banana bread slashfic generator app got so darn good at making steamy baking scenes in the first place!! so let’s like just talk about anything else, it really shouldn’t matter that companies violated all these laws that usually result in huge fines because i got my cool cornbread generator

which like i get that too, i have a new free streaming service on my phone called jellyfin and i have no idea where they get all the latest cool shows but they don’t have ads and i’m saving a ton of money now that i don’t have to use all the paid services anymore with ads! maybe amazon should try to figure out how to use data from jellyfin to train AI to remove the ads from their shows?

anyway i would really recommend watching the video again if any of this comment doesn’t make sense because it’s actually just me not using ai to summarize the video
posted by grizzly at 12:38 AM on August 24 [3 favorites]


a copy of your copyrighted work is not stored within the model.

This is why there should be an individual cause of action to not only opt your creations out of all AI training, but out of all derivative models into which it has been processed and incorporated as well. The original work is just chum once they train on it, the meat, the infringement of you will, is in the reproduction of fragments and derivatives in the software's operation (by definition!), akin to sampling.

when...copyrighted works can be conjured from the model with relative ease, it ought to be legally actionable as violation of copyright.

I'd venture that it's a failure of implementation if it can't reproduce its training data. Like, if it can't do that, can it accomplish any specific goal? Maybe there's an "AI of Theseus" where it's producing your painting, except tweaked just enough to fall outside of the current jurisprudence.
posted by rhizome at 1:12 AM on August 24


It’s not illegal for a person to “produce an output” that’s potentially infringing, it’s illegal to publish, distribute, perform it and so on. If you mean specifically models the output of which is made available as a commercial service there might be an argument that one of those things is happening in the process of delivering that output.

Yes, I thought it was clear we're talking about services which take a user's input, generate a work, and then distribute that work back to the user. Or in which the pre-trained model is distributed to the user and produces the work on the user's computer, I don't see a material distinction between those cases. "Publication" is the act of providing the output to the user. If a person trains up their own model and runs it for their own personal use only, then no, that's not copyright violation, but I didn't think that's what we're talking about.
posted by biogeo at 1:16 AM on August 24 [1 favorite]


everyone’s labor and work have provably been stolen and used by OpenAI to build products that are now being sold to make money.

No, I checked, and a lot of my work is still in my possession, nothing stolen.
posted by Dysk at 1:24 AM on August 24 [2 favorites]


It’s not a very good argument to say “we have to do this or china will do it, and they’re going to steal all your intellectual property” (I’m paraphrasing, my apologies). The problem is that a lot of artists feel their stuff has already been stolen, that the companies urgently in charge of AI research don’t give a shit, and that it’s not going to change.

The other problem is that everything produced by AI is fucking awful and useless. It does nothing for the economy because it’s a hype bubble made by the incumbent FAANG types to booster their position at a time of economic stress. Sure, sure, there’s going to be some kind of AI abstraction layer that will, idk, find quicker routes for traffic, or identify medical problems sooner, but this is the same as an algorithm. It’s not the world changing technology we were promised, because it was a hype bubble.
posted by The River Ivel at 1:44 AM on August 24 [3 favorites]


Free, Prior, and Informed Consent is a principle of law worth fighting for

Look upon the numbers and numbers of journalists who are now working service industry jobs and despair. Everything is getting stupider, and the current move is to fire / force out all the teachers next?

It was chilling to listen to how educational information was being targeted in the video. There is already a strong political movement to get rid of teachers, and stealing educational content just seems to line up perfectly with that agenda.

LLMs/ IT will never replace the work of journalists, but they have convinced the bosses enough that there are no more journalists, that their labor is not worth it.

This news that they are coming for the teaching profession next is gutting to hear.

And I swear that voice to text has only gotten worse since this BS dropped. I know I lost funding and support for my LLM project, which was to develop environmental datasets and track the impact of climate. So, yay.

Glad to hear the strong dissent from such a communicator.
posted by eustatic at 1:48 AM on August 24 [5 favorites]


This is a really good video, thanks. My previous (not deeply informed) position was that the scraping that AI companies are doing is clearly illegal, so I appreciate the example of subtitle/translation models (which are good) and details on terms of service agreements. What AI companies are doing is still at least partially illegal, as well as morally questionable, but there is more nuance to it.
posted by Alex404 at 3:15 AM on August 24 [1 favorite]


Ryvar's stance looks plain wrong to me. Copyright is a social construction; you can't ignore what the courts say -- but I would agree with an opt-out being needed.

Say I have made some material and registered it with the Copyright office and it's taken for use training machine learning models. Any transformation, any copying without permission breaches my monopoly control of my creative output. If it's never seen in the output of the ML tool, or is one of millions of input work, that doesn't change that neither a licence nor permission were granted.

It doesn't need to appear in the output -- in whole or part -- for the model to be derived from the training data (and derived from my work in that training data set). Outputting a copy is a tidy convenience to show copying has occurred, but we're claiming the training data is transformed into a derived work. If my work in the training data set doesn't influence the output, you can exclude it from the training data without consequence to the efficacy or customer value of the ML model.

Building a training data set: USA says that a database of labelled images is fact and not protectable by copyright law, where worldwide "database right" applies to the effort needed to accumulate and organise an amount of factual data. Typically, moving data around the internet or any computer system involves copying the bits from buffer to storage, and we accommodate this with an implicit exclusion from copyright infringement.

Now, say, this trained ML tool uses a suite of my registered copyright work and will create something "in the style of k3ninho" -- with an effective block against cloning substantial parts of my work ... this dilutes the uniqueness of my existing compositions and harms my reputation. (You might claim it's parody, but you've still transformed this suite of my work to train the machine learning model and you still expect users to find value in synthesisizing something "in the style of k3ninho" so it would be fair to expect that value to become royalties to make this right.)

Now, again, let's say you've not had any of my work in the training set but have a database of my work you use in Retrieval-Augmented Generation, which says "when you respond to prompts, use these works to ensure fidelity to the information they contain" and so you've only catalogued my work in your factual database. In running the ML tool, you've again directed a computer program to transform my work for your business gain, infringing on my copyrights.

In USA courts, damages are related to substantiality, or how much of your work was an input and how much of your work is vital to the output, so that's why these principles of copyright aren't being adjudicated into standing decisions of law.

Until this becomes fact-in-law, OpenAI is doing stuff like persuading Financial Times and Condé Nast to add their output to the training data instead of fighting for a royalty licence. I'll repeat myself in service of an opt-out: If my work in the training data set doesn't influence the output, you can exclude it from the training data without consequence to the efficacy or customer value of the ML model.
posted by k3ninho at 5:26 AM on August 24 [2 favorites]


Copyright is a social construction; you can't ignore what the courts say

Yeah to be clear on this: I care about the consequences of what the courts say in terms of who goes to jail or loses a lawsuit. They’re not going to give me much choice about that. But my entire adult life has been spent watching the US legal system make such terribly-reasoned hash out of tech cases that I now strongly weight legal decisions as evidence against, rather than for, when deciding what is morally correct or how things ought to work.

In a non-legal but more meaningful sense: I hold the US court system’s views on technology in contempt. Utter contempt.

Say I have made some material and registered it with the Copyright office and it's taken for use training machine learning models. Any transformation, any copying without permission breaches my monopoly control of my creative output. If it's never seen in the output of the ML tool, or is one of millions of input work, that doesn't change that neither a licence nor permission were granted.

“Taken” is doing a lot of heavy lifting here. Was it downloaded, stored on OpenAIs’ internal network, and used to alter a model’s weights during pre-training? Could I, a normal human, have downloaded and runtime-trained the neural network between my ears with that same material?

Because I think the actual difference in our views is: I do not see refinement of a model’s weights and a human’s dendrites as fundamentally different operations. They both constitute learning in a very real sense. Hank used airquotes on the word learning when he related this view in his video and that specific choice is where I think he’s dead wrong. The refinement of neural connection weights - silicon or flesh - by viewing materials is fully what it means to learn. And that act does not result in a new copy of the data. Executed properly, it does not even permit the creation of new copies: only the incorporation of stylistic choices and motifs in new works.

The fact that this mimicry of style can now be repeated a thousand times in an hour on home or smallscale server equipment, the fact that it was used in this novel fashion without the consent of the authors - those things represent a degree of harm and a moral wrong, respectively. But they do not create copies, nor deny a sale on your specific execution of that style, your specific employment of those motifs in a given work. It can be wrong or even immoral without violating copyright.

let's say you've not had any of my work in the training set but have a database of my work you use in Retrieval-Augmented Generation

Let me stop you right there: RAG is definitely theft and probably copyright violation as far as I’m concerned, and you should be able to sue people for hitting a database of your works during a RAG query without a license to it. That is fundamentally different from pre-training.

To me, it is specifically learning which is sacrosanct, whether it’s being done by people or machines. Or the kinds of machines that will one day become people - I define myself as someone who embraces the future, even if that sometimes terrifies me a little.

…and any people who want to halt or hinder learning need to be fired into the sun. That’s not a joke or my characteristic hyperbole.
posted by Ryvar at 6:39 AM on August 24 [2 favorites]


Could I, a normal human, have downloaded and runtime-trained the neural network between my ears with that same material

I think you are making the mistake of confusing the metaphor with reality. Computers "think" but that's just a metaphor. It's a useful metaphor, and there might be direct analogs to what humans do when they think, but it's unlikely to be the same thing.

An LLM not out there, following its interests, choosing what to focus on, reading websites as whim dictates, and taking in the information that interests it. Instead, it is doing whatever its programmers have specifically designed it to do, and taking in the information they have specified it take in, and processing it a way that it was designed to process it. Is that learning?
posted by surlyben at 7:09 AM on August 24 [2 favorites]


To me, it is specifically learning which is sacrosanct, whether it’s being done by people or machines.

Not for nothing, but just because folks call it machine learning, doesn't mean that the machine is, well, learning.
posted by Gygesringtone at 7:57 AM on August 24 [2 favorites]


it's unlikely to be the same thing

I assure you I am not confused on the difference between a) artificial systems which are trained on massive quantities of data in one phase to create a set of fixed weights that reflects the probability of any particular next token in sequence, then in a second phase run inference upon these weights with user-supplied input to generate corresponding output, and b) organic neural networks which are continuously updated at runtime via sensory data in a process that strongly resembles aspects of reinforcement models (maze solvers), large language models, and a third thing (recursive agentic reasoning within general case predictive systems models); the latter of which may or may not be achievable in silicon via synthesis of the prior two. Achievable in silicon definitely, eventually, but with any reasonable refinement of existing types of ANN architecture? Meh.

What you seem to be missing is that because we created artificial neural networks in our own image, even if the precise details differ, the resulting topological structures taken at the first or second derivative begin to increasingly correlate with ours because we are training them on the output of *our* neural structures. Those implementation differences - the ground truth of the network before any topological analysis - mean that truly enormous quantities of data are needed to achieve this. All the data you’ve got, really.

Because the problem with generative pre-trained transformers is that they are cut off from runtime updates - like a photo of your language center flash-frozen and set adrift, somehow decoupled from the entire central nervous system that previously hosted it. It understands or at least encodes the understanding of the relationships between “apple,” “fruit,” and “plant.” But it has no runtime experiences to connect those concepts with. It has never held an apple and watched its own hand turn the apple. Its understanding exists only in pure-relative terms.

Thus learning, and pure-relative conceptual mapping forms of understanding are achieved - but practical reasoning or useful understanding require hybridization with different kinds of neural architecture meant to work in real-world, runtime applications like robot navigation or locomotion: reinforcement learning.

That is why LLMs cannot reason, but they can be a major component of larger future systems which will. More concretely: they can author the evaluation criteria for the reinforcement components of future systems, because they already consistently outperform human experts at this (search term: Nvidia Eureka). This does not change the fact that their basic operation is predicated on refinement of neural weights, on optimization to local minimum loss: which is fully learning.
posted by Ryvar at 8:24 AM on August 24


I don't have language to talk about ML weights other than them being a digital dataset which processes an input into an output. It's not embodied, it's not searching for fuel to survive, it's not seeking reproductive mates -- so I use the lens of copyrightable data sourced from training data sets. What does it need to do so that I treat it as a person? I don't yet know.

You also seem centred around copies, where instead my creative effort is embodied in my work, and transforming (even truncating) it begets a derived work that earns copyright protection and carries forward my spark of creativity. (To some extent, copyright steals away my creativity from the collective good, but usually the know-how or knowledge diffuses among the community even beyond the specific embodiment of the creative work.)
posted by k3ninho at 8:34 AM on August 24 [2 favorites]


What does it need to do so that I treat it as a person? I don't yet know.

Neither do I. And that’s a good thing. We’re sure as shit not there yet, and I strongly doubt OpenAI’s pending “what is the crudest possible reinforcement-llm hybrid even possible?” release will change that regardless of whether it’s more “useful.” But I don’t believe that the human soul actually exists, or reaches into our neural networks from an ethereal plane via the thalamus like Descartes speculated, and somehow twists our network such that the output reflects the contents of our souls, rather than our meat. That’s bullshit, and so long as it remains bullshit we know that we’ll teach a rock to be a person eventually.
posted by Ryvar at 8:46 AM on August 24 [2 favorites]


To me, it is specifically learning which is sacrosanct, whether it’s being done by people or machines. Or the kinds of machines that will one day become people - I define myself as someone who embraces the future, even if that sometimes terrifies me a little.

…and any people who want to halt or hinder learning need to be fired into the sun. That’s not a joke or my characteristic hyperbole.


This strikes me as a limited moral framework. One that would make it easy to valorize knowledge over all sorts of other important things. Or value intelligence, however defined, over many other important qualities. I know people like to talk about education as end in itself, but it's frequently also a means to an end, and we need to be curious about what those ends are. (I'm not saying you aren't, Ryvar, just that this rhetoric plays right into the hands of those who would prefer we not be curious.)

Learning is in many ways inextricable from the pursuit of power, whether it's power over the fundamentals of the atom, over disease, or over your enemies. We need to understand why we want that power and what we would do with it if we had it. (Or whoever or whatever would do with it, if they had it.) Since we can't really know this in advance, I think we should proceed cautiously.
posted by ropeladder at 11:17 AM on August 24 [1 favorite]


did I summon beavits? (I'm enjoying this, perhaps too much)
posted by es_de_bah at 12:29 AM on August 25


« Older Death and Deception in Disabled Gaming   |   I am a parasite and I do not have the country’s... Newer »


This thread has been archived and is closed to new comments