OpenAI says it’s “impossible” to create useful AI models without copyrighted material

sculd@beehaw.org · 10 months ago

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

frog 🐸@beehaw.org · 10 months ago

I wish I could upvote this more than once.

What people always seem to miss is that a human doesn’t need to billions of examples to be able to produce something that’s kind of “eh, close enough”. Artists don’t look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn’t looking at billions of examples: it’s looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they’re trying to express.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

teawrecks@sopuli.xyz · 10 months ago

What you count as “one” example is arbitrary. In terms of pixels, you’re looking at millions right now.

The ability to train faster using fewer examples in real time, similar to what an intelligent human brain can do, is definitely a goal of AI research. But right now, we may be seeing from AI what a below average human brain could accomplish with hundreds of lifetimes to study.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

I mean, no, if you only ever look at public domain stuff you literally wouldn’t know the state of the art, which is historically happening for profit. Even the most untrained artist “doing their own thing” watches Disney/Pixar movies and listens to copyrighted music.

Bene7rddso@feddit.de · 10 months ago

Humans learn mostly from real life. Go touch some grass

frog 🐸@beehaw.org · 10 months ago

If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.

OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities - too vast for them to simply compensate all the creators. So it genuinely is not comparable to a human, because humans can, in fact, learn without using copyrighted material. If OpenAI’s argument is actually that their AI can’t compete commercially with modern art without using copyrighted works, then they should be honest about that - but then they’d be showing their hand, wouldn’t they?

Even_Adder@lemmy.dbzer0.com · 10 months ago

It isn’t wrong to use copyrighted works for training. Let me quote an article by the EFF here:

First, copyright law doesn’t prevent you from making factual observations about a work or copying the facts embodied in a work (this is called the “idea/expression distinction”). Rather, copyright forbids you from copying the work’s creative expression in a way that could substitute for the original, and from making “derivative works” when those works copy too much creative expression from the original.

Second, even if a person makes a copy or a derivative work, the use is not infringing if it is a “fair use.” Whether a use is fair depends on a number of factors, including the purpose of the use, the nature of the original work, how much is used, and potential harm to the market for the original work.

and

Even if a court concludes that a model is a derivative work under copyright law, creating the model is likely a lawful fair use. Fair use protects reverse engineering, indexing for search engines, and other forms of analysis that create new knowledge about works or bodies of works. Here, the fact that the model is used to create new works weighs in favor of fair use as does the fact that the model consists of original analysis of the training images in comparison with one another.

What you want would swing the doors open for corporate interference like hindering competition, stifling unwanted speech, and monopolization like nothing we’ve seen before. There are very good reasons people have these rights, and we shouldn’t be trying to change this. Ultimately, it’s apparent to me, you are in favor of these things. That you believe artists deserve a monopoly on ideas and non-specific expression, to the detriment of anyone else. If I’m wrong, please explain to me how.

If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

Humans benefit from years of evolutionary development and corporeal bodies to explore and interact with their world before they’re ever expected to produce complex art. AI need huge datasets to understand patterns to make up for this disadvantage. Nobody pops out of the womb with fully formed fine motor skills, pattern recognition, understanding of cause and effect, shapes, comparison, counting, vocabulary related to art, and spatial reasoning. Datasets are huge and filled with image-caption pairs to teach models all of this from scratch. AI isn’t human, and we shouldn’t judge it against them, just like we don’t judge boats on their rowing ability.

And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.

AI don’t require most modern art in order to learn to make images either, but the range of expression would be limited, just like a human’s in this situation. You can see this in cave paintings and early sculptures. They wouldn’t be limited to this same degree, but you would still be limited.

It took us 100,000 years to get from cave drawings to Leonard Da Vinci. This is just another step for artists, like Camera Obscura was in the past. It’s important to remember that early man was as smart as we are, they just lacked the interconnectivity to exchange ideas that we have.

ParsnipWitch@feddit.de · 10 months ago

I think the difference in artistic expression between modern humans and humans in the past comes down to the material available (like the actual material to draw with).

Humans can draw without seeing any image ever. Blind people can create art and draw things because we have a different understanding of the world around us than AI has. No human artist needs to look at a thousand or even at 1 picture of a banana to draw one.

The way AI sees and “understands” the world and how it generates an image is fundamentally different from how the human brain conveys the object banana into an image of a banana.

Phanatik@kbin.social · 10 months ago

Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.

This doesn’t even touch the fact that I’m learning to draw not by looking at other drawings but what exactly I’m trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether it’s through a digital medium or traditional pen/pencil and paper. But the skill isn’t being able replicate other drawings, it’s being able to convert something I can see into a drawing. If I’m drawing someone sitting in a wheelchair, then I’ll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I don’t want. There’s so much that goes into creative work and I’m tired of arguing with people who have no idea what it takes to produce creative works.

frog 🐸@beehaw.org · 10 months ago

It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than “well anyone can draw, children do it all the time”. They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don’t get it, and to an extent, that’s fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.

intensely_human@lemm.ee · 10 months ago

Well, that’s not me. I’m a creative, and I see deep parallels between how LLMs work and how my own mind works.

frog 🐸@beehaw.org · 10 months ago

Either you’re vastly overestimating the degree of understanding and insight AIs possess, or you’re vastly underestimating your own capabilities. :)

Veloxization@yiffit.net · 10 months ago

This whole AI craze has just shown me that people are losing faith in their own abilities and their ability to learn things. I’ve heard so many who use AI to generate “artwork” argue that they tried to do art “for years” without improving, and hence have come to conclusion that creativity is a talent that only some have, instead of a skill you can learn and hone. Just because they didn’t see results as fast as they’d have liked.

frog 🐸@beehaw.org · 10 months ago

Very well said! Creativity is definitely a skill that requires work, and for which there are no short cuts. It seems to me that the vast majority of people using AI for artwork are just looking for a short cut, so they can get the results without having to work hard and practice. The one valid exception is when it’s used by disabled people who have physical limitations on what they can do, which is a point that’s brought up occasionally - and if that was the one and only use-case for these models, I think a lot of artists would actually be fine with that.

jarfil@beehaw.org · 10 months ago

Alternatively, you might be vastly overestimating human “understanding and insight”, or how much of it is really needed to create stuff.

frog 🐸@beehaw.org · edit-2 10 months ago

Average humans, sure, don’t have a lot of understanding and insight, and little is needed to be able to draw a doodle on some paper. But trained artists have a lot of it, because part of the process is learning to interpret artworks and work out why the artist used a particular composition or colour or object. To create really great art, you do actually need a lot of understanding and insight, because everything in your work will have been put there deliberately, not just to fill up space.

An AI doesn’t know why it’s put an apple on the table rather than an orange, it just does it because human artists have done it - it doesn’t know what apples mean on a semiotic level to the human artist or the humans that look at the painting. But humans do understand what apples represent - they may not pick up on it consciously, but somewhere in the backs of their minds, they’ll see an apple in a painting and it’ll make the painting mean something different than if the fruit had been an orange.

jarfil@beehaw.org · 10 months ago

it doesn’t know what apples mean on a semiotic level

Interestingly, LLMs seem to show emerging semiotic organization. By analyzing the activation space of the neural network, related concepts seem to get trained into similar activation patterns, which is what allows LLMs to zero shot relationships when executed at a “temperature” (randomness level) in the right range.

Pairing an LLM with a stable diffusion model, allows the resulting AI to… well, judge by yourself: https://llm-grounded-diffusion.github.io/

frog 🐸@beehaw.org · 10 months ago

I’m unconvinced that the fact they’re getting better at following instructions, like putting objects where the prompter specifies, or changing the colour, or putting the right number of them, etc means the model actually understands what the objects mean beyond their appearance. It doesn’t understand the cultural meanings attached to each object, and thus is unable to truly make a decision about why it should place an apple rather than an orange, or how the message within the picture changes when it’s a red sports car rather than a beige people-carrier.

Quokka@quokk.au · edit-2 10 months ago

Children learn by watching others. We are trained from millions of examples starting from before birth.

intensely_human@lemm.ee · 10 months ago

When you look at one painting, is that the equivalent of one instance of the painting in the training data? There is an infinite amount of information in the painting, and each time you look you process more of that information.

I’d say any given painting you look at in a museum, you process at least a hundred mental images of aspects of it. A painting on your wall could be seen ten thousand times easily.

Even_Adder@lemmy.dbzer0.com · 10 months ago

When people say that the “model is learning from its training data”, it means just that, not that it is human, and not that it learns exactly humans. It doesn’t make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.

Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.

For example, when a model takes in a thousand images of circles, it doesn’t “learn” a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from “cat” to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.

Eccitaze@yiffit.net · 10 months ago

It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.

But when it’s pointed out that LLMs don’t learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldn’t be judged by human standards? I don’t know if it’s intentional on your part, but that’s a pretty classic example of a motte-and-bailey fallacy. You can’t have it both ways.

ParsnipWitch@feddit.de · edit-2 10 months ago

In general I agree with you, but AI doesn’t learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it’s important to make the distinction.

That is why current models aren’t regarded as actual intelligence, although people already call them that…