OpenAI says it’s “impossible” to create useful AI models without copyrighted material

sculd@beehaw.org · 10 months ago

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

Phanatik@kbin.social · 10 months ago

Neither is an LLM. What you’re describing is a primitive Markov chain.

My description might’ve been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn’t know.

What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.

LLMs are an evolution of the same idea. I’m not saying it’s not impressive because it’s very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.

It’s easy to look at what people have created throughout history and think “this looks like that” and on a point by point basis you’d be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we’ve heard recently but we’ll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don’t know. But Carlin regularly calls upon his own experiences so it’s likely that he’s referencing a event from his past that is similar to that of 200 years ago. He might’ve subconsciously absorbed the information.

The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work. I don’t think they’re taking the position that it’s intelligent because from the beginning that was a marketing ploy. They’re taking the position that they should be allowed to use the data they stole because there was no other way.

Pup Biru@aussie.zone · 10 months ago

branding

okay

the marketing

yup

the plagiarism

woah there! that’s where we disagree… your position is based on the fact that you believe that this is plagiarism - inherently negative

perhaps its best not use loaded language. if we want to have a good faith discussion, it’s best to avoid emotive arguments and language that’s designed to evoke negativity simply by their use, rather than the argument being presented

I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer

its understandable that it’s frustrating, but just because a machine is now able to do a similar job to a human doesn’t make it inherently wrong. it might be useful for you to reframe these developments - it’s not taking away from humans, it’s enabling humans… the less a human has to have skill to get what’s in their head into an expressive medium for someone to consume the better imo! art and creativity shouldn’t be about having an ability - the closer we get to pure expression the better imo!

the less you have to worry about the technicalities of writing, the more you can focus on pure creativity

The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work

i’d question why it’s unethical, and also suggest that “stolen” is another emotive term here not meant to further the discussion by rational argument

so, why is it unethical for a machine but not a human to absorb information and create something based on its “experiences”?