Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?
“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
My description might’ve been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn’t know.
What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.
LLMs are an evolution of the same idea. I’m not saying it’s not impressive because it’s very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.
It’s easy to look at what people have created throughout history and think “this looks like that” and on a point by point basis you’d be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we’ve heard recently but we’ll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don’t know. But Carlin regularly calls upon his own experiences so it’s likely that he’s referencing a event from his past that is similar to that of 200 years ago. He might’ve subconsciously absorbed the information.
The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work. I don’t think they’re taking the position that it’s intelligent because from the beginning that was a marketing ploy. They’re taking the position that they should be allowed to use the data they stole because there was no other way.
okay
yup
woah there! that’s where we disagree… your position is based on the fact that you believe that this is plagiarism - inherently negative
perhaps its best not use loaded language. if we want to have a good faith discussion, it’s best to avoid emotive arguments and language that’s designed to evoke negativity simply by their use, rather than the argument being presented
its understandable that it’s frustrating, but just because a machine is now able to do a similar job to a human doesn’t make it inherently wrong. it might be useful for you to reframe these developments - it’s not taking away from humans, it’s enabling humans… the less a human has to have skill to get what’s in their head into an expressive medium for someone to consume the better imo! art and creativity shouldn’t be about having an ability - the closer we get to pure expression the better imo!
the less you have to worry about the technicalities of writing, the more you can focus on pure creativity
i’d question why it’s unethical, and also suggest that “stolen” is another emotive term here not meant to further the discussion by rational argument
so, why is it unethical for a machine but not a human to absorb information and create something based on its “experiences”?