With the latest announcement regarding google allegedly paying reddit 60million per year for access to user created content to train their AI, what is stopping companies from using the freely available information on the lemmyverse to do it for free?

How does everyone feel about the likelihood of this already happening and should something be done about it?

  • Ziggurat@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    Technically copyright stops them. I know, the whole copyright debate on AI training hasn’t been settled. But when you sign a contract with reddit or dropbox, I assume it includes a licence to use the content to train AI.

    Here on Lemmy, I never gave a licence to my instance to reuse my content. and I keep full copyright on the content.

    Well I know, nobody cares about copyright, but there is a difference between OP downloading a torrent of my little pony and a company making tons of money out of it. Remember that the pirate bay founder got jail time,

  • WatDabney@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    Nothing is stopping them from doing it, and the only reason it might not be already happening is the possibility that nobody has cared enough to bother yet.

    And by design nothing much can be done about it. That’s the nature of a decentralized platform - it’s explicitly set up to share content, and that’s what it does, by default. And there is no central authority that can control access to the fediverse as a whole. And that’s pretty much that.

    And personally, I don’t care. I’ve never bought into the nonsensical idea that the stuff I post in a public forum is in any meaningful sense my property after I post it.

    • Platypus@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      Agreed–if I put up a poster on a billboard, I’m not really in a position to complain if someone takes a picture of it.

  • Oisteink@feddit.nl
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    They have to pay Reddit now as the api is gone. I’m quite certain that at least one of the companies scraping the web to train their LLM have been using it.

    And I’m quite certain that this happens to fediverse as well. You don’t even need an api, just set up your own instance. Make a few thousand accounts and sub all over using these. You got all the data in a nice db

    • Draconic NEO@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      Realistically with how Fediverse works they could just ban his actor from their collection node and it’ll ignore all requests made by him or replies to him, as if they never even happened.

    • Kayn@dormi.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      Ironically, their comments are more accessible than anyone else’s here.