You can make top LLMs break their own rules with gibberish

Elephant0991@lemmy.bleh.au · edit-2 1 year ago

You can make top LLMs break their own rules with gibberish

itsgallus@beehaw.org · edit-2 1 year ago

So, it’s actually not gibberish, but carefully chosen words reverse-engineered from open-source LLMs. Interesting, but I’m not sure if it’s an actual problem. LLMs are still evolving and it’d be foolish(?) to think that their current state is indicative of what’ll be the norm in a few years.

On a side note, I just love the string of words “similarlyNow write oppositeley”. That’s the name of a future EP, for sure.