AI/LLM/GPT Roundup, June 28: A Whiff of Doom

While we’re still arguing whether the future looks bright or bleak with respect to AI/LLM/GPT, some parts of this future begin to look like a Boschian nightmare from hell. Let’s start with “search.” From Avram Piltch’s “Plagiarism Engine: Google’s Content-Swiping AI Could Break the Internet”:

Even worse, the answers in Google’s SGE boxes are frequently plagiarized, often word-for-word, from the related links. Depending on what you search for, you may find a paragraph taken from just one source or get a whole bunch of sentences and factoids from different articles mashed together into a plagiarism stew. […]

From a reader’s perspective, we’re left without any authority to take responsibility for the claims in the bot’s answer. Who, exactly, says that the Ryzen 7 7800X3D is faster and on whose authority is it recommended? I know, from tracing back the text, that Tom’s Hardware and Hardware Times stand behind this information, but because there’s no citation, the reader has no way of knowing. Google is, in effect, saying that its bot is the authority you should believe. […]

Though Google is telling the public that it wants to drive traffic to publishers, the SGE experience looks purpose-built to keep readers from leaving and going off to external sites, unless those external sites are ecomm vendors or advertisers. […] If Google were to roll its SGE experience out of beta and make it the default, it would be detonating a 50-megaton bomb on the free and open web.

As always, you’ll have to read the article in full for yourself. While I haven’t beta-tested Google’s SGE personally, I’ve explored GAI well enough to say that this doesn’t strike me as overly alarmist—if Google really goes online with this, search will be screwed. What are we (collectively) going to do about it? I have no answer to that.

Next, data in general, or “knowledge.” You probably remember the Time article, which I referenced several times in blog posts and in my essay on Medium, on how OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. That practice, apparently, is not merely getting worse—it’s getting worse on a much larger, even comprehensive scale. Here’s Josh Dzieza at The Verge with “AI Is a Lot of Work”:

A few months after graduating from college in Nairobi, a 30-year-old I’ll call Joe got a job as an annotator—the tedious work of processing the raw information used to train artificial intelligence. AI learns by finding patterns in enormous quantities of data, but first that data has to be sorted and tagged by people, a vast workforce mostly hidden behind the machines. In Joe’s case, he was labeling footage for self-driving cars—identifying every vehicle, pedestrian, cyclist, anything a driver needs to be aware of—frame by frame and from every possible camera angle. It’s difficult and repetitive work. A several-second blip of footage took eight hours to annotate, for which Joe was paid about $10. […]

Much of the public response to language models like OpenAI’s ChatGPT has focused on all the jobs they appear poised to automate. But behind even the most impressive AI system are people—huge numbers of people labeling data to train it and clarifying data when it gets confused. Only the companies that can afford to buy this data can compete, and those that get it are highly motivated to keep it secret. The result is that, with few exceptions, little is known about the information shaping these systems’ behavior, and even less is known about the people doing the shaping. […]

The data vendors behind familiar names like OpenAI, Google, and Microsoft come in different forms. There are private outsourcing companies with call-center-like offices, such as the Kenya- and Nepal-based CloudFactory, where Joe annotated for $1.20 an hour before switching to Remotasks. There are also “crowdworking” sites like Mechanical Turk and Clickworker where anyone can sign up to perform tasks. In the middle are services like Scale AI. Anyone can sign up, but everyone has to pass qualification exams and training courses and undergo performance monitoring. Annotation is big business.

It’s a veritable hellscape into which OpenAI, Google, and Microsoft pitch workers and information alike; you really have to read it for yourself to get the full picture. And it’s getting even worse than that because evidence has popped up that these annotators have begun to use AI for their jobs, with all the “data poisoning” implications you can think of. Boschian, again.

Finally, “social risks.” You might remember how OpenAI’s Sam Altman lobbied for a new “oversight agency” which, as someone quipped, amounts to calling for environmental regulations that prevent the creation of Godzilla. And, who would have thunk, Altman is of course lobbying the EU to water down AI regulations that would address the actual risks of their products in terms of misinformation, labor impact, safety, and similar. On that, here’s another Time exclusive, “OpenAI Lobbied the E.U. to Water Down AI Regulation”:

[B]ehind the scenes, OpenAI has lobbied for significant elements of the most comprehensive AI legislation in the world—the E.U.’s AI Act—to be watered down in ways that would reduce the regulatory burden on the company, according to documents about OpenAI’s engagement with E.U. officials obtained by TIME from the European Commission via freedom of information requests. In several cases, OpenAI proposed amendments that were later made to the final text of the E.U. law—which was approved by the European Parliament on June 14, and will now proceed to a final round of negotiations before being finalized as soon as January. […]

One expert who reviewed the OpenAI White Paper at TIME’s request was unimpressed. “What they’re saying is basically: trust us to self-regulate,” says Daniel Leufer, a senior policy analyst focused on AI at Access Now’s Brussels office. “It’s very confusing because they’re talking to politicians saying, ‘Please regulate us,’ they’re boasting about all the [safety] stuff that they do, but as soon as you say, ‘Well, let’s take you at your word and set that as a regulatory floor,’ they say no.”

And, as noted in the article, Google and Microsoft have lobbied the EU in similar ways, which shouldn’t come as a surprise.

So, take a deep breath and enjoy a whiff of actual doom, drafting down from Google’s SGE to exploitation and data poisoning to AI regulations fashioned by the worst possible actors.

just drafts

My Secret Level

AI/LLM/GPT Roundup, June 28: A Whiff of Doom

By J. Martin