Ted Chiang: “ChatGPT Is a Blurry JPEG of the Web”

Ted Chiang: “ChatGPT Is a Blurry JPEG of the Web” →

Last Tuesday, in my essay at medium.com on “Artificial Intelligence, ChatGPT, and Transformational Change,” I was generally pessimistic about current AI’s social implications and generally optimistic about its technical implications.

However, yesterday’s article “ChatGPT Is a Blurry JPEG of the Web” by Ted Chiang in The New Yorker gave me second thoughts about the latter:

Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.

Also, Google’s Bard presentation yesterday (and some Bing shenanigans) gave me second thoughts from a different, but related perspective. Not because things did go sideways a bit there; it rather occurred to me that chatbots obscure both their sources’ origins and the selection process a lot more than conventional search engines already do, which might transform search engines in the long run into portaled successors of the AOL internet. Sure, people can still use conventional search, but we all know how things are done at Google. If more and more people adapt to chatbot search and conventional search begins to deliver fewer and fewer ads, resources might be cut, and conventional search might even be dropped for good someday (viz: Google Reader, Feedburner, Wave, Inbox, Rockmelt, Web & Realtime APIs, Site Search, Map Maker, Spaces, Picasa, Orkut, Google+, and so on).

But there’s more. Ted Chiang again:

Imagine what it would look like if ChatGPT were a lossless algorithm. If that were the case, it would always answer questions by providing a verbatim quote from a relevant Web page. We would probably regard the software as only a slight improvement over a conventional search engine, and be less impressed by it. The fact that ChatGPT rephrases material from the Web instead of quoting it word for word makes it seem like a student expressing ideas in her own words, rather than simply regurgitating what she’s read; it creates the illusion that ChatGPT understands the material. In human students, rote memorization isn’t an indicator of genuine learning, so ChatGPT’s inability to produce exact quotes from Web pages is precisely what makes us think that it has learned something. When we’re dealing with sequences of words, lossy compression looks smarter than lossless compression.

Go and read the whole thing. It’s chock-full of insights and interesting thoughts.

just drafts

My Secret Level