Hilary Mason, formerly Chief Scientist for Bitly and GM of Machine Learning at Cloudera, recently wrote this fun and incredibly effective explanation for ChatGPT that I highly recommend sharing with, well, everyone:
How does ChatGPT work, really?
It can feel like a wizard behind a curtain, or like a drunk friend, but ChatGPT is just… math.
Let’s talk about how to think about Large Language Models (LLMs) like ChatGPT so you can understand what to expect from them and better imagine what you might actually use them for.
A language model is a compressed representation of the patterns in language, trained on all of the language on the internet, and some books.
That’s it.
The model is trained by taking a very large data set — in this case, text from sites like Reddit (including any problematic content🤮) and Wikipedia, and books from Project Gutenberg — extracting tokens, which are essentially the words in the sentences, and then computing the relationships in the patterns of how those words are used. That representation can then be used to generate patterns that look like the patterns it's seen before.
If I sing the words 🎶Haaappy biiiirthdayyy to ____🎶, most English speakers know that the next word is likely to be “you.” It might be other words like “me,” or “them,” but it’s very unlikely to be words like “allocative” or “pudding.” (No shade to pudding. I wish pudding the happiest of birthdays.)
You weren’t born knowing the birthday song; you know the next likely words in the song because you’ve heard it sung lots of times. That’s basically what’s happening under the hood with a language model.
This is why a language model can generate coherent sentences without understanding the things it writes about. It's why it hallucinates (another technical term!) plausible-sounding information but doesn't have any idea what's factual. Inevitably, “it’s gonna lie.” The model understands syntax, not semantics.
When you ask ChatGPT a question, you get a response that fits the patterns of the probability distribution of language that the model has seen before. It does not reflect knowledge, facts, or insights.
And to make this even more fun, in that compression of language patterns, we also magnify the bias in the underlying language. This is how we end up with models that are even more problematic than what you find on the web. ChatGPT specifically does use human-corrected judgements to reduce the worst of this behavior, but those efforts are far from foolproof, and the model still reflects the biases of the humans doing the correcting.
Finally, because of the way these models are designed, they are at best, a representation of the average language used on the internet. By design, ChatGPT aspires to be the most mediocre web content you can imagine.
With all of that said, these language models are tremendously useful. They can take minimal inputs and make coherent text for us! They can help us draft, or translate, or change our writing style, or trigger new ideas. And they’ll be completely ✨transformative✨.
Emphasis mine, because good god, it needs emphasizing now. I keep reading stories about ChatGPT passing a medical license exam, or being a person's therapist, or literally terrifying a New York Times reporter, when there's really nothing all that dramatic actually going on behind the scenes.
The problem, as I put it to Hilary, seems to be one of perception. OpenAI the organization is not really explaining what ChatGPT does, so when it generates impressive answers, people really believe it's super smart and even sentient.
Why ChatGPT Goes "Wrong" & People Misinterpret It (Comment of the Week)
Most of the key ML/AI papers come from Google Brain and DeepMind. The 2017 paper on transformers "Attention is all you need" came from them too. OpenAI based their GPT models on that (GPT = Generative Pre-trained Transformer).
Since OpenAI couldn't have an edge on research, what could they do to be competitive? Release earlier.
That's pretty easy [to do] as Google and DeepMind are super cautious. Microsoft then joined and invested on OpenAI, of course they want to compete against Google and now you have Bing powered by a language model.
I've got the early access to it since a while and it looks more capable than ChatGPT. However, as you could see, there have been issues immediately. Mostly because of Reddit users trolling the model, then journalists went there and took the sensationalist route. Then Microsoft had to take action by regulating and restricting the usage of the model and now it's harder to investigate its capabilities.
This helps explain how a NYT journalist had a "scary encounter" with ChatGPT; Nade argues he made the same mistake as Blake Lemoine, the Google engineer who believes that LaMDA, the company's experimental chatbot, achieved sentience:
Continue reading "Why ChatGPT Goes "Wrong" & People Misinterpret It (Comment of the Week)" »
Posted on Monday, March 13, 2023 at 04:25 PM in AI, Comment of the Week | Permalink | Comments (2)
|
|