Hilary Mason, formerly Chief Scientist for Bitly and GM of Machine Learning at Cloudera, recently wrote this fun and incredibly effective explanation for ChatGPT that I highly recommend sharing with, well, everyone:
How does ChatGPT work, really?
It can feel like a wizard behind a curtain, or like a drunk friend, but ChatGPT is just… math.Let’s talk about how to think about Large Language Models (LLMs) like ChatGPT so you can understand what to expect from them and better imagine what you might actually use them for.
A language model is a compressed representation of the patterns in language, trained on all of the language on the internet, and some books.That’s it.
The model is trained by taking a very large data set — in this case, text from sites like Reddit (including any problematic content🤮) and Wikipedia, and books from Project Gutenberg — extracting tokens, which are essentially the words in the sentences, and then computing the relationships in the patterns of how those words are used. That representation can then be used to generate patterns that look like the patterns it's seen before.
If I sing the words 🎶Haaappy biiiirthdayyy to ____🎶, most English speakers know that the next word is likely to be “you.” It might be other words like “me,” or “them,” but it’s very unlikely to be words like “allocative” or “pudding.” (No shade to pudding. I wish pudding the happiest of birthdays.)You weren’t born knowing the birthday song; you know the next likely words in the song because you’ve heard it sung lots of times. That’s basically what’s happening under the hood with a language model.
This is why a language model can generate coherent sentences without understanding the things it writes about. It's why it hallucinates (another technical term!) plausible-sounding information but doesn't have any idea what's factual. Inevitably, “it’s gonna lie.” The model understands syntax, not semantics.
When you ask ChatGPT a question, you get a response that fits the patterns of the probability distribution of language that the model has seen before. It does not reflect knowledge, facts, or insights.
And to make this even more fun, in that compression of language patterns, we also magnify the bias in the underlying language. This is how we end up with models that are even more problematic than what you find on the web. ChatGPT specifically does use human-corrected judgements to reduce the worst of this behavior, but those efforts are far from foolproof, and the model still reflects the biases of the humans doing the correcting.
Finally, because of the way these models are designed, they are at best, a representation of the average language used on the internet. By design, ChatGPT aspires to be the most mediocre web content you can imagine.
With all of that said, these language models are tremendously useful. They can take minimal inputs and make coherent text for us! They can help us draft, or translate, or change our writing style, or trigger new ideas. And they’ll be completely ✨transformative✨.
Emphasis mine, because good god, it needs emphasizing now. I keep reading stories about ChatGPT passing a medical license exam, or being a person's therapist, or literally terrifying a New York Times reporter, when there's really nothing all that dramatic actually going on behind the scenes.
The problem, as I put it to Hilary, seems to be one of perception. OpenAI the organization is not really explaining what ChatGPT does, so when it generates impressive answers, people really believe it's super smart and even sentient.
"ChatGPT is fundamentally a UX update to GPT3, allowing iterative querying of the model, creating a much better experience," Hilary observes. "But it's not a fundamental change in tech!"
"As to why OpenAI hasn't tried to damp down the hype, I think ChatGPT has served as a tremendous marketing tool for them. Having it free (when the GPUs are definitely not free to operate), and with such an easy-to-use UX, has unleashed a tremendous amount of energy. Why would they do anything to lessen that?
"OpenAI is a startup, too. And ChatGPT isn't a product. My intuition is that they would tolerate some experimental behavior in order to discover what people will ultimately be willing to pay for?"
So as I understand it, the reason people put so many inflated expectations around ChatGPT is very much due to marketing and presentation -- not because of the technology itself. And you know what happens on the Gartner Hype Cycle after inflated expectations get, well, deflated. At some point soon, people will begin to realize it's basically a better version of Google Search (in certain contexts, at least), and recalibrate.
As for Hilary, her latest startup also uses AI, but in a very different way. And that's a story for another post.
Excerpt reprinted with permission from Hilary Mason's LinkedIn.
Most of the key ML/AI papers come from Google Brain and DeepMind. The 2017 paper on transformers "Attention is all you need" came from them too. OpenAI based their GPT models on that (GPT = Generative Pre-trained Transformer).
Since OpenAI couldn't have an edge on research, what could they do to be competitive? Release earlier. That's pretty easy as Google and DeepMind are super cautious.
Microsoft then joined and invested on OpenAI, of course they want to compete against Google and now you have Bing powered by a language model. I've got the early access to it since a while and it looks more capable than ChatGPT.
However, as you could see, there have been issues immediately. Mostly because of Reddit users trolling the model, then journalists went there and took the sensationalist route. Then Microsoft had to take action by regulating and restricting the usage of the model and now it's harder to investigate its capabilities.
As for the NY journalist, he did the same move as Lemonine: prompt leading. The model then gives you what you request (or what you hint, even subconsciously: someone isn't aware of doing that) and the model follows your prompt.
ChatGPT, by default, essentially roleplays as a robotic assistant, so people use it as a sort of oracle or they ask calculations to... a language model, apt to, guess what, language tasks. You can ask it to summarize, to generate text of any kind in any style, etc.
Character.ai, developed by previous developers of Google's LaMDA, can pretend to be any character, from game characters to famous people, imitating their speech and personality.
And you can trick these models by making them generate anything else.
However, prompt engineering could also be malicious. OpenAI (and Microsoft with Bing Chat) tries to prevent that with some filter and pre-prompting (there is a hidden prompt before your input), but that works only so much.
At any rate, it's true that these models work with patterns. Neural networks (your brain too) are pattern recognition systems. This is also how you try to predict things, from the next word to finding patterns in stock market graphs. Often wrongly and that's also how stereotypes happen, but pattern recognition has its advantages too.
It's also how you learn, by training, repetition, until you "get it", as the mentioned song and the sentences you heard so many times; sadly propaganda, especially on social media, does the same by repeating misinformation, aided by bots, until people begin to parrot it; so, ironically, they are trained by propaganda bots.
The human brain not only predicts the next word and the next sentence, though, but also tries to predict what the interlocutor may reply. Having empathy and a developed theory of mind helps.
Since that also helps in predicting the next word (or better, the next token), there are papers investigating possible emergent abilities in language models:
https://arxiv.org/abs/2206.07682 "Emergent Abilities of Large Language Models"
https://arxiv.org/abs/2302.02083 "Theory of Mind May Have Spontaneously Emerged in Large Language Models"
(there are many more)
Hallucination is also another feature of neural networks, that fills the missing spots, beginning with the blind spot in the retina up to your dreams. So it's not something we are going to get rid. Instead, without hallucinations you couldn't have Stable Diffusion, nor in-painting techniques. Someone even thinks the human brain lives in a controlled hallucination and even the sense of self and consciousness may be illusory.
Anyway, that's true also for your "faulty", lossy fuzzy memories (that however can archive a huge amount of info), so that you have to check your notes and photos, because that red tulip you remember maybe it's your brain making it up for some missing information and it was a pink petunia instead.
Since these LLMs are good at making stuff up and they are language models, one of the best use cases is indeed chat-based roleplay gaming. We had some examples of that several years ago already with e.g. AI Dungeon. GPT-2 based AI Dungeon was hilarious with all the nonsense generated and it was quickly incoherent. GPT-3 based AI Dungeon was somewhat better, but still derailing, roleplaying for you, etc. But have you tried paragraph-roleplaying with ChatGPT? It's way more coherent. Essentially it's still GPT-3, but much improved (it's GPT-3.5) and it takes advantage of a larger context window, InstructGPT and several other things.
Posted by: Nade | Thursday, March 09, 2023 at 12:16 AM
If you want to try out chatGTP in Second Lifet go to BOO CLUB and chat withe "Betty Bot" Blais and I made a chatbot for you.
Taxi: https://maps.secondlife.com/secondlife/Danish%20Visions/188/233/250
Posted by: Betty Tureaud | Tuesday, March 14, 2023 at 01:45 AM
Not only is content 'mediocre', but it gets *more* mediocre and fuzzy over time: https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
Posted by: John | Friday, August 18, 2023 at 04:35 AM