Now that ChatGPT is openly available online, I've been checking in on the system's knowledge of virtual worlds -- and yep, it's still confused. And worse it's actively confusing what I've written about virtual worlds. Ask it, "How does Wagner James Au define the Metaverse?", and you get something like:
[He] defines the Metaverse as a collective virtual shared space, created by the convergence of virtually enhanced physical reality and physically persistent virtual reality. This definition emphasizes the interconnectedness of virtual spaces and their integration with our physical world, suggesting a seamless blending of digital and physical experiences.
Like the saying goes, this is not right, it's not even wrong. I don't know how ChatGPT came up with this muddy extrusion, when I painstakingly researched and put my definition on the web (let alone my book, let alone several online articles about my book), starting years ago:
The Metaverse is a vast, immersive virtual world simultaneously accessible by millions of users through VR and other devices, highly customizable avatars and powerful experience creation tools. It is integrated with the real world economy and compatible with external technology.
A properly targeted Google search leads to that definition, but ChatGPT's authoritative, mansplain-y format is meant to ensure confidence in its answers. (Even the fine print qualification, "ChatGPT can make mistakes", belies its rampant potential for off-base laziness.)
It's amusing to read AI evangelists assert that programs like ChatGPT will soon replace writers, when I mostly see ChatGPT causing more tedious work for writers -- making us spend extra time chasing down its errors, turning its mediocre, bland answers into something that's readable.
Longtime journalist/editor Mitch Wagner, who uses ChatGPT as a side assistant tool for spellchecking and a thesaurus reference while he's writing his own articles, made some similar points recently:
Some ways I find ChatGPT and other generative AI useful today:
- Generating questions for interviews. ChatGPT is surprisingly great at that.
- Generating images.
- Occasionally writing draft introductions to articles, as well as conclusions, descriptions and summaries. I’ve always had trouble writing that kind of thing. I don’t use the version ChatGPT generates—I tear that up and write my own—but ChatGPT gets me started. I don’t do this often, but I’m grateful when I do.
- Casual low-stakes queries, when I remember to use ChatGPT for that. “What was the name of the movie that was set in a boarding house for actresses that starred Katherine Hepburn?” “Stage Door.” “Was Lucille Ball in that one too?” “Yes.” “Was that Katherine Hepburn’s first movie?” “No.” And ChatGPT provided some additional information. I probably could have gotten that information from Google, but ChatGPT was faster.
- I find otter.ai extremely useful for transcriptions, likewise Grammarly for proofreading. Do those applications use GenAI? I don’t know.
My big problem, and the reason I don’t us ChatGPT more, is that ChatGPT lies. Not only that, but it lies convincingly. A convincing liar is even worse than a liar. I don’t have much use for an information source that I can’t trust. I don’t see an obvious way to solve this problem.
The trust piece is the worst part, especially when ChatGPT bullshits so confidently. When a writer puts their byline on a work, we're effectively saying that we stand behind every word. Any additions by ChatGPT force us to recheck every word in its answers -- even when the answers are about our own writing.
The loss of trust is what Daniel Dennett said about how AIs can hurt civilisation. It goes further, but destruction of trust is central to his argument: https://www.bbc.com/future/article/20240422-philosopher-daniel-dennett-artificial-intelligence-consciousness-counterfeit-people
Posted by: JT | Friday, April 26, 2024 at 06:14 AM
It never ceases to amaze me how some people persist with this bullshit, using a LLM as a substitute for Google or as a database for factual information. Especially when the LLM in question is an outdated model and running an old application that doesn't even have web search capabilities, nor RAG for grounding. By the way, Copilot Precise, as a search engine assistant, found your definition and quoted it verbatim, even providing a link to the source.
Mitch Wagner, instead, provided also a few examples of what (even) ChatGPT is actually useful for (and there are many more) and correctly said it's not reliable as information source. That "ANN/LLMs hallucinate" is know since forever. Also a LLM with few billion parameters can't physically hold a 15 trillion token dataset. It's incorrect to label hallucinations as lies, though: "they lie" implies an intentional act of deception, whereas LLMs fill in the gaps with "hallucinated" information. Do you lie when your brain reconstructs what your blind spot can't see, based on the average surroundings?
Also I find it interesting that Mitch Wagner said 'ChatGPT was faster' than getting information from Google. Sometimes, when searching for technical or obscure information, you have to rummage through numerous search results and websites. GPT-4, which powers Copilot Precise, might already have a decent idea of what the solution is, maybe a lacking one, but it knows what to search for, then it uses its search tool and finds it or at least puts you on the right track. Also you can ask for something more complex than trivial plain searches. Copilot and Perplexity saved me quite a bit of time, multiple times.
You might be interested in an example of multi-search for Second Life: "What are the most trendy Second Life female mesh bodies as of 2024? Please list them and for each of them search for their respective inworld store in secondlife.com Destinations." In few seconds, Copilot Precise correctly listed Maitreya Lara and LaraX, eBody Reborn, and Mesbody Legacy, found their stores in Destinations and linked them. You can also ask it to make a comparative table. It's not always perfect, but it's pretty handy.
I won't take ChatGPT and its old GPT-3.5, as the state-of-the-art, let alone use it as a benchmark to evaluate the current advancements in AI research. Today, even compact models like PHI-3 and LLama 3 8b, which can run locally on a laptop or a high-end smartphone, are nearly on par with GPT-3.5 in many tasks. In the meantime, while GPT-4 continues to be helpful for coding, models like Gemini 1.5 Pro and Claude 3 Opus seem better than GPT-4 in generating (and writing) fiction ideas or brainstorming. Gemini has a huge context window and can write a critique of a novel. Llama 3 400b is about to be released and it looks like OpenAI is releasing a new model this year.
It's improving and improving. I don't think they would be on par with a senior software engineer or a professional writer or poet until we develop an actual strong AI or AGI (or at least a good introspection, the labs are working on planning at least), but vice-versa I won't say they are useless either or only focus on the negative.
Of course GPT-3.5 powering ChatGPT hasn't been updated: OpenAI, obviously, wouldn't waste millions to retrain that older model, even less for just minor facts that aren't repeated enough times in the training dataset and can't be fit inside its limited amount of parameters anyway, let alone for the sake of the wrong use in the wrong application... when their newer models (or even GPT-3.5 itself) are already doing that on better suited applications such as Perplexity or Copilot.
Posted by: n | Friday, April 26, 2024 at 05:06 PM