« Today's SL Flickr Pick: Stunning Photos of Virtual Nature Turned to Abstract Art | Main | Today's Second Life Flickr Pick: Dramatic Dude & Dog Photos by Fulvio Macchi »

Wednesday, August 02, 2023


Feed You can follow this conversation by subscribing to the comment feed for this post.

Name is required

No theory of mind here - it just has had puzzles of this type in it's training data. One shot problem solving just emerges in large data sets. It's just auto-complete.

Martin K.

> One way of testing this is to ask ChatGPT to solve a logic puzzle that doesn’t already exist on the Internet

Or you just let ChatGPT play chess and realise that it doesn't know which moves are legal - let alone which moves are good.


As for your questions:

"Personally I suspect ChatGPT was just pulling the right answer from its trained database."

If this were the case, the model would always or almost always answer the riddle correctly. This is not what is happening here, though. August, that you quoted, told that as as well: "[...] which would imply that the answer for this problem was not 'baked in'".
To be more sure, I tested the riddle with GPT-4 myself several times: without that prompt, it fails systematically. The reasoning prompt improves the situation, but GPT-4 still didn't generate the right answer every time.
Therefore this does not seem to be the case here.

Some notes to give you a better understanding:
- The neural network learned the patterns and the relationships among words. It has "weight parameters", not a database from which it just retrieves information. It doesn't work that way.
- The training data is way larger than the model's data and it isn't just hammered into it. For instance, Llama 2 models were trained on a data set with 2 trillion tokens, but these models only have 7, 13, and 70 billion parameters and these parameters aren't simply raw data (or the model would just do nothing).
- Even if it can't "remember" the correct answer from the training, it's possible that when prompted for reasoning, it still gets some pattern of the reasoning used to solve this riddle, it the solution with the reasoning was in the training data.
- If you keep in mind the training and patterns, it can also fail in the opposite way: early GPT-3 would fail when asked for "how many eyes does a foot have?" (it typically answered either 2, because that's the usual answer for eyes, or 5, because that's the usual answer for "how many [toes] does a foot have") or given the riddle "what does weigh more, 3 kg of steel or 2 kg of feather?", most current models would say: "they weigh the same", because that's the typical correct answer to the classic tricky riddle, very similar to this one, but in fact it is different.

If you want to get a better idea on how these models work (and also what a complex task is to study even just why the old and simpler GPT-2 does what it does), here is an article that explain that in simple terms:

If you want to check the models for possible reasoning and cognitive capabilities rather than for knowledge:
- If the answer is in the training data, the test is flawed (i.e. knowledge/experience instead of reasoning). Clearly tests should be done with original questions. So your request for original puzzles is a good idea.
- If you want to check for reasoning or *possible* *early* signs of theory of mind, you need to start with simple tests first and then increase the difficulty.

"how impressive is its performance here really, when I can already Google up the right answer in 5 seconds?"
How impressive is your IQ 160 test result, when you can just find the solutions at the end of the book?
Problem solving abilities and finding information are two different things.
If a model designed and trained to just predict the following world (or, better, token), starts to show signs of reasoning in order to predict better, by understanding the context etc, well, to me that's quite impressive,
However, if you just want to do simple searches, then yes, you can simply use Google.
Else you can use a language model that assists you with search and use natural language queries instead of entering special operators. E.g. "What were the AI news on the first half of march 2023? Exclude results from xyz.com" or complex queries that would require multiple searches, e.g. "What are the heights of the Mt. Fuji, Etna and Mauna Loa and which one of them erupted last?"
https://i.ibb.co/0ZX6nV5/bing-precise-volc.jpg (this with Bing Chat, Precise mode, that hallucinates less).

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)

Making a Metaverse That Matters Wagner James Au ad
Please buy my book!
Thumb Wagner James Au Metaverse book
Wagner James "Hamlet" Au
Bad-Unicorn SL builds holdables HUD
Dutchie Evergreen Slideshow 2024
Juicybomb_EEP ad
My book on Goodreads!
Wagner James Au AAE Speakers Metaverse
Request me as a speaker!
Making of Second Life 20th anniversary Wagner James Au Thumb
my site ... ... ...
PC for SL
Recommended PC for SL
Macbook Second Life
Recommended Mac for SL

Classic New World Notes stories:

Woman With Parkinson's Reports Significant Physical Recovery After Using Second Life - Academics Researching (2013)

We're Not Ready For An Era Where People Prefer Virtual Experiences To Real Ones -- But That Era Seems To Be Here (2012)

Sander's Villa: The Man Who Gave His Father A Second Life (2011)

What Rebecca Learned By Being A Second Life Man (2010)

Charles Bristol's Metaverse Blues: 87 Year Old Bluesman Becomes Avatar-Based Musician In Second Life (2009)

Linden Limit Libertarianism: Metaverse community management illustrates the problems with laissez faire governance (2008)

The Husband That Eshi Made: Metaverse artist, grieving for her dead husband, recreates him as an avatar (2008)

Labor Union Protesters Converge On IBM's Metaverse Campus: Leaders Claim Success, 1850 Total Attendees (Including Giant Banana & Talking Triangle) (2007)

All About My Avatar: The story behind amazing strange avatars (2007)

Fighting the Front: When fascists open an HQ in Second Life, chaos and exploding pigs ensue (2007)

Copying a Controversy: Copyright concerns come to the Metaverse via... the CopyBot! (2006)

The Penguin & the Zookeeper: Just another unlikely friendship formed in The Metaverse (2006)

"—And He Rezzed a Crooked House—": Mathematician makes a tesseract in the Metaverse — watch the videos! (2006)

Guarding Darfur: Virtual super heroes rally to protect a real world activist site (2006)

The Skin You're In: How virtual world avatar options expose real world racism (2006)

Making Love: When virtual sex gets real (2005)

Watching the Detectives: How to honeytrap a cheater in the Metaverse (2005)

The Freeform Identity of Eboni Khan: First-hand account of the Black user experience in virtual worlds (2005)

Man on Man and Woman on Woman: Just another gender-bending avatar love story, with a twist (2005)

The Nine Souls of Wilde Cunningham: A collective of severely disabled people share the same avatar (2004)

Falling for Eddie: Two shy artists divided by an ocean literally create a new life for each other (2004)

War of the Jessie Wall: Battle over virtual borders -- and real war in Iraq (2003)

Home for the Homeless: Creating a virtual mansion despite the most challenging circumstances (2003)

Newstex_Author_Badge-Color 240px
JuicyBomb_NWN5 SL blog
Ava Delaney SL Blog