Above: Most successful AI runs, character type (Wizard-Elf-Chaotic-Male, etc.) the level at which they died, and what killed them
Back in 2021, Meta (still Facebook at the time), sponsored an innovative challenge in AI programming: Create a bot which could win NetHack on its own. If you know the classic roguelike RPG, first created in the 80s, you know how hard that is. I know that from personal experience, because it was one of the very first games I obsessed over as a kid, and first inspired my interest in virtual worlds.
Though it was limited to ASCII graphics, the complexity and depth of the world made it seem real. Unlike say, a chess game, where each individual move is limited to a few dozen options, the moves in NetHack seem unlimited. I wrote about that complexity in one of my very first articles:
With every object, tool, weapon and creature imbued with a wealth of attributes, every situation has endless potential. The cockatrice, for example, could turn you into stone, but that is only the beginning. If you kill one, then pick it up with gloves, you can wield its body like a flail, instantly turning monsters to stone when you bash them with it. (Usenet wags dubbed this maneuver “wielding the rubber chicken.”) If you have a wand of Polymorph and also wear a Ring of Polymorph Control, you can actually turn yourself into a cockatrice, and explore the dungeon in that deadly form. You can even lay cockatrice eggs, too — usable as hand grenades of instant paralysis.
In NetHack, at any point, anything seems possible. Jean-Christophe Collet, a DevTeam member who discovered the game while working for a Parisian Unix company, says he was enthralled by “the sheer complexity of the situations you could get into, and the way that there was no ‘right way’ to get out of them.” Surrounded by Orcs, for example, you could incinerate most of them with your Wand of Lightning, but the blast would likely ricochet off the opposite wall and crisp you, too. You could wear your Ring of Conflict, which would magically compel the Orcs to start attacking each other instead — but then again, wearing it would probably also compel your pet Large Dog to attack you. You’d often get the eerie sense the game was anticipating you and all these uniquely intricate conundrums that no one could have possibly foreseen. Or could they?
AI developers at Oxford, NYU, and other top universities participated. So how'd they do? One of the participants recently shared the academic summaries with me, but the non-academic answer would be:
Above: The Medusa Level, which proved extremely tough for AIs in the NetHack Challenge
Pretty good for an AI, but super bad compared to a human gamer.
You can see that in the report's table above, showing the results of the best bots. At best, a bot only reached the 12th level of the dungeon (the game has roughly 50) and reached level 7 in Experience (where level 30 is the max).
And hilariously, one of the top bots died in one of the stupidest (but hilarious) ways: Kicking a wall. ("This is a costly action, to explore," the report notes drily, "as the agent loses health points and might even die when accidentally kicking against walls.")
One conclusion:
Notably, the game remains unsolved. The best agents’ median score is several orders of magnitude short of a typical (human) ‘ascension’. As argued in Kuttler et al. (2020), the Nethack Learning environment is at the frontier for RL research and it remains to be seen which methods will be able to scale to the point of reliably beating the game. [emph. mine]
It took me awhile to find these results online, and I sort of suspect Meta didn't do much to promote them, after no AI in the challenge managed to steal the Amulet of Yendor and ascend into heaven with it (NetHack's ridiculously near-impossible win condition).
I'm hoping to speak with one of the project leaders soon, and hope Meta and other top companies try to revive these efforts. Forget about creating an AI that can pass the Turing Test. If you really want to impress people, create an AI that can win NetHack!
And maybe do so after wielding the rubber chicken.
Anyway, read the academic reports (.pdfs both) here and also here.
Comments