Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 14 July 2024

David Gerard@awful.systems · 11 months ago

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 14 July 2024

BigMuffin69@awful.systems · 11 months ago

https://www.nature.com/articles/d41586-024-02218-7

Might be slightly off topic, but interesting result using adversarial strategies against RL trained Go machines.

Quote: Humans able use the adversarial bots’ tactics to beat expert Go AI systems, does it still make sense to call those systems superhuman? “It’s a great question I definitely wrestled with,” Gleave says. “We’ve started saying ‘typically superhuman’.” David Wu, a computer scientist in New York City who first developed KataGo, says strong Go AIs are “superhuman on average” but not “superhuman in the worst cases”.

Me thinks the AI bros jumped the gun a little too early declaring victory on this one.

YourNetworkIsHaunted@awful.systems · 11 months ago

See, in StarCraft we would just say that the meta is evolving in order to accommodate this new strategy. Maybe Go needs to take a page from newer games in how these things are discussed.

sc_griffith@awful.systems · 11 months ago

this is simple. we just need to train a new model for every move. that way the adversarial bot won’t know what weaknesses to exploit

BigMuffin69@awful.systems · 11 months ago

In chess the table base for optimal moves with only 7 pieces takes like ~20 terrabytes to store. And in that DB there are bizzare checkmates that take 100 + moves even with perfect precision- ignoring the 50 move rule. I wonder if the reason these adversarial strats exists is because whatever the policy network/value network learns is way, way smaller than the minimum size of the “true” position eval function for Go. Thus you’ll just invariably get these counter play attacks as compression artifacts.

Sources cited: my ass cheeks

sc_griffith@awful.systems · 11 months ago

i don’t think that can be quite right, as illustrated by an extreme example: consider a game where the first move has player 1 choose “win” or “hypergo.” if player 1 chooses win, they win. if player 1 chooses hypergo, begin a game of Go on a 1,000,000,000 x 1,000,000,000 board, and whoever wins that subgame wins. for player 1, the ‘true’ position eval function must be in some sense incredibly complicated, because it includes hypergo nonsense. but player 1 strategy can be compressed to “choose win” without opening up any counterattacks

sc_griffith@awful.systems · 11 months ago

more generally I suspect that as soon as you are trying to compare some notion of a ‘true’ position eval function to eval functions you can actually generate you’re going to have a very difficult time making correct and clear predictions. the reason I say this is that treating such a ‘true’ function is essentially the domain of combinatorial game theory (not the same as “game theory”), and there are few if any bridges people have managed to build between cgt and practical Go etc playing engines. so it’s probably pretty hard to do

(I know there’s a theory of ‘temperature’ of combinatorial games that I think was developed for purposes of analyzing Go, but I don’t think it has any known relationship to reinforcement learning based Go engines)

ShakingMyHead@awful.systems · 11 months ago

60% of the time, it works 100% of the time.