Why are Turing tests so evasive? Discover a new approach to measuring artificial intelligence from Expect Labs CEO, Timothy Tuttle.
For the past thirty years there have been competitions as well as scientists that have attempted to create computing programs that would pass a Turing test. And these are typically question and answer programs where you develop a computer system that’s designed to take as an input a question, analyze a language, and then provide a response in return. As it turns out, the programs that end up doing the best in these Turing test situations are ones that don’t necessarily directly answer the question all the time, but instead, avoid answering that question, misdirect, or evade the question. So for example, if the interviewer asks a question like, you know, “Who is your favorite composer,” the computer program could respond something like, “I don’t like answering personal questions, why don’t you tell me a joke instead?” And then the interviewer would tell a joke, and then the computer program would laugh or make believe they were entertained because they knew the interviewer was going to ask a joke. And so the argument that Professor Levesque makes is that that is not really a true test of intelligence because by evading the question you can often times demonstrate intelligence when really the computer program quite simple underneath.
So as a proposal what Levesque proposes is using a new type of test called the Winograd Schema Test, which is named after computer science professor at Stanford Terry Winograd, and instead of allowing the you can ask any type of question at all. What it proposes is that the computer program has to come up with a correct answer, some predetermined questions that require that humans are very good at answering but typically computers aren’t. And I’ll give you an example. So a question would be something like, “Joan makes sure to thank Susan for all the help she has received. So who received the help? And a question like that is not obvious, but because Joan was thanking Susan, then probably Joan receive the help? A five-year-old could probably answer that question pretty well. Even today’s best question and answer assistants wouldn’t be able to answer that. I’m pretty sure Siri probably couldn’t do a good job at answering that. Wolfram Alpha probably have trouble with that. The Watson supercomputer that IBM built probably would have trouble with that as well.
Another example would be something like the question, “the large ball crushed the table because it was made of styrofoam.” My question is, well, what made of styrofoam? Well in this case, because the large ball crushed the table, it was probably the table that was made of styrofoam, not the ball, and so again, any human can figure it out, but a computer would probably have a hard time. And so this is being proposed, at least by Professor Levesque and some others, as potentially a better way to measure the intelligence of computing systems because it really gets to the heart of what it is that humans can do very well that most computing systems still have a hard time with. And I thought that was very interesting.