The news media is buzzing with talk of IBM’s new DeepQA project, aimed at creating a program that can beat humans at the question-answering game of Jeopardy. This is indeed exciting indeed – although, at the moment, it’s a partly-completed plan rather than a demonstrated accomplishment. But let’s suppose IBM succeeds at its aim. What will this really mean?
To interpret DeepQA in the proper way, one needs to grasp the notion of "AI-completeness" — an informal concept that is central to the folklore of modern AI. Here’s the basic concept behind AI-completeness: Some problems are so hard that the only way to solve them is to create an artificial entity with human-level general intelligence. These problems are AI-complete. On the other hand, some problems — even though they’re hard for humans and seem to require great general intelligence — are actually amenable to simple, specialized approaches. These problems are not AI-complete.
Making things more interesting — it’s not always obvious up-front, even to the experts, which problems are AI-complete. Chess was thought by many to be AI-complete — until Deep Blue. For a while, many said Go was AI-complete, but now, computers are making so much progress in Go (finally playing above the beginner level and occasionally beating professionals), and that opinion is less common.
Many early AI researchers in the 1950s and ’60s thought that doing symbolic math like calculus was AI-complete, which seems absurd now that software like Mathematica and Maple is commonplace.
It’s not even obvious that there are any AI-complete problems. Perhaps every single thing we humans do can be handled by some tricky specialized program, and the key to making a human-level AI is making a high-level mind-architecture that appropriately combines all these tricky little programs. As an AI-researcher, my intuition finds me fairly skeptical of this hypothesis, but I don’t think it’s out of the question.
The classic example of an AI-complete problem involves a computer holding a human-like conversation in human language. Computing pioneer Alan Turing famously proposed in the 1950’s that if any computer could successfully impersonate a human in conversation with another human, it should be considered to have human-like intelligence (the so-called "Turing Test"). Of course, there’s a sense in which this is too hard of a test: why should an AI be required to fake humanity? Are humans required to fake AI-ity to be considered intelligent?
And then there’s Question-answering, often called QA. This is a tough AI problem that has often been considered pretty close to the holy grail of human-like conversation. It’s an active area in academic AI, with famous competitions like TREC, and some useful deployed applications like EAGLi for biomedicine.
However, whether question-answering is AI-complete is a subject of some disagreement in the research community. As of now, none of the available AI QA systems can compete with humans in answering questions about general knowledge, but some are pretty good in specialized domains.
The game of 20 questions was cracked in 2004 using surprisingly simple AI techniques, and the 20Q-playing AI system was embodied in a popular toy.
Wolfram Alpha, which is slated to be released later this month, will answer English questions about a variety of topics, based on knowledge entered by a large team of humans over a number of years.
And now, DeepQA aims to do for Jeopardy what was done for 20 Questions 5 years ago. According to IBM spokesman Michael Loughran, no specific dates or contestants have been scheduled yet. But if IBM can pull this off — and I think it’s quite possible — what will this mean?
It might mean that IBM is on the path to creating a human-level AI.
Or — and this is my suspicion — it might mean that question-answering is more like chess and calculus than we thought… a non-AI-complete problem.
Note that Jeopardy, as tough as it is for us humans with our hyperdeveloped monkey brains, is not about creativity or discovery — it’s about understanding English questions and then responding by "regurgitating" information that was read or heard somewhere. It requires quick access to knowledge of a broad range of topics including history, literature, politics, film, pop culture, and science. But much of this knowledge is available in online databases now, and nearly all the rest is online in relatively simple English phrases. It wouldn’t be entirely shocking if the Jeopardy task were susceptible to a relatively simple, "brute force" AI approach without any real humanlike understanding under the hood.
A somewhat more ambitious initiative is the Large Knowledge Collider, a European effort aimed at creating "a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web." This project has a lot more research obstacles to overcome than DeepQA, but their potential upside is larger too. They don’t just want to answer questions based on information stored in databases and simple English… they want to create software that reasons based on the large body information at its disposal and draws its own conclusions, going beyond this information. Clearly the direction they’re pointing in is more clearly general-intelligence oriented than in the case of DeepQA — though, whether the methods they have in mind are capable of achieving their goals is an open question.
But even if DeepQA doesn’t pose the possibility of any kind of "general intelligence" breakthrough, the technology it represents could still be fantastically useful. I spend a lot of time each week asking Google questions, and then sifting through pages to find the answers. This works better for me than using Ask.com or any other available question-answering product, but it wouldn’t take that huge a technology improvement to push QA systems ahead of keyword search engines like Google for many purposes.
Bring on the answerbots!