Open Domain Question Answering
May 19th, 2008As part of the CLMA program here at University of Washington, we were asked to investigate an area of interest and write a short summary of what we read about.
So I chose to look at the TREC Question Answering challenge and read about some of the techniques that were used for submissions to that evaluation. The kinds of questions that are used for this include simple factoid questions such as “How many calories are in a Big Mac?”, list questions such as “Which past and present NFL players have the last name of Johnson?”, and definition question such as “What is a Golden Parachute?”.
Many of the system descriptions submitted to TREC have pipelined architectures. Here are some examples of components that were described:
- Question Type Classification - Decide what the answer should be like. Is it asking for a name, a date, a description or a list, etc.
- Question Rewrite - Convert the question to search terms that are used for the IR search. This step typically expands the list of words used in the search by adding synonyms. This gives a broader list of pages returned that can then be further analyzed.
- IR Search - generally a simple search using established technologies.
- Passage Selection - deciding which portions of the text returned by the IR search to include in the results.
- Answer extraction and ranking - creating the actual text that will be returned as answers.
The paper is not intended to be a thorough description of the field. It helped me get a handle on what current designs are being developed.
Here is the link to my exploration paper: QuestionAnsweringAtTrec