ABSTRACT
In this paper we propose a way to cope with questions typed by dyslexic users as they are usually a deformation of the intended query that cannot be corrected with classical spellcheckers. We first propose a new model for statistic question answering systems based on a probabilistic information retrieval model and a combination of results. This model allows a multiple weighted terms query as an input. We also introduce a phonology based approach at the sentence level to derive possible intended terms from typed questions. This approach uses the finite state machine framework to go from phonetic hypothesis and spellchecker proposals to hypothesized sentences thanks to a language model. The final weighted queries are obtained thanks to posterior probabilities computation. They are evaluated according to new density and appearance rating measures which adapt recall and precision to non binary data.
- C. Allauzen and M. Mohri. The design principles and algorithms of a weighted grammar library. International Journal of Foundations of Computer Science, 16(3):403--421, 2005.Google ScholarCross Ref
- G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Actes de ECIR'04, Lecture Notes in Computer Science, pages 127--137, Sunerland, 2004. Springer.Google Scholar
- F. Bechet. Lia_phon - un systeme complet de phonetisation de textes. Traitement Automatique des Langues (T.A.L.), 42 (1), 2001.Google Scholar
- E. Brill and R. C. Moore. An improved error model for noisy channel spelling correction. In Proceedings of the 38th Annual Meeting of the ACL, pages 286--293, 2000. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Actes de SIGIR'02, pages 299--306. ACM, August 2002. Google ScholarDigital Library
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
- S. Deorowicz and M. G. Ciura. Correcting spelling errors by modelling their causes. International journal of applied mathematics and computer science, 15(2):275--285, 2005.Google Scholar
- C. Fairon and S. Paumier. A translated corpus of 30,000 french sms. In In Proceeding of LREC 2006, Genoa, Italy, May 2006.Google Scholar
- E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2), pages 243--252, 1994.Google Scholar
- J. Gao, H. Qi, X. Xia, and J.-Y. Nie. Linear discriminant model for information retrieval. In Proceedings of SIGIR'05, pages 290--297, 2005. Google ScholarDigital Library
- G. T. Gillon. Phonological Awareness- From Research to Practice. Guilford Press, 2004.Google Scholar
- J. Grivolla, P. Jourlin, and R. D. Mori. Automatic classification of queries by expected retrieval performance. In Actes de SIGIR'05, Salvador, 2005. ACM Press.Google Scholar
- A. James and E. Draffan. The accuracy of electronic spell checkers for dyslexic learners. PATOSS bulletin, August 2004.Google Scholar
- K. L. Kwok. An attempt to identify weakest and strongest queries. In Actes de SIGIR'05, Salvador, 2005. ACM Press.Google Scholar
- D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, 2006. ACM Press. Google ScholarDigital Library
- R. P. W. Loosemore. A neural net model of normal and dyslexic spelling. In International Joint Conference on Neural Networks, volume 2, pages 231--236, Seattle, USA, 1991.Google Scholar
- C. D. Loupy and P. Bellot. Evaluation of document retrieval systems and query difficulty. In Actes du. LREC'2000 Satellite Workshop "Using Evaluation within HLT Programs: Results and trends", pages 31--38, Athènes, 2000.Google Scholar
- M. Mohri, F. C. N. Pereira, and M. Riley. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16(1):69--88, 2002.Google ScholarDigital Library
- M. Mohri, F. C. N. Pereira, and M. D. Riley. At&t fsm librarytm - finite-state machine library, 1997.Google Scholar
- J. Mothe and L. Tanguy. Linguistic features to predict query difficulty - a case study on previous trec campaigns. In Actes de SIGIR'05, pages 7--10, Salvador, 2005. ACM Press.Google Scholar
- J.-Y. Nie. Clir as query expansion as logical inference. Technology letters, 4(1):69--76, 2000.Google Scholar
- J. Pedler. The detection and correction of real-word spelling errors in dyslexic text. In Proceedings of the 4th Annual CLUK Colloquium, 2001.Google Scholar
- S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter. Probabilistic models of indexing and searching. In 3rd annual ACM conference on Research and development in information retrieval, pages 35--36, Cambridge, England, 1980. Google Scholar
- Roger. A spelling checker for dyslexic users: user modelling for error recovery. PhD thesis, Human Computer Interaction Group, Department of Computer Science, University of York, Heslington, York, September 1998.Google Scholar
- L. Sitbon, P. Bellot, and P. Blache. Phonetic based sentence level rewriting of questions typed by dyslexic spellers in an information retrieval context. In Proceedings of Interspeech 2007, Antwerp, Belgium, September 2007.Google Scholar
- L. Sitbon, P. Bellot, and P. Blache. A corpus of real-life questions for evaluating robustness of qa systems. In Proceedings of the 6th edition of the Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco, May 2008.Google Scholar
- L. Sitbon, L. Gillard, J. Grivolla, P. Bellot, and P. Blache. Vers une prédiction automatique de la difficulté d'une question en langue naturelle. In 13ième conférence Traitement Automatique des Langues Naturelles (TALN), pages 337--346, Louvain, Belgique, 10--13 Avril 2006.Google Scholar
- K. Toutanova and R. C. Moore. Pronunciation modeling for improved spelling correction. In Proceedings of the 40th annual meeting of ACL, pages 144--151, Philadelphia, July 2002. Google ScholarDigital Library
- C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151--173, 1999. Google ScholarDigital Library
- E. M. Voorhees and D. Harman. Overview of the eighth text retrieval conference (trec-8). In proceedings of the eighth Text REtrieval Conference, pages 1--24, Gaithersburg, Maryland, USA, November 1999.Google Scholar
- P. Wolf and B. Raj. The merl spokenquery information retrieval system. In IEEE International Conference on Multimedia and Expo (ICME), volume 2, pages 317--320, Août 2002.Google ScholarCross Ref
Index Terms
- How to cope with questions typed by dyslexic users
Recommendations
Probabilistic models for answer-ranking in multilingual question-answering
This article presents two probabilistic models for answering ranking in the multilingual question-answering (QA) task, which finds exact answers to a natural language question written in different languages. Although some probabilistic methods have been ...
Semantic computation in a Chinese Question-Answering system
AbstractThis paper introduces a kind of semantic computation and presents how to combine it into our Chinese Question-Answering (QA) system. Based on two kinds of language resources,Hownet andCilin, we present an approach to computing the similarity and ...
Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering
AbstractQuestion Answering (QA) systems based on Information Retrieval return precise answers to natural language questions, extracting relevant sentences from document collections. However, questions and sentences cannot be aligned ...
Comments