Abstract
This paper introduces Netspeak, a Web service which assists writers in finding adequate expressions. To provide statistically relevant suggestions, the service indexes more than 1.8 billion n-grams, n ≤ 5, along with their occurrence frequencies on the Web. If in doubt about a wording, a user can specify a query that has wildcards inserted at those positions where she feels uncertain.
Queries define patterns for which a ranked list of matching n-grams along with usage examples are retrieved. The ranking reflects the occurrence frequencies of the n-grams and informs about both absolute and relative usage. Given this choice of customary wordings, one can easily select the most appropriate. Especially second-language speakers can learn about style conventions and language usage.
To guarantee response times within milliseconds we have developed an index that considers occurrence probabilities, allowing for a biased sampling during retrieval. Our analysis shows that the extreme speedup obtained with this strategy (factor 68) comes without significant loss in retrieval quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: IO-Top-k: Index-access Optimized Top-k Query Processing. In: Proc. of VLDB 2006 (2006)
Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, Displace, and Compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)
Brants, T., Franz, A.: Web 1T 5-gram Version 1. Linguistic Data Consortium (2006)
Brockett, C., Dolan, W.B., Gamon, M.: Correcting ESL Errors Using Phrasal SMT Techniques. In: Proc. of ACL 2006 (2006)
Cafarella, M.J., Etzioni, O.: A Search Engine for Natural Language Applications. In: Proc. of WWW 2005 (2005)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A Survey of Top-k Query Processing Techniques in Relational Database Systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT, Cambridge (1999)
Resnik, P., Elkiss, A.: The Linguist’s Search Engine: An Overview. In: Proc. of ACL 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stein, B., Potthast, M., Trenkmann, M. (2010). Retrieving Customary Web Language to Assist Writers. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-12275-0_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)