Efficient query autocompletion with edit distance-based error tolerance

Qin, Jianbin; Xiao, Chuan; Hu, Sheng; Zhang, Jie; Wang, Wei; Ishikawa, Yoshiharu; Tsuda, Koji; Sadakane, Kunihiko

doi:10.1007/s00778-019-00595-4

Efficient query autocompletion with edit distance-based error tolerance

Regular Paper
Published: 14 December 2019

Volume 29, pages 919–943, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Jianbin Qin¹,
Chuan Xiao^2,3,
Sheng Hu^3,4,
Jie Zhang⁵,
Wei Wang⁶,
Yoshiharu Ishikawa³,
Koji Tsuda⁷ &
…
Kunihiko Sadakane⁷

680 Accesses
3 Citations
Explore all metrics

Abstract

Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper, we study the problem of query autocompletion that tolerates errors in users’ input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distances from the query string are within the given threshold. The major inherent drawback of these approaches is that the number of such prefixes is huge for the first few characters of the query string and is exponential in the alphabet size. This results in slow query response even if the entire query approximately matches only few prefixes. We propose a novel neighborhood generation-based method to process error-tolerant query autocompletion. Our proposed method only maintains a small set of active nodes, thus saving both space and time to process the query. We also study efficient duplicate removal, a core problem in fetching query answers, and extend our method to support top-k queries. Optimization techniques are proposed to reduce the index size. The efficiency of our method is demonstrated through extensive experiments on real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 5

NGNC: A Flexible and Efficient Framework for Error-Tolerant Query Autocompletion

SETJoin: a novel top-k similarity join algorithm

Article 06 March 2020

findere: Fast and Precise Approximate Membership Query

Notes

The worst case may happen for the first few keystrokes, when the prefixes of most data strings have their edit distances within \(\tau \) from the query string.
This exception happens when n is the end of a data string.
In case \(n_j\) is the last child, we recursively go up the tree until reaching an ancestor such that it has a next sibling, and then use the next sibling as \(n_k\).
http://www.informatik.uni-trier.de/~ley/db/.
http://ebiquity.umbc.edu/resource/html/id/351.
http://mbr.nlm.nih.gov/Download/index.shtml.
https://jeffhuang.com/search_query_logs.html.

References

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston (1974)
MATH Google Scholar
Aoe, J.-I.: An efficient digital search algorithm by using a double-array structure. IEEE Trans. Softw. Eng. 15(9), 1066–1077 (1989)
Article Google Scholar
Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. JASIST 58(12), 1793–1804 (2007)
Article Google Scholar
Bar-Yossef, Z., Kraus, N.: Context-sensitive query auto-completion. In: WWW, pp. 107–116 (2011)
Bast, H., Weber, I.: Type less, find more: fast autocompletion search with a succinct index. In: SIGIR, pp. 364–371 (2006)
Bhatia, S., Majumdar, D., Mitra, P.: Query suggestions in the absence of query logs. In: SIGIR, pp. 795–804 (2011)
Bocek, T., Hunt, E., Stiller, B.: Fast similarity search in large dictionaries. Technical Report ifi-2007.02. Department of Informatics, University of Zurich (2007)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)
Boytsov, L.: Indexing methods for approximate dictionary searching: comparative analysis. ACM J. Exp. Algorithm. 16(1), 1 (2011)
MathSciNet MATH Google Scholar
Cai, F., Chen, H.: Term-level semantic similarity helps time-aware term popularity based query completion. J. Intell. Fuzzy Syst. 32(6), 3999–4008 (2017)
Article Google Scholar
Cai, F., Chen, W., Ou, X.: Learning search popularity for personalized query completion in information retrieval. J. Intell. Fuzzy Syst. 33(4), 2427–2435 (2017)
Article Google Scholar
Cai, F., de Rijke, M.: Selectively personalizing query auto-completion. In: SIGIR, pp. 993–996 (2016)
Cai, F., Liang, S., de Rijke, M.: Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Trans. Knowl. Data Eng. 28(9), 2452–2466 (2016)
Article Google Scholar
Cao, H., Jiang, D., Pei, J., Chen, E., Li, H.: Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In: WWW, pp. 191–200 (2009)
Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: KDD, pp. 875–883 (2008)
Cetindil, I., Esmaelnezhad, J., Kim, T., Li, C.: Efficient instant-fuzzy search with proximity ranking. In: ICDE, pp. 328–339 (2014)
Chaudhuri, S., Kaushik, R.: Extending autocompletion to tolerate errors. In: SIGMOD, pp. 707–718 (2009)
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)
Daciuk, J.: Comparison of construction algorithms for minimal, acyclic, deterministic, finite-state automata from sets of strings. In: CIAA, pp. 255–261 (2002)
Darragh, J.J., Witten, I.H., James, M.L.: The reactive keyboard: a predicive typing aid. IEEE Comput. 23(11), 41–49 (1990)
Article Google Scholar
Deng, D., Li, G., Feng. J.: A pivotal prefix based filtering algorithm for string similarity search. In: SIGMOD, pp. 673–684 (2014)
Deng, D., Li, G., Feng, J., Duan, Y., Gong, Z.: A unified framework for approximate dictionary-based entity extraction. VLDB J. 24(1), 143–167 (2015)
Article Google Scholar
Deng, D., Li, G., Wen, H., Jagadish, H.V., Feng, J.: META: an efficient matching-based method for error-tolerant autocompletion. PVLDB 9(10), 828–839 (2016)
Google Scholar
Duan, H., Hsu, B.-J.P.: Online spelling correction for query completion. In: WWW, pp. 117–126 (2011)
Duan, H., Li, Y., Zhai, C., Roth, D.: A discriminative model for query spelling correction with latent structural SVM. In: EMNLP-CoNLL, pp. 1511–1521 (2012)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Fan, J., Wu, H., Li, G., Zhou, L.: Suggesting topic-based query terms as you type. In: APWeb, pp. 61–67 (2010)
Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)
Article Google Scholar
Gao, J., Li, X., Micol, D., Quirk, C., Sun, X.: A large scale ranker-based system for search query spelling correction. In: COLING, pp. 358–366 (2010)
Grabski, K., Scheffer, T.: Sentence completion. In: SIGIR, pp. 433–439 (2004)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB, pp. 491–500 (2001)
He, Q., Jiang, D., Liao, Z., Hoi, S.C.H., Chang, K., Lim, E.-P., Li, H.: Web query recommendation via sequential query prediction. In: ICDE, pp. 1443–1454 (2009)
Hofmann, K., Mitra, B., Radlinski, F., Shokouhi, M.: An eye-tracking study of user interactions with query auto completion. In: CIKM, pp. 549–558 (2014)
Hsu, B.P., Ottaviano, G.: Space-efficient data structures for top-\(k\) completion. In: WWW, pp. 583–594 (2013)
Hu, S., Xiao, C., Ishikawa, Y.: An efficient algorithm for location-aware query autocompletion. IEICE Trans. 101–D(1), 181–192 (2018)
Article Google Scholar
Ji, S., Li, C.: Location-based instant search. In: SSDBM, pp. 17–36 (2011)
Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)
Jiang, J., Ke, Y., Chien, P., Cheng, P.: Learning user reformulation behavior for query auto-completion. In: SIGIR, pp. 445–454 (2014)
Krishnan, U., Moffat, A., Zobel, J.: A taxonomy of query auto completion modes. In: ADCS, pp. 6:1–6:8 (2017)
Li, C., Wang, B., Yang, X.: VGRAM: improving performance of approximate queries on string collections using variable-length grams. In: VLDB, pp. 303–314 (2007)
Li, G., Deng, D., Feng, J.: A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst. 38(2), 9:1–9:33 (2013)
Article MathSciNet MATH Google Scholar
Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD, pp. 695–706 (2009)
Li, G., Ji, S., Li, C., Feng, J.: Efficient fuzzy full-text type-ahead search. VLDB J. 20(4), 617–640 (2011)
Article Google Scholar
Li, G., Wang, J., Li, C., Feng, J.: Supporting efficient top-k queries in type-ahead search. In: SIGIR, pp. 355–364 (2012)
Li, L., Deng, H., Dong, A., Chang, Y., Baeza-Yates, R.A., Zha, H.: Exploring query auto-completion and click logs for contextual-aware web search and query suggestion. In: WWW, pp. 539–548 (2017)
Li, L., Deng, H., Dong, A., Chang, Y., Zha, H., Baeza-Yates, R.A.: Analyzing user’s sequential behavior in query auto-completion via Markov processes. In: SIGIR, pp. 123–132 (2015)
Li, Y., Dong, A., Wang, H., Deng, H., Chang, Y., Zhai, C.: A two-dimensional click model for query auto-completion. In: SIGIR, pp. 455–464 (2014)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Mitra, B., Shokouhi, M., Radlinski, F., Hofmann, K.: On user interactions with query auto-completion. In: SIGIR, pp. 1055–1058 (2014)
Mor, M., Fraenkel, A.S.: A hash code method for detecting and correcting spelling errors. Commun. ACM 25(12), 935–938 (1982)
Article Google Scholar
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666 (2002)
Myers, E.W.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)
Article MathSciNet MATH Google Scholar
Nandi, A., Jagadish, H.V.: Effective phrase prediction. In: VLDB, pp. 219–230 (2007)
Qin, J., Wang, W., Xiao, C., Lu, Y., Lin, X., Wang, H.: Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst. 38(3), 16 (2013)
Article MathSciNet MATH Google Scholar
Roy, S.B., Chakrabarti, K.: Location-aware type ahead search on spatial databases: semantics and efficiency. In: SIGMOD, pp. 361–372 (2011)
Sadikov, E., Madhavan, J., Wang, L., Halevy, A.Y.: Clustering query refinements by user intent. In: WWW, pp. 841–850 (2010)
Shokouhi, M.: Learning to personalize query auto-completion. In: SIGIR, pp. 103–112 (2013)
Shokouhi, M., Radinsky, K.: Time-sensitive query auto-completion. In: SIGIR, pp. 601–610 (2012)
Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J.G., Nie, J.: A hierarchical recurrent encoder–decoder for generative context-aware query suggestion. In: CIKM, pp. 553–562 (2015)
Tsur, D.: Fast index for approximate string matching. J. Discrete Algorithms 8(4), 339–345 (2010)
Article MathSciNet MATH Google Scholar
Tyler, S.K., Teevan, J.: Large scale query log analysis of re-finding. In: WSDM, pp. 191–200 (2010)
Ukkonen, E.: Algorithms for approximate string matching. Inf. Control 64(1–3), 100–118 (1985)
Article MathSciNet MATH Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Wang, W., Qin, J., Xiao, C., Lin, X., Shen, H.T.: Vchunkjoin: an efficient algorithm for edit similarity joins. IEEE Trans. Knowl. Data Eng. 25(8), 1916–1929 (2013)
Article Google Scholar
Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit constraints. In: SIMGOD, pp. 759–770 (2009)
Wang, Y., Ouyang, H., Deng, H., Chang, Y.: Learning online trends for interactive query auto-completion. IEEE Trans. Knowl. Data Eng. 29(11), 2442–2454 (2017)
Article Google Scholar
Wei, H., Yu, J.X., Lu, C.: String similarity search: a hash-based approach. IEEE Trans. Knowl. Data Eng. 30(1), 170–184 (2018)
Article Google Scholar
Wen, J., Zhang, H., Nie, J.: Query clustering using content words and user feedback. In: SIGIR, pp. 442–443 (2001)
Whiting, S., Jose, J.M.: Recent and robust query auto-completion. In: WWW, pp. 971–982 (2014)
Xiao, C., Qin, J., Wang, W., Ishikawa, Y., Tsuda, K., Sadakane, K.: Efficient error-tolerant query autocompletion. PVLDB 6(6), 373–384 (2013)
Google Scholar
Xiao, C., Wang, W., Lin, X.: Ed-Join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB 1(1), 933–944 (2008)
MathSciNet Google Scholar
Yu, M., Wang, J., Li, G., Zhang, Y., Deng, D., Feng, J.: A unified framework for string similarity search with edit-distance constraint. VLDB J. 26(2), 249–274 (2017)
Article Google Scholar
Zhang, A., Goyal, A., Kong, W., Deng, H., Dong, A., Chang, Y., Gunter, C.A., Han, J.: adaqac: adaptive query auto-completion via implicit negative feedback. In: SIGIR, pp. 143–152 (2015)
Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD, pp. 425–436 (2001)
Zheng, Y., Bao, Z., Shou, L., Tung, A.K.H.: INSPIRE: a framework for incremental spatial prefix query relaxation. IEEE Trans. Knowl. Data Eng. 27(7), 1949–1963 (2015)
Article Google Scholar
Zhong, R., Fan, J., Li, G., Tan, K., Zhou, L.: Location-aware instant search. In: CIKM, pp. 385–394 (2012)
Zhou, X., Qin, J., Xiao, C., Wang, W., Lin, X., Ishikawa, Y.: BEVA: an efficient query processing algorithm for error-tolerant autocompletion. ACM Trans. Database Syst. 41(1), 5:1–5:44 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Chuan Xiao was supported by JSPS Kakenhi 16H01722, 17H06099, 18H04093, and NSFC 61702409. Sheng Hu and Yoshiharu Ishikawa were supported by JSPS Kakenhi 16H01722. Jie Zhang was supported by NSFC 61702409. Wei Wang was supported by ARC DPs 170103710 and 180103411, and D2DCRC DC25002 and DC25003. We thank the authors of [23] for kindly providing their source codes.

Author information

Authors and Affiliations

Shenzhen Institute of Computing Sciences, Shenzhen University, Shenzhen, China
Jianbin Qin
Osaka University, Osaka, Japan
Chuan Xiao
Nagoya University, Nagoya, Japan
Chuan Xiao, Sheng Hu & Yoshiharu Ishikawa
Kyoto University, Kyoto, Japan
Sheng Hu
Xi’an University of Technology, Xi’an, China
Jie Zhang
The University of New South Wales, Sydney, Australia
Wei Wang
The University of Tokyo, Tokyo, Japan
Koji Tsuda & Kunihiko Sadakane

Authors

Jianbin Qin
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiharu Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar
Koji Tsuda
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiko Sadakane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianbin Qin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, J., Xiao, C., Hu, S. et al. Efficient query autocompletion with edit distance-based error tolerance. The VLDB Journal 29, 919–943 (2020). https://doi.org/10.1007/s00778-019-00595-4

Download citation

Received: 06 November 2018
Revised: 01 April 2019
Accepted: 12 August 2019
Published: 14 December 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00778-019-00595-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient query autocompletion with edit distance-based error tolerance

Abstract

Access this article

Similar content being viewed by others

NGNC: A Flexible and Efficient Framework for Error-Tolerant Query Autocompletion

SETJoin: a novel top-k similarity join algorithm

findere: Fast and Precise Approximate Membership Query

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient query autocompletion with edit distance-based error tolerance

Abstract

Access this article

Similar content being viewed by others

NGNC: A Flexible and Efficient Framework for Error-Tolerant Query Autocompletion

SETJoin: a novel top-k similarity join algorithm

findere: Fast and Precise Approximate Membership Query

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation