skip to main content
10.1145/2063576.2063590acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Unsupervised transactional query classification based on webpage form understanding

Published: 24 October 2011 Publication History

Abstract

Query type classification aims to classify search queries into categories like navigational, informational and transactional, etc., according to the type of information need behind the queries. Although this problem has drawn many research attentions, previous methods usually require editors to label queries as training data or need domain knowledge to edit rules for predicting query type. Also, the existing work has been mainly focusing on the classification of informational and navigational query types. Transactional query classification has not been well addressed. In this work, we propose an unsupervised approach for transactional query classification. This method is based on the observation that, after the transactional queries are issued to a search engine, many users will click the search result pages and then have interactions with Web forms on these pages. The interactions, e.g., typing in text box, making selections from dropdown list, clicking on a button to execute actions, are used to specify detailed information of the transaction. By mining toolbar search log data, which records the associations between queries and Web forms clicked by users, we can get a set of good quality transactional queries without using manual labeling efforts. By matching these automatically acquired transactional queries and their associated Web form contents, we can generalize these queries into patterns. These patterns can be used to classify queries which are not covered by search log. Our experiments indicate that transactional queries produced by this method have good quality. The pattern based classifier achieves 83% F1 classification result. This is very effective considering the fact that we do not adopt any labeling efforts to train the classifier.

References

[1]
G. Agarwal, G. Kabra, and K. C.-C. Chang. Towards rich query interpretation: walking back and forth for mining query templates. In WWW '10, pages 1--10, New York, NY, USA, 2010. ACM.
[2]
K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In SIGIR '98, pages 104--111, New York, NY, USA, 1998. ACM.
[3]
M. Bilenko and R. W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In WWW '08, pages 51--60, New York, NY, USA, 2008. ACM.
[4]
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.
[5]
N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In SIGIR '01, pages 250--257, New York, NY, USA, 2001. ACM.
[6]
L. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3), 1945.
[7]
E. C. Dragut, T. Kabisch, C. Yu, and U. Leser. A hierarchical approach to model web query interfaces for web source integration. Proc. VLDB Endow., 2(1):325--336, 2009.
[8]
B. He and K. C.-C. Chang. Automatic complex schema matching across web query interfaces: A correlation mining approach. ACM Trans. Database Syst., 31(1):346--395, 2006.
[9]
B. He, K. C.-C. Chang, and J. Han. Discovering complex matchings across web query interfaces: a correlation mining approach. In KDD '04, pages 148--157, New York, NY, USA, 2004. ACM.
[10]
B. He, Z. Zhang, and K. C.-C. Chang. Metaquerier: querying structured web sources on-the-fly. In SIGMOD '05, pages 927--929, New York, NY, USA, 2005. ACM.
[11]
H. He, W. Meng, C. Yu, and Z. Wu. Wise-integrator: an automatic integrator of web search interfaces for e-commerce. In VLDB '2003, pages 357--368. VLDB Endowment, 2003.
[12]
H. He, W. Meng, C. Yu, and Z. Wu. Wise-integrator: a system for extracting and integrating complex web search interfaces of the deep web. In VLDB '05, pages 1314--1317. VLDB Endowment, 2005.
[13]
B. J. Jansen, D. L. Booth, and A. Spink. Determining the informational, navigational, and transactional intent of web queries. Inf. Process. Manage., 44(3):1251--1266, 2008.
[14]
T. Joachims. Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schølkopf and C. Burges and A. Smola (ed.). MIT-Press, 1999.
[15]
O. Kaljuvee, O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Efficient web form entry on pdas. In WWW '01, pages 663--672, New York, NY, USA, 2001. ACM.
[16]
I.-H. Kang. Transactional query identification in web search. In AIRS, 2005.
[17]
I.-H. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR '03, pages 64--71, New York, NY, USA, 2003. ACM.
[18]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[19]
W. Kraaij, T. Westerveld, and D. Hiemstra. The importance of prior probabilities for entry page search. In SIGIR '02, pages 27--34, New York, NY, USA, 2002. ACM.
[20]
U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW '05, pages 391--400, New York, NY, USA, 2005. ACM.
[21]
V. LEVENSHTEIN. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl., 10(8):707--710, 1966.
[22]
Y. Liu, M. Zhang, L. Ru, and S. Ma. Automatic query type identification based on click-through information. In LNCS 4182, pages 593--600, 2006.
[23]
S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In VLDB '2001, pages 129--138, 2001.
[24]
D. E. Rose and D. Levinson. Understanding user goals in web search. In WWW '04, pages 13--19, New York, NY, USA, 2004. ACM.
[25]
D. Sullivan. Nielsen/netratings search engine ratings. Available from http://www.searchenginewatch.com/reports/netratings.html, 2006.
[26]
I. Szpektor, A. Gionis, and Y. Maarek. Improving recommendation for long-tail queries via templates. In WWW '11, pages 47--56, New York, NY, USA, 2011. ACM.
[27]
W. Wu, C. Yu, A. Doan, and W. Meng. An interactive clustering-based approach to integrating source query interfaces on the deep web. In SIGMOD '04, pages 95--106, NY, USA, 2004. ACM.

Cited By

View all
  • (2023)Smart Document Classifier and Analyzer Using NLP2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS60515.2023.10334144(1-6)Online publication date: 2-Nov-2023
  • (2020)Query Intent UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_4(69-101)Online publication date: 2-Dec-2020
  • (2012)The wisdom of advertisersProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396827(505-514)Online publication date: 29-Oct-2012

Index Terms

  1. Unsupervised transactional query classification based on webpage form understanding

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
      October 2011
      2712 pages
      ISBN:9781450307178
      DOI:10.1145/2063576
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 October 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. query type classification
      2. transactional query
      3. unsupervised learning

      Qualifiers

      • Research-article

      Conference

      CIKM '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Smart Document Classifier and Analyzer Using NLP2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS60515.2023.10334144(1-6)Online publication date: 2-Nov-2023
      • (2020)Query Intent UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_4(69-101)Online publication date: 2-Dec-2020
      • (2012)The wisdom of advertisersProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396827(505-514)Online publication date: 29-Oct-2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media