Skip to main content

Topical Pattern Based Document Modelling and Relevance Ranking

  • Conference paper
Web Information Systems Engineering – WISE 2014 (WISE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Included in the following conference series:

Abstract

For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bayardo Jr., R.J.: Efficiently mining long patterns from databases. ACM Sigmod Record 27, 85–93 (1998)

    Article  Google Scholar 

  2. Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: KDD 2002, pp. 436–442. ACM (2002)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000, pp. 33–40. ACM (2000)

    Google Scholar 

  5. Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classiffication. In: ICDE 2007, pp. 716–725. IEEE (2007)

    Google Scholar 

  6. Gao, Y., Xu, Y., Li, Y.: Pattern-based topic models for information filtering. In: Proceedings of International Conference on Data Mining Workshop SENTIRE, ICDM 2013. IEEE (2013)

    Google Scholar 

  7. Gao, Y., Xu, Y., Li, Y., Liu, B.: A two-stage approach for generating topic models. In: PADKDD 2013, pp. 221–232 (2013)

    Google Scholar 

  8. Lafferty, J., Zhai, C.: Probabilistic relevance models based on document and query generation. In: Language modeling for information retrieval, pp. 1–10. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49. ACM (2004)

    Google Scholar 

  10. Sparck Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of infor- mation retrieval: development and comparative experiments: Part 2. Information Processing & Management 36(6), 809–840 (2000)

    Article  Google Scholar 

  11. Steyvers, M., Griffiths, T.: Probabilistic topic models. Handbook of Latent Semantic Analysis 427(7), 424–440 (2007)

    Google Scholar 

  12. Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: KDD 2012, pp. 1285–1293. ACM (2012)

    Google Scholar 

  13. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: KDD 2011, pp. 448–456. ACM (2011)

    Google Scholar 

  14. Wang, X., McCallum, A., Wei, X.: Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In: ICDM 2007, pp. 697–702. IEEE (2007)

    Google Scholar 

  15. Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 178–185. ACM (2006)

    Google Scholar 

  16. Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: ICDM 2006, pp. 1157–1161. IEEE (2006)

    Google Scholar 

  17. Xu, Y., Li, Y., Shaw, G.: Reliable representations for association rules. Data & Knowledge Engineering 70(6), 555–575 (2011)

    Article  Google Scholar 

  18. Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: SDM, vol. 2, pp. 457–473 (2002)

    Google Scholar 

  20. Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002, pp. 81–88. ACM (2002)

    Google Scholar 

  21. Zhong, N., Li, Y., Wu, S.-T.: Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering 24(1), 30–44 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gao, Y., Xu, Y., Li, Y. (2014). Topical Pattern Based Document Modelling and Relevance Ranking. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics