Skip to main content

Topic Analysis by Exploring Headline Information

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2020 (WISE 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12343))

Included in the following conference series:

  • 1111 Accesses

Abstract

As for the topic representation in standard topic models, the words that appear in a document are considered with the same weight under the assumption of ‘bag of words’. The word-topic assignment will lean to the high-frequency words and ignore the influence of the low-frequency words. As a result, it will ultimately impact on the performance of topic representation. Generally, the statistical information obtained from the whole document collection can be used to improve this situation. In addition, headlines of some kind of documents, such as news articles, usually summarize the important elements in the document, and the words in headlines are more appropriate to represent the topics. However, few previous studies consider the headline rich information, which is significant for topic modeling. In this paper, we propose a new headline-based topic model in order to accomplish a well-formed topic description. Experimental results on three widely used datasets show that the proposed headline-based modeling scheme achieves lower perplexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://research.nii.ac.jp/ntcir/index-en.html.

  2. 2.

    http://trec.nist.gov/.

  3. 3.

    http://kdd.ics.uci.edu/database/reuters21578/reuters21578.html.

  4. 4.

    https://tartarus.org/martin/PorterStemmer/.

  5. 5.

    http://sourceforge.net/projects/jgibblda/.

References

  1. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  2. Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS 2005), pp. 147–154. MIT Press, MA (2005)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Chen, X.Y., Xia, Y.Q., Jin, P., Carroll, J.: Dataless text classification with descriptive LDA. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015), pp. 2224–2231. AAAI (2015)

    Google Scholar 

  5. Gao, Y., Xu, Y., Li, Y.F.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceeding of International Conference on Web Information Systems Engineering (WISE 2014), pp. 186–201. Springer International Publishing, Cham, October 2014

    Google Scholar 

  6. Gao, Y., Xu, Y., Li, Y., Liu, B.: A two-stage approach for generating topic models. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 221–232. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_19

    Chapter  Google Scholar 

  7. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  8. Mei, Q., Zhai, C.X.: Topical pattern based document modelling and relevance ranking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 198–207. ACM, New York, August 2005

    Google Scholar 

  9. Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word features for latent Dirichlet allocation. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS 2010), vol. 2, pp. 1921–1929. Curran Associates Inc., New York, December 2010

    Google Scholar 

  10. Sato, I., Nakagawa, H.: Topic models with power-law using pitman-Yor process. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 673–682. ACM, New York, July 2010

    Google Scholar 

  11. Tang, G., Xia, Y., Sun, J., Zhang, M., Zheng, T.F.: Statistical word sense aware topic models. Soft. Comput. 19(1), 13–27 (2014). https://doi.org/10.1007/s00500-014-1372-z

    Article  Google Scholar 

  12. Trabelsi, A., ZaI̋ane, O.R.: A joint topic viewpoint model for contention analysis. In: Natural Language Processing and Information Systems, pp. 114–125. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-07983-7_16

  13. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), pp. 448–456. ACM, New York, August 2011

    Google Scholar 

  14. Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), pp. 178–185. ACM, New York, August 2006

    Google Scholar 

  15. Wilson, A.T., Chew, P.A.: Term weighting schemes for latent Dirichlet allocation. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 465–473. Association for Computational Linguistics, Stroudsburg, June 2010

    Google Scholar 

  16. Xia, Y.Q., Tang, N., Hussain, A., Cambria, E.: Discriminative Bi-term topic model for headline-based social news clustering. In: Proceedings of the 28th Florida Ariticial Intelligence Research Society Conference, pp. 311–316. AAAI, April 2015

    Google Scholar 

  17. Zeng, J.P., Duan, J.J., Cao, W.J., Wu, C.R.: Topics modeling based on selective Zipf distribution. Expert Syst. Appl. 49(7), 6541–6546 (2012)

    Article  Google Scholar 

  18. Zhai, C.X., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 743–748. ACM, New York, August 2004

    Google Scholar 

Download references

Acknowledgment

This research is jointly supported by the Natural Science Foundation of China (Grant No. 61866029, 61763034) and Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2018MS06025).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, R., Gao, G. (2020). Topic Analysis by Exploring Headline Information. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62008-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62007-3

  • Online ISBN: 978-3-030-62008-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics