Topic Analysis by Exploring Headline Information

Yan, Rong; Gao, Guanglai

doi:10.1007/978-3-030-62008-0_9

Rong Yan^13,14 &
Guanglai Gao^13,14

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12343))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1111 Accesses

Abstract

As for the topic representation in standard topic models, the words that appear in a document are considered with the same weight under the assumption of ‘bag of words’. The word-topic assignment will lean to the high-frequency words and ignore the influence of the low-frequency words. As a result, it will ultimately impact on the performance of topic representation. Generally, the statistical information obtained from the whole document collection can be used to improve this situation. In addition, headlines of some kind of documents, such as news articles, usually summarize the important elements in the document, and the words in headlines are more appropriate to represent the topics. However, few previous studies consider the headline rich information, which is significant for topic modeling. In this paper, we propose a new headline-based topic model in order to accomplish a well-formed topic description. Experimental results on three widely used datasets show that the proposed headline-based modeling scheme achieves lower perplexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article Google Scholar
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS 2005), pp. 147–154. MIT Press, MA (2005)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chen, X.Y., Xia, Y.Q., Jin, P., Carroll, J.: Dataless text classification with descriptive LDA. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015), pp. 2224–2231. AAAI (2015)
Google Scholar
Gao, Y., Xu, Y., Li, Y.F.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceeding of International Conference on Web Information Systems Engineering (WISE 2014), pp. 186–201. Springer International Publishing, Cham, October 2014
Google Scholar
Gao, Y., Xu, Y., Li, Y., Liu, B.: A two-stage approach for generating topic models. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 221–232. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_19
Chapter Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Article Google Scholar
Mei, Q., Zhai, C.X.: Topical pattern based document modelling and relevance ranking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 198–207. ACM, New York, August 2005
Google Scholar
Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word features for latent Dirichlet allocation. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS 2010), vol. 2, pp. 1921–1929. Curran Associates Inc., New York, December 2010
Google Scholar
Sato, I., Nakagawa, H.: Topic models with power-law using pitman-Yor process. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 673–682. ACM, New York, July 2010
Google Scholar
Tang, G., Xia, Y., Sun, J., Zhang, M., Zheng, T.F.: Statistical word sense aware topic models. Soft. Comput. 19(1), 13–27 (2014). https://doi.org/10.1007/s00500-014-1372-z
Article Google Scholar
Trabelsi, A., ZaI̋ane, O.R.: A joint topic viewpoint model for contention analysis. In: Natural Language Processing and Information Systems, pp. 114–125. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-07983-7_16
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), pp. 448–456. ACM, New York, August 2011
Google Scholar
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), pp. 178–185. ACM, New York, August 2006
Google Scholar
Wilson, A.T., Chew, P.A.: Term weighting schemes for latent Dirichlet allocation. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 465–473. Association for Computational Linguistics, Stroudsburg, June 2010
Google Scholar
Xia, Y.Q., Tang, N., Hussain, A., Cambria, E.: Discriminative Bi-term topic model for headline-based social news clustering. In: Proceedings of the 28th Florida Ariticial Intelligence Research Society Conference, pp. 311–316. AAAI, April 2015
Google Scholar
Zeng, J.P., Duan, J.J., Cao, W.J., Wu, C.R.: Topics modeling based on selective Zipf distribution. Expert Syst. Appl. 49(7), 6541–6546 (2012)
Article Google Scholar
Zhai, C.X., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 743–748. ACM, New York, August 2004
Google Scholar

Download references

Acknowledgment

This research is jointly supported by the Natural Science Foundation of China (Grant No. 61866029, 61763034) and Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2018MS06025).

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, Hohhot, People’s Republic of China
Rong Yan & Guanglai Gao
Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, Hohhot, People’s Republic of China
Rong Yan & Guanglai Gao

Authors

Rong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong Yan .

Editor information

Editors and Affiliations

VU Amsterdam, Amsterdam, The Netherlands
Zhisheng Huang
VU Amsterdam, Amsterdam, The Netherlands
Wouter Beek
Victoria University, Melbourne, VIC, Australia
Hua Wang
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou
Victoria University, Melbourne, VIC, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, R., Gao, G. (2020). Topic Analysis by Exploring Headline Information. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-62008-0_9
Published: 21 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62007-3
Online ISBN: 978-3-030-62008-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics