Abstract
As for the topic representation in standard topic models, the words that appear in a document are considered with the same weight under the assumption of ‘bag of words’. The word-topic assignment will lean to the high-frequency words and ignore the influence of the low-frequency words. As a result, it will ultimately impact on the performance of topic representation. Generally, the statistical information obtained from the whole document collection can be used to improve this situation. In addition, headlines of some kind of documents, such as news articles, usually summarize the important elements in the document, and the words in headlines are more appropriate to represent the topics. However, few previous studies consider the headline rich information, which is significant for topic modeling. In this paper, we propose a new headline-based topic model in order to accomplish a well-formed topic description. Experimental results on three widely used datasets show that the proposed headline-based modeling scheme achieves lower perplexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS 2005), pp. 147–154. MIT Press, MA (2005)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chen, X.Y., Xia, Y.Q., Jin, P., Carroll, J.: Dataless text classification with descriptive LDA. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015), pp. 2224–2231. AAAI (2015)
Gao, Y., Xu, Y., Li, Y.F.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceeding of International Conference on Web Information Systems Engineering (WISE 2014), pp. 186–201. Springer International Publishing, Cham, October 2014
Gao, Y., Xu, Y., Li, Y., Liu, B.: A two-stage approach for generating topic models. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 221–232. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_19
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Mei, Q., Zhai, C.X.: Topical pattern based document modelling and relevance ranking. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 198–207. ACM, New York, August 2005
Petterson, J., Smola, A., Caetano, T., Buntine, W., Narayanamurthy, S.: Word features for latent Dirichlet allocation. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS 2010), vol. 2, pp. 1921–1929. Curran Associates Inc., New York, December 2010
Sato, I., Nakagawa, H.: Topic models with power-law using pitman-Yor process. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 673–682. ACM, New York, July 2010
Tang, G., Xia, Y., Sun, J., Zhang, M., Zheng, T.F.: Statistical word sense aware topic models. Soft. Comput. 19(1), 13–27 (2014). https://doi.org/10.1007/s00500-014-1372-z
Trabelsi, A., ZaI̋ane, O.R.: A joint topic viewpoint model for contention analysis. In: Natural Language Processing and Information Systems, pp. 114–125. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-07983-7_16
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011), pp. 448–456. ACM, New York, August 2011
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), pp. 178–185. ACM, New York, August 2006
Wilson, A.T., Chew, P.A.: Term weighting schemes for latent Dirichlet allocation. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 465–473. Association for Computational Linguistics, Stroudsburg, June 2010
Xia, Y.Q., Tang, N., Hussain, A., Cambria, E.: Discriminative Bi-term topic model for headline-based social news clustering. In: Proceedings of the 28th Florida Ariticial Intelligence Research Society Conference, pp. 311–316. AAAI, April 2015
Zeng, J.P., Duan, J.J., Cao, W.J., Wu, C.R.: Topics modeling based on selective Zipf distribution. Expert Syst. Appl. 49(7), 6541–6546 (2012)
Zhai, C.X., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 743–748. ACM, New York, August 2004
Acknowledgment
This research is jointly supported by the Natural Science Foundation of China (Grant No. 61866029, 61763034) and Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2018MS06025).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yan, R., Gao, G. (2020). Topic Analysis by Exploring Headline Information. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-62008-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62007-3
Online ISBN: 978-3-030-62008-0
eBook Packages: Computer ScienceComputer Science (R0)