A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets

Swaraj, K. P.; Manjula, D.

doi:10.1007/s10586-016-0561-1

A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets

Published: 31 March 2016

Volume 19, pages 837–848, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

369 Accesses
5 Citations
Explore all metrics

Abstract

Nowadays XML based big bibliographic datasets are common in different domains which provide meta data about articles published in that domain. They have well defined tags which give details of the year, title, authors, abstract, keywords, the type of article, the venue of publishing the article and other such specific details about each article. A lot of statistics can be extracted from this dataset. Most of the time the tag pertaining to domain sub topic information associated with the article will be absent in the dataset as it is not an article attribute. Hence for such statistics articles must be mapped to its associated sub domain. This paper investigates this problem and proposes a fast approach to find trending articles and hot topics from XML based big bibliographic datasets. The proposed framework uses domain ontology to first classify articles into its sub topics. Fast detection of hot topics, trending keywords and articles is achieved using novel Map Reduce algorithms implemented on a hadoop distributed framework. Performance comparison demonstrates that it outperforms its non-Map Reduce counterpart in quickly sorting out the trending keywords and titles in a particular hot topic from XML based bibliographic dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recommendation system based on semantic scholar mining and topic modeling on conference publications

Article 03 November 2020

Detecting Similar Linked Datasets Using Topic Modelling

Discovering the Topical Evolution of the Digital Library Evaluation Community

References

Ley M.: The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proceedings of the 9th International Symposium on String Processing and Information Retrieval, pp. 1–10, Springer, London (2002)
Alwahaishi, S., Martinovič, J., Snášel, V.: Analysis of the DBLP publication classification using concept lattices. In: Digital Enterprise and Information Systems Communications in Computer and Information Science, vol. 194, pp. 99–108 (2011)
Biryukov, M., Dong, C.: Analysis of Computer Science Communities Based on DBLP Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, Vol. 6273, pp. 228–235. Springer, Berlin (2010)
Minks, S., Martinovic, J., Drazdilova, P., Slaninova, K.: Author cooperation based on terms of article titles from DBLP. In: Proceedings of the Third International Conference on Intelligent Human Computer Interaction (IHCI 2011), Prague, Czech Republic, pp. 281–290. Springer, Berlin (2011)
Obadi, G., Drazdilova, P., Hlavacek, L., Martinovic, J., Snasel, V. : A tolerance rough set based overlapping clustering for the DBLP Data. In: Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, pp. 57–60. IEEE (2010)
Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: Proceedings of the 19th International Conference on Database and Expert Systems Applications, pp. 54–58. IEEE Computer Society, Washington, DC (2008)
Griffiths, T.I., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA 101, 5228–5235 (2004)
Article Google Scholar
Rathore, A.S., Devshri, R.: Performance of LDA and DCT models. J. Inf. Sci. 40(3), 281–292 (2014)
Article Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model for topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM, New York (2006)
Krishna, S.M., Bhavani, S.D.: An efficient approach for text clustering based on frequent itemsets. Eur. J. Sci. Res. 42(3), 399–410 (2010)
Google Scholar
Agarwal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–442. ACM, New York (2002)
Abe, H., Tsumoto, S.: Evaluating a temporal pattern detection method for finding research keys in bibliographical data. In: Transactions on Rough Sets XIV. Lecture Notes in Computer Science, vol. 6600, pp. 1–17 (2011)
Decker, S. L., Aleman-Meza, B., Cameron, D., Arpinar, I. B.: Detection of Bursty and Emerging Trends towards Identification of Researchers at the Early Stage of Trends. (Tech. Rep. No. 11148065665). University of Georgia, Computer Science Department (2007)
Jun, S.: A Technology forecasting method using text mining and visual apriori algorithm. Appl. Math. Inf. Sci 8, 35–40 (2014)
Article Google Scholar
Ma, J., Xu, W., Sun, Y., Turban, E., Wang, S., Liu, O.: An ontology-based text-mining method to cluster proposals for research project selection. IEEE Trans. Syst. Man Cybern. A 42(3), 784–790 (2012)
Article Google Scholar
Punnarut, R., Sriharee, G.A.: A researcher expertise search system using ontology-based data mining. In: Proceedings of the Seventh Asia-Pacific Conference on Conceptual Modelling, vol. 110, pp 71–78. Australian Computer Society, Inc., Darling Hurst (2010)
Rajpathak, D.G.: An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain. Comput. Ind. 64(5), 565–580 (2013)
Article Google Scholar
Chen, L.-C., Kuo, P.-J., Liao, I.-E.: Ontology-based library recommender system using MapReduce. Clust. Comput. 18, 113–121 (2015)
Article Google Scholar
Han, J.-S., Kim, G.-J.: A method of intelligent recommendation using task ontology. Clust. Comput. 17, 827–833 (2014)
Article Google Scholar
Shubhankar, K., Singh, A.P., Pudi, V.: An efficient algorithm for topic ranking and modeling topic evolution. In: Database and Expert Systems Applications. Lecture Notes in Computer Science, vol. 6860, pp. 320–330. Springer (2011)
Shubhankar, K., Singh, A. P., Pudi, V.: A Frequent keyword-set based algorithm for topic modeling and clustering of research papers. In: Proceedings of the 3rd Conference on Data Mining and Optimization (DMO), pp 96–102. IEEE, Selangor (2011)
Pan, Y., Lu, W., Zhang, Y., Chiu, K.: A static load-balancing scheme for parallel XML parsing on multicore CPUs. In: Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGRID, Rio De Janeiro, pp. 351–362 (2007)
Chen, R., Liao, H.: ParaParse: A parallel method for XML parsing. In: Proceedings of IEEE 3rd International Conference on Communication Software and Networks (ICCSN), pp. 81–85 (2011)
Fen, Z., Yabin, X., Yanping, L.: Research on internet hot topic detection based on MapReduce architecture. In: Proceedings of the 4th International Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 01, pp 81–84. IEEE Computer Society, Washington, DC (2012)
Han, L., Ong, H.Y.: Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences. Clust. Comput. 18, 403–418 (2015)
Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20(9), 1217–1229 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Anna University, Chennai, India
K. P. Swaraj & D. Manjula

Authors

K. P. Swaraj
View author publications
You can also search for this author in PubMed Google Scholar
D. Manjula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. P. Swaraj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Swaraj, K.P., Manjula, D. A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Cluster Comput 19, 837–848 (2016). https://doi.org/10.1007/s10586-016-0561-1

Download citation

Received: 18 November 2015
Revised: 14 March 2016
Accepted: 16 March 2016
Published: 31 March 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10586-016-0561-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets

Abstract

Access this article

Similar content being viewed by others

Recommendation system based on semantic scholar mining and topic modeling on conference publications

Detecting Similar Linked Datasets Using Topic Modelling

Discovering the Topical Evolution of the Digital Library Evaluation Community

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets

Abstract

Access this article

Similar content being viewed by others

Recommendation system based on semantic scholar mining and topic modeling on conference publications

Detecting Similar Linked Datasets Using Topic Modelling

Discovering the Topical Evolution of the Digital Library Evaluation Community

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation