Skip to main content
Log in

A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Nowadays XML based big bibliographic datasets are common in different domains which provide meta data about articles published in that domain. They have well defined tags which give details of the year, title, authors, abstract, keywords, the type of article, the venue of publishing the article and other such specific details about each article. A lot of statistics can be extracted from this dataset. Most of the time the tag pertaining to domain sub topic information associated with the article will be absent in the dataset as it is not an article attribute. Hence for such statistics articles must be mapped to its associated sub domain. This paper investigates this problem and proposes a fast approach to find trending articles and hot topics from XML based big bibliographic datasets. The proposed framework uses domain ontology to first classify articles into its sub topics. Fast detection of hot topics, trending keywords and articles is achieved using novel Map Reduce algorithms implemented on a hadoop distributed framework. Performance comparison demonstrates that it outperforms its non-Map Reduce counterpart in quickly sorting out the trending keywords and titles in a particular hot topic from XML based bibliographic dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ley M.: The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proceedings of the 9th International Symposium on String Processing and Information Retrieval, pp. 1–10, Springer, London (2002)

  2. Alwahaishi, S., Martinovič, J., Snášel, V.: Analysis of the DBLP publication classification using concept lattices. In: Digital Enterprise and Information Systems Communications in Computer and Information Science, vol. 194, pp. 99–108 (2011)

  3. Biryukov, M., Dong, C.: Analysis of Computer Science Communities Based on DBLP Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, Vol. 6273, pp. 228–235. Springer, Berlin (2010)

  4. Minks, S., Martinovic, J., Drazdilova, P., Slaninova, K.: Author cooperation based on terms of article titles from DBLP. In: Proceedings of the Third International Conference on Intelligent Human Computer Interaction (IHCI 2011), Prague, Czech Republic, pp. 281–290. Springer, Berlin (2011)

  5. Obadi, G., Drazdilova, P., Hlavacek, L., Martinovic, J., Snasel, V. : A tolerance rough set based overlapping clustering for the DBLP Data. In: Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, pp. 57–60. IEEE (2010)

  6. Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: Proceedings of the 19th International Conference on Database and Expert Systems Applications, pp. 54–58. IEEE Computer Society, Washington, DC (2008)

  7. Griffiths, T.I., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA 101, 5228–5235 (2004)

    Article  Google Scholar 

  8. Rathore, A.S., Devshri, R.: Performance of LDA and DCT models. J. Inf. Sci. 40(3), 281–292 (2014)

    Article  Google Scholar 

  9. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model for topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM, New York (2006)

  10. Krishna, S.M., Bhavani, S.D.: An efficient approach for text clustering based on frequent itemsets. Eur. J. Sci. Res. 42(3), 399–410 (2010)

    Google Scholar 

  11. Agarwal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)

  12. Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–442. ACM, New York (2002)

  13. Abe, H., Tsumoto, S.: Evaluating a temporal pattern detection method for finding research keys in bibliographical data. In: Transactions on Rough Sets XIV. Lecture Notes in Computer Science, vol. 6600, pp. 1–17 (2011)

  14. Decker, S. L., Aleman-Meza, B., Cameron, D., Arpinar, I. B.: Detection of Bursty and Emerging Trends towards Identification of Researchers at the Early Stage of Trends. (Tech. Rep. No. 11148065665). University of Georgia, Computer Science Department (2007)

  15. Jun, S.: A Technology forecasting method using text mining and visual apriori algorithm. Appl. Math. Inf. Sci 8, 35–40 (2014)

    Article  Google Scholar 

  16. Ma, J., Xu, W., Sun, Y., Turban, E., Wang, S., Liu, O.: An ontology-based text-mining method to cluster proposals for research project selection. IEEE Trans. Syst. Man Cybern. A 42(3), 784–790 (2012)

    Article  Google Scholar 

  17. Punnarut, R., Sriharee, G.A.: A researcher expertise search system using ontology-based data mining. In: Proceedings of the Seventh Asia-Pacific Conference on Conceptual Modelling, vol. 110, pp 71–78. Australian Computer Society, Inc., Darling Hurst (2010)

  18. Rajpathak, D.G.: An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain. Comput. Ind. 64(5), 565–580 (2013)

    Article  Google Scholar 

  19. Chen, L.-C., Kuo, P.-J., Liao, I.-E.: Ontology-based library recommender system using MapReduce. Clust. Comput. 18, 113–121 (2015)

    Article  Google Scholar 

  20. Han, J.-S., Kim, G.-J.: A method of intelligent recommendation using task ontology. Clust. Comput. 17, 827–833 (2014)

    Article  Google Scholar 

  21. Shubhankar, K., Singh, A.P., Pudi, V.: An efficient algorithm for topic ranking and modeling topic evolution. In: Database and Expert Systems Applications. Lecture Notes in Computer Science, vol. 6860, pp. 320–330. Springer (2011)

  22. Shubhankar, K., Singh, A. P., Pudi, V.: A Frequent keyword-set based algorithm for topic modeling and clustering of research papers. In: Proceedings of the 3rd Conference on Data Mining and Optimization (DMO), pp 96–102. IEEE, Selangor (2011)

  23. Pan, Y., Lu, W., Zhang, Y., Chiu, K.: A static load-balancing scheme for parallel XML parsing on multicore CPUs. In: Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGRID, Rio De Janeiro, pp. 351–362 (2007)

  24. Chen, R., Liao, H.: ParaParse: A parallel method for XML parsing. In: Proceedings of IEEE 3rd International Conference on Communication Software and Networks (ICCSN), pp. 81–85 (2011)

  25. Fen, Z., Yabin, X., Yanping, L.: Research on internet hot topic detection based on MapReduce architecture. In: Proceedings of the 4th International Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 01, pp 81–84. IEEE Computer Society, Washington, DC (2012)

  26. Han, L., Ong, H.Y.: Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences. Clust. Comput. 18, 403–418 (2015)

  27. Chim, H., Deng, X.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20(9), 1217–1229 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. P. Swaraj.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Swaraj, K.P., Manjula, D. A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Cluster Comput 19, 837–848 (2016). https://doi.org/10.1007/s10586-016-0561-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0561-1

Keywords

Navigation