Abstract
Efficient retrieval of scientific literature related to a certain topic plays a key role in research work. While little has been done on topic-enabled citation filtering in traditional citation tracing, this paper presents visual citation tracing of scientific papers with document topics taken into consideration. Improved term selection and weighting are employed for mining the most relevant citations. A variation of the TF-IDF scheme, which uses external domain resources as references is proposed to calculate the term weighting in a particular domain. Moreover document weight is also incorporated in the calculation of term weight from a group of citations. A simple hierarchical word weighting method is also presented to handle keyword phrases. A visual interface is designed and implemented to interactively present the citation tracks in chord diagram and Sankey diagram.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wei, H., Zhao, Y., Liu, E., Wu, S., Deng, Z., Parvinzamir, F., Dong, F.: Management of scientific documents and visualization of citation relationships using weighted key scientific terms. In: DATA 2016, pp. 135–143 (2016)
Wei, H., Wu, S., Zhao, Y., Deng, Z., Ersotelos, N., Parvinzamir, F., Liu, B., Liu, E., Dong, F.: Data mining, management and visualization in large scientific corpuses. Edutainment 2016, 371–379 (2016)
Grolinger, K., HigashinoEmail, W., Tiwari, A., Capretz, M.: Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput. Adv. Syst. Appl. Adv. Syst. Appl. 2(1), 2–22 (2013)
Kivikangas, P., Ishizuka, M.: Improving semantic queries by utilizing UNL ontology and a graph database. In: Proceedings of the 6th IEEE International Conference on Semantic Computing, pp. 83–86 (2012)
Neo4j. https://neo4j.com/
Tsai, F.S., Kwee, A.T.: Experiments in term weighting for novelty mining. Expert Syst. Appl. 38(11), 14094–14101 (2011)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 784–788. ACM Press (2003)
Zhang, Y., Tsai, F.S.: Combining named entities and tags for novel sentence detection. In: Proceedings of the WSDM Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2009), pp. 30–34 (2009)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: A study on term weighting for text categorization: a novel supervised variant of tf.idf. In: Proceedings of the 4th International Conference on Data Management Technologies and Applications, pp. 26–37 (2015)
Li, F., Pan, S.J., Jin, O., Yang, Q., Zhu, X.: Cross-domain co-extraction of sentiment and topic lexicons. In: Proceedings of the 50th Annual Meeting Association for Computational Linguistics: Long Papers (ACL 2012), vol. 1, pp. 410–419 (2012)
Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Cross-domain text classification through iterative refining of target categories representations. In: Proceedings of the 6th International Conference on Knowledge Discovery & Information Retrieval (KDIR) (2014)
Alencar, A.B., Oliveira, M.C., Paulovich, F.V.: Seeing beyond reading: a survey on visual text analytics. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(6), 476–492 (2012)
Fu, S.: A survey on visual text analytics (2015). http://www.cse.ust.hk/~sfuaa/data/pqe.pdf
Federico, P., Heimerl, F., Koch, S., Miksch, S.: A survey on visual approaches for analyzing scientific literature and patents. TVCG (2016)
Zhao, D., Strotmann, A.: Analysis and Visualization of Citation Networks. Synthesis Lectures on Information Concepts Retrieval and Services, vol. 7(1) (2015)
Chen, C.: CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Technol. 57(3), 359–377 (2006)
Zhang, J., Chen, C., Li, J.: Visualizing the intellectual structure with paper-reference matrices. IEEE TVCG 15(6), 1153–1160 (2009)
Stasko, J., Choo, J., Han, Y., Hu, M., Pileggi, H., Sadana, R., Stolper, C.: Citevis: exploring conference paper citation data visually. Poster IEEE Vis. (2013)
Gorg, C., Liu, Z., Kihm, J., Choo, J., Park, H., Stasko, J.: Combining computational analyses and interactive visualization for document exploration and sense making in jigsaw. IEEE TVCG 19(10), 1646–1663 (2013)
Doerk, M., Riche, N., Ramos, G., Dumais, S.: Pivotpaths: strolling through faceted information spaces. IEEE TVCG 18(12), 2709–2718 (2012)
van Eck, N., Waltman, L.: CitNetExplorer: a new software tool for analyzing and visualizing citation network. J. Inf. 8(4), 802–823 (2014)
Heimerl, F., Han, Q., Koch, S., Ertl, T.: CiteRivers: visual analytics of citation patterns. IEEE TVCG 22(1), 190–199 (2016)
ACM SIGGRAPH. www.siggraph.org
Fensel, D., Hendler, J., Lieberman, H., Wahlster, W., Berners-Lee, T.: Sesame: An Architecture for Storing and Querying RDF Data and Schema Information. In: MIT Press eBook Chapters: Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, pp. 197–222 (2005)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan., V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)
Apach CouchDB. http://couchdb.apache.org/
Huang, H., Dong, Z.: Research on architecture and query performance based on distributed graph database Neo4j. In: Proceedings of the 3rd International Conference Consumer Electronics, Communications and Networks (CECNet), pp. 533–536 (2013)
Elasticsearch. https://www.elastic.co/products/elasticsearch
Elasticsearch attachment plugin. https://github.com/elastic/elasticsearch-mapper-attachments
pdfbox. https://pdfbox.apache.org/
Thakker, D., Sman, T., Lakin, P.: GATE Jape Grammar Tutorial, Version 1.0, A, Pictures, UK (2009)
Microsoft Academic Search (MAS) API. http://academic.research.microsoft.com/
D3. http://d3js.org/
Riehmann, P., Hanfler, M., Froehlich, B.: Interactive sankey diagrams. In: Proceedings of the IEEE Symposium on Information Visualization, pp. 233–240 (2005)
Blei, M., Ng, Y., Jordan, I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
Havre, S., Hetzler, E., Whitney, P., Nowell, L.: Themeriver: visualizing thematic changes in large document collections. IEEE Trans. Vis. Comput. Graph. 8(1), 9–20 (2002)
Acknowledgments
The research is supported by the FP7 Programme of the European Commission within projects Dr Inventor [FP7-ICT-611383] and CARRE [FP7-ICT-611140]. We would like to thank the European Commission for the funding and thank the project officers and reviewers for their indispensable support for both of the projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhao, Y. et al. (2017). Topic-Aware Visual Citation Tracing via Enhanced Term Weighting for Efficient Literature Retrieval. In: Francalanci, C., Helfert, M. (eds) Data Management Technologies and Applications. DATA 2016. Communications in Computer and Information Science, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-62911-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-62911-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62910-0
Online ISBN: 978-3-319-62911-7
eBook Packages: Computer ScienceComputer Science (R0)