Abstract
Institutional research output data center will store normative and convinced scholar’s research output data, and it will effectively support dynamic presentation of research output, reveal institutional academic publication in multiple dimensions, advance open access, and provide data support for subject evaluation and discipline development.
In this paper, we propose a data quality management framework to build institutional research output data center, and put forward relevant technical solution for different data governance problems, such as department name similarity estimation in data matching, author name disambiguous problem in data merging and security issue in data exchange. We also introduce some learning algorithms such as text distance and community detection with matrix factorization. Comparing with different ways, our methods achieve good performance in quality manage processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azeroual, O., Saake, G., Abuosba, M., Schöpfel, J.: Text data mining and data quality management for research information systems in the context of open data and open science. arXiv preprint arXiv:1812.04298 (2018)
Berkhoff, K., Ebeling, B., Lübbe, S.: Integrating research information into a software for higher education administration-benefits for data quality and accessibility. In: 11th International Conference on Current Research Information Systems. euroCRIS (2012)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theor. Exp. 2008(10), P10008 (2008)
Bryant, R., et al.: Practices and patterns in research information management: findings from a global survey. OCLC Research (2018). https://doi.org/10.25333/BGFG-D241
Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Comput. Intell. Neurosci. 2009, 1–17 (2009). https://doi.org/10.1155/2009/785152
Chang, E.: The mechanism and key technology of scholar identification. Libr. Tribune 35(10), 88–95 (2015)
Dai, W., Yoshigoe, K., Parsley, W.: Improving data quality through deep learning and statistical models. In: Latifi, S. (ed.) Information Technology - New Generations. AISC, vol. 558, pp. 515–522. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-54978-1_66
Danon, L., DÃaz-Guilera, A., Arenas, A.: The effect of size heterogeneity on community identification in complex networks. J. Stat. Mech.: Theor. Exp. 2006(11), P11010 (2006)
Hardt, D.: The oauth 2.0 authorization framework. Technical report (2012)
Joint, N.: Current research information systems, open access repositories and libraries: antaeus. Libr. Rev. 57(8), 570–575 (2008)
Le Martelot, E., Hankin, C.: Fast multi-scale detection of relevant communities in large-scale networks. Comput. J. 56(9), 1136–1150 (2013)
Momeni, F., Mayr, P.: Using co-authorship networks for author name disambiguation. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 261–262. IEEE (2016)
Müller, M.C., Reitz, F., Roy, N.: Data sets for author name disambiguation: an empirical analysis and a new resource. Scientometrics 11, 1–34 (2017)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)
Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Olson, J.E.: Data Quality: The Accuracy Dimension. Elsevier, Amsterdam (2003)
Sedelnikov, M.S., Gordeev, R.N., Kuzmicheva, A.V., Odulov, A.G.: Disambiguation solution for persons’ accounts in research information management systems. Indian J. Sci. Technol. 9(43), 1–12 (2016)
Shen, S.S., Ding, A.X.: Design and establishment of information exchange standard on campus. In: Applied Mechanics and Materials, vol. 513, pp. 1294–1298. Trans Tech Publications (2014)
Shi, X., Lu, H.: Community detection in scientific collaborative network with Bayesian matrix learning. Front. Comput. Sci. 13(1), 212–214 (2019)
Shi, X., Lu, H., Jia, G.: Adaptive overlapping community detection with Bayesian nonnegative matrix factorization. In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10178, pp. 339–353. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55699-4_21
Smalheiser, N.R., Torvik, V.I.: Author name disambiguation. Annu. Rev. Inf. Sci. Technol. 43(1), 1–43 (2009)
Tang, J., Fong, A.C., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)
Treeratpituk, P., Giles, C.L.: Disambiguating authors in academic publications using random forests. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 39–48. ACM (2009)
Wang, Q., Liu, N.J., Cheng, Z.R.: The application research of data exchange technology in digital campus. In: Zhang, Y., Zhou, Z.-H., Zhang, C., Li, Y. (eds.) IScIDE 2011. LNCS, vol. 7202, pp. 607–613. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31919-8_77
Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: a survey. IEEE Trans. Big Data 3(1), 18–35 (2017)
Yang, X., Jin, P., Xiang, W.: Exploring word similarity to improve Chinese personal name disambiguation. In: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 03, pp. 197–200. IEEE Computer Society (2011)
Zhang, B., Dundar, M., Al Hasan, M.: Bayesian non-exhaustive classification a case study: online name disambiguation using temporal record streams. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1341–1350. ACM (2016)
Acknowledgments
This work was supported by NSFC (Grant No. 61772330), the Science and Technology Commission of Shanghai Municipality (Grant No. 16JC1402800), China Next Generation Internet IPv6 project (Grant No. NGII20170609), the Social Science Planning of Shanghai (Grant No. 2018BTQ002), and Arts and Science Cross Special Fund of Shanghai JiaoTong University (Grant No. 15JCMY08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, X., Xing, Z., Lu, H. (2019). Data Quality Management in Institutional Research Output Data Center. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-18590-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)