Skip to main content

Data Quality Management in Institutional Research Output Data Center

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2019)

Abstract

Institutional research output data center will store normative and convinced scholar’s research output data, and it will effectively support dynamic presentation of research output, reveal institutional academic publication in multiple dimensions, advance open access, and provide data support for subject evaluation and discipline development.

In this paper, we propose a data quality management framework to build institutional research output data center, and put forward relevant technical solution for different data governance problems, such as department name similarity estimation in data matching, author name disambiguous problem in data merging and security issue in data exchange. We also introduce some learning algorithms such as text distance and community detection with matrix factorization. Comparing with different ways, our methods achieve good performance in quality manage processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Levenshtein_distance/.

  2. 2.

    http://webofknowledge.com/.

  3. 3.

    https://oauth.net/.

  4. 4.

    https://orcid.org/.

References

  1. Azeroual, O., Saake, G., Abuosba, M., Schöpfel, J.: Text data mining and data quality management for research information systems in the context of open data and open science. arXiv preprint arXiv:1812.04298 (2018)

  2. Berkhoff, K., Ebeling, B., Lübbe, S.: Integrating research information into a software for higher education administration-benefits for data quality and accessibility. In: 11th International Conference on Current Research Information Systems. euroCRIS (2012)

    Google Scholar 

  3. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theor. Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  4. Bryant, R., et al.: Practices and patterns in research information management: findings from a global survey. OCLC Research (2018). https://doi.org/10.25333/BGFG-D241

  5. Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Comput. Intell. Neurosci. 2009, 1–17 (2009). https://doi.org/10.1155/2009/785152

    Article  Google Scholar 

  6. Chang, E.: The mechanism and key technology of scholar identification. Libr. Tribune 35(10), 88–95 (2015)

    Google Scholar 

  7. Dai, W., Yoshigoe, K., Parsley, W.: Improving data quality through deep learning and statistical models. In: Latifi, S. (ed.) Information Technology - New Generations. AISC, vol. 558, pp. 515–522. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-54978-1_66

    Chapter  Google Scholar 

  8. Danon, L., Díaz-Guilera, A., Arenas, A.: The effect of size heterogeneity on community identification in complex networks. J. Stat. Mech.: Theor. Exp. 2006(11), P11010 (2006)

    Article  Google Scholar 

  9. Hardt, D.: The oauth 2.0 authorization framework. Technical report (2012)

    Google Scholar 

  10. Joint, N.: Current research information systems, open access repositories and libraries: antaeus. Libr. Rev. 57(8), 570–575 (2008)

    Article  Google Scholar 

  11. Le Martelot, E., Hankin, C.: Fast multi-scale detection of relevant communities in large-scale networks. Comput. J. 56(9), 1136–1150 (2013)

    Article  Google Scholar 

  12. Momeni, F., Mayr, P.: Using co-authorship networks for author name disambiguation. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 261–262. IEEE (2016)

    Google Scholar 

  13. Müller, M.C., Reitz, F., Roy, N.: Data sets for author name disambiguation: an empirical analysis and a new resource. Scientometrics 11, 1–34 (2017)

    MATH  Google Scholar 

  14. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)

    Article  Google Scholar 

  15. Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  16. Olson, J.E.: Data Quality: The Accuracy Dimension. Elsevier, Amsterdam (2003)

    Google Scholar 

  17. Sedelnikov, M.S., Gordeev, R.N., Kuzmicheva, A.V., Odulov, A.G.: Disambiguation solution for persons’ accounts in research information management systems. Indian J. Sci. Technol. 9(43), 1–12 (2016)

    Article  Google Scholar 

  18. Shen, S.S., Ding, A.X.: Design and establishment of information exchange standard on campus. In: Applied Mechanics and Materials, vol. 513, pp. 1294–1298. Trans Tech Publications (2014)

    Google Scholar 

  19. Shi, X., Lu, H.: Community detection in scientific collaborative network with Bayesian matrix learning. Front. Comput. Sci. 13(1), 212–214 (2019)

    Article  Google Scholar 

  20. Shi, X., Lu, H., Jia, G.: Adaptive overlapping community detection with Bayesian nonnegative matrix factorization. In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10178, pp. 339–353. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55699-4_21

    Chapter  Google Scholar 

  21. Smalheiser, N.R., Torvik, V.I.: Author name disambiguation. Annu. Rev. Inf. Sci. Technol. 43(1), 1–43 (2009)

    Article  Google Scholar 

  22. Tang, J., Fong, A.C., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)

    Article  Google Scholar 

  23. Treeratpituk, P., Giles, C.L.: Disambiguating authors in academic publications using random forests. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 39–48. ACM (2009)

    Google Scholar 

  24. Wang, Q., Liu, N.J., Cheng, Z.R.: The application research of data exchange technology in digital campus. In: Zhang, Y., Zhou, Z.-H., Zhang, C., Li, Y. (eds.) IScIDE 2011. LNCS, vol. 7202, pp. 607–613. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31919-8_77

    Chapter  Google Scholar 

  25. Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: a survey. IEEE Trans. Big Data 3(1), 18–35 (2017)

    Article  Google Scholar 

  26. Yang, X., Jin, P., Xiang, W.: Exploring word similarity to improve Chinese personal name disambiguation. In: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 03, pp. 197–200. IEEE Computer Society (2011)

    Google Scholar 

  27. Zhang, B., Dundar, M., Al Hasan, M.: Bayesian non-exhaustive classification a case study: online name disambiguation using temporal record streams. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1341–1350. ACM (2016)

    Google Scholar 

Download references

Acknowledgments

This work was supported by NSFC (Grant No. 61772330), the Science and Technology Commission of Shanghai Municipality (Grant No. 16JC1402800), China Next Generation Internet IPv6 project (Grant No. NGII20170609), the Social Science Planning of Shanghai (Grant No. 2018BTQ002), and Arts and Science Cross Special Fund of Shanghai JiaoTong University (Grant No. 15JCMY08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohua Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, X., Xing, Z., Lu, H. (2019). Data Quality Management in Institutional Research Output Data Center. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18590-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18589-3

  • Online ISBN: 978-3-030-18590-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics