Abstract
Cold start knowledge base population (KBP) is the problem of populating a knowledge base from unstructured documents. While neural networks have led to improvements in the different tasks that are part of KBP, the overall F1 of the end-to-end system remains quite low. This problem is more acute in personal knowledge bases, which present additional challenges with regard to data protection, fairness and privacy. In this work, we use data augmentation to populate a more complete personal knowledge base from the TACRED dataset. We then use explainability techniques and representative set sampling to show that the augmented knowledge base is more fair and diverse as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alt, C., Gabryszak, A., Hennig, L.: TACRED revisited: a thorough evaluation of the TACRED relation extraction task. arXiv preprint arXiv:2004.14855 (2020)
Angeli, G., et al.: Bootstrapped self training for knowledge base population. In: TAC (2015)
Balog, K., Kenter, T.: Personal knowledge graphs: a research agenda. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 217–220 (2019)
Bellamy, R.K., et al.: AI fairness 360: an extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943 (2018)
Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F.R., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010)
Cohen, A.D., Rosenman, S., Goldberg, Y.: Relation extraction as two-way span-prediction. arXiv preprint arXiv:2010.04829 (2020)
Dasgupta, R., Ganesan, B., Kannan, A., Reinwald, B., Kumar, A.: Fine grained classification of personal data entities. arXiv preprint arXiv:1811.09368 (2018)
Ellis, J., et al.: Overview of linguistic resources for the tac KBP 2015 evaluations: methodologies and results. In: TAC (2015)
Garg, S., Perot, V., Limtiaco, N., Taly, A., Chi, E.H., Beutel, A.: Counterfactual fairness in text classification through robustness. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 219–226 (2019)
Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., Chang, Y.: GraphLIME: local interpretable model explanations for graph neural networks. arXiv preprint arXiv:2001.06216 (2020)
Ji, H., Grishman, R., Dang, H.T., Griffitt, K., Ellis, J.: Overview of the tac 2010 knowledge base population track. In: Third Text Analysis Conference (TAC 2010), vol. 3, p. 3 (2010)
Liberty, E.: Simple and deterministic matrix sketching (2012)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)
Mesquita, F., Cannaviccio, M., Schmidek, J., Mirza, P., Barbosa, D.: KnowledgeNet: a benchmark dataset for knowledge base population. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 749–758 (2019)
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. VLDB J. 1–22 (2019)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI, vol. 18, pp. 1527–1535 (2018)
Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: GNNExplainer: generating explanations for graph neural networks. In: Advances in Neural Information Processing Systems, pp. 9244–9255 (2019)
Yuan, H., Tang, J., Hu, X., Ji, S.: XGNN: towards model-level explanations of graph neural networks. arXiv preprint arXiv:2006.02587 (2020)
Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and supervised data improve slot filling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 35–45 (2017)
Acknowledgement
This work was done as part of the Global Remote Mentoring initiative of IBM University Relations to promote undergraduate student research. We thank Kalapriya Kannan, Dinesh Garg, Poornima Iyengar, Kranti Athalye, and Nitte Meenakshi Institute of Technology for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Vannur, L.S., Ganesan, B., Nagalapatti, L., Patel, H., Tippeswamy, M.N. (2021). Data Augmentation for Fairness in Personal Knowledge Base Population. In: Gupta, M., Ramakrishnan, G. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12705. Springer, Cham. https://doi.org/10.1007/978-3-030-75015-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-75015-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75014-5
Online ISBN: 978-3-030-75015-2
eBook Packages: Computer ScienceComputer Science (R0)