Skip to main content

Data Augmentation for Fairness in Personal Knowledge Base Population

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12705))

Included in the following conference series:

Abstract

Cold start knowledge base population (KBP) is the problem of populating a knowledge base from unstructured documents. While neural networks have led to improvements in the different tasks that are part of KBP, the overall F1 of the end-to-end system remains quite low. This problem is more acute in personal knowledge bases, which present additional challenges with regard to data protection, fairness and privacy. In this work, we use data augmentation to populate a more complete personal knowledge base from the TACRED dataset. We then use explainability techniques and representative set sampling to show that the augmented knowledge base is more fair and diverse as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alt, C., Gabryszak, A., Hennig, L.: TACRED revisited: a thorough evaluation of the TACRED relation extraction task. arXiv preprint arXiv:2004.14855 (2020)

  2. Angeli, G., et al.: Bootstrapped self training for knowledge base population. In: TAC (2015)

    Google Scholar 

  3. Balog, K., Kenter, T.: Personal knowledge graphs: a research agenda. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 217–220 (2019)

    Google Scholar 

  4. Bellamy, R.K., et al.: AI fairness 360: an extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943 (2018)

  5. Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F.R., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010)

    Google Scholar 

  6. Cohen, A.D., Rosenman, S., Goldberg, Y.: Relation extraction as two-way span-prediction. arXiv preprint arXiv:2010.04829 (2020)

  7. Dasgupta, R., Ganesan, B., Kannan, A., Reinwald, B., Kumar, A.: Fine grained classification of personal data entities. arXiv preprint arXiv:1811.09368 (2018)

  8. Ellis, J., et al.: Overview of linguistic resources for the tac KBP 2015 evaluations: methodologies and results. In: TAC (2015)

    Google Scholar 

  9. Garg, S., Perot, V., Limtiaco, N., Taly, A., Chi, E.H., Beutel, A.: Counterfactual fairness in text classification through robustness. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 219–226 (2019)

    Google Scholar 

  10. Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., Chang, Y.: GraphLIME: local interpretable model explanations for graph neural networks. arXiv preprint arXiv:2001.06216 (2020)

  11. Ji, H., Grishman, R., Dang, H.T., Griffitt, K., Ellis, J.: Overview of the tac 2010 knowledge base population track. In: Third Text Analysis Conference (TAC 2010), vol. 3, p. 3 (2010)

    Google Scholar 

  12. Liberty, E.: Simple and deterministic matrix sketching (2012)

    Google Scholar 

  13. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)

    Google Scholar 

  14. Mesquita, F., Cannaviccio, M., Schmidek, J., Mirza, P., Barbosa, D.: KnowledgeNet: a benchmark dataset for knowledge base population. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 749–758 (2019)

    Google Scholar 

  15. Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. VLDB J. 1–22 (2019)

    Google Scholar 

  16. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

    Google Scholar 

  17. Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI, vol. 18, pp. 1527–1535 (2018)

    Google Scholar 

  18. Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: GNNExplainer: generating explanations for graph neural networks. In: Advances in Neural Information Processing Systems, pp. 9244–9255 (2019)

    Google Scholar 

  19. Yuan, H., Tang, J., Hu, X., Ji, S.: XGNN: towards model-level explanations of graph neural networks. arXiv preprint arXiv:2006.02587 (2020)

  20. Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and supervised data improve slot filling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 35–45 (2017)

    Google Scholar 

Download references

Acknowledgement

This work was done as part of the Global Remote Mentoring initiative of IBM University Relations to promote undergraduate student research. We thank Kalapriya Kannan, Dinesh Garg, Poornima Iyengar, Kranti Athalye, and Nitte Meenakshi Institute of Technology for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balaji Ganesan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 136 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vannur, L.S., Ganesan, B., Nagalapatti, L., Patel, H., Tippeswamy, M.N. (2021). Data Augmentation for Fairness in Personal Knowledge Base Population. In: Gupta, M., Ramakrishnan, G. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12705. Springer, Cham. https://doi.org/10.1007/978-3-030-75015-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75015-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75014-5

  • Online ISBN: 978-3-030-75015-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics