Abstract
The mainstream narrative about open data excludes any consideration of Indigenous voices. Instead, there is primarily Western rhetoric reiterating the benefits and advantages it provides for society. The discussion fails to recognise the negative impact open data can have on Indigenous peoples. This paper explores the problem of open data in digital repositories and discusses why open data is harmful in the development of Indigenous natural language processing tools. It begins with a brief introduction to open data before examining the negative impact that open data has on Indigenous peoples by describing what data is collected, how data is collected, who has access to the data and what the data is used for. The paper then offers an alternative solution to reach an ideal state for good data sharing by drawing on the experiences of Te Reo Irirangi o Te Hiku o te Ika (Te Hiku Media), a tribal media hub based in Aotearoa New Zealand. It provides an example of how a small Indigenous community-based organisation collects, stores, and protects its data.
Similar content being viewed by others
Notes
Social security number, national insurance number.
References
GeeksforGeeks: Open Source and Open Data. Geeksforgeeks. (2021). https://www.geeksforgeeks.org/open-source-and-open-data/. Accessed 25 July 2022
Open Data Charter (n.d.) Principles. Open Data Charter. https://opendatacharter.net/principles/. Accessed 25 July 2022
New Zealand Government: International Open Data Charter. Digital Government. (2020). https://www.digital.govt.nz/digital-government/international-partnerships/international-open-data-charter/. Accessed 25 July 2022
Hao, K.: Artificial intelligence is creating a new colonial world order. MIT Technology Review. (2022). https://www.technologyreview.com/2022/04/19/1049592/artificial-intelligence-colonialism/. Accessed 25 July 2022
Henry, J.: Snapchat Users Beware! New ‘SnapMap’ Update Can Track Your Location. Tech Times. (2022). https://www.techtimes.com/articles/277616/20220705/snapchat-users-beware-new-snap-map-update-track-location.htm. Accessed 25 July 2022
ODSC - Open Data Science: 20 Open Datasets for Natural Language Processing. ODSC Medium. (2019). https://odsc.medium.com/20-open-datasets-for-natural-language-processing–538fbfaf8e38. Accessed 20 June 2023
Montantes, J.: 7 Top Open Datasets to Train Natural Language Processing (NLP) & Text Models. Becoming Human. (2021). https://becominghuman.ai/7-top-open-source-datasets-to-train-natural-language-processing-nlp-text-models–8debdc240ca9. Accessed 20 June 2023
iMerit: 25 Best NLP Datasets for Machine Learning. iMerit. (2021). https://imerit.net/blog/25-best-nlp-datasets-for-machine-learning-all-pbm/. Accessed 20 June 2023
National Congress of American Indians: Resolution KAN–18–011: Support of US Indigenous Data Sovereignty and Inclusion of Tribes in the Development of Tribal Data Governance Principles. 4 June 2018. (2018). http://www.ncai.org/attachments/Resolution_gbuJbEHWpkOgcwCICRtgMJHMsUNofqYvuMSnzLFzOdxBlMlRjij_KAN–18–011%20Final.pdf. Accessed 25 July 2022
Te Hiku Media: He reo tuku iho, he reo ora. MAI J. 11(1), 40–49 (2022)
Carroll, S.R., Rodriguez-Lonebear, D., Martinez, A.: Indigenous Data Governance: Strategies from United States native nations. Data Sci. Jour. 18(31), 1–15 (2019). https://doi.org/10.5334/dsj-2019-031
Rainie, S.C., Rodriguez-Lonebear, D., Martinez, A.: Policy Brief: Data Governance for Native Nation Rebuilding (Version 2). (2017). Available at http://nni.arizona.edu/application/files/8415/0007/5708/Policy_Brief_Data_Governance_for_Native_Nation_Rebuilding_Version_2.pdf/application/files/8415/0007/5708/Policy_
Rainie, S.C., Kukutai, T., Walter, M., Figueroa-Rodríguez, O.L., Walker, J., Axelsson, P.: Indigenous data sovereignty. In: Davies, T., Walker, S., Rubinstein, M., Perini, F. (eds.) The State of Open Data: Histories and Horizons, pp. 300–319. African Minds and International Development Research Centre, Cape Town and Ottawa (2019)
Sherman, J.: Big Data May Not Know Your Name. But It Knows Everything Else. Wired. (2021). https://www.wired.com/story/big-data-may-not-know-your-name-but-it-knows-everything-else/. Accessed 25 July 2022
Kukutai, T., Taylor, J.: Data Sovereignty for indigenous peoples: Current practice and future needs. In: Kukutai, T., Taylor, J. (eds.) Indigenous data Sovereignty: Toward an Agenda, pp. 2–24. Australian National University, Australia (2016). https://doi.org/10.22459/CAEPR38.11.2016.14
Walter, M., Lovett, R., Maher, B., Williamson, B., Prehn, J., Bodkin-Andrews, F.: Australian J. social issues. 56, 143–156 (2020). https://doi.org/10.1002/ajs4.141 Indigenous Data Sovereignty in the Era of Big Data and Open Data
Oguamanam, C.: Indigenous peoples, Data Sovereignty, and Self-Determination: Current realities and imperatives. Afr. J. Inform. Communication. 26, 1–20 (2020). https://doi.org/10.23962/10539/30360
Te Hiku Media: Kaitiakitanga License. (2022). https://github.com/TeHikuMedia/Kaitiakitanga-License. Accessed 30 July 2022
Te Hiku Media: Kaitiakitanga License - Papa Reo. (2022). https://github.com/TeHikuMedia/Kaitiakitanga-License/blob/tumu/papareo_api.md. Accessed 30 July 2022
Te Hiku Media: Kaitiakitanga License - Whare Kōrero. (2022). https://github.com/TeHikuMedia/Kaitiakitanga-License/blob/tumu/wharekorero_app.md. Accessed 30 July 2022
Te Hiku Media: Whare Kōrero App Privacy. Whare Kōrero. (2022). https://wharekōrero.nz/privacy. Accessed 30 July 2022
Te Hiku Media: Rongo App Privacy. Rongo. (2022). https://rongo.app/privacy. Accessed 30 July 2022
Te Hiku Media: About. Te Hiku Media. (2022). https://tehiku.nz/about/. Accessed 30 July 2022
Hao, K., Hernández, A.P.: How the AI industry profits from catastrophe. (2022). https://www.technologyreview.com/2022/04/20/1050392/ai-industry-appen-scale-data-labels/, Accessed 30 July 2022
Jones, P., Mahelona, K., Duncan, S., Leoni, G.: Kia tangata whenua: Artificial intelligence that grows from the land and people. Ethical Space: Int. J. Communication Ethics. 20, 23 (2023)
Finn, A., Jones, P.L., Mahelona, K., Duncan, S., Leoni, G.: Developing a Part-Of-Speech tagger for te reo Māori, ComputEL 2022, (2022). https://aclanthology.org/2022.computel–1.12
Te Hiku Media: Te Reo o te Kāinga. Te Hiku Media. (2022). https://tehiku.nz/te-reo/te-reo-o-te-kainga/. Accessed 30 July 2022
Acknowledgements
The work completed by Te Hiku Media that has led to this paper was funded by the Ministry of Business Innovation and Employment through the Strategic Science Investment Fund and by Te Puni Kōkiri through the Ka Hao fund.
Author information
Authors and Affiliations
Contributions
All authors contributed, read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors are currently employed by Te Reo Irirangi o Te Hiku o te Ika.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jones, PL., Mahelona, K., Duncan, S. et al. Kaitiaki: closing the door on open Indigenous data. Int J Digit Libr 26, 1 (2025). https://doi.org/10.1007/s00799-025-00410-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00799-025-00410-2