Abstract
African historical data presents unique challenges to search algorithms because much of the data was produced by colonial authorities or archivists far from the source of the data. Contemporary datasets include descriptions of museum artefacts in European museums and books written by colonial administrators, both of which encode African history. These are both arguably biased collections and the information retrieval algorithms used to search through such data collections may not provide modern researchers with relevant results. The goal of this study was therefore to investigate the degree to which common text and image pre-processing algorithms affect the quality of search results when users search through a current African historical data collection. Nine common algorithms were compared in terms of recall, precision and NDCG. The results indicate that text pre-processing performs better when stemming and stopping are used but thesaurus use may depend on the thesaurus chosen. Results from the image pre-processing experiment indicate that shape detectors generally work better than colour detectors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache: Solr. https://lucene.apache.org/solr/
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid Kernel. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 401–408. CIVR 2007. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1282280.1282340
Chatzichristofis, S., Boutalis, Y., Lux, M.: Selection of the proper compact composite descriptor for improving content based image retrieval. In: Proceedings of the 6th IASTED International Conference, vol. 134643, p. 064 (2009)
Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: International Conference on Computer Vision Systems, pp. 312–322. Springer (2008). https://doi.org/10.1007/978-3-540-79547-6_30
Fanon, F., Sartre, J.P., Farrington, C.: The Wretched of the Earth. Grove Press, New York (1963)
Huang, J., Kumar, S., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlograms. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 762–768 (1997). https://doi.org/10.1109/CVPR.1997.609412
Kasutani, E., Yamada, A.: The mpeg-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), vol. 1, pp. 674–677 (2001). https://doi.org/10.1109/ICIP.2001.959135
Kessi, S., Marks, Z., Ramugondo, E.: Decolonizing African studies (2020)
Lux, M., Riegler, M., Halvorsen, P., MacStravic, G.: LireSolr: a visual information retrieval server. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 466–469. ICMR 2017. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3078971.3079014
Mbembe, A.: Decolonizing knowledge and the question of the archive (2015)
Memmi, A.: The Colonizer and the Colonized. Routledge (2013). https://dx.doi.org/10.4324/9781315065670
Noble, S.U.: Google search: Hyper-visibility as a means of rendering black women and girls invisible. InVisible Culture (2013)
Parker, K.R.: Introduction: decolonizing the university: a battle for the African mind. CLA J. 60(2), 164–171 (2016)
Simpson, T.W.: Evaluating google as an epistemic tool. Philosophical Engineering: Toward a Philosophy of the Web, pp. 97–115 (2013)
Smithsonian Institute: National museum of African art. https://africa.si.edu/collections/collections
The Five Hundred Year Archive: About. https://fhya.org/about
Won, C.S., Park, D.K., Park, S.J.: Efficient use of mpeg-7 edge histogram descriptor. ETRI J. 24(1), 23–30 (2002). https://dx.doi.org/10.4218/etrij.02.0102.0103
Acknowledgements
This research was partially funded by the National Research Foundation of South Africa (Grant numbers: 105862, 119121 and 129253) and University of Cape Town. The authors acknowledge that opinions, findings and conclusions or recommendations expressed in this publication are that of the authors, and that the NRF accepts no liability whatsoever in this regard.
We would like to acknowledge the Archive & Public Culture research initiative at the University of Cape Town for allowing this research to use the Five Hundred Year Archive data collection for the purposes of this study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, S., Suleman, H. (2022). A Comparison of Information Retrieval Pre-processing Algorithms Applied to African Historical Data. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol 13636. Springer, Cham. https://doi.org/10.1007/978-3-031-21756-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-21756-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21755-5
Online ISBN: 978-3-031-21756-2
eBook Packages: Computer ScienceComputer Science (R0)