Abstract
This project aims to explore to what extent external semantic resources on companies can be used to improve the accuracy of a real bank transaction classification system. The goal is to identify which implementations are best suited to exploit the additional company data retrieved from the Brønnøysund Registry and the Google Places API, and accurately measure the effects they have. The classification system builds on a Bag-of-Words representation and uses Logistic Regression as classification algorithm. This study suggests that enriching bank transactions with external company data substantially improves the accuracy of the classification system. If we compare the results obtained from our research to the baseline, which has an accuracy of 89.22%, the Brønnøysund Registry and Google Places API yield increases of 2.79pp and 2.01pp respectively. In combination, they generate an increase of 3.75pp.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer New York Inc., Secaucus (2006)
Van Asch, V.: Macro-and micro-averaged evaluation measures (2013). https://www.semanticscholar.org/. Accessed 23 Apr 2017
Skeppe, L.B.: Classifying Swedish bank transactions with early and late fusion techniques. Master thesis. KTH Royal Institute of Technology, Stockholm (2014)
Perlich, C.: Which is your favourite Machine Learning Algorithm? (2016). http://www.kdnuggets.com/2016/09/perlich-favorite-machine-learning-algorithm.html. Accessed 10 May 2017
The Google Places API Web Service. https://developers.google.com/places/web-service/intro. Accessed 15 June 2017
The Google Places API Text Search Requests. https://developers.google.com/places/web-service/search#TextSearchRequests. Accessed 15 June 2017
Vollset, E., Folkestad, E.: Automatic classification of bank transactions, Chap. 2. Master thesis. Norwegian University of Science and Technology, Trondheim (2017)
Gutiérrez, Y., Vázquez, S., Montoyo, A.: Sentiment classification using semantic features extracted from WordNet-based resources. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, pp. 139–145 (2011)
Albitar, S., Espinasse, B., Fournier, S.: Semantic enrichments in text supervised classification: application to medical domain. In: Florida Artificial Intelligence Research Society Conference (2014)
Iftene, A., Baboi, A.M.: Using semantic resources in image retrieval. In: 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES 2016, Vol. 96, pp. 436–445. Elsevier (2016)
Ye, Y., Ma, F., Rong, H., Huang, J.Z.: Improved email classification through enriched feature space. In: Li, Q., Wang, G., Feng, L. (eds.) AIM 2004. LNCS, vol. 3129, pp. 489–498. Springer, Heidelberg (2004)
Poyraz, M., Ganiz, M.C., Akyokus, S., Gorener, B., Kilimci, Z.H.: Exploiting Turkish wikipedia as a semantic resource for text classification. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–5 (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Vollset, E., Folkestad, E., Gallala, M.R., Gulla, J.A. (2017). Making Use of External Company Data to Improve the Classification of Bank Transactions. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-69179-4_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)