Abstract
As the variety of cosmetic products increases day by day, the controllability of their ingredients decreases. It is becoming difficult for users to obtain information about the potential risks of the products they purchase. The main reasons for this are that ingredients that can cause health problems, such as allergens and harmful chemicals in cosmetic products, have complicated names on their labels, and the text on the labels is difficult to read. This makes it difficult for consumers to accurately assess potential health risks. The aim of this project is to identify harmful ingredients in cosmetic products and to inform the user about the potential risks of these products. Our project includes a mobile application developed as a solution to the difficulties of making informed choices in the cosmetics industry. Users can use the application by taking photos of the labels of cosmetic products with their smartphones. These photos pass through an OCR engine in the background. The OCR engine detects the text on the label and converts it into digital text format. This text data is sent to a pre-trained LSTM-based information extraction model. Thus, the information extraction model identifies the ingredients from all the text on the label. This list of detected ingredients is compared against a database of harmful ingredients using a word similarity algorithm. The database consists of harmful ingredients identified in light of reports and articles previously published by various organizations. After all these steps, the user is provided with information about the harmful ingredients in the product, the potential risks associated with these substances, and the names of all the ingredients found in the product. As a result of the project, an application developed for smartphones allows users to learn the ingredients of cosmetic products instantly by taking a photo. The application identifies ingredients with high accuracy rates and detects harmful ingredients. In this way, the objective is to protect consumers from potential health risks in cosmetic products by making informed choices.





Similar content being viewed by others
Data Availability
Due to the nature of the investigation, no supporting data is available for ethical reasons.
References
Mithe, R., Indalkar, S., Divekar, N.: Optical character recognition. Int. J. Recent Technol. Eng. IJRTE 2(1), 72–75 (2013)
Sun, P., Yang, X., Zhao, X., Wang, Z.: An overview of named entity recognition. In: 2018 International Conference on Asian Language Processing (IALP), pp. 273–278 (2018). https://ieeexplore.ieee.org/abstract/document/8629225. Accessed 19 Sept 2024
Özger, Z.B., DiRi, B.: Türkçe Dokümanlar İçin Kural Tabanlı Varlık İsmi Tanıma (Named Entity Recognition for Turkish Text)
Van Houdt, G., Mosquera, C., Nápoles, G.: A review on the long short-term memory model. Artif. Intell. Rev. 53(8), 5929–5955 (2020)
Dursun, B., Sonmez, A.C.: Türkçe metin benzerlik hesaplamasi için yeni bir yöntem. In: 2008 IEEE 16th Signal Processing, Communication and Applications Conference, pp. 1–4 (2008). ISSN: 2165-0608. https://ieeexplore.ieee.org/abstract/document/4632581. Accessed 19 Sept 2024
Alwis, K., Udayangi, T.: WellnessCare: OCR-based web application for cosmetic product safety assurance (2022)
Rohini, B., Pavuluri, D.M., Naresh Kumar, L., Soorya, V., Aravinth, J.: A framework to identify allergen and nutrient content in fruits and packaged food using deep learning and OCR. In: 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 72–77 (2021). ISSN: 2575-7288. https://ieeexplore.ieee.org/abstract/document/9441800. Accessed 19 Sept 2024
Hivarkar, P., Shrivastava, V., Bhattacharya, P., Jawade, S., Jain, A., Thakre, D., Harkare, N.: Product analysis using computer vision. Int. J. Comput. Sci. Mob. Comput. 11(12), 48–52 (2022)
Yuniarti, A., Kuswardayan, I., Hariadi, R.R., Arifiani, S., Mursidah, E.: Design of integrated Latext: Halal detection text using OCR (Optical character recognition) and web service. In: 2017 International Seminar on Application for Technology of Information and Communication (iSemantic), pp. 137–141 (2017). https://ieeexplore.ieee.org/abstract/document/8251858
Fadhilah, H., Djamal, E.C., Ilyas, R., Najmurrokhman, A.: Non-halal ingredients detection of food packaging image using convolutional neural networks. In: 2018 International Symposium on Advanced Intelligent Informatics (SAIN), pp. 131–136 (2018). https://doi.org/10.1109/SAIN.2018.8673376
Khairani, D., Bangkit, D.A., Rozi, N.F., Masruroh, S.U., Oktaviana, S., Rosyadi, T.: Named-entity recognition and optical character recognition for detecting halal food ingredients: Indonesian case study. In: 2022 10th International Conference on Cyber and IT Service Management (CITSM), pp. 01–05 (2022). ISSN: 2770-159X. https://ieeexplore.ieee.org/abstract/document/9935966. Accessed 19 Sept 2024
Bagal, V., Gaykar, K., Ahirao, P.: Image based text translation using firebase ML kit
Andersen, F.A.: Annual review of cosmetic ingredient safety assessments: 2007–2010. Int. J. Toxicol. 30(5–suppl), 73–127 (2011)
Funding
This work has been supported by the Turkish Scientific and Technological Research Council (TÜBITAK) under Grant Number 1919B012322132.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sayallar, Ç., Sayar, A. Harmful ingredient detection from cosmetic products using optical character recognition and bi-LSTM model. SIViP 19, 338 (2025). https://doi.org/10.1007/s11760-025-03923-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-025-03923-0