Skip to main content
Log in

Harmful ingredient detection from cosmetic products using optical character recognition and bi-LSTM model

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

As the variety of cosmetic products increases day by day, the controllability of their ingredients decreases. It is becoming difficult for users to obtain information about the potential risks of the products they purchase. The main reasons for this are that ingredients that can cause health problems, such as allergens and harmful chemicals in cosmetic products, have complicated names on their labels, and the text on the labels is difficult to read. This makes it difficult for consumers to accurately assess potential health risks. The aim of this project is to identify harmful ingredients in cosmetic products and to inform the user about the potential risks of these products. Our project includes a mobile application developed as a solution to the difficulties of making informed choices in the cosmetics industry. Users can use the application by taking photos of the labels of cosmetic products with their smartphones. These photos pass through an OCR engine in the background. The OCR engine detects the text on the label and converts it into digital text format. This text data is sent to a pre-trained LSTM-based information extraction model. Thus, the information extraction model identifies the ingredients from all the text on the label. This list of detected ingredients is compared against a database of harmful ingredients using a word similarity algorithm. The database consists of harmful ingredients identified in light of reports and articles previously published by various organizations. After all these steps, the user is provided with information about the harmful ingredients in the product, the potential risks associated with these substances, and the names of all the ingredients found in the product. As a result of the project, an application developed for smartphones allows users to learn the ingredients of cosmetic products instantly by taking a photo. The application identifies ingredients with high accuracy rates and detects harmful ingredients. In this way, the objective is to protect consumers from potential health risks in cosmetic products by making informed choices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Due to the nature of the investigation, no supporting data is available for ethical reasons.

References

  1. Mithe, R., Indalkar, S., Divekar, N.: Optical character recognition. Int. J. Recent Technol. Eng. IJRTE 2(1), 72–75 (2013)

    Google Scholar 

  2. Sun, P., Yang, X., Zhao, X., Wang, Z.: An overview of named entity recognition. In: 2018 International Conference on Asian Language Processing (IALP), pp. 273–278 (2018). https://ieeexplore.ieee.org/abstract/document/8629225. Accessed 19 Sept 2024

  3. Özger, Z.B., DiRi, B.: Türkçe Dokümanlar İçin Kural Tabanlı Varlık İsmi Tanıma (Named Entity Recognition for Turkish Text)

  4. Van Houdt, G., Mosquera, C., Nápoles, G.: A review on the long short-term memory model. Artif. Intell. Rev. 53(8), 5929–5955 (2020)

    MATH  Google Scholar 

  5. Dursun, B., Sonmez, A.C.: Türkçe metin benzerlik hesaplamasi için yeni bir yöntem. In: 2008 IEEE 16th Signal Processing, Communication and Applications Conference, pp. 1–4 (2008). ISSN: 2165-0608. https://ieeexplore.ieee.org/abstract/document/4632581. Accessed 19 Sept 2024

  6. Alwis, K., Udayangi, T.: WellnessCare: OCR-based web application for cosmetic product safety assurance (2022)

  7. Rohini, B., Pavuluri, D.M., Naresh Kumar, L., Soorya, V., Aravinth, J.: A framework to identify allergen and nutrient content in fruits and packaged food using deep learning and OCR. In: 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 72–77 (2021). ISSN: 2575-7288. https://ieeexplore.ieee.org/abstract/document/9441800. Accessed 19 Sept 2024

  8. Hivarkar, P., Shrivastava, V., Bhattacharya, P., Jawade, S., Jain, A., Thakre, D., Harkare, N.: Product analysis using computer vision. Int. J. Comput. Sci. Mob. Comput. 11(12), 48–52 (2022)

  9. Yuniarti, A., Kuswardayan, I., Hariadi, R.R., Arifiani, S., Mursidah, E.: Design of integrated Latext: Halal detection text using OCR (Optical character recognition) and web service. In: 2017 International Seminar on Application for Technology of Information and Communication (iSemantic), pp. 137–141 (2017). https://ieeexplore.ieee.org/abstract/document/8251858

  10. Fadhilah, H., Djamal, E.C., Ilyas, R., Najmurrokhman, A.: Non-halal ingredients detection of food packaging image using convolutional neural networks. In: 2018 International Symposium on Advanced Intelligent Informatics (SAIN), pp. 131–136 (2018). https://doi.org/10.1109/SAIN.2018.8673376

  11. Khairani, D., Bangkit, D.A., Rozi, N.F., Masruroh, S.U., Oktaviana, S., Rosyadi, T.: Named-entity recognition and optical character recognition for detecting halal food ingredients: Indonesian case study. In: 2022 10th International Conference on Cyber and IT Service Management (CITSM), pp. 01–05 (2022). ISSN: 2770-159X. https://ieeexplore.ieee.org/abstract/document/9935966. Accessed 19 Sept 2024

  12. Bagal, V., Gaykar, K., Ahirao, P.: Image based text translation using firebase ML kit

  13. Andersen, F.A.: Annual review of cosmetic ingredient safety assessments: 2007–2010. Int. J. Toxicol. 30(5–suppl), 73–127 (2011)

    MATH  Google Scholar 

Download references

Funding

This work has been supported by the Turkish Scientific and Technological Research Council (TÜBITAK) under Grant Number 1919B012322132.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Çağrı Sayallar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sayallar, Ç., Sayar, A. Harmful ingredient detection from cosmetic products using optical character recognition and bi-LSTM model. SIViP 19, 338 (2025). https://doi.org/10.1007/s11760-025-03923-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-025-03923-0

Keywords