IMPLEMENTASI OPTICAL CHARACTER RECOGNITION DAN ALGORITMA AHO-CORASICK UNTUK DETEKSI BAHAN BERBAHAYA PADA PRODUK SKINCARE

Authors

  • Alfin Syahrina Universitas Bumigora
  • Bambang Krismono Triwijoyo Program Studi Ilmu Komputer, Fakultas Teknik, Universitas Bumigora
  • Muhammad Zulfikri Program Studi Ilmu Komputer, Fakultas Teknik, Universitas Bumigora

DOI:

https://doi.org/10.30587/indexia.v7i2.10242

Keywords:

OCR, Tesseract, Aho-Corasick, skincare, harmful ingredients, text extraction, automatic detection

Abstract

Perkembangan industri skincare di Indonesia memunculkan tantangan terhadap keberadaan bahan kimia berbahaya dalam produk. Pengguna awam kesulitan mengidentifikasi bahan berbahaya dalam daftar komposisi pada kemasan. Penelitian ini bertujuan mengembangkan sistem deteksi otomatis berbasis Tesseract OCR untuk ekstraksi teks dari gambar kemasan, serta algoritma Aho-Corasick untuk mendeteksi bahan berbahaya. Dataset terdiri atas 5.328 bahan skincare dari Kaggle dan 1.004 bahan berbahaya dari CDPH yang diklasifikasikan ke dalam empat kategori risiko. Uji coba pada 30 gambar produk menunjukkan akurasi ekstraksi Tesseract OCR mencapai 93,43% (Word Accuracy) dan 97,06% (Character Accuracy). Deteksi bahan berbahaya menggunakan Aho-Corasick mencapai akurasi 100%. Hasil ini menunjukkan sistem efektif membantu konsumen dalam mengenali bahan berbahaya pada produk skincare.

                                                                                                                                                       

Kata Kunci: OCR, Tesseract, Aho-Corasick, skincare, bahan berbahaya, ekstraksi teks, deteksi otomatis.

 

ABSTRACT

The growth of the skincare industry in Indonesia presents challenges regarding the presence of harmful chemical substances in products. General consumers often struggle to identify hazardous ingredients listed on product packaging. This study aims to develop an automated detection system using Tesseract OCR for text extraction from packaging images and the Aho-Corasick algorithm for detecting harmful ingredients. The dataset consists of 5,328 skincare ingredients from Kaggle and 1,004 hazardous substances from CDPH, classified into four risk categories. Experiments on 30 product images showed that Tesseract OCR achieved a text extraction accuracy of 93.43% (Word Accuracy) and 97.06% (Character Accuracy). The detection of hazardous substances using the Aho-Corasick algorithm reached 100% accuracy. These results indicate that the system is effective in assisting consumers in identifying harmful ingredients in skincare products.

Downloads

Download data is not yet available.

References

[1] A. Asroni, G. Indrawan, dan L. J. Erawati Dewi, “Implementasi Hirarki Dataset Dalam Membangun Model Language Aksara Bali Menggunakan Framework Tesseract OCR,” J. Resist. (Rekayasa Sist. Komputer), vol. 6, no. 1, hal. 20–28, 2023, doi: 10.31598/jurnalresistor.v6i1.1345.

[2] Syafriwan Nasution, “Rancang Bangun Aplikasi Batak Angkola Dictionary,” vol. 3, no. 2, hal. 91–102, 2018.

[3] O. Lazhar dan B. Djamel, “Simd implementation of the Aho-Corasick algorithm using intel Avx2,” Scalable Comput., vol. 20, no. 3, hal. 563–576, 2019, doi: 10.12694/scpe.v20i3.1572.

[4] R. Haryanti, S. Auliya, dan M. Abdassah, “Artikel Ulasan: Tinjauan Bahan Berbahaya dalam Krim Pencerah Kulit,” Farmaka, vol. 16, no. 2, hal. 214–224, 2018.

[5] A. N. Rahmawati, S. A. Wibowo, dan U. Sunarya, “Analisis Sistem Optical Character Recognition (Ocr) Pada Dokumen Digital Menggunakan Metode Tesseract Perfomance Analysis of Optical Character Recognition (Ocr) System on Digital Documents Using Tesseract Method,” e-Proceeding Eng., vol. 8, no. 5, hal. 4777–4785, 2021.

[6] Y. H. Tiara Susilo Putri, Anggalana, “Analisis Yuridis Perlindungan Konsumen Terhadap Kosmetik Kecantikan yang Tidak Layak Edar ( Studi pada Badan Pengawas Obat Makanan BPOM Bandar Lampung ),” vol. 3, no. 1, hal. 335–347, 2024.

[7] N. Yusuf, A. Wahyu, dan H. Habo, “Pengaruh Penggunaan Kosmetik (Whitening Cream) Terhadap Kadar Merkuri (Hg) Pada Perawat Magang Program Studi Profesi Ners Universitas Muslim Indonesia,” Wind. Heal. J. Kesehat., vol. 2, no. 3, hal. 206–217, 2019, doi: 10.33368/woh.v0i0.170.

[8] T. W. Ramdhani, I. Budi, dan B. Purwandari, “Optical Character Recognition Engines Performance Comparison in Information Extraction,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 8, hal. 120–127, 2021, doi: 10.14569/IJACSA.2021.0120814.

[9] S. H. Naibaho, Nailufar Farha Afifah, Yuyun Umaidah, dan Nono Heryana, “Analysis of Student Reading Interest in UNSIKA Library with K-Means Algorithm,” Antivirus J. Ilm. Tek. Inform., vol. 18, no. 1, hal. 82–94, 2024, doi: 10.35457/antivirus.v18i1.2926.

[10] Daniel Jurafsky and James H. Martin, Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 2024. doi: 10.9783/9780812200027.toc.

[11] R. Devi, B. Kumar, dan P. Student, “Special issues on Computer Applications Image Processing Principles and Applications,” hal. 56–59, 2020, [Daring]. Tersedia pada: www.internationaljournalssrg.org

[12] Reza Eka Alfarisi, “Rancang Bangun Aplikasi Terjemahan BahasaJepang - Indonesia Berbasis AndroidMenggunakan Tesseract OCR.,” 2020.

[13] S. G. Ashish Gangurde, Gauri Dhumal, Shivam Gavandi, “Algoritma Pencocokan Pola dan Aplikasinya,” 2023. https://medium.com/@gavadesnehal2/pattern-matching-algorithms-and-its-applications-30c95eaddaff

[14] N. F. Sulaeman dan M. Murnawan, “Implementasi Algoritma Aho-Corasick pada Pencarian di Aplikasi Lost and Found,” J. Edukasi dan Penelit. Inform., vol. 9, no. 3, hal. 509, 2023, doi: 10.26418/jp.v9i3.68389.

[15] B. Komalasari, Rita. Angelina, Joan., Meilani, Pengantar ilmu komputer: teori komprehensif perkembangan ilmu komputer terkini, no. January. 2023.

Downloads

Published

2025-10-10

How to Cite

Syahrina, A., Krismono Triwijoyo, B., & Zulfikri, M. (2025). IMPLEMENTASI OPTICAL CHARACTER RECOGNITION DAN ALGORITMA AHO-CORASICK UNTUK DETEKSI BAHAN BERBAHAYA PADA PRODUK SKINCARE. Indexia, 7(2), 88–96. https://doi.org/10.30587/indexia.v7i2.10242

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.