PREDIKSI JENIS KEBOCORAN DATA KESEHATAN DI AS BERDASARKAN LAPORAN HIPAA MENGGUNAKAN LIGHTGBM DAN KERANGKA OSEMN

Authors

DOI:

https://doi.org/10.30587/indexia.v7i2.10220

Keywords:

HIPAA, LightGBM, OSEMN, Prediksi Pelanggaran, Text Mining

Abstract

Meningkatnya insiden kebocoran data pada sektor kesehatan di Amerika Serikat mendorong perlunya analisis komprehensif terhadap pola, tren, dan prediksi jenis pelanggaran yang terjadi. Penelitian ini menganalisis 1654 laporan pelanggaran data berdasarkan publikasi resmi HIPAA dari tahun 2009 hingga 2016. Dengan pendekatan kerangka kerja OSEMN (Obtain, Scrub, Explore, Model, and Interpret), dilakukan eksplorasi data deskriptif dan text mining untuk mengidentifikasi distribusi insiden berdasarkan waktu, lokasi geografis, jenis entitas, serta lokasi penyimpanan data. Selanjutnya, model prediksi jenis pelanggaran data (Type of Breach) dibangun menggunakan algoritma Light Gradient Boosting Machine (LightGBM) yang dikombinasikan dengan teknik preprocessing, one-hot encoding, SMOTE untuk penyeimbangan kelas, dan tuning hyperparameter melalui GridSearchCV. Evaluasi menggunakan F1-score macro menunjukkan bahwa model mampu melakukan klasifikasi multi-kelas dengan performa baik, khususnya pada kelas mayoritas. Temuan ini memberikan kontribusi penting dalam pemahaman risiko keamanan informasi kesehatan dan menjadi dasar pengembangan sistem deteksi dini berbasis data historis.

Downloads

Download data is not yet available.

References

[1] D. Molitor, A. Saharia, V. Raghupathi, dan W. Raghupathi, “Exploring the characteristics of data breaches: A descriptive analytic study,” J. Inf. Secur. , vol. 15, hal. 168–195, 2024.

[2] R. F. Parks dan L. Adams, “Analyzing security breaches in the U.S.: A business analytics case-study,” Inf. Syst. Educ. J., vol. 14, no. 2, hal. 43–48, 2016.

[3] M. H. Gabriel, A. Noblin, A. Rutherford, A. Walden, dan K. Cortelyou-Ward, “Data breach locations, types, and associated characteristics among US hospitals,” Am. J. Manag. Care , vol. 24, no. 2, hal. 78–84, Feb. 2018.

[4] V. Liu, M. A. Musen, dan T. Chou, “Data breaches of protected health information in the United States,” JAMA , vol. 313, no. 14, hal. 1471–1473, Apr. 2015.

[5] H. Mason dan C. Wiggins, A Taxonomy of Data Science: OSEMN Framework , 2010. [Online]. Tersedia: https://www.dataists.com/2010/09/a-taxonomy-of-data-science/

[6] DQLab, “Step by step tugas data scientist dengan framework OSEMN,” DQLab, 2024. [Online]. Tersedia: https://dqlab.id/step-by-step-tugas-data-scientist-dengan-framework-osemn

[7] N. Nazurah et al ., “HealthyHeart: Visualisasi Prediktif Kondisi Jantung Menggunakan Machine Learning dan OSEMN,” J. Informatika , vol. 7, no. 2, 2023.

[8] W. W. Koczkodaj, M. Nowacki, W. Pedrycz, dan D. Strzalka, “Text mining analysis of over 392 million compromised healthcare records,” Healthc. Anal. , 2025.

[9] W. Raghupathi, V. Raghupathi, dan A. Saharia, “Analyzing health data breaches: A visual analytics approach,” Health Inf. Manag. J. , vol. 52, no. 1, 2023.

[10] J. Reddy, N. Elsayed, Z. ElSayed, dan M. Ozer, “A review on data breaches in healthcare security systems,” Int. J. Cyber Health Secur. , vol. 4, no. 2, 2023.

[11] S. Aljawarneh, M. Aldwairi, dan M. B. Yassein, “Anomaly-based intrusion detection system through feature selection and hybrid model,” J. Comput. Sci. , vol. 25, hal. 63–75, 2018.

[12] G. Ke et al ., “LightGBM: A highly efficient gradient boosting decision tree,” Adv. Neural Inf. Process. Syst. , vol. 30, 2017.

[13] M. M. Hossain and Y. A. Hong, "Trends and characteristics of protected health information breaches in the United States," 2022.

[14] M. Huo, M. Bland, and K. Levchenko, "All Eyes On Me: Inside Third Party Trackers’ Exfiltration of PHI from Healthcare Providers’ Online Systems," Proc. 21st Workshop on Privacy in the Electronic Society (WPES ’22), ACM, Los Angeles, CA, USA, 2022.

[15] M. Zhu, Y. Zhang, Y. Gong, K. Xing, X. Yan, and J. Song, "Ensemble Methodology: Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble," 2024.

[16] A. Shepard and N. Naheed, “Application of Data Transformation Techniques and Train-Test Split,” 2021 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 58-63, Dec. 2021, doi: 10.1109/CSCI54926.2021.00079.

Downloads

Published

2025-10-10

How to Cite

Sulistiyani, N. I., Purnomo, R. A., Rohmatunisa, N., Pujo, R. M., Rizal Broer Bahaweres, M.Kom., & Ir. Nashrul Hakiem, S.Si., M.T., Ph.D. (2025). PREDIKSI JENIS KEBOCORAN DATA KESEHATAN DI AS BERDASARKAN LAPORAN HIPAA MENGGUNAKAN LIGHTGBM DAN KERANGKA OSEMN. Indexia, 7(2), 110–119. https://doi.org/10.30587/indexia.v7i2.10220

Similar Articles

<< < 1 2 3 > >> 

You may also start an advanced similarity search for this article.