Using Naïve Bayes Algorithm to Predict and Classify Alcohol Addiction Severity: A Machine Learning Approach for Public Health Interventions

Francis Balazon

doi:10.48017/dj.v10i1.3131

Authors

Francis Balazon College of Teacher Education Graduate School, Batangas State University The National Engineering University, Philippines https://orcid.org/0000-0003-0143-2983

DOI:

https://doi.org/10.48017/dj.v10i1.3131

Keywords:

Machine Learning, , Naïve Bayes Algorithm, K-means Clustering, Alcoholism,, Alcohol Addiction

Abstract

Alcohol addiction has increasingly emerged as a significant concern in global health, with current methods of prediction and classification revealing certain limitations. The principal objective of this study was to deepen the understanding of predicting and classifying alcohol addiction levels by employing the Naïve Bayes Algorithm and K-means Clustering. Through a thorough survey, data from 500 participants were collected, shedding light on factors such as the frequency of alcohol consumption and associated negative impacts. The methodology utilized the Naïve Bayes Algorithm, registering a notable accuracy of 95%, precision of 93%, recall of 97%, and an F1 Score of 95%. Concurrently, the K-means Clustering method effectively delineated three distinct levels of addiction: less addicted, mildly addicted, and highly addicted. When juxtaposed with existing literature and methodologies, the study's approach showcases superior accuracy and a refined classification system, offering a potent tool for healthcare practitioners to identify and address alcohol addiction. Potential avenues for future exploration include integrating varied algorithms and probing into other facets of addiction.

Metrics

Metrics Loading ...

Author Biography

Francis Balazon, College of Teacher Education Graduate School, Batangas State University The National Engineering University, Philippines

0000-0003-0143-2983; College of Teacher Education Graduate School, Batangas State University The National Engineering University, Philippines, francis.balazon@g.batstate-u.edu.ph

References

Ali, D. S., Ghoneim, A., & Saleh, M. (2017). Data clustering method based on mixed similarity measures. In Proceedings of the 6th International Conference on Operations Research and Enterprise Systems. https://doi.org/10.5220/0006245600001482

Ali, S. F., Onaivi, E. S., Dodd, P. R., Cadet, J. L., Schenk, S., Kuhar, M. J., & Koob, G. F. (2011). Understanding the global problem of drug addiction is a challenge for IDARS scientists. Current Neuropharmacology, 9(1), 2–7. https://doi.org/10.2174/157015911795017245

American Psychological Association. (2012). Understanding alcohol use disorders and their treatment. https://www.apa.org/topics/substance-use-abuse-addiction/alcohol-disorders

An, Q., Rahman, S., Zhou, J., & Kang, J. J. (2023). A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors, 23, 4178. https://doi.org/10.3390/s23094178

Azeraf, E., Monfrini, E., & Pieczynski, W. (2022). Improving usual Naive Bayes classifier performances with neural Naive Bayes-based models. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods. ttps://doi.org/10.5220/0010890400003122

Bazett, T. (2022). Introduction to Bayes’ Theorem. In Bayesian Inference. https://doi.org/10.1007/978-3-030-95792-6_3

Bhatt, A. (2022). Alcohol addiction and abuse. Addiction Center. https://www.addictioncenter.com/alcohol/

Bèchet, N. B., Shanbhag, N. C., & Lundgaard, I. (2020). Glymphatic function in the gyrencephalic brain. BioRxiv. https://doi.org/10.1101/2020.11.09.373894

Bijnen, E. J. (1973). Coefficients for defining the degree of similarity between objects. In Cluster Analysis (pp. 4–20). https://doi.org/10.1007/978-94-011-6782-6_2

Centers for Disease Control and Prevention. (2022). Alcohol-related disease impact application website.

Chaudhary, M. (2020). K-means clustering in machine learning. Medium. https://medium.com/@cmukesh8688/k-means-clustering-in-machine-learning-252130c85e23

Chiva-Blanch, G., & Badimon, L. (2019). Benefits and risks of moderate alcohol consumption on cardiovascular disease: Current findings and controversies. Nutrients, 12(1), 108. https://doi.org/10.3390/nu12010108

David, C., et al. (2016). Usability of a smartphone app to reduce excessive alcohol consumption. Frontiers in Public Health, 4. https://doi.org/10.3389/conf.fpubh.2016.01.00064

Deng, Z., Choi, K.-S., Chung, F.-L., & Wang, S. (2010). Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition, 43(3), 767–781. https://doi.org/10.1016/j.patcog.2009.09.010

Early exposure to child abuse or neglect can cause long term health consequences. (2009). PsycEXTRA Dataset. https://doi.org/10.1037/e572212009-002

Epinephrine. (2023). Reactions Weekly, 1968(1), 138–138. https://doi.org/10.1007/s40278-023-44302-4

Franjic, S. (2021). Frequent alcohol consumption can have detrimental health consequences. Archives of Psychiatry and Behavioral Sciences, 4(1), 29–34. https://doi.org/10.22259/2638-5201.0401005

Habehh, H., & Gohel, S. (2021). Machine learning in healthcare. Current Genomics, 22(4), 291–300. https://doi.org/10.2174/1389202922666210705124359

Harmful use of alcohol kills more than 3 million people each year, most of them men. (2023). Human Rights Documents Online. https://doi.org/10.1163/2210-7975_hrd-9841-20180011

Hartung, T. (2018). Making big sense from big data. Frontiers in Big Data, 1, October. https://doi.org/10.3389/fdata.2018.00005

Jacobs, K. (1978). Positive contents and measures. In Measure and Integral (pp. 26–71). https://doi.org/10.1016/b978-0-12-378550-3.50005-0

Jarman, M. P., & Haider, A. H. (2019). When one data set is insufficient—Things to consider when linking secondary data—Reply. JAMA Surgery, 154(2), 187. https://doi.org/10.1001/jamasurg.2018.4751

Jo, T. (2020). K means algorithm. In Machine Learning Foundations (pp. 217–240). https://doi.org/10.1007/978-3-030-65900-4_10

Khalaf, A., Majeed, A., Akeel, W., & Salah, A. (2017). Students’ success prediction based on Bayes algorithms. International Journal of Computer Applications, 178(7), 6–12. https://doi.org/10.5120/ijca2017915506

Kim, K. (2017). A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures. Journal of Intelligent & Fuzzy Systems, 32(1), 979–990. https://doi.org/10.3233/jifs-16157

Kozak, M., Zieliński, A., & Singh, S. (2008). Stratified two-stage sampling in domains: Sample allocation between domains, strata, and sampling stages. Statistics & Probability Letters, 78(8), 970–974. https://doi.org/10.1016/j.spl.2007.09.057

Lee, R. B., Baring, R., Maria, M. S., & Reysen, S. (2015). Attitude towards technology, social media usage and grade-point average as predictors of global citizenship identification in Filipino university students. International Journal of Psychology, 52(3), 213–219. ://doi.org/10.1002/ijop.12200

Lewis, D. J. (1969). Positive instances of reinstatement. Science, 166(3906), 772–772. https://doi.org/10.1126/science.166.3906.772-a

Mean average precision. (n.d.). Springer Reference. https://doi.org/10.1007/springerreference_65277

Nembach, E. (1975). Critical resolved shear stress of materials which simultaneously contain various types of obstacles impeding the glide of dislocations. In The Movement of Molecules Across Cell Membranes (pp. 413–416). https://doi.org/10.1007/978-3-540-37413-0_19

Palupi, E. S. (2021). Employee turnover classification using PSO-based naïve Bayes and naïve Bayes algorithm in PT. Mastersystem Infotama. Jurnal Riset Informatika, 3(3), 233–240. https://doi.org/10.34288/jri.v3i3.232

Schwenkreis, F. (2022). Using the silhouette coefficient for representative search of team tactics in noisy data. In Proceedings of the 11th International Conference on Data Science, Technology and Applications. https://doi.org/10.5220/0011100600003269

Sudhinaraset, M., Wigglesworth, C., & Takeuchi, D. T. (2016). Social and cultural context of alcohol use: Influences in a social-ecological framework. Alcohol Research, 38(1), 35–45. https://pubmed.ncbi.nlm.nih.gov/27159810

Sullivan, M. G. (2009). Too many pregnant women still drink alcohol. Family Practice News, 39(12), 33. https://doi.org/10.1016/s03007073(09)70489-x

Unsupervised learning—Clustering using K-means. (2019). In Python® Machine Learning (pp. 221–242). https://doi.org/10.1002/9781119557500.ch10

Vongprechakorn, K., Chumuang, N., & Farooq, A. (2019). Prediction model for amphetamine behaviors based on Bayes network classifier. In 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (pp. 1–6). https://doi.org/10.1109/iSAI-NLP48611.2019.9045560

Whiteman, H. (2022). Drinking alcohol can clear brain waste, study finds. Medical News Today. https://www.medicalnewstoday.com/articles/320824

Woodman, R. J., & Mangoni, A. A. (2023). A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging Clinical and Experimental Research. https://doi.org/10.1007/s40520-023-02552-2

Xiao, N., Li, K., Zhou, X., & Li, K. (2019). A novel clustering algorithm based on directional propagation of cluster labels. In 2019 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn.2019.8852159