Using Naïve Bayes Algorithm to Predict and Classify Alcohol Addiction Severity: A Machine Learning Approach for Public Health Interventions
DOI:
https://doi.org/10.48017/dj.v10i1.3131Keywords:
Machine Learning, , Naïve Bayes Algorithm, K-means Clustering, Alcoholism,, Alcohol AddictionAbstract
Alcohol addiction has increasingly emerged as a significant concern in global health, with current methods of prediction and classification revealing certain limitations. The principal objective of this study was to deepen the understanding of predicting and classifying alcohol addiction levels by employing the Naïve Bayes Algorithm and K-means Clustering. Through a thorough survey, data from 500 participants were collected, shedding light on factors such as the frequency of alcohol consumption and associated negative impacts. The methodology utilized the Naïve Bayes Algorithm, registering a notable accuracy of 95%, precision of 93%, recall of 97%, and an F1 Score of 95%. Concurrently, the K-means Clustering method effectively delineated three distinct levels of addiction: less addicted, mildly addicted, and highly addicted. When juxtaposed with existing literature and methodologies, the study's approach showcases superior accuracy and a refined classification system, offering a potent tool for healthcare practitioners to identify and address alcohol addiction. Potential avenues for future exploration include integrating varied algorithms and probing into other facets of addiction.
Metrics
References
Ali, D. S., Ghoneim, A., & Saleh, M. (2017). Data clustering method based on mixed similarity measures. In Proceedings of the 6th International Conference on Operations Research and Enterprise Systems. https://doi.org/10.5220/0006245600001482
Ali, S. F., Onaivi, E. S., Dodd, P. R., Cadet, J. L., Schenk, S., Kuhar, M. J., & Koob, G. F. (2011). Understanding the global problem of drug addiction is a challenge for IDARS scientists. Current Neuropharmacology, 9(1), 2–7. https://doi.org/10.2174/157015911795017245
American Psychological Association. (2012). Understanding alcohol use disorders and their treatment. https://www.apa.org/topics/substance-use-abuse-addiction/alcohol-disorders
An, Q., Rahman, S., Zhou, J., & Kang, J. J. (2023). A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors, 23, 4178. https://doi.org/10.3390/s23094178
Azeraf, E., Monfrini, E., & Pieczynski, W. (2022). Improving usual Naive Bayes classifier performances with neural Naive Bayes-based models. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods. ttps://doi.org/10.5220/0010890400003122
Bazett, T. (2022). Introduction to Bayes’ Theorem. In Bayesian Inference. https://doi.org/10.1007/978-3-030-95792-6_3
Bhatt, A. (2022). Alcohol addiction and abuse. Addiction Center. https://www.addictioncenter.com/alcohol/
Bèchet, N. B., Shanbhag, N. C., & Lundgaard, I. (2020). Glymphatic function in the gyrencephalic brain. BioRxiv. https://doi.org/10.1101/2020.11.09.373894
Bijnen, E. J. (1973). Coefficients for defining the degree of similarity between objects. In Cluster Analysis (pp. 4–20). https://doi.org/10.1007/978-94-011-6782-6_2
Centers for Disease Control and Prevention. (2022). Alcohol-related disease impact application website.
Chaudhary, M. (2020). K-means clustering in machine learning. Medium. https://medium.com/@cmukesh8688/k-means-clustering-in-machine-learning-252130c85e23
Chiva-Blanch, G., & Badimon, L. (2019). Benefits and risks of moderate alcohol consumption on cardiovascular disease: Current findings and controversies. Nutrients, 12(1), 108. https://doi.org/10.3390/nu12010108
David, C., et al. (2016). Usability of a smartphone app to reduce excessive alcohol consumption. Frontiers in Public Health, 4. https://doi.org/10.3389/conf.fpubh.2016.01.00064
Deng, Z., Choi, K.-S., Chung, F.-L., & Wang, S. (2010). Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition, 43(3), 767–781. https://doi.org/10.1016/j.patcog.2009.09.010
Early exposure to child abuse or neglect can cause long term health consequences. (2009). PsycEXTRA Dataset. https://doi.org/10.1037/e572212009-002
Epinephrine. (2023). Reactions Weekly, 1968(1), 138–138. https://doi.org/10.1007/s40278-023-44302-4
Franjic, S. (2021). Frequent alcohol consumption can have detrimental health consequences. Archives of Psychiatry and Behavioral Sciences, 4(1), 29–34. https://doi.org/10.22259/2638-5201.0401005
Habehh, H., & Gohel, S. (2021). Machine learning in healthcare. Current Genomics, 22(4), 291–300. https://doi.org/10.2174/1389202922666210705124359
Harmful use of alcohol kills more than 3 million people each year, most of them men. (2023). Human Rights Documents Online. https://doi.org/10.1163/2210-7975_hrd-9841-20180011
Hartung, T. (2018). Making big sense from big data. Frontiers in Big Data, 1, October. https://doi.org/10.3389/fdata.2018.00005
Jacobs, K. (1978). Positive contents and measures. In Measure and Integral (pp. 26–71). https://doi.org/10.1016/b978-0-12-378550-3.50005-0
Jarman, M. P., & Haider, A. H. (2019). When one data set is insufficient—Things to consider when linking secondary data—Reply. JAMA Surgery, 154(2), 187. https://doi.org/10.1001/jamasurg.2018.4751
Jo, T. (2020). K means algorithm. In Machine Learning Foundations (pp. 217–240). https://doi.org/10.1007/978-3-030-65900-4_10
Khalaf, A., Majeed, A., Akeel, W., & Salah, A. (2017). Students’ success prediction based on Bayes algorithms. International Journal of Computer Applications, 178(7), 6–12. https://doi.org/10.5120/ijca2017915506
Kim, K. (2017). A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures. Journal of Intelligent & Fuzzy Systems, 32(1), 979–990. https://doi.org/10.3233/jifs-16157
Kozak, M., Zieliński, A., & Singh, S. (2008). Stratified two-stage sampling in domains: Sample allocation between domains, strata, and sampling stages. Statistics & Probability Letters, 78(8), 970–974. https://doi.org/10.1016/j.spl.2007.09.057
Lee, R. B., Baring, R., Maria, M. S., & Reysen, S. (2015). Attitude towards technology, social media usage and grade-point average as predictors of global citizenship identification in Filipino university students. International Journal of Psychology, 52(3), 213–219. ://doi.org/10.1002/ijop.12200
Lewis, D. J. (1969). Positive instances of reinstatement. Science, 166(3906), 772–772. https://doi.org/10.1126/science.166.3906.772-a
Mean average precision. (n.d.). Springer Reference. https://doi.org/10.1007/springerreference_65277
Nembach, E. (1975). Critical resolved shear stress of materials which simultaneously contain various types of obstacles impeding the glide of dislocations. In The Movement of Molecules Across Cell Membranes (pp. 413–416). https://doi.org/10.1007/978-3-540-37413-0_19
Palupi, E. S. (2021). Employee turnover classification using PSO-based naïve Bayes and naïve Bayes algorithm in PT. Mastersystem Infotama. Jurnal Riset Informatika, 3(3), 233–240. https://doi.org/10.34288/jri.v3i3.232
Schwenkreis, F. (2022). Using the silhouette coefficient for representative search of team tactics in noisy data. In Proceedings of the 11th International Conference on Data Science, Technology and Applications. https://doi.org/10.5220/0011100600003269
Sudhinaraset, M., Wigglesworth, C., & Takeuchi, D. T. (2016). Social and cultural context of alcohol use: Influences in a social-ecological framework. Alcohol Research, 38(1), 35–45. https://pubmed.ncbi.nlm.nih.gov/27159810
Sullivan, M. G. (2009). Too many pregnant women still drink alcohol. Family Practice News, 39(12), 33. https://doi.org/10.1016/s03007073(09)70489-x
Unsupervised learning—Clustering using K-means. (2019). In Python® Machine Learning (pp. 221–242). https://doi.org/10.1002/9781119557500.ch10
Vongprechakorn, K., Chumuang, N., & Farooq, A. (2019). Prediction model for amphetamine behaviors based on Bayes network classifier. In 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (pp. 1–6). https://doi.org/10.1109/iSAI-NLP48611.2019.9045560
Whiteman, H. (2022). Drinking alcohol can clear brain waste, study finds. Medical News Today. https://www.medicalnewstoday.com/articles/320824
Woodman, R. J., & Mangoni, A. A. (2023). A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging Clinical and Experimental Research. https://doi.org/10.1007/s40520-023-02552-2
Xiao, N., Li, K., Zhou, X., & Li, K. (2019). A novel clustering algorithm based on directional propagation of cluster labels. In 2019 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn.2019.8852159
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Francis Balazon

This work is licensed under a Creative Commons Attribution 4.0 International License.
The Diversitas Journal expresses that the articles are the sole responsibility of the Authors, who are familiar with Brazilian and international legislation.
Articles are peer-reviewed and care should be taken to warn of the possible incidence of plagiarism. However, plagiarism is an indisputable action by the authors.
The violation of copyright is a crime, provided for in article 184 of the Brazilian Penal Code: “Art. 184 Violating copyright and related rights: Penalty - detention, from 3 (three) months to 1 (one) year, or fine. § 1 If the violation consists of total or partial reproduction, for the purpose of direct or indirect profit, by any means or process, of intellectual work, interpretation, performance or phonogram, without the express authorization of the author, the performer, the producer , as the case may be, or whoever represents them: Penalty - imprisonment, from 2 (two) to 4 (four) years, and a fine. ”