Clasificación de muestras de agua para determinar su potabilidad mediante el uso de algoritmos de aprendizaje automático

González Andújar, Daniel

Author

González Andújar, Daniel

URI

http://hdl.handle.net/10334/9193

Date

2023

Subject

Aprendizaje automático, Aprendizaje supervisado, Clasificación binaria, Agua potable, Salud pública

Publisher

Universidad Internacional de Andalucía

Abstract

Trabajo Fin de Máster en Big Data (2022-23). Tutores: Dr. D. Diego Marín Santos ; Dr. D. Manuel Emilio Gegúndez Arias. The main porpuse of this work is to analyze the results obtained in the classification of different water samples through the use of algorithms and machine learning methods. Access to safe drinking water services is still a problem for approximately 2,000 million people around the world, which makes it even more necessary to study and predict the potability of water samples, this analysis is a tool for the prevention of disease and even death. A dataset with water samples has been extracted from the Kaggle website, specifically, the database is made up of a total of 3,276 instances of water samples, where 59.67% correspond to non-potable water samples. Different supervised learning methods such as K-NN, Random Forest or SVM have been used to classify the water samples, and they have been evaluated with different methodologies. The results obtained with the different classifiers, depending on the methodology, vary significantly. The algorithm which we have obtained the best results has been SVM, which is capable of working with an AUC of 0.75, Hit Rate of 0.72, Sensitivity of 0.71 and Specificity of 0.73. Although these results are susceptible of improvement, they indicate that the application of algorithms based on machine learning can constitute an important tool to predict non-potable water samples.