Clasificación de muestras de agua para determinar su potabilidad mediante el uso de algoritmos de aprendizaje automático
Author
Date
Subject
Publisher
Abstract
Trabajo Fin de Máster en Big Data (2022-23). Tutores: Dr. D. Diego Marín Santos ; Dr. D. Manuel Emilio Gegúndez Arias. The main porpuse of this work is to analyze the results obtained in the classification of different water samples through the use of algorithms and machine learning methods. Access to safe drinking water services is still a problem for approximately 2,000 million people around the world, which makes it even more necessary to study and predict the potability of water samples, this analysis is a tool for the prevention of disease and even death. A dataset with water samples has been extracted from the Kaggle website, specifically, the database is made up of a total of 3,276 instances of water samples, where 59.67% correspond to non-potable water samples. Different supervised learning methods such as K-NN, Random Forest or SVM have been used to classify the water samples, and they have been evaluated with different methodologies. The results obtained with the different classifiers, depending on the methodology, vary significantly. The algorithm which we have obtained the best results has been SVM, which is capable of working with an AUC of 0.75, Hit Rate of 0.72, Sensitivity of 0.71 and Specificity of 0.73. Although these results are susceptible of improvement, they indicate that the application of algorithms based on machine learning can constitute an important tool to predict non-potable water samples.
Trabajo Fin de Máster en Big Data (2022-23). Tutores: Dr. D. Diego Marín Santos ; Dr. D. Manuel Emilio Gegúndez Arias. The main porpuse of this work is to analyze the results obtained in the classification of different water samples through the use of algorithms and machine learning methods. Access to safe drinking water services is still a problem for approximately 2,000 million people around the world, which makes it even more necessary to study and predict the potability of water samples, this analysis is a tool for the prevention of disease and even death. A dataset with water samples has been extracted from the Kaggle website, specifically, the database is made up of a total of 3,276 instances of water samples, where 59.67% correspond to non-potable water samples. Different supervised learning methods such as K-NN, Random Forest or SVM have been used to classify the water samples, and they have been evaluated with different methodologies. The results obtained with the different classifiers, depending on the methodology, vary significantly. The algorithm which we have obtained the best results has been SVM, which is capable of working with an AUC of 0.75, Hit Rate of 0.72, Sensitivity of 0.71 and Specificity of 0.73. Although these results are susceptible of improvement, they indicate that the application of algorithms based on machine learning can constitute an important tool to predict non-potable water samples.