Researcher / Data Scientist
cminutti@data-fusionlab.com
I holds a B.Sc. in Statistics from Chapingo Autonomous University, a M.Sc. in Mathematics, and Ph.D. in Computer Science from the National University of Mexico. I also undertook a research stay at the University of Waterloo, Canada, as part of the Emerging Leaders in the Americas Program.
I won second place for the best master's thesis in statistics, receiving the national award from the Mexican Association of Statistics, and earning first place in the 22nd Mexican International Conference on Artificial Intelligence for the best paper. Presently, I hold a position as a member of the National System of Researchers.I have made contributions to the field of data science as a consultant for various companies. My expertise extends to serving as a research associate for the "Consortium of Artificial Intelligence." Furthermore, I have undertaken postdoctoral fellowships at both the National Polytechnic Institute and the National University of Mexico.
My current research focuses on the use of data science and artificial intelligence applied to health.
Explainability and bias mitigation are crucial aspects of deep learning (DL) models for medical image analysis. Generative AI, particularly autoencoders, can enhance explainability by analyzing the latent space to identify and control variables that contribute to biases. By manipulating the latent space, biases can be mitigated in the classification layer. Furthermore, the latent space can be visualized to provide a more intuitive understanding of the model’s decision-making process. In our work, we demonstrate how the proposed approach enhances the explainability of the decision-making process, surpassing the capabilities of traditional methods like Grad-Cam. Our approach effectively identifies and mitigates biases in a straightforward manner, without necessitating model retraining or dataset modification, showing how Generative AI has the potential to play a pivotal role in addressing explainability and bias mitigation challenges, enhancing the trustworthiness and clinical utility of DL-powered medical image analysis tools.
In this paper, we introduce PumaMedNet-CXR, a generative AI designed for medical image classification, with a specific emphasis on Chest X-ray (CXR) images. The model effectively corrects common defects in CXR images, offers improved explainability, enabling a deeper understanding of its decision-making process. By analyzing its latent space, we can identify and mitigate biases, ensuring a more reliable and transparent model. Notably, PumaMedNet-CXR achieves comparable performance to larger pre-trained models through transfer learning, making it a promising tool for medical image analysis. The model’s highly efficient autoencoder-based architecture, along with its explainability and bias mitigation capabilities, contribute to its significant potential in advancing medical image understanding and analysis.
Air pollution has been linked to premature mortality and reduced life expectancy, with acute and chronic effects on human health. These effects can be difficult to measure because of possible interactions and nonlinear relationships with other variables such as age, weight, sex, and socioeconomic status.Multi-dimensional relationships are difficult to model using conventional statistical methods. However, modern machine learning techniques have been quite successful in this domain.In this study, gradient boosting regression trees are used to predict the severity/mortality of the leading causes of hospitalization in Mexico City for 91,964 patients during the years 2015-2020 to measure the impact due to different air pollutants. The results show multiple nonlinear relationships and a significant effect of air pollutants on some of the most prevalent diseases.
In large cities, the health of the inhabitants and the concentrations of particles smaller than 10 and 2.5 μm as well as ozone are related, making their prediction useful for the government and citizens. Mexico City has an air quality forecast system, which presents a forecast by pollutant at hourly and geographic zone level, but is only valid for the next 24 h.
To generate predictions for a longer time period, sophisticated methods need to be used, but highly automated techniques, such as deep learning, require a large amount of data, which are not available for this problem. Therefore, a set of predictor variables is created to feed and test different Machine Learning (ML) methods, and determine which features of these methods are essential for the prediction of different pollutant concentrations, to develop a hybrid ad-hoc model that includes ML features, but allowing a level of explainability, unlike what would occur with methods such as neural networks.
In this work we present a hybrid prediction model using different statistical methods and ML techniques, which allow estimating the concentration of the three main pollutants in the air of Mexico City two weeks ahead. The results of the different models are presented and compared, with the hybrid model being the one that best predicts the extreme cases.