Researcher / Data Scientist
cminutti@data-fusionlab.com
I am a researcher in artificial intelligence and data science applied to health. I hold a B.Sc. in Statistics from Chapingo Autonomous University, an M.Sc. in Mathematics, and a Ph.D. in Computer Science from the National Autonomous University of Mexico, with a research stay at the University of Waterloo through the Emerging Leaders in the Americas Program.
I am a member of Mexico’s National System of Researchers. My work has received both national and international recognition, including the Mexican Association of Statistics award for the second-best master’s thesis, first place in the Best Paper Award at MICAI 2023, and third place in the AFIRME–UNAM Research Award 2024. I have also received awards in international data science and artificial intelligence competitions, including first place at the International Joint Conference on Neural Networks (IJCNN 2025) and second place at the Iberian Language Evaluation Forum (IberLEF 2025).
I have completed postdoctoral fellowships at both the National Polytechnic Institute and the National Autonomous University of Mexico. I have also worked as a data science consultant and research associate in collaborative AI initiatives.
This study investigates the combined effects of air pollution and socioeconomic factors on disease incidence and severity, addressing gaps in prior research that often analyzed these factors separately. Using data from 86,170 hospitalizations in Mexico City (2015–2019), we employed multivariate statistical methods (PCA and factor analysis) to construct composite measures of social and economic status and grouped correlated pollutants. Logistic and negative binomial regression models assessed their associations with hospitalization risk and frequency. Results showed that economic status significantly influenced diabetes complications, while social factors affected prenatal care-related diseases and hypertension. The PM10–PM2.5–CO group increased the incidence of asthma, influenza, and epilepsy, whereas NO2–NOx impacted diabetes complication severity and influenza. Nonlinear effects and interactions (e.g., age and weight) were also identified, highlighting the need for integrated analyses in environmental health research.
Explainability and bias mitigation are crucial aspects of deep learning (DL) models for medical image analysis. Generative AI, particularly autoencoders, can enhance explainability by analyzing the latent space to identify and control variables that contribute to biases. By manipulating the latent space, biases can be mitigated in the classification layer. Furthermore, the latent space can be visualized to provide a more intuitive understanding of the model's decision-making process. In our work, we demonstrate how the proposed approach enhances the explainability of the decision-making process, surpassing the capabilities of traditional methods like Grad-Cam. Our approach effectively identifies and mitigates biases in a straightforward manner, without necessitating model retraining or dataset modification, showing how Generative AI has the potential to play a pivotal role in addressing explainability and bias mitigation challenges, enhancing the trustworthiness and clinical utility of DL-powered medical image analysis tools.
In this paper, we introduce PumaMedNet-CXR, a generative AI designed for medical image classification, with a specific emphasis on Chest X-ray (CXR) images. The model effectively corrects common defects in CXR images, offers improved explainability, enabling a deeper understanding of its decision-making process. By analyzing its latent space, we can identify and mitigate biases, ensuring a more reliable and transparent model. Notably, PumaMedNet-CXR achieves comparable performance to larger pre-trained models through transfer learning, making it a promising tool for medical image analysis. The model's highly efficient autoencoder-based architecture, along with its explainability and bias mitigation capabilities, contribute to its significant potential in advancing medical image understanding and analysis.
Air pollution has been linked to premature mortality and reduced life expectancy, with acute and chronic effects on human health. These effects can be difficult to measure because of possible interactions and nonlinear relationships with other variables such as age, weight, sex, and socioeconomic status. Multi-dimensional relationships are difficult to model using conventional statistical methods. However, modern machine learning techniques have been quite successful in this domain. In this study, gradient boosting regression trees are used to predict the severity/mortality of the leading causes of hospitalization in Mexico City for 91,964 patients during the years 2015–2020 to measure the impact due to different air pollutants. The results show multiple nonlinear relationships and a significant effect of air pollutants on some of the most prevalent diseases.
In large cities, the health of the inhabitants and the concentrations of particles smaller than 10 and 2.5 μm as well as ozone are related, making their prediction useful for the government and citizens. Mexico City has an air quality forecast system, which presents a forecast by pollutant at hourly and geographic zone level, but is only valid for the next 24 h.
To generate predictions for a longer time period, sophisticated methods need to be used, but highly automated techniques, such as deep learning, require a large amount of data, which are not available for this problem. Therefore, a set of predictor variables is created to feed and test different Machine Learning (ML) methods, and determine which features of these methods are essential for the prediction of different pollutant concentrations, to develop a hybrid ad-hoc model that includes ML features, but allowing a level of explainability, unlike what would occur with methods such as neural networks.
In this work we present a hybrid prediction model using different statistical methods and ML techniques, which allow estimating the concentration of the three main pollutants in the air of Mexico City two weeks ahead. The results of the different models are presented and compared, with the hybrid model being the one that best predicts the extreme cases.