Coleção de Artigos Acadêmicos

URI permanente para esta coleçãohttps://repositorio.insper.edu.br/handle/11224/3227

Navegar

Resultados da Pesquisa

Agora exibindo 1 - 10 de 17

Prediction of bacterial and fungal bloodstream infections using machine learning in patients undergoing chemotherapy
(2025) Freire, Maristela P.; ADHEMAR VILLANI JUNIOR; Lazar Neto, Felippe; Lage, Luis Alberto De Padua Covas; Oliveira, Maura Salaroli; Abdala, Edson; Nunes, Fatima L.S.; Levin, Anna Sara S.
Purpose This study aimed to develop a machine learning (ML) model to predict bloodstream infection (BSI) in chemotherapy patients. Patients and methods We included all cancer patients undergoing chemotherapy at a tertiary cancer hospital from 2017 to 2022. Data were collected per chemotherapy cycle, including chemotherapy drugs, indications, cycle number, cancer type, body mass index, age, gender, complete blood count, creatinine levels, and microbial cultures. BSI was assessed within 21 days after chemotherapy. The ML algorithms tested included logistic regression, ridge regression, k-nearest neighbors, Naive Bayes, Perceptron, neural networks, decision trees, boosting methods, Random Forests, and Support Vector Machines. The SHapley Additive exPlanations (SHAP) method was used to measure feature importance. Results Among 107,757 cycles from 19,225 patients, 91.7 % had solid tumors, primarily breast (36.8 %) and gastrointestinal (19.4 %) cancers. The first cycle accounted for 23.7 % of cycles, and palliative chemotherapy made up 52.9 %. Alkylating agent was the most common drug class used (55.5 %). BSI occurred in 1.33 % of cycles, with 34 % of these cases occurring in neutropenic patients. Of the bacteremia cases, 11.8 % were polymicrobial, and 69.3 % involved gram-negative bacteria. The best model was a neural network with one hidden layer (5 neurons), achieving 70.7 % sensitivity, 93.49 % specificity, 93.19 % accuracy, and an area under a receiver operating characteristic curve of 91.93 %. Key predictors included the first cycle, antimetabolite use, palliative chemotherapy, monocytopenia, and hematological malignancies. Conclusion ML effectively predicts bacteremia in chemotherapy patients, including non-neutropenic cases, and could be used in clinical practice to guide treatment and infection workup.
Neonatal mortality prediction with routinely collected data: a machine learning approach
(2021) ANDRE FILIPE DE MORAES BATISTA; Diniz, Carmen S. G.; Bonilha, Eliana A.; Kawachi, Ichiro; Chiavegatto Filho, Alexandre D. P.
Background: Recent decreases in neonatal mortality have been slower than expected for most countries. This study aims to predict the risk of neonatal mortality using only data routinely available from birth records in the largest city of the Americas. Methods: A probabilistic linkage of every birth record occurring in the municipality of São Paulo, Brazil, between 2012 e 2017 was performed with the death records from 2012 to 2018 (1,202,843 births and 447,687 deaths), and a total of 7282 neonatal deaths were identified (a neonatal mortality rate of 6.46 per 1000 live births). Births from 2012 and 2016 (N = 941,308; or 83.44% of the total) were used to train five different machine learning algorithms, while births occurring in 2017 (N = 186,854; or 16.56% of the total) were used to test their predictive performance on new unseen data. Results: The best performance was obtained by the extreme gradient boosting trees (XGBoost) algorithm, with a very high AUC of 0.97 and F1-score of 0.55. The 5% births with the highest predicted risk of neonatal death included more than 90% of the actual neonatal deaths. On the other hand, there were no deaths among the 5% births with the lowest predicted risk. There were no significant differences in predictive performance for vulnerable subgroups. The use of a smaller number of variables (WHO’s five minimum perinatal indicators) decreased overall performance but the results still remained high (AUC of 0.91). With the addition of only three more variables, we achieved the same predictive performance (AUC of 0.97) as using all the 23 variables originally available from the Brazilian birth records. Conclusion: Machine learning algorithms were able to identify with very high predictive performance the neonatal mortality risk of newborns using only routinely collected data.
Cause-specific mortality prediction in older residents of São Paulo, Brazil: a machine learning approach
(2021) Nascimento, Carla Ferreira do; Hellen Geremias dos Santos; ANDRE FILIPE DE MORAES BATISTA; Lay, Alejandra Andrea Roman; Duarte, Yeda Aparecida Oliveira
Background: Populational ageing has been increasing in a remarkable rate in developing countries. In this scenario, preventive strategies could help to decrease the burden of higher demands for healthcare services. Machine learning algorithms have been increasingly applied for identifying priority candidates for preventive actions, presenting a better predictive performance than traditional parsimonious models. Methods: Data were collected from the Health, Well Being and Aging (SABE) Study, a representative sample of older residents of São Paulo, Brazil. Machine learning algorithms were applied to predict death by diseases of respiratory system (DRS), diseases of circulatory system (DCS), neoplasms and other specific causes within 5 years, using socioeconomic, demographic and health features. The algorithms were trained in a random sample of 70% of subjects, and then tested in the other 30% unseen data. Results: The outcome with highest predictive performance was death by DRS (AUC−ROC = 0.89), followed by the other specific causes (AUC−ROC = 0.87), DCS (AUC−ROC = 0.67) and neoplasms (AUC−ROC = 0.52). Among only the 25% of individuals with the highest predicted risk of mortality from DRS were included 100% of the actual cases. The machine learning algorithms with the highest predictive performance were light gradient boosted machine and extreme gradient boosting. Conclusion: The algorithms had a high predictive performance for DRS, but lower for DCS and neoplasms. Mortality prediction with machine learning can improve clinical decisions especially regarding targeted preventive measures for older individuals.
Data Leakage in Health Outcomes Prediction With Machine Learning. Comment on “Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning"
(2021) Chiavegatto Filho, Alexandre; ANDRE FILIPE DE MORAES BATISTA; Santos, Hellen Geremias dos
A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil
(2021) Fernandes, Fernando Timoteo; Oliveira, Tiago Almeida de; Teixeira, Cristiane Esteves; ANDRE FILIPE DE MORAES BATISTA; Costa, Gabriel Dalla; Chiavegatto Filho, Alexandre Dias Porto
The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. We followed a total of 1040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. We used routinely-collected laboratory, clinical and demographic data to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). We used a random sample of 70% of patients to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.
Proposição e validação de ferramenta de avaliação da desnutrição hospitalar, com base no Global Leadership Initiative on Malnutrition: protocolo do Estudo GLIM-BR
(2022) Lopes, Giovanna Guimarães; Piovacari, Silvia Maria Fraga; Pereira, Adriano José; ANDRE FILIPE DE MORAES BATISTA
Introdução: O consenso do Global Leadership Initiative on Malnutrition (GLIM) publicou uma proposta de diretrizes para estruturação de uma ferramenta, composta por critérios diagnósticos fenotípicos e etiológicos. Posteriormente, foi publicado o Guia de Validação do GLIM, que visa estimular e direcionar iniciativas de validação desta nova ferramenta. O objetivo deste artigo é compartilhar o protocolo do Estudo GLIM-BR, em andamento, o qual irá propor e validar um novo instrumento nacional de classificação diagnóstica de desnutrição hospitalar baseado no GLIM, utilizando desfechos clínicos. Método: Estudo de validação prospectivo observacional multicêntrico, dividido em 3 fases, realizado em 4 hospitais brasileiros terciários, com pacientes internados e com expectativa de permanência maior que 48h. Modelos preditivos baseados em aprendizado de máquina/inteligência artificial serão utilizados na definição do conjunto ótimo de variáveis e cutoffs capazes de predizer desfechos clínicos, como óbito e tempo de permanência. Resultados: Na fase 1, das 12 variáveis apresentadas e discutidas no painel de opinião de especialistas, 8 tiveram aprovação sem alteração. As demais variáveis foram ajustadas por meio do consenso Grupo GLIM-BR. Todas as variáveis foram escolhidas com base na literatura atual (racional teórico) e utilizando, sempre que possível, outras ferramentas já validadas. A Fase 2 já possui resultados preliminares (subestudo) apresentados em Congresso internacional (ESPEN) e que serão submetidos para publicação em periódico científico internacional nos próximos meses. A Fase 3 está em curso e as variáveis de interesse selecionadas para serem avaliadas pelo modelo preditivo do estudo GLIM-BR, em cada uma das categorias propostas pelo GLIM, estão divulgadas neste artigo, juntamente com detalhes do protocolo de pesquisa em curso. Conclusão: Almeja-se desenvolver uma ferramenta validada para diagnóstico da desnutrição hospitalar, que contorne limitações identificadas em ferramentas de avaliação nutricional vigentes, prática e pronta para uso pela comunidade de nutricionistas nos serviços hospitalares.
Physician preference for receiving machine learning predictive results: A cross-sectional multicentric study
(2022) Wichmann, Roberta Moreira; Fagundes, Thales Pardini; Oliveira, Tiago Almeida de; ANDRE FILIPE DE MORAES BATISTA; Chiavegatto Filho, Alexandre Dias Porto
Artificial intelligence (AI) algorithms are transforming several areas of the digital world and are increasingly being applied in healthcare. Mobile apps based on predictive machine learning models have the potential to improve health outcomes, but there is still no consensus on how to inform doctors about their results. The aim of this study was to investigate how healthcare professionals prefer to receive predictions generated by machine learning algorithms. A systematic search in MEDLINE, via PubMed, EMBASE and Web of Science was first performed. We developed a mobile app, RandomIA, to predict the occurrence of clinical outcomes, initially for COVID-19 and later expected to be expanded to other diseases. A questionnaire called System Usability Scale (SUS) was selected to assess the usability of the mobile app. A total of 69 doctors from the five regions of Brazil tested RandomIA and evaluated three different ways to visualize the predictions. For prognostic outcomes (mechanical ventilation, admission to an intensive care unit, and death), most doctors (62.9%) preferred a more complex visualization, represented by a bar graph with three categories (low, medium, and high probability) and a probability density graph for each outcome. For the diagnostic prediction of COVID-19, there was also a majority preference (65.4%) for the same option. Our results indicate that doctors could be more inclined to prefer receiving detailed results from predictive machine learning algorithms.
Data-driven decision making for the screening of cognitive impairment in primary care: a machine learning approach using data from the ELSA-Brasil study
(2023) Szlejf, C.; ANDRE FILIPE DE MORAES BATISTA; Bertola, L.; Lotufo, P.A.; Benseñor, I.M.; Chiavegatto Filho, A.D.P.; Suemoto, C.K.
The systematic assessment of cognitive performance of older people without cognitive complaints is controversial and unfeasible. Identifying individuals at higher risk of cognitive impairment could optimize resource allocation. We aimed to develop and test machine learning models to predict cognitive impairment using variables obtainable in primary care settings. In this cross-sectional study, we included 8,291 participants of the baseline assessment of the ELSA-Brasil study, who were aged between 50 and 74 years and were free of dementia. Cognitive performance was assessed with a neuropsychological battery and cognitive impairment was defined as global cognitive z-score below 2 standard deviations. Variables used as input to the prediction models included demographics, social determinants, clinical conditions, family history, lifestyle, and laboratory tests. We developed machine learning models using logistic regression, neural networks, and gradient boosted trees. Participants’ mean age was 58.3±6.2 years, 55% were female. Cognitive impairment was present in 328 individuals (4%). Machine learning algorithms presented fair to good discrimination (areas under the ROC curve between 0.801 and 0.873). Extreme Gradient Boosting presented the highest discrimination, high specificity (97%), and negative predictive value (97%). Seventy-six percent of the individuals with cognitive impairment were included among the highest ranked individuals by this algorithm. In conclusion, we developed and tested a machine learning model to predict cognitive impairment based on primary care data that presented good discrimination and high specificity. These characteristics could support the detection of patients who would not benefit from cognitive assessment, facilitating the allocation of human and economic resources.
A Software to Compare Clusters between Groups and Its Application to the Study of Autism Spectrum Disorder
(2017) MACIEL CALEBE VIDAL; Sato, João R.; Balardin, Joana B.; Takahashi, Daniel Y.; Fujita, André
Understanding how brain activities cluster can help in the diagnosis of neuropsychological disorders. Thus, it is important to be able to identify alterations in the clustering structure of functional brain networks. Here, we provide an R implementation of Analysis of Cluster Variability (ANOCVA), which statistically tests (1) whether a set of brain regions of interest (ROI) are equally clustered between two or more populations and (2) whether the contribution of each ROI to the differences in clustering is significant. To illustrate the usefulness of our method and software, we apply the R package in a large functional magnetic resonance imaging (fMRI) dataset composed of 896 individuals (529 controls and 285 diagnosed with ASD—autism spectrum disorder) collected by the ABIDE (The Autism Brain Imaging Data Exchange) Consortium. Our analysis show that the clustering structure of controls and ASD subjects are different (p < 0.001) and that specific brain regions distributed in the frontotemporal, sensorimotor, visual, cerebellar, and brainstem systems significantly contributed (p < 0.05) to this differential clustering. These findings suggest an atypical organization of domain-specific functionbrain modules in ASD.
Identification of alterations associated with age in the clustering structure of functional brain networks
(2018) Guzman, Grover E. C.; Sato, Joao R.; MACIEL CALEBE VIDAL; Fujita, Andre
Initial studies using resting-state functional magnetic resonance imaging on the trajectories of the brain network from childhood to adulthood found evidence of functional integration and segregation over time. The comprehension of how healthy individuals’ functional integration and segregation occur is crucial to enhance our understanding of possible deviations that may lead to brain disorders. Recent approaches have focused on the framework wherein the functional brain network is organized into spatially distributed modules that have been associated with specific cognitive functions. Here, we tested the hypothesis that the clustering structure of brain networks evolves during development. To address this hypothesis, we defined a measure of how well a brain region is clustered (network fitness index), and developed a method to evaluate its association with age. Then, we applied this method to a functional magnetic resonance imaging data set composed of 397 males under 31 years of age collected as part of the Autism Brain Imaging Data Exchange Consortium. As results, we identified two brain regions for which the clustering change over time, namely, the left middle temporal gyrus and the left putamen. Since the network fitness index is associated with both integration and segregation, our finding suggests that the identified brain region plays a role in the development of brain systems.

Coleção de Artigos Acadêmicos

Navegar

Filtros

Configurações

Ordenar por

Resultados por página

Resultados da Pesquisa