Automated Phenotype-Based Clustering of Clinical Reports Using Large Language Models

Contributo in Atti di convegno

Data di Pubblicazione:

2025

Citazione:

(2025). Automated Phenotype-Based Clustering of Clinical Reports Using Large Language Models . Retrieved from https://hdl.handle.net/10446/306126

Abstract:

Large Language Models (LLMs) have shown significant potential in natural language processing tasks, including various applications in clinical and biomedical domains. This study explores the use of LLMs for analyzing a real dataset from Italian clinical reports and proposes a pipeline for automatically clustering these reports based on the described symptoms. The pipeline incorporates two approaches: (1) direct analysis of textual descriptions in the clinical reports, and (2) standardized processing through the automatic extraction of Human Phenotype Ontology terms using LLM-based methods. The obtained clusters will serve as the foundation for further predictive analyses, such as estimating the likelihood of a patient carrying specific genetic mutations. Our investigation compares the performance of direct text analysis against phenotype-standardized descriptions, highlighting the strengths and limitations of each approach.

Tipologia CRIS:

1.4.01 Contributi in atti di convegno - Conference presentations

Elenco autori: