Skip to Main Content (Press Enter)

Logo UNIBG
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNIBG

|

UNI-FIND

unibg.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Pubblicazioni

BAT: A Toolkit for Biomedical Text Augmentation

Contributo in Atti di convegno
Data di Pubblicazione:
2025
Citazione:
(2025). BAT: A Toolkit for Biomedical Text Augmentation . Retrieved from https://hdl.handle.net/10446/316346
Abstract:
We introduce BAT (Biomedical Augmentation for Text), a Python package specifically designed to augment textual data in the biomedical domain using a neuro-symbolic pipeline. This innovative approach combines knowledge-driven and data-driven methodologies to generate perturbed versions of text while preserving its original meaning. The package provides two categories of functions: Knowledge-based (KB) perturbation and Transformer-based (TB) perturbation. KB perturbation offers a utility interface towards semantic resources for handling medical terminology alongside general-purpose terms, by providing both medical and general synonym replacement. TB perturbation leverages language models to enable generation of new augmented sentences through contextual word prediction, back-translation, and rephrasing. BAT is designed to tackle the typical challenges of biomedical text, navigating complex medical jargon and enriching text while maintaining its readability. It is also designed for modularity, allowing seamless integration into existing NLP workflows and processing of entire datasets, ranging from single words and sentences to large corpora. By integrating formalized domain knowledge with cutting-edge machine learning models, BAT serves as a versatile toolkit for text augmentation across multiple languages, including English as well as low-resources languages such as Italian, Spanish, and French. It facilitates the generation of diverse, high-quality textual data to support a range of biomedical applications, including creating new training samples, addressing imbalanced distributions, and evaluating model robustness.
Tipologia CRIS:
1.4.01 Contributi in atti di convegno - Conference presentations
Elenco autori:
Bergomi, Laura; Parimbelli, Enea; Pala, Daniele; Buonocore, Tommaso M.
Autori di Ateneo:
PALA Daniele
Link alla scheda completa:
https://aisberg.unibg.it/handle/10446/316346
Titolo del libro:
Artificial Intelligence in Medicine. 23rd International Conference, AIME 2025, Proceedings, Part II
Pubblicato in:
LECTURE NOTES IN COMPUTER SCIENCE
Series
  • Ricerca

Ricerca

Settori


Settore IBIO-01/A - Bioingegneria
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.1.3.0