Data di Pubblicazione:
2025
Citazione:
(2025). Detecting Semantic Relationships Among Datasets . Retrieved from https://hdl.handle.net/10446/310987
Abstract:
The novel context of Big Data has demonstrated that classical relational databases are not suitable: novel platforms for managing an incredible variety of datasets have become necessary, as demonstrated by the popularity of “data lakes” and “data lakehouses”. One common issue of modern data platforms is to detect pairs of datasets that concern the same topic. However, a matching that is purely syntactic is not effective: the exploitation of modern AI techniques for Natural-Language Processing, such as word embedding and sentence embedding, promise to address the issue in a (more or less) semantic way. The contribution of the paper is a novel methodology (called “TopicRank”) for flexible querying data platforms, so as to find out pairs of datasets that concern the same topic, on the basis of the textual description that accompany datasets as meta-data. The paper presents the results of a preliminary experiment that was conducted on a real pool of datasets.
Tipologia CRIS:
1.4.01 Contributi in atti di convegno - Conference presentations
Elenco autori:
Fosci, Paolo; Carbone, Vincenzo; Leo, Matteo; Marmorato, Andrea; Psaila, Giuseppe; Rosa, Giampiero; Torabi, Mohammadsadegh
Link alla scheda completa:
Titolo del libro:
Flexible Query Answering Systems. 16th International Conference, FQAS 2025, Burgas, Bulgaria, September 11–13, 2025, Proceedings
Pubblicato in: