Skip to Main Content (Press Enter)

Logo UNIBG
  • ×
  • Home
  • Degrees
  • Courses
  • People
  • Outputs
  • Organizations
  • Third Mission
  • Projects
  • Expertise & Skills

UNI-FIND
Logo UNIBG

|

UNI-FIND

unibg.it
  • ×
  • Home
  • Degrees
  • Courses
  • People
  • Outputs
  • Organizations
  • Third Mission
  • Projects
  • Expertise & Skills
  1. Courses

TEXT MINING AND ANALYSIS (IN THE HUMANITIES) - 17711-ENG

courses
ID:
17711-ENG
Dettaglio:
SSD: Statistics for Economics Duration: 36 CFU: 6
Located in:
BERGAMO
Url:
Course Details:
TEXT SCIENCES AND CULTURE ENHANCEMENT IN THE DIGITAL AGE - 177-270-EN/PERCORSO COMUNE Year: 2
Year:
2025
  • Overview
  • Syllabus
  • Degrees
  • People
  • Other

Overview

Date/time interval

Primo Semestre (22/09/2025 - 19/12/2025)

Syllabus

Course Objectives

The aim of this course is to present the main theoretical foundations and practical elements that will allow students to correctly conduct textual data analysis. Students will acquire a solid background on textual data extraction and analysis techniques from a theoretical and practical point of view. In particular, during the course students will:

  • become familiar with different types of data sources, with particular reference to unstructured and big data;
  • be able to extract data from different sources, such as social media or websites;
  • be able to convert unstructured textual data into structured numerical data;
  • be able to implement natural language processing techniques, such as sentiment analysis and topic modeling.

The methods will be presented using the R software.


Course Prerequisites

None


Teaching Methods

Lectures and laboratory sessions where students will be stimulated with active discussions and participation to create their own case study.


Assessment Methods

The evaluation will be based on:

• A final written test entailing theoretical questions and exercises to be solved with the R software.

• Assignments provided by the teacher (exercises, case studies, reports).



Contents

• Unstructured data and Big data: what they are, how to use them; characteristics of different data sources.

• Introduction to R software

• Working with strings: basic tools for managing with character strings (e.g. length computation, pattern recognition, regular expressions)

• Text data management: transformation from unstructured to structured, tokenization, cleaning, stemming, lemmatization.

• Text mining: introduction and different approaches; document representation; document synthesis; detection of distances between strings and text similarity.

• Sentiment analysis: design and development of methods for sentiment classification and polarity detection.

• Topic modeling: brief overview of methods for document content classification.

• Data extraction from the web: web scraping and API.

• Empirical applications with real data, with reference for example to data from library or museum websites, newspaper articles.


Online Resources

  • E-learning
  • Leganto - Reading lists

Degrees

Degrees

TEXT SCIENCES AND CULTURE ENHANCEMENT IN THE DIGITAL AGE - 177-270-EN 
Master's Degree
2 years
No Results Found

People

People (2)

BIANCHI Annamaria
AREA MIN. 13 - Scienze economiche e statistiche
Settore STAT-02/A - Statistica economica
Gruppo 13/STAT-02 - STATISTICA ECONOMICA
Componente del Comitato per l’integrità e l’etica della ricerca
BIANCHI Annamaria
AREA MIN. 13 - Scienze economiche e statistiche
Settore STAT-02/A - Statistica economica
Gruppo 13/STAT-02 - STATISTICA ECONOMICA
Professori Associati
No Results Found

Other

Main module

TEXT MINING AND ANALYSIS (IN THE HUMANITIES)
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.4.3.0