DALMINE
Overview
Date/time interval
Syllabus
Course Objectives
Within the Gestione, Analisi e Rappresentazione Dati Course, students will learn how to approach the analysis of a structured data set (in tabular form) and its representation in several domains (e.g. manufacturing and production) using commercial tools (e.g., software such as Tableau) and general purpose programming languages/tools (such as Python and related libraries).
Specifically, at the end of the course, students will:
· be able to adopt procedures and operations to quickly become familiar with the dataset, identify outliers, manage missing data;
● acquire the basics of machine learning algorithms for tackling regression, classification and clustering problems;
● have learned the theoretical concepts underlying data representation;
● learn to understand the operation of commercial software (e.g. Tableau) and open source software (e.g. Python) for data management, analysis and representation;
- be able to choose, depending on the context and availability, whether to use commercial tools (e.g. Tableau) or open-source tools (e.g. Python libraries) for the analysis and representation of the available data;
● be able to apply a data analysis strategy that can cover the phases of collection, analysis and representation of a structured dataset;
● be able to build interactive summary dashboards that can support manufacturing decisions.
Course Prerequisites
Basics of Computer Science
Basics of Statistics
Teaching Methods
Teaching takes place through lectures and exercises for a total of 48 hours, with particular attention to interaction with students. The teaching material consists mainly of the slides available on the course website and the sample code, which will be accompanied by in-depth readings on the texts recommended in the bibliography. The slides act as a support to the classroom discussion: therefore, during the classroom discussion additional details not available on the slides can be provided.
The password to access the material published on the course page is communicated during the first lesson of the course. If you are unable to attend the first lesson, please contact the teacher by e-mail.
Lecturers:
● They will present the main theoretical concepts in theoretical lectures using PowerPoint or other tools deemed useful.
● They will present examples of data management, analysis, and representation through hands-on exercises and tutorials, developing and discussing each example in class with students.
● They will present the exam project providing information on the expected results and the goals to be achieved.
● They will support students in the development of the project through the feedback provided during the course.
Students:
● They will be tasked with listening, taking notes and actively participating in theoretical and practical lessons through questions and discussions.
● They will put into practice the concepts explained during the theoretical lectures in practical sessions in which, individually or in groups, they will be asked to solve small problems provided by the professors (e.g., perform a certain analysis, represent a certain data set).
● They will autonomously create groups for the development of a case study that will be used for their examination.
● They will ask for meetings with professors to get feedback on their project.
Assessment Methods
The evaluation will be based on the development of a case study that will require the application of the concepts learned during the course. The purpose of the case study will be to analyze a dataset and create a dashboard for the representation of its content. To demonstrate that they have met the requirements to pass the course, students must:
● Work in groups (minimum 3 students - maximum 5 students) to develop the case study.
● Provide the files containing the analyses carried out using one of the software tools learned during the course.
● Create a PowerPoint presentation that explains the analysis strategy adopted and the steps taken for the preparation and analysis of the dataset and case study. The discussion will last about 30 minutes per group, including clarification questions from professors.
The use of Artificial Intelligence to support the development of the case study is permitted, provided that it is clearly declared and specified both in the submitted materials and during the oral presentation. Specifically, this means:
· Clearly indicating, either during or at the end of the presentation, which AI tools were used (e.g., ChatGPT, Gemini, Copilot, Claude).
· Explaining how the AI tool was used (e.g., for brainstorming, code generation, slide creation support) and sharing the prompts/conversations exchanged with the AI.
· Describing how the use of the AI tool influenced the overall work, and why its use was considered necessary.
· Any non-transparent or uncritical use of AI (e.g., being unable to justify the reasons for its use or explain the generated outputs) will be considered improper and will result in failing the exam.
For all matters not explicitly covered in the points above, the full content of the University’s Guidelines for the Use of Artificial Intelligence remains applicable.
The examination methods for non-attending students are the same as those for attending students.
Non-attending students are invited to contact the teacher to evaluate any supplementary materials.
Contents
Starting from a general overview of the concepts of data analysis and representation, the course aims to provide students with the knowledge necessary to understand how to collect, analyze and represent data with the ultimate aim of providing students with the skills to create interactive dashboards to support decisions in manufacturing and production. The course aims to achieve the goal through the presentation of commercial (e.g. Tableau) and open-source (e.g. Python and related libraries) tools that will give students the flexibility and skills to understand which tool to use depending on the work context.
Specifically, the course consists of modules dedicated to:
● General concepts – Data representation, dashboards in production and management, definition of a generic analysis pipeline for data collection, management, analysis, and representation.
● Introduction to Business Intelligence Software – Loading a dataset, building queries, building charts and dashboards.
● Dashboard Design – Identification of users. Alignment of content to the purpose of use. Structuring of the content to facilitate readability and consultation. Identification of the elements necessary for communication.
● Introduction to Python – Anaconda, Jupyter notebook, basic commands.
● Introduction to Pandas – Loading a dataset, basics of data processing, transformation and preparation.
● Become familiar with the dataset – Operations to be carried out to gain knowledge with the available data
● Outlier Detection and Management - Defining outliers and automatic outlier detection methods
● Missing Data Detection and Management – Missing data detection mechanisms and imputation
● Extracting Information from Data - How to create new insights from existing data
● Fundamentals of Machine Learning – Regression, Classification and Clustering
● Dashboard Development – Creating dashboards, identifying and implementing interactivity elements while also considering readability. Assessment of the consistency of the data communicated with respect to the context of use.
Online Resources
More information
If the course is taught in mixed mode or at a distance, changes may be introduced with respect to what is stated in the syllabus to make the course and exams usable also in this way.