Condividi

Digital Innovation in Price Statistics: Open and Fast Insights from Online Scraped Data

Obiettivi del progetto

Detailed information on consumer price levels and changes may be a crucial tool for empowering European consumers to play an active role in the green and digital transitions according to the New Consumer Agenda (European Commission, 2020a) by also stimulating an increase in new consumption models and more informed consumers' decisions (European Commission, 2020a).

Given that the Internet is already an important purchaser channel, whose relevance for consumers has increased due to the COVID-19, it is essential to provide information on price movements for online purchases and their price movements. Indeed, several National Statistical Institutions (NSIs) started to collect data from online retailers and use them for compiling a representative official Consumer Price Index (CPI) according to the Eurostat recommendation.

This project aims at developing an automated, scalable and maintainable platform composed by two main systems:

  1. A web scraping system that daily collects prices and metadata on consumer products, ensures data quality with the support of machine learning techniques, and store the data in a large scale repository,
  2. A web portal and REST APIs system that will make this data available under different aggregation levels both to the Italian and European scientific community for use in their own research and applications and to consumers for getting information on product prices. More specifically, in the framework of this project, we plan to leverage the dataset to build a system able to complement official CPIs in Italy with high-frequency consumer price indexes for specific product categories, creating new information assets directly accessible for citizens and researchers.

We target to cover about 46% of the overall consumption basket. We will collect online prices by referring to the national and territorial level (NUTS3) and by considering both the largest multi-channel retailers and local retailers selling online and offline. We will adopt a high degree of detail in the product specifications following the ECOICOP classification (European Classification of Individual Consumption according to Purpose). Machine learning techniques will be employed to clean the data collected and match product denominations across different retailers, in order to provide additional robustness to price index calculations across space and time.

For product categories with localised online markets (for instance, groceries) we will also calculate Spatial price indexes, measuring differences in consumer price levels across geographical areas within Italy. These databases, even for a limited number of product categories, will provide crucial statistical information on product price levels and changes that are currently not available, thus advancing statistical knowledge on this topic. In addition these data will support policy making by allowing frequent insights on the economic environment and prompt feedback on the effects of policies.

Data di inizio e fine

2023-2025

Responsabile del progetto

Prof. Luca Vollero, Substitute Principal Investigator

Istituzione coordinatrice del progetto

Università della Tuscia

Fonte/i di finanziamento

MIUR – Progetto PRIN 2022

Valore economico del progetto

297.829€
COLLEGAMENTI RAPIDI
L'Università Campus Bio-Medico di Roma promuove strutture integrate d'insegnamento e ricerca perseguendo come fine principale delle proprie attività il bene della persona.
magnifiercrossmenuchevron-downchevron-leftchevron-right