SFR Dataset

SFR

Data Content

We use a dataset built from call detail records (CDR) and network signaling data (NSD), provided by the French telecom operator SFR. CDR datasets usually contain network signaling events that are time-stamped, geo-referenced (i.e., the cell tower generating the event), and associated with individual mobile devices. The dataset is pre-processed by the operator and represents an aggregation of mobile network data of all the mobile phone subscribers of SFR, covering 20% of the entire population of the studied area.

Privacy compliancy

We note that the European privacy regulations consider as personal data any individual trajectory data, such as those used in [Ayan et. al., 2021], even when the users' identifiers are pseudonymized. In this context, the pre-processed dataset describes flows counters of geographical zones, aggregated both in time (per hour) and space (i.e., cell tower communication range). The mobile operator cells were aggregated based on the official units for statistical information, known as IRIS zones, defined by the French National Institute of Statistics and Economic Studies (INSEE).
IRIS zones offer the most sophisticated tool to date to describe the internal structure of more than 1,900 municipalities in France with at least 5,000 inhabitants. Some IRIS contain very few active users who could be identified and are grouped together in the data by the operator to ensure a proper level of anonymity. The dataset does not contain any information on individual trajectories, but only information on the number of trips, aggregated per source and destination.

Spatial features

The IRIS zones defined by INSEE must respect geographic and demographic criteria and have borders that are clearly identifiable and stable in the long term1. IRIS zones offer the most sophisticated tool to date to describe the internal structure of more than 1,900 municipalities in France with at least 5,000 inhabitants. Even this level of granularity is too fine in some cases, as some IRIS contain very few active users who could be identified this way. Therefore, some IRIS are grouped together by the operator to ensure a proper level of anonymity.
Practically, the dataset covers the city of Paris (i.e., the French department with code 75), for a total area of 93.76 km2. There are 992 INSEE IRIS zones in the city of Paris, which are further aggregated into 326 zones in the SFR dataset. To achieve this level of aggregation, 8% of the INSEE IRIS were removed by SFR, about 11% of the INSEE IRIS have been kept the same, and the remaining ones have been merged into groups with size from 2 to 16 INSEE IRIS (an average merging of 2.9 INSEE IRIS).
Figures below show the original INSEE IRIS fragmentation of Paris and the division provided by SFR denoted hereafter as SFR-IRIS, which appears in the studied dataset. In the central metropolitan area, the SFR-IRIS sizes are smaller, due to their high user attendance, when compared with SFR-IRIS covering parks or more humanly sparse zones.

Original IRIS division given by INSEE/IGN
IRIS division given by SFR

Flow information

The provided dataset contains aggregated counter or flow information of several types:

  1. Continuous user presence – per IRIS zones, we have the number of (static or mobile insider) users observed in the corresponding zone during the entire 1 hour interval.
  2. Incoming trips – per SFR-IRIS zone, we have the number of users that arrive in the zone during the 1 hour interval. These users have been previously observed in a different area and are now logged in the area of interest.
  3. Outgoing trips – per SFR-IRIS zone, we have the number of users that leave out the zone during the 1 hour interval. The last known location of these users was in the area of interest, and they are now observed in a different area.

Temporal features

The aggregated dataset covers two weeks before the first lockdown in France and two weeks during this strict lockdown: i.e., from January 26th to February 8th and from March 22nd to April 4th. Also, it has a temporal granularity of 1 hour.
For more datails about the dateset temporal features, please access Statistics.

js">