The aim of this lesson is to assess the easibility of using mobile positioning data for generating statistics on domestic, outbound and inbound tourism flows, and to address the strengths and weaknesses related to access, trust, cost, and the technological and methodo-logical challenges inherent in the use of such anew data source.
The lesson concentrated on the various aspects involved in the use of mobile positioning data in terms of tourism statistics and other domains: an overview of the situation involving the use of mobile data; accessibility to the data from the legal, technological, financial and business aspects, including possible cost and burden implications; methodological principles of statistical data collection and compilation, including evaluation by using different quality aspects and comparing the results against existing traditional methods; opportunities offered by, as well as limitations inherent in, the use of the data source.
This lesson demonstrated extensive interest in this data source and possibilities for a wide range of uses when it comes to tourism statistics and other domains, while at the same time acknowledging the multiple problems associated with accessing and processing the data.
Tourism statistics is one of the domains in which the opportunities are rather clear as the properties of the data correspond to the nature of the tourism activities. Inbound, outbound roaming and domestic data stored by mobile network operators (MNO) clearly corresponds to the respective inbound, outbound and domestic domains of tourism, however, not without some methodological reservations. The use of mobile positioning big-data by users of tourism statistic includes the following expectations:
Reducing both the burden on the reporting units and the cost involved in statistical processes by (fully or partially) replacing the existing, relatively expensive methods with new data sources;
Expanding the available options in terms of measuring tourism activities through new indicators;
Improving timeliness;
Improving time and spatial accuracy.
MNOs are primarily concerned about the possibility of their competitors acquiring the following information:
Their number of subscribers (both domestic and roaming);
The number of their service activities (calls, messaging, data) in the network;
Any information on the constitution of their subscribers (the number of prepaid versus
post-paid cards, the socio-demographic information of their subscribers, the number of subscribers in various foreign countries, etc.);
The number and locations of their network antennae (the release of this information might also be prohibited by law in some countries due to the terror threat to that country’s vital telecommunication infrastructure);
Any financial information and strategic plan of MNOs;
Technological capabilities and information on infrastructure and systems.
This issue is clearly an important one as it was repeatedly mentioned by MNOs, but it is one that can be resolved by determining procedures that do not allow such information to be exposed to competitors.
The technological and methodological aspects of processing mobile positioning data are tightly linked.
This discussion includes the processing and preparation of the data for the formation of a framework and for data compilation requirements, including the identification of the home country (in inbound roaming data), handling subscriber identity codes, the geographical and time properties that are included in the data, any available additional attributes (e.g. socio-demographic attributes for subscribers or the technical attributes of the event), the removal of non-human devices, the use of a blacklist and sampling, and the formatting and preparation of the data for urther tourism-specific processing.
Where technical access and data processing are concerned, the main questions are connected to the specific data sources — there exists a wide range of databases and registries that can be used; however, they often differ between MNOs depending on the system architecture being used and on their technical ability to be able to store different types of data.
Three main types of data can be distinguished, based on their origin within the MNO’s systems, where such data is required for the compilation of tourism statistics:
any kind of event data (metadata) that covers subscriber activities and which is included in the MNO data stream;
geographical cellular (network) referencing data;
attribute data for subscribers (e.g. demographic information taken from the customer database).
Event data can be divided into internal and external network events and, furthermore, it can be broken down in the same way as for the various forms of tourism statistics:
(a) MNO internal events:
1. inbound roaming;
2. domestic;
(b) MNO external events:
1. outbound roaming;
The most common source for mobile data, as well as being the easiest to access, is Call Detail Records (CDR), with it being possible to take data from inbound and outbound roaming and domestic datasets as this data is held in storage and is rather easily accessible by the MNOs, although at the same time this is the most tenuous (in terms of the number o records/events per subscriber per day).
The CDRs represent the active usage of mobile device covering incoming and outgoing calls and SMS text messaging. The biggest problems with CDRs are the frequency and the regularity of the records as they are based on the usage pattern of the subscriber. The average number of CDRs for tourists is approximately four events per subscriber per day — meaning that there is an average of four location facts for a phone for every single day. This is sufficient for some areas, but it sets limits upon domains in which better temporal accuracy is required (e.g. hourly statistics on a very small geographical space). Alternative data sources such as Data Detail Records (DDR), location updates, or others can include up to several hundred location events per user per day; however, such data is not often stored by MNOs.
The size of the data block, the number of records it covers, and the processing complexity it creates all require a sophisticated data processing system that can roughly be divided into two options
Data is extracted and processed within the MNOs, and the resulting statistical indicators are transmitted to the NSI, where the results from several MNOs are combined to create the final statistical indicators;
Data is extracted by MNOs and transmitted to the NSI, where the processing is carried out in order to produce the final indicators.
The cost and distribution of the burden are different for both scenarios. In addition, there is no clear preference as both these options have their clear benefits and disadvantages. The main technological challenge lies in the ability of the system to carry out periodical complex processes involving large data records within the designated timeframe with the chance of being able to recalculate the results in case any error occurs.
During the processing of mobile positioning data, there are several important steps that enable generating the tourism statistics, such as the identification of usual environment and the country of residence, the duration of stay in specific place, differentiation between same-day and overnight visits, etc.
Depending upon the availability of the data and upon technological availability, all of those trips that are taken within the framework of the requirements can be analysed where they can be seen to correspond to the situation shown in the census or, alternatively, a subset of observations within the framework could be selected.
The sample sizes can be substantially larger in this situation when compared to traditional sample surveys, as the cost and burden of data collection is much less driven by the number of observations in the sample. The sample size can be determined from available technological capabilities and disclosure rules. The aspects of cost and burden are discussed in the respective chapter of Report 4.
The methodology contains the following sections: the additional preparation of event data, frame formation, data compilation and estimation. The iinitial data extracted and prepared by MNOs is based upon network events that specify a specific subscriber’s presence in time and space.
Additional preparation may include geographical referencing, the elimination of non-human-operated mobile devices, checking the time and area coverage of the data, dealing with missing values, etc.
After the data has been prepared by MNOs, the following processing steps are set out:
Frame formation:
◊ Applying trip identification algorithms — identifying each subscriber’s individual trip to the destination (country of residence, foreign country) in question with the start and end times for each trip;
◊ Identifying the population of interest (distinguishing tourism activities from non-tourism activities):
Defining roaming subscribers not actually crossing the border and entering the country (inbound, outbound);
Defining residents (inbound, outbound);
Defining the place of residence and the usual environment (domestic);
Identifying country-wide transit trips (inbound);
Identifying destination and transit countries (outbound).
Data compilation:
◊ Spatial granulation (visits at the smallest administrative level for inbound); Defining variables (number of visits, duration of trips, classification, etc.).
Estimation (from an MNO-specific sample to the whole population of interest) contains:
◊ Time and space aggregation of the data (day, week, month, quarter/grid-based (one km2), LAU-2, LAU-1, country);
◊ Combining data from various MNOs and computing final statistical indicators.
Statistica e Turismo
https://ec.europa.eu/eurostat/web/tourism/methodology/projects-and-studies