If you are living in Europe or doing business with European companies, you are probably already familiar with the General Data Protection Regulation (GDPR), which has been in effect since May 2018. However, what you may not know is how this law is actually administered. After all, a law is only as good as its ability to be enforced. Given that internet content is shared globally, how can anyone ensure those within the European Union (EU) borders are actually protected by this law, especially when content needs to travel across borders? How do we know if data is just passing through, or if it is terminating in Europe? What can be used as evidence of violations? The answer: geolocation and tracking flows.
With most of the researchers in the network measurement community focusing on data collection and financial worth, little attention has been given to tracking flows in relation to geolocation, which can show whether or not information crosses national or international borders, and whether or not any information is leaked. Additionally, it can show how adequately internet service providers (ISPs) handle the distribution of tracking flows on different networks, i.e. mobile or broadband.
Geolocations show traffic flows
For those of you who are unfamiliar with network measurement, a tracking flow is a flow between an end user and a web analytics tool – the geographical footprint of the tracking flow – and shows where it originated, where it went, and where it terminated. For a user, this means where have you been on the network, and who knows you were there. In relation to the GDPR, this is the evidence that proves whether or not data is being collected on EU users without their permission or knowledge.
Extracting geolocation in a GDPR era
So why aren’t more people researching this method? The simple answer: Because extracting the geolocation of users requires having access to real tracking flows that originate from users and terminate at trackers. On top of getting actual user permission for investigating tracking flows, precision of user location and complete measurements prove to be further challenges.
Despite the barriers, one research group, with support from BENOCS, realized its necessity and found a solution by developing an extensive measurement methodology for quantifying the amount of tracking flows that cross data protected borders through a browser extensions in order to render advertising and detect tracking flows. This is especially ground breaking because the method manages, according to the study, double the amount of tracking flows as previous studies, and shows the actual confinement of trackers staying within the EU. This study also managed to find out whether trackers track sensitive data such as religion, health, sexuality, to name a few, without violating the GDPR.
By tracking 350 test users over a period of four months, researchers found that, in contrast to the popular belief that trackers located outside of Europe conduct most tracking flows, around 90% of the traffic flows that originate in Europe actually also terminate in Europe. This small sample serves as a baseline intended to correlate with datasets that will deal with millions of users in the future.
They also found that, despite the regulations on tracking flows with sensitive and protected data categories, around 3% of the total tracking flows identified in this study fall within the protected categories. This 3% is evidence of violations.
These results are especially important when trying to figure out how well companies are actually abiding by this law.
As we are now living in an era with GDPR and other privacy regulations, it is important that we also have the necessary tools to enforce them. Having had success with their tracking flow measurements, further research and development will continue in this subject in order to provide anyone requiring the evidence and footprint of tracking flows in real-time.
The information in this post comes from the paper “Tracing Cross Border Web Tracking.” To read the full study, please click here.
*This study received a distinguished paper award at ACM Internet Measurement Conference in 2018.