Data Diversification as a Privacy Measure

May 12, 2018, 3:41 p.m.

I've been thinking recently about reduction in data linkage across the various services I use online. When I worked for a small health data analysis startup, I became aware of how much power a small number of linked datasets hold for identifying and building detailed profiles of individuals. People in that space were keenly aware of this too - there were strict rules in place across the NHS and its affiliates to minimise linkage of data across different health statistics. This got me thinking a bit about data linkage, or more specifically the inverse.

Complete data profiles of users are what every tech company strives for. Google, Facebook, Amazon, Microsoft - all the big tech companies try really hard to keep you within their respective ecosystems for a reason: it not only ties you into their services, but it makes their ability to analyse user behaviour that much easier. Many of these companies offer the core services: an email client, a messaging app, a calendar, a maps service, cloud storage, and some form of network, social or otherwise. To solidify these ecosystems, many have been making forays into the physical, with companies like Amazon and Google deliberately selling hardware like the Echo and Home Mini at significant loss (over $330m for Amazon so far) to achieve market dominance.

These significant steps towards ecosystem lock-in are insidious, and privacy-conscious consumers should actively seek to limit this. Offering a few core services like email, calendars, cloud storage, maps and apps (which track location, even if just via WiFi positioning), a company has access to maybe 80% of your digital life. Google is perhaps the most potent force in this regard, with arguably the most complete user profiles of any company in history. Just *consider* how personal and private your search history is. Then consider how they can aggregate and link your time-stamped searches to your location, emails, files, and beyond, before packaging this information up and selling the insights to advertisers and data brokers.

Most people are fine with this - they're happy to accept the bargain that involves Google harvesting, mining and selling your data in return for access to their numerous useful services. I worry that due to network effects and Pareto distributions, we may in doing so sleepwalk into a position where the top 10 internet companies have such a stranglehold on essential internet services that it makes it impossible for startups to enter the market and offer competing services. Perhaps we already have.

Still, it's not really feasible to give up all of these services because of privacy concerns. In 2018, it would severely impact ease of life to stop using services from big companies because of these reasons. But why do we have to make it easy for them?

Population-scale human behavioural analytics at any of the big tech companies are most likely a decade or so ahead of publicly-funded research. The quality and quantity of behavioural data Google or Facebook has access to is mind-blowing, and thoroughly demoralising to anyone splashing around in the comparative paddling-pool that is academia (myself somewhat included).

Data diversification - deliberately using different services from different companies to prevent, or at least slow down, data linkage - would make this more difficult, and is something I've been trying to implement in recent months. When you consider trading off a fair amount of convenience for admittedly mild privacy benefits, it's perhaps understandable that most people are not willing to make that deal. But this is more of an ideological kind of thing.

I've started with Google, as the most potent force in this regard - there's no way I can stop using Google's services entirely - but for certain core services there are acceptable tradeoffs you can make without any real loss in functionality.

I made the biggest change first, and left Android. Having been a die-hard user for over 10 years, since the release of the very first Android phone, this was tough, but I finally switched to iPhone. I took a big hit to my wallet to do so, but it was necessary to make this change properly; aside from being an absolute tyre fire with respect to device segmentation and security, Android ties you in to Google's services in a way that's highly difficult to leave without ruining the whole experience or compromising security.

The next big change was to limit Gmail usage. I uninstalled the Inbox and Gmail apps, changed to a new email address, and switched to using Airmail for iOS and macOS, Thunderbird on Linux, and Zoho as my web client. This will probably be the longest changeover - it'll quite literally take me over half a decade or so to remove myself from Gmail entirely, but it's worth starting somewhere. I've also moved away from Google Drive - I've switched to Box in the meanwhile (Box for the love of God please just build a functioning Linux client - if Dropbox can do it so can you), and am making moves towards OwnCloud.

Switching Chrome to Firefox has definitely been one of the less painful transitions I've made, particularly considering the massive speed boost provided by the Firefox Quantum update, but there are still irreplaceable Chrome plugins that Firefox just doesn't have (particularly for research, like Paperpile for example), which make this transition more difficult than it has to be. Calendar-wise, I've switched to iCal for personal stuff (forwarding the work-related Google Calendars). With respect to search I've done a straight replacement of Google with Startpage (this still provides Google results, just anonymised - DuckDuckGo's results were just too inconsistent for development work).

I know this reads like deliberate inconvenience for no real benefit, but fundamentally I think it's just about regaining a modicum of control over your data - make Big Tech companies compete for you. Your data has inherent value, in its capacity to be sold to advertisers as customer profiles. As such, maybe it's worth using market forces to your own benefit. In keeping with this financial logic, don't invest solely in one stock - diversify your portfolio. This is an ongoing process - so far there have been no tangible downsides, save for the time investment in switching, and a fair amount of privacy upside. I don't know how this is going to change over time; perhaps I'll revert back to using these integrated services, or perhaps this experiment will encourage me to continue diversifying further. Either way, it'll be interesting to see how it goes.