Data Is the New OilAll your data belong to us.
Every project we hear about these days seems to be about data—or artificial intelligence, which is mainly the same. Data is not something new, but the enthusiasm about it is growing rapidly. People are not always aware of how much a website or an application is collecting about them, pushing legislators to write laws about what a company can and can’t do with a user’s data.
Why is that the problem of software creators? Well, the mere amount of data we now collect could never be processed in a lifetime without our digital skills. That makes us at least partly responsible for what is done with it.
We Give Our Data Away #
The Data We Produce #
A few years ago, some people began worrying about the data we put on the Internet and how it may be used. The social networks encouraged us to share ever more. It seemed almost innocent at first. Then came IoT. Everything became connected and producing even more data to sell to companies and advertisers. Builders now seem to do the same with facial recognition.
In 2018, Forbes published some stats about data. For instance, they said that 90% of the data in the world had been generated in the preceding two years. Domo’s Data Never Sleeps infographic is enlightening. Every tweet, every post, every click produces data that can be sold to an advertiser, to help them understanding your habits and deduce patterns.
A more recent study by IDC estimated that the volume of stored data will reach 175 ZB in 2025, which is 5.3 more than in 2018.
This data has become a business and a currency. For instance, Facebook leveraged user data depending on their relation with their partners and rivals.
Why We Give This Data Away #
At first, we didn’t have much of choice, nor were we even really conscious of what was beginning. We browsed the Internet and cookies were stored on our computers, but we didn’t really know what they were. Ads caught our eyes and clicks and we produced data without even knowing it. We began posting, tweeting, producing our own content on social networks, thinking we were sharing with our friends, not realizing that we were also sharing with big companies.
Then, the harvesting of data became even more pervasive, through IoT, assistants, and so on. I can’t find the source today, but one of Google’s founders declared a few years back that people want to be assisted. They want to be reminded they have to buy milk when they’re passing near the grocery store. This implies that, for this comfort, those people would accept to let Google know where they are and what’s in their fridge. That’s how we started knowingly sharing our data with big companies. This data includes basic things but goes to health or intimate data: your physical activity and shape, your blood pressure, your genome…
Is it worth it? And should we trust those companies?
What Happens to Your Data #
Have you ever stopped to wonder what your data becomes? Do you feel those companies you give it to have earned your trust? In 2019 only, Facebook has been known to rediscover hundreds of millions of user passwords stored in plain text for years and user data was accessible on a forgotten server.
Ok, everybody makes mistakes—though such mistakes are huge, given the means Facebook has—but even the normal processing of your data may surprise you. Do you know transcripts of your Alexa requests are preserved? Or that human “experts” may listen to your Google Home requests?
Are you comfortable with that? Legislators aren’t.
The Laws to Protect Us and Our Data #
In 2016, the European Union adopted a law to protect users, the General Data Protection Regulation (or GDPR). It applies to any enterprise processing personal information of subjects inside the European Economic Area. Without getting into the details, the spirit of the law states that users must know:
- which of their data is collected;
- to which purpose;
- how long they will be retained;
- with whom they will be shared.
This law also imposes that users explicitly opt in to their data collection. The banner saying, “By navigating this website, you accept that…” is no longer a viable solution.
One last interesting point about this law is that a service provider can not prevent you from accessing their service because you refuse sharing your data if that data is not required for the service to work. And that means that if it needs your first name only and you refuse to provide your last name also, it must allow you to access the service.
Of course, the GDPR would be useless without an enforcement incentive. As always, money is the most efficient incentive there is for big companies, so those who do not respect those directives might be imposed a fine up to €20 million or up to 4% of the worldwide annual turnover, whichever is greater.
And you may think this is European only, but it’s not! California, the home of the Silicon Valley and the big digital companies, voted the California Consumer Privacy Act, the intentions of which are quite similar to the GDPR’s.
Basically, it’s about being informed and in control of our data.
The Vicious Cycle of Data Collection #
The Shift Project’s 2019 study “Déployer la sobriété numérique" put it in words concisely and eloquently: as we handle more data, the infrastructure that transports, processes and stores it grows, allowing for new uses. Those novelties themselves won’t quench the thirst for data but make it even more acute.
And so it goes: having more data means being able to handle more data means wanting more data. This, of course, has an adverse effect on the environmental footprint of digital. This also means that we don’t think really much of how relevant or privacy-compliant these new uses are.
A Brave New World #
I want to keep it short, so I’ll be brief, but feel free to follow the links when they are available.
We’ve seen Facebook and Google—among others—have access to a large amount of data. Amnesty International deems their business model a threat to human rights.
You’ve probably never heard about Clearview AI, but if a picture of you ever made it to the Internet, it’s almost sure to be able to recognize you, even with your face covered. Its database contains 3 billion faces, for “only” 411 million in the FBI’s.
Even more intrusive surveillance, the DNA you may have sent to know about your ancestry may be used to aid human-rights abuses. To make it worse, the DNA of babies born in California in 2018 may have been stored and sold to private research. Some French geneticists also sometimes choose to ignore the restrictions of the GDPR, sharing the genomic data of patients with other research teams without asking for their consent. Our genome is the most intimate data we could share, it’s an important part of who we are, but it doesn’t seem to receive any more respect than any other piece of data.
These are only a few examples, and that’s not even talking about the many scandals that are popping around the world about facial recognition. Everyone is interested in your data: the companies, the police and now the state.
And thinking about the future, what’ll happen when Google proposes a mortgage insurance? Will you be comfortable subscribing it knowing it has all your Google Fit data?
Words of Parting #
I understand that most of us do as we’re told and think that’s the only thing they can do, but hiding behind orders resembles the coward’s way. The role of software creators is not only to type on a keyboard. You know things your superiors don’t—not because you’re a developer, just because every human has knowledge and experience of their own—and you should always share your mind to your boss, project manager or client if you think it is necessary.
And yes, where there is data, there tends to be AI nowadays, but, please! do you really need AI for what you’re creating? Buzzwords and fads don’t make a solution better, and AI often feels that way to me. But I’ll come back to it in the last post of this series.
Sources and references
The Internet with a Human Face, by Maciej Cegłowski, 05/2014
Facial recognition could take over, one ‘convenience’ at a time, by Alfred Ng, CNET News, 01/2020
How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read, by Bernard Marr, Forbes, 05/2018
Data Never Sleeps 7.0, by Domo, 2019
Leaked documents show Facebook leveraged user data to fight rivals and help friends, by Olivia Solon & Cyrus Farivar, NBC News, 11/2019
Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years, by Krebs on Security, 03/2019
Unsecured Facebook Databases Leak Data Of 419 Million Users, by Davey Winder, Forbes, 09/2019
Answers from Amazon to Senator Coons about Alexa policies, by Brian Huseman, Amazon, 06/2019
More information about our processes to safeguard speech data, by David Monsees, Google, 07/2019
General Data Protection Regulation, by Wikipedia
California Consumer Privacy Act, by Wikipedia
Déployer la sobirété numérique (FR), by The Shift Project
Surveillance giants: How the business model of Google and Facebook threatens human rights, by Amnesty International, 11/2019
The Secretive Company That Might End Privacy as We Know It, by Kashmir Hill, The New York Times, 01/2020
Crack down on genomic surveillance, by Yves Moreau, Nature, 12/2019
California Biobank Stores Every Baby’s DNA; Parents Unaware Of Practice, by Julie Watts, CBS SF BayArea, 05/2018