by Maria Rigaki & Sebastian Garcia

Machine learning systems are now ubiquitous and work well in several applications, but it is still relatively unexplored how much information they can leak. This blog post reviews the most recent techniques that cause ML models to leak private data, an overview of the most important attacks, and why these types of attacks are possible in the first place.

In the past few years there has been plenty of growth in the area of privacy leaks in ML models, and in order to summarize the existing work we wrote a comprehensive survey (available here: A Survey of Privacy Attacks in Machine Learning). We also created a repository of all the papers in the area along with their code: Awesome ML privacy attacks github repository.

Why should we care about privacy attacks in ML?

Machine learning is no longer just an academic research topic but a field that is increasingly applied and affects our life. It is well known that machine learning is powered by data, but what is less known is that the data is usually collected without our consent; and what is worse, some data are sensitive in nature. Is it then safe to assume that our data are securely hidden inside these machine learning black-boxes? Can the data used for training be inferred from a machine learning model? And how easy is it to steal the machine learning model itself just by looking at its predictions?

It turns out that the answers to all these questions are not as reassuring as one would hope for. Under certain assumptions, models do leak and model theft is possible with relatively low costs for the attackers. Data collected with or without our consent are not necessarily private even if they are not directly exposed and that is something that should concern everyone.

Threat Model

In order to understand these attacks better and to be able to analyze their common elements it is a good idea to have a model of the environment and the potential threats. We therefore created a new Threat model, shown in Figure 1, to analyze how all the parts interact. From a threat model perspective, the assets that are sensitive and are potentially under attack are the training dataset and the model itself: its parameters, its hyper-parameters, and architecture.

The actors identified in this threat model are:

The data owners whose data may be sensitive.
The model owners which may or may not own the data and may or may not want to share information about their models.
The model consumers that use the services that the model owner exposes, usually via a programming or user interface.
The adversary may also have access to these interfaces as a normal consumer does. If the model owner allows it, they may have access to the model itself.

Other interesting distinctions between the different attacks have to do with the level of knowledge of the attacker (black-box vs. white-box) and whether or not the attacks are mounted when the model is deployed or during training.

What kind of privacy attacks are there?

Membership inference: After a model is trained, can we find out if a data sample was used for its training? If for example, several hospitals provide their data for building a machine learning model that makes predictions about a certain disease, would it be possible to find whether someone was a patient in the dataset just by having access to the trained model?

Reconstruction: can we reconstruct data used for training a model fully of partially? While the previous attack cared about membership, this attack is about extracting information about the data samples themselves. For example, while training a facial recognition system, can someone reconstruct the faces used for the training? This type of attack usually requires a stronger adversary that has access to the model parameters or loss gradients. Some people also use terms such as model inversion or attribute inference for this kind of attack.

Property inference: what kind of properties can we infer about the dataset used for training? For example, if we have a model that was trained to predict the age of a person from a face image, can we find out information about the percentage of people in the training data that are wearing glasses? This attack is related to how models, especially deep learning ones, learn features that are seemingly not correlated with the initial learning task or learn biases related to the training data.

Model extraction: Given black-box access to a model, can we create a substitute model or find information about the model’s architecture, and hyper-parameters? This is also known as model stealing. The prototypical target of this category is Machine Learning as a Service (MLaaS) where someone creates a model and hosts it in the cloud. The attacker is trying to learn an equivalent model using as few queries to the target model as possible. This attack can be used as a stepping stone to perform other attacks later on, with the advantage of extra knowledge about the target model.

While all the above attacks have negative results with respect to the data or model privacy, there are situations that attacks like these can be used for protecting someone’s data. If for example, a company gathers data without consent and uses them to create a machine learning model, a membership inference attack could potentially be used for auditing the model and establishing whether a person’s data were used or not.

Highlights

There is plenty of interesting work done so far in the area, with many new avenues of thought and proposals. Our survey explores this work and summarizes the most important information in order to understand the advances and identify potential focus areas.

In general, the topic has received increasing interest, especially in the past two years. While membership inference was the most studied type of attack until 2019, interest in model extraction and reconstruction attacks has increased too, with plenty of papers getting published in major conferences.

Unfortunately, the reasons why models leak are not fully understood yet. Some attacks against membership inference are more successful when the models exhibit high generalization error. While this seems like something that can be fixed easily, it is not necessarily the case. Deploying a well generalized model in real life settings, can be harder than it sounds. Other types of privacy attacks such as model extraction are possible even against well-generalized models. To make things worse, some training data samples can be more prone to data leakage than others.

When it comes to which learning tasks are being tested for attacks, there are clear favorites in the research community. This is no surprise since a lot of deep learning revolves around supervised learning and computer vision tasks. Figure 2 shows the amount of attacks of each type, reflecting this situation. Another reason for this preference is that some tasks are not as easy to attack, maybe because it is harder to formulate the attack in a meaningful way.

Figure 2. Amount of papers focusing on each type of attacks and learning tasks. The most attacked task is classification.

The preference towards attacking certain models is also reflected in the choice of datasets, with a lot of attacks choosing popular datasets such as MNIST or CIFAR. However, most papers demonstrate their attacks using multiple datasets and some of them attack state of the art models with complex architectures.

What about defenses?

Well, that is the topic of another survey and for someone else to write, but in brief, there are several proposals that attempt to thwart privacy attacks in machine learning. The most prominent ones belong to the Privacy Preserving Machine Learning area and its three pillars:

Federated Learning [1, 2] whose main idea is to allow the data owners keep their data and allow training of ML models in a distributed manner.
Differential Privacy [3] whose goal is to ensure that different computations over a set of data do not reveal information about individuals.
Encrypted computation using Homomorphic Encryption [4] or Multi-Party Computation [5] which allows calculations over data while they are encrypted.

In addition to those, most attack papers propose or test additional mitigations. However, it is fair to say that there is no one technique that works against every attack and it will take a combination of defensive measures to get us to a desired state of privacy.

Conclusion

As research in privacy related attacks is gaining momentum, it is expected that attacks against ML will improve further. Even in this relatively early stage of research, there are attacks that work under realistic assumptions. Therefore, publishing models or applications that use ML should be done with consideration of their potential to leak private data.

References

[1] McMahan, Brendan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. "Communication-efficient learning of deep networks from decentralized data." In Artificial Intelligence and Statistics, pp. 1273-1282. 2017.

[2] https://federated.withgoogle.com/

[3] Cynthia Dwork and Aaron Roth. 2013. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 (2013), 211–487 https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

[4] https://en.wikipedia.org/wiki/Homomorphic_encryption

[5] https://en.wikipedia.org/wiki/Secure_multi-party_computation

Machine Learning Leaks and Where to Find Them