How we ensure patient privacy
We apply Federated learning and differential privacy, these are two techniques that can be used to protect the privacy of data when it is being used for machine learning.
Federated learning is a distributed machine learning technique that allows multiple parties to train a machine learning model on their own data, without sharing their data with each other. This means that the data remains on the individual parties' systems, where it is protected by their own security measures.
Differential privacy is a mathematical concept that provides a formal definition of privacy. It involves adding random noise to the learning process of the model in a way that preserves the accuracy of the model, but makes it difficult or impossible to determine the specific values of the individual data points. This means that even if an attacker were to gain access to the data, they would not be able to learn anything about the individuals whose data was used to train the model.
We measure whether an attacker could somehow re-identify or trace any sensitive patient information to proof our methods are completely annonimising.
Together, federated learning and differential privacy can provide strong protections for the privacy of data used in machine learning. By using federated learning to keep the data distributed and secure, and by using differential privacy to add noise to the data, it is possible to train accurate machine learning models while still protecting the privacy of the individuals whose data is used.
How does that change the data science project?
As mentioned above, we can train models in a privacy preserving way, which is step 4 in the process. But how can we ensure privacy in the steps before?
By synthesizing data, we can create a replica of your data which has the same properties, but proveably none of the identying characteristics.
This way we can work in parallel and set up the infrastructure when we need to.