Data Privacy in AI
“Digital evolution must no longer be offered to a customer in trade-off between privacy and security. Privacy is not for sale, it's a valuable asset to protect.” Stephane Nappo, Global Chief Information Security Officer at Groupe SEB
Data, especially when sourced from users or sensitive sectors, needs to be treated with utmost confidentiality. Ensuring data privacy during the labelling process becomes an imperative. To illustrate the gravity, consider a data labelling project for a healthcare firm, where patient medical records, scans, and histories are being annotated. A breach in this data not only compromises the personal information of the patients but could also expose the healthcare provider to severe legal and financial repercussions, not to mention a tarnished reputation.
Collecting, handling and processing data comes with its fair share of laws and regulations, the most important being the General Data Protection Regulation (GDPR) in the EU, and the California Consumer Privacy Act of 2018 (CCPA) in the US. Through these acts, personal data is safeguarded through legislation and business practices are required to reflect data sovereignty and ethics not just inside of the national territory in which they operate, but even outside of the border.
With both data privacy and AI in the limelight, the United Nations specialised Agency for ICT - the International Telecommunication Union (ITU-T) - has established the AI for Good programme: an ongoing series of Webinars in which the AI community is able to connect, identify issues, discuss solutions in AI towards establishing global sustainable development goals.
A key initiative for AI for Good has been the constitution of the Trustworthy AI programme created with the sole purpose of standardising Privacy-Enhancing Technologies (PETs). These technologies are focused on empowering people and ensuring the protection of Personal Identifiable Information (PII) through minimising the use of personal data and maximising security measures. The overall scope is to create technologies that can limit access to personal information while providing the same excellence in service delivery, with examples ranging from homomorphic encryption and zero-knowledge proofs to federated learning.
In an effort to address the challenges raised by emerging data privacy acts, as well as the recent increase in data silos, federated learning (FL) has been introduced as a novel approach for decentralised machine learning model training. Data is no longer centrally stored, rather it is distributed to multiple data nodes. The AI training is carried out on each of these nodes and then aggregated into a global machine learning model based on every node’s contribution. Essentially, your data never has to leave your side, it’s the AI training process that gets distributed in an effort to promote data privacy & minimisation. Let’s take a deeper look!
Last updated