The Top 10 ML Security Vulnerabilities
Adversaries craft malicious inputs to cause ML models to produce incorrect outputs. Also known as adversarial examples or evasion attacks.
Real-World Example: In 2019, researchers demonstrated that adding specific stickers to stop signs could cause Tesla's Autopilot to misclassify them, potentially leading to dangerous driving behaviour. Similarly, subtle modifications to images have been shown to fool facial recognition systems used in security applications.
Attackers inject malicious data into training datasets to compromise model behaviour, causing models to learn incorrect patterns or introduce backdoors.
Real-World Example: Microsoft's Tay chatbot was taken offline in 2016 after coordinated attacks poisoned its learning data through malicious conversations, causing it to generate offensive content. More recently, concerns about data poisoning have emerged in large language model training, where adversaries could inject biased or malicious content into web-scraped training data.
Attackers exploit model outputs to reconstruct sensitive training data, revealing private information that should remain confidential.
Real-World Example: Research has demonstrated that facial recognition models can be attacked to reconstruct recognisable faces from the training data. In healthcare, studies have shown that ML models trained on patient data can leak sensitive medical information through carefully crafted queries, even when the raw training data is supposedly protected.
Adversaries determine whether specific data points were used in model training, potentially exposing sensitive information about individuals or organisations.
Real-World Example: Studies on medical ML models have shown attackers can determine if a specific patient's records were in the training dataset, potentially revealing that someone has a particular medical condition. This poses significant privacy risks for models trained on genomic data, financial records, or other sensitive personal information.
Attackers extract or replicate proprietary ML models through API queries or other access methods, stealing intellectual property and potentially uncovering model weaknesses.
Real-World Example: Researchers demonstrated model extraction attacks against commercial ML APIs from major cloud providers, successfully replicating models worth millions in development costs. In 2020, studies showed that attackers could steal Google's sentiment analysis model with 98% accuracy using only API queries, highlighting the vulnerability of ML-as-a-Service platforms.
Compromising ML systems through vulnerabilities in the supply chain, including pre-trained models, datasets, libraries, or ML infrastructure components.
Real-World Example: In 2022, researchers discovered that popular pre-trained models on repositories like Hugging Face contained malicious code that could execute arbitrary commands when loaded. The PyTorch supply chain was also compromised in 2022 when an attacker uploaded a malicious package dependency, potentially affecting thousands of ML developers.
Exploiting the practice of using pre-trained models by embedding backdoors or vulnerabilities that persist when the model is fine-tuned for new tasks.
Real-World Example: Academic research has demonstrated "BadNets" where backdoors embedded in pre-trained models remain active even after extensive fine-tuning. For instance, a pre-trained image classifier could be backdoored to misclassify images containing a specific trigger pattern, and this behaviour persists across different downstream applications using transfer learning.
Gradually degrading model performance through subtle attacks or data drift, causing models to produce increasingly incorrect or biased outputs over time.
Real-World Example: Recommender systems in social media platforms have been shown to be vulnerable to coordinated manipulation campaigns that gradually shift model predictions. Financial fraud detection systems have been evaded through slow, incremental changes to fraudulent patterns that cause models to adapt to and eventually accept malicious behaviour as normal.
Manipulating or tampering with ML model outputs after inference, causing downstream systems to receive incorrect predictions without affecting the model itself.
Real-World Example: In autonomous vehicle systems, attackers could potentially intercept and modify the output of object detection models between the ML inference engine and the vehicle control system. Healthcare systems using AI for diagnosis could be vulnerable to output tampering if the communication channel between the ML service and clinical systems is not properly secured.
Directly tampering with ML models through unauthorized access to model weights, architecture, or training infrastructure, fundamentally compromising model integrity.
Real-World Example: Insider threats in ML organizations could directly modify production models to introduce biases or backdoors. Research has shown that even small, targeted changes to model weights can create persistent backdoors while maintaining normal performance on benign inputs. This is particularly concerning for models deployed in critical infrastructure or national security applications.