OWASP ML Security Top 10 | Ben Kereopa-Yorke

About This Framework

The OWASP ML Security Top 10 was originally founded by Sagar Bhure and Shain Singh, who pioneered this critical framework for the ML security community. Ben Kereopa-Yorke later joined as Co-Lead, working alongside the founding team and a global community of ML security experts and practitioners to maintain and evolve this framework to address emerging threats in the rapidly changing ML security landscape.

This framework is used by organisations worldwide to assess and improve their ML security posture, from startups to Fortune 500 companies across finance, healthcare, government, and technology sectors.

Framework Overview

Understanding ML Security Risks

Machine learning systems face unique security challenges that traditional cybersecurity frameworks don't address. The OWASP ML Top 10 identifies the most critical vulnerabilities affecting ML systems in production.

The Top 10 ML Security Vulnerabilities

ML01:2023 Input Manipulation Attack

Critical

Adversaries craft malicious inputs to cause ML models to produce incorrect outputs. Also known as adversarial examples or evasion attacks.

Real-World Example: In 2019, researchers demonstrated that adding specific stickers to stop signs could cause Tesla's Autopilot to misclassify them, potentially leading to dangerous driving behaviour. Similarly, subtle modifications to images have been shown to fool facial recognition systems used in security applications.

ML02:2023 Data Poisoning Attack

Critical

Attackers inject malicious data into training datasets to compromise model behaviour, causing models to learn incorrect patterns or introduce backdoors.

Real-World Example: Microsoft's Tay chatbot was taken offline in 2016 after coordinated attacks poisoned its learning data through malicious conversations, causing it to generate offensive content. More recently, concerns about data poisoning have emerged in large language model training, where adversaries could inject biased or malicious content into web-scraped training data.

ML03:2023 Model Inversion Attack

High

Attackers exploit model outputs to reconstruct sensitive training data, revealing private information that should remain confidential.

Real-World Example: Research has demonstrated that facial recognition models can be attacked to reconstruct recognisable faces from the training data. In healthcare, studies have shown that ML models trained on patient data can leak sensitive medical information through carefully crafted queries, even when the raw training data is supposedly protected.

ML04:2023 Membership Inference Attack

High

Adversaries determine whether specific data points were used in model training, potentially exposing sensitive information about individuals or organisations.

Real-World Example: Studies on medical ML models have shown attackers can determine if a specific patient's records were in the training dataset, potentially revealing that someone has a particular medical condition. This poses significant privacy risks for models trained on genomic data, financial records, or other sensitive personal information.

ML05:2023 Model Theft

High

Attackers extract or replicate proprietary ML models through API queries or other access methods, stealing intellectual property and potentially uncovering model weaknesses.

Real-World Example: Researchers demonstrated model extraction attacks against commercial ML APIs from major cloud providers, successfully replicating models worth millions in development costs. In 2020, studies showed that attackers could steal Google's sentiment analysis model with 98% accuracy using only API queries, highlighting the vulnerability of ML-as-a-Service platforms.

ML06:2023 AI Supply Chain Attacks

Critical

Compromising ML systems through vulnerabilities in the supply chain, including pre-trained models, datasets, libraries, or ML infrastructure components.

Real-World Example: In 2022, researchers discovered that popular pre-trained models on repositories like Hugging Face contained malicious code that could execute arbitrary commands when loaded. The PyTorch supply chain was also compromised in 2022 when an attacker uploaded a malicious package dependency, potentially affecting thousands of ML developers.

ML07:2023 Transfer Learning Attack

Medium

Exploiting the practice of using pre-trained models by embedding backdoors or vulnerabilities that persist when the model is fine-tuned for new tasks.

Real-World Example: Academic research has demonstrated "BadNets" where backdoors embedded in pre-trained models remain active even after extensive fine-tuning. For instance, a pre-trained image classifier could be backdoored to misclassify images containing a specific trigger pattern, and this behaviour persists across different downstream applications using transfer learning.

ML08:2023 Model Skewing

Medium

Gradually degrading model performance through subtle attacks or data drift, causing models to produce increasingly incorrect or biased outputs over time.

Real-World Example: Recommender systems in social media platforms have been shown to be vulnerable to coordinated manipulation campaigns that gradually shift model predictions. Financial fraud detection systems have been evaded through slow, incremental changes to fraudulent patterns that cause models to adapt to and eventually accept malicious behaviour as normal.

ML09:2023 Output Integrity Attack

High

Manipulating or tampering with ML model outputs after inference, causing downstream systems to receive incorrect predictions without affecting the model itself.

Real-World Example: In autonomous vehicle systems, attackers could potentially intercept and modify the output of object detection models between the ML inference engine and the vehicle control system. Healthcare systems using AI for diagnosis could be vulnerable to output tampering if the communication channel between the ML service and clinical systems is not properly secured.

ML10:2023 Model Poisoning

Critical

Directly tampering with ML models through unauthorized access to model weights, architecture, or training infrastructure, fundamentally compromising model integrity.

Real-World Example: Insider threats in ML organizations could directly modify production models to introduce biases or backdoors. Research has shown that even small, targeted changes to model weights can create persistent backdoors while maintaining normal performance on benign inputs. This is particularly concerning for models deployed in critical infrastructure or national security applications.