Qodana

The code quality platform for teams

Get Qodana

Adversarial AI: Understanding the Threats to Modern AI Systems

Kerry Beetge

Whether you’re in “fan” or “fear” mode (or somewhere in-between), there’s no denying that Artificial Intelligence changed how we build products, and do business. We’ve always had cybersecurity threats but now the landscape is more complex and we have to consider new forms of fraud detection, customer support, autonomous systems and generative AI hazards.

Plus, as AI capabilities grow, so do the threats targeting them. One of the most critical emerging risk areas is adversarial AI – the use of malicious techniques to exploit, manipulate, and/or compromise AI systems.

Understanding adversarial AI is important for protecting the integrity, reliability, and security of our “AI-powered” products – a phrase we’re all too familiar with. Why? Because, these threats can directly impact business outcomes, leading to financial losses, reputational damage, and the absolute erosion of customer trust, which we’re already seeing.

In this article, we introduce adversarial AI, explore its two primary forms, and outline the main attack surfaces that organizations and software development teams must secure.

The two faces of adversarial AI

Adversarial AI threats generally fall into two broad categories.

1. AI used as a weapon

In the first category, attackers use AI itself to amplify malicious activities. These include:

Deepfake generation: Creating realistic fake images, videos, or audio to spread misinformation, commit fraud, or damage reputations.
Automated phishing: Using AI to craft highly personalized phishing emails at scale, increasing success rates while reducing attacker effort.
AI-generated malware: Developing malware that can identify vulnerabilities and adapt faster than traditional attack techniques.

These attacks aren’t theoretical. They’re already being used to bypass defenses, deceive users, and exploit organizations – often at unprecedented speed and scale. We’ll show you some examples further on but it can be as alarming as it sounds.

2. Attacks targeting AI systems directly

The second category focuses on attacking AI models and systems themselves. These attacks are especially dangerous because they undermine how AI makes decisions, potentially leading to misleading outputs, biased behavior, or unsafe actions.

For organizations relying on AI-driven decisions, compromised models can quietly introduce systemic risk, often without obvious signs until there’s significant damage.

Where attackers focus their efforts

When targeting AI systems, adversaries typically concentrate on three main areas.

1. Attacks on AI algorithms

These attacks target the core learning and decision-making mechanisms of AI systems. By interfering with how models are trained or how they interpret inputs, attackers can influence predictions and outcomes.

This category includes some of the most impactful adversarial techniques, which we explore in detail later in this article.

2. Attacks on generative AI filters

Generative AI systems rely on filters and safeguards to prevent people misusing them – like content moderation filters or information that identifies people personally (email addresses), etc. Attackers exploit weaknesses in these controls using techniques like prompt injection or code injection, helping them get past restrictions.

These filters are applied during input and output and unfortunately provide plenty of opportunity for attackers to get creative to get and use this sensitive information.

When successful, these attacks help adversaries generate harmful content, and/or execute actions that aren’t intended – often leaving the user none the wiser until it’s too late.

3. Supply chain attacks on AI artifacts

AI systems depend a lot on third-party components, including datasets, pre-trained models, APIs, and open-source libraries. Supply chain attacks target these dependencies.

For example, an attacker may compromise an open-source library that’s used during model training or embed malicious code into a dataset. Once it’s already integrated, the compromised component can enable unauthorized access, data exfiltration, or system disruption.

Because these attacks exploit trusted dependencies, they’re especially difficult to detect and can have far-reaching consequences.

Attacks on AI algorithms: real-world examples

Attacks on AI algorithms strike at the foundation of AI systems. When successful, they can cause models to behave incorrectly, unpredictably, or maliciously. Three attack types dominate this category: data poisoning, evasion attacks, and model theft.

Data poisoning attacks

Data poisoning occurs during the training phase of an AI model. Attackers manipulate training data to corrupt the model’s learning process, causing it to internalize false or harmful patterns.

For example, consider a fraud detection model trained to identify suspicious transactions. If an attacker gains access to the training pipeline, they could inject fraudulent transactions labeled as legitimate. As a result, the model becomes less effective at detecting real fraud, exposing the organization to financial risk.

A well-known real-world example is Microsoft’s Tay chatbot, launched in 2016. Tay learned directly from user interactions on Twitter (if you’re a Millennial) or X (If you’re a Gen-Z), and was quickly manipulated into producing offensive and harmful content. This incident highlighted the risks of unmonitored data pipelines and insufficient safeguards during training.

Search engine manipulation offers another example, where poisoned data has been used to surface false information, which breaks user trust in AI-driven systems.

Evasion attacks

Evasion attacks occur after a model has been deployed. Instead of modifying the model, attackers subtly manipulate inputs to cause incorrect predictions.

In fraud detection, this might involve changing spending behavior just enough to avoid triggering alerts, such as breaking large transactions into smaller ones. Each transaction appears legitimate in isolation, allowing fraud to go undetected.

Evasion attacks have also been demonstrated in autonomous driving systems. Researchers have shown that placing small stickers on a stop sign can cause a self-driving car to misinterpret it as a speed limit sign. These changes are often barely noticeable, or even not noticeable at all, to humans but sufficient to confuse the model. This could lead to catastrophic outcomes depending on the different use cases.

Similar techniques can bypass facial recognition systems or other biometric controls, enabling unauthorized access and data theft.

Model theft

Model theft involves stealing or replicating an AI model by repeatedly querying it and analyzing its outputs. Over time, attackers can infer the model’s structure, parameters, or even training data, effectively cloning proprietary intellectual property.

In 2019, researchers demonstrated that a commercial AI model could be replicated with approximately 90% accuracy using only its public interface. By observing how the model responded to carefully chosen inputs, they reconstructed its internal behavior.

A more recent example emerged in 2023 with the Alpaca and OpenLLaMA projects. Attackers queried Meta’s Llama model extensively and analyzed its outputs to reverse-engineer its functionality. This process enabled them to create Alpaca, a model that closely mimicked LLaMA’s performance without direct access to its source code or training data.

Model theft undermines the competitive advantage of proprietary AI systems and enables adversaries to reuse or resell stolen capabilities.

Why this matters for businesses

Data poisoning, evasion attacks, and model theft all compromise the integrity and reliability of AI systems. For businesses, the consequences can include:

Operational disruptions
Financial losses
Intellectual property theft
Regulatory and compliance risks
Loss of customer trust

Protecting AI systems requires more than traditional application security. Organizations must design AI with resilience in mind, implementing access controls, monitoring model behavior, validating data pipelines, and securing dependencies throughout the AI supply chain.

What’s next?

Understanding these attack vectors is the first step toward securing AI-powered products. In the next session, we’ll explore attacks on filters in generative AI, where adversaries bypass safeguards to misuse AI capabilities.

By proactively addressing adversarial AI threats, organizations can protect their models, their users, and their business outcomes.

As Qodana continues to develop for a new age of security and quality threats, we’re releasing new features that help protect your codebase so you can focus on quality and debt. Speak to the team to find out how we can help.

Get A Qodana Demo

SonarQube vs. Qodana: Which Code Quality Tool Is Right for Your Team?

Discover more

This is the English version of the Japanese article first published by Jan Maki on Qiita. This post was not commissioned by Qodana and was written independently. For more information please reach out to them in the comments or via their link tree. Qodana is JetBrains’ code analysis platform f…

Why shadow AI can be dangerous1. Data leakage2. Security vulnerabilities3. Compliance violations4. Lack of auditabilityReal-world example: Samsung’s Source Code Leak via ChatGPTHow Qodana reduces shadow AI risksCompliance assuranceSecurity-focused scanningPolicy enforcement across teamsTraceability …

Qodana

Adversarial AI: Understanding the Threats to Modern AI Systems

Table of Contents