Qodana
The code quality platform for teams
Secure Your PHP Code With Taint Analysis by Qodana
This blog post was brought to you by Qodana – a code quality platform by JetBrains. The platform is designed to bring server-side static analysis to your preferred CI tool. By using the same code inspections and profiles as PhpStorm and other JetBrains IDEs do, Qodana helps ensure consistent code quality checks both in your IDE and your CI environment.
It only takes one user to exploit a vulnerability in your project and breach your system. To defend programs against malicious inputs from external users (known as “taints”), development teams add taint checking to their static analysis routines.
In this year’s first release, the Qodana team has delivered taint analysis for PHP in the EAP. The feature is available only in Qodana for PHP 2023.1 (jetbrains/qodana-php:2023.1-eap). Qodana for PHP was the first linter we released, so we decided to let PHP developers be the first to test our new security functionality, too. We plan on adding more languages in the future, after we’ve collected enough feedback.
Read on to learn more about what taint analysis is and how it works in Qodana.
What is taint analysis?
A taint is any value that can pose a security risk when modified by an external user. If you have a taint in your code and unverified external data can be distributed across your program, hackers can execute these code fragments to cause SQL injection, arithmetic overflow, cross-site scripting, path traversal, and more. Usually they exploit these vulnerabilities to destroy the system, hijack credentials and other data, and change the system’s behavior.
As an extra layer of defense against malicious inputs, development teams execute taint analysis when they run a security audit on the program’s attack surface.
Taint analysis is the process of assessing the flow of untrusted user input throughout the body of a function or method. Its core goal is to determine if unanticipated input can affect program execution in malicious ways.
Taint sources are locations where a program gets access to potentially tainted data. Key points in a program that are susceptible to allowing tainted input are called taint sinks. This data can be propagated to the sinks via function calls or assignments.
If you run taint analysis manually, you should spot all of the places where you accept data from external users and follow each piece of data through the system – the tainted data can be used in dozens of nodes. Then, to prevent taint propagation, you should take one of the two approaches described below:
- Sanitize the data, i.e. transform data to a safe state. In the example below, we removed tags to resolve the taint.
- Validate the data, i.e. check that the added data conforms to a required pattern. In the example below, we enable validation for the `$email` variable.
In other words, the taint analysis inspection traces user-tainted data from its source to your sinks, and raises the alarm when you work with that data without sanitizing or validating it.
How taint analysis works in Qodana
Taint analysis is performed by Qodana for PHP starting from version 2023.1 EAP. This functionality includes an inspection that scans the code and highlights the taint and potential vulnerability, the ability to open the problem in PhpStorm to address it on the spot, and a dataflow graph visualizing the taint flow.
Example #1. SQL injection
Let’s take a look at an example of SQL injection and how Qodana detects it:
Here, Qodana shows us the following taints in the system_admin() function:
Markers 1-2: Data from user form input is retrieved from the $
_POST
global array with no sanitization or validation and is assigned to the variable $
edit
. This is a taint.
Marker 3: The tainted variable $
edit
is passed to the system_save_settings function as an argument without any proper sanitization.
Marker 4: Data from the $edit
variable is now located in the $edit
parameter.
Marker 5: The $
edit
variable is passed to foreach with the $
filename
key and $
status
value. Both variables contain the tainted data from the $
edit
variable concatenated with the string. The $
filename
key is concatenated with a tainted SQL string, and then it will propagate tainted data into an argument passed to the db_query.
Marker 6: The $ filename
key contains the tainted data from the $edit
variable concatenated with the string.
Marker 7: The $ filename
key is concatenated with a tainted SQL string.
Marker 8: Tainted SQL string will propagate tainted data into an argument passed to the `db_query`
Let’s now look at the db_query:
Marker 9: The tainted string will be located in the $query
parameter.
Marker 10: This parameter is going to be an argument of the _db_query function.
Let’s move on to the _db_query function:
Marker 11: Tainted data located in the first parameter $ query
of the _db_query function.
Marker 12: Data of the parameter is passed to the mysql_query function, which is a sink.
The whole data flow above illustrates how data moves from $
_POST[“edit”]
to the mysql_query($query) without any sanitization or validation. This allows the attacker to manipulate the SQL query which was concatenated with a key of $
_POST[“edit”]
and trigger SQL injection.
Qodana will spot these risks in your codebase along with all nodes where tainted data is used, so you can sanitize all tainted data in a timely manner.
Example #2. XSS problem
In the Qodana UI, you can see a graph that visualizes the entire taint flow. Here’s how Qodana will visualize the XSS vulnerability, which contains 2 sources that would be merged on marker 5.
Source 1
Markers 1-2: Data from the searchUpdate.pos file will be read and tainted data will be assigned to the $
start
variable.
Source 2
Markers 3-4: Data from files whose path is located in $
posFile
will be read and tainted data will be assigned to the $
start
variable.
Marker 5: A merged tainted state from all conditional branches in the $
start
variable will be passed as an argument to the doUpdateSearchIndex method.
Let’s look inside the doUpdateSearchIndex() method:
Markers 6-8: The $ start
parameter will contain tainted data on this dataflow slice and then it will be passed within a concatenated string as an argument to the `output` method.
Let’s look inside the output method:
Marker 9: Tainted data contained inside the transmitted string will be located in the $out
parameter.
Marker 10: Data from the $out
parameter will be transferred to the `print` function without any sanitization. This function is a sink and causes XSS vulnerability, which can be exploited.
To exploit the vulnerability, an attacker can, for example, upload a shell script instead of the expected files in markers 1 and 2, and will be able to put any information onto the web page as a result of an unsanitized print function.
Qodana will alert you to this vulnerability and give it a high priority so that you can resolve it as soon as possible and prevent the hack.
Conclusion
Taint analysis helps eliminate exploitable attack surfaces, so it’s an effective method to reduce risk to your software. To learn about taint analysis and Qodana in detail, explore Qodana documentation or check out impeccable PHP code quality at a price you’ll love!
Happy developing and keep your code healthy!