Part 1: XZ backdoor story – Initial analysis
Part 2: Assessing the Y, and How, of the XZ Utils incident (social engineering)
In our first article on the XZ backdoor, we analyzed its code from initial infection to the function hooking it performs. As we mentioned then, its initial goal was to successfully hook one of the functions related to RSA key manipulation. In this article, we will focus on the backdoor’s behavior inside OpenSSH, specifically OpenSSH portable version 9.7p1 – the most recent version at this time.
To better understand what’s going on, we recommend you to read Baeldung’s article about SSH authentication methods and JFrog’s article about privilege separation in SSH.
Our analysis revealed the following interesting details about the backdoor’s functionality:
There are three functions that the backdoor attempts to hook, of which RSA_public_decrypt is the primary target and RSA_get0_key is the secondary. The third function, EVP_PKEY_set1_RSA, doesn’t exist in the SSH server version in question. It may be an artifact left over from the tool used for malicious public key generation (this function is used by an independent ssh-keygen tool included in the OpenSSH packet), or it may have been used in a rare or outdated version of the SSH server.
The two target functions in the latest SSH server version are called when the RSA certificate is configured as an SSH authentication method. They first check if an incoming RSA connection uses authentication data (RSA key) as an argument. If so, the backdoor passes it to a common function (called by all hooks) that parses this RSA key and extracts information that is embedded in its modulus part. The backdoor’s main payload function works only once during a client preauth session, when the RSA-based authentication checks are performed.
An attacker must generate a specific RSA key to interact with the backdoored server; the key is used as a container for the attacker’s commands in SSH connections using CA certificates.
The RSA key is represented by a structure in the OpenSSL library that contains the E (exponent) and N (modulus). The backdoor extracts and processes the RSA modulus, which means that the malicious payload is packed inside the N value from the RSA cryptosystem.
The custom RSA modulus must conform to the following format to be processed correctly by the backdoor:
There are three fields in the payload header (PartialCommand1, 2 and 3 in the scheme above) that are used to calculate the command type and also act as a form of magic number check. The command type is calculated using the following formula: PartialCommand3 + (PartialCommand2 * PartialCommand1), where the result of the calculation must be a value between 0 and 3:
If the calculated check passes, the code proceeds to the payload decryption and payload signature check.
To decrypt and verify the payload data, the backdoor uses an ED448 public key extracted from the binary.
When we first encountered the key extraction procedure, it looked like the backdoor authors had managed to create code that generated a correct public key before the private key, which should be impossible. Normally, for the Elliptic Curve Algorithm, the private key must be generated first, and then the public key is calculated from it. To solve the mystery of generating the public key from the binary, we analyzed the source code of various cryptographic libraries and came up with nothing. We then analyzed the backdoor code more closely, and found that the keys were generated using a regular procedure. However, the attackers used a custom steganography technique in the x86 code to hide an arbitrary message (in this case, the public key).
The public key information was scattered inside the binary code within specific valid instructions. The method of recovering the key is somewhat similar to the gadget scanning technique in a return-oriented programming (ROP) binary exploitation scenario. But here the “gadgets” are actually register-register instructions (e.g., mov rdi, rbx), each of which holds one bit of information, whose value is either 1 or 0.
To achieve key recovery, some functions, usually at the beginning of the function, call the “key rebuild” algorithm with specific arguments.
The arguments used by this algorithm are:
The key rebuild algorithm scans certain functions of the backdoor from beginning to end looking for register-register instructions. When it finds an instruction, it decodes the ‘BitIndex’ value to extract the correct byte index and bit to be set.
The BitIndex value is unpacked to determine the target index in the buffer. It then adds (bitwise or) the bit to the current value at that index. As the encrypted public key buffer is initialized with zeros, the rebuilder algorithm will only activate specific bits inside it. It sets the key bit value to 1 if the register-register instruction matches the opcode criteria (image above), or skips it, indicating that this bit value should remain zero. After that, the BitIndex value increases.
The algorithm determines whether the bit should be set or not for each instruction individually, even if the instructions have the same disassembly representation. This is because some instructions can have the same assembly code but different opcodes.
In general, for each instruction found, the BitIndex is used to reconstruct a specific part of the encrypted key. In total, 456 instructions are hunted through the binary execution, and the encrypted public key is rebuilt by the end of this process.
In our research, we recreated the entire key rebuilding process that results in the encrypted public key that is later decrypted.
The ED448 public key is encrypted using the ChaCha20 algorithm, where the key and nonce are the result of ChaCha20 encryption of a buffer consisting of zeros, with zeros used as the key and nonce.
After decryption, the backdoor takes the first 32 bytes of the public key and uses them as the key to decrypt the payload body, which is also ChaCha20 encrypted.
The decrypted payload contains the signature of the remaining data in its header. To verify the signature, one must have a private key to sign the payload. In the expected attack scenario, only the backdoor author would have access to sign and send payloads to the infected server.
To verify the integrity and authenticity of the payload, the backdoor again uses the decrypted ED448 public key to confirm that the incoming payload was signed with the attacker’s private key.
It also takes the SHA-256 hash of the server’s public key (taken from the initial SSH connection when the server sends the public key) into the payload signed data and verifies that it matches the currently running server. This is done to prevent replay attacks, where a researcher could capture the backdoor communication and replay the same backdoor command to another server.
If all the checks pass, the code proceeds to parse the arguments of the desired backdoor command. The backdoor can execute the commands in two modes, root and non-root, and the execution can vary depending on the privilege level. However, the non-root mode operations don’t appear to be the attacker’s goal, so we’ll describe what the root-mode code does.
The command chosen by the attacker depends on the result of the calculation on the header fields. The core backdoor commands essentially allow the attacker to log into the server as root or a regular user and execute some system commands. This section describes what each command does.
Both commands 0 and 1 enable root login on the SSH server if it wasn’t previously enabled. Additionally, they can optionally disable the use of Pluggable Authentication Modules (PAM). Next, depending on the command, one of two OpenSSH functions can be hooked, the mm_answer_authpassword if the command is 1 or the mm_answer_keyallowed if the command is 0.
The mm_answer_authpassword hook allows the attacker to use any username/password to log in to the infected server without any further checks. We’ll describe mm_answer_keyallowed in more detail later.
If the command is 2, it extracts the command string from the payload data and executes it on the machine using the system function. The attacker can send specific flags in the payload body to specify the user and group ID under which the command should be executed.
The last backdoor command (3) simply closes the pre-auth session by exiting the Monitor process.
After the main hook function completes, the backdoor replaces the incoming RSA public key with an empty one to avoid logging any malicious data from the attacker.
If the backdoor command is 0 the mm_answer_keyallowed function will be hooked. This function is called by the privileged Monitor process when the unprivileged NetworkProcessing sshd process receives the publickey method of the ssh-userauth request, allowing the backdoor to process additional commands at that stage.
In this processing method, similar steps occur: the modulus of the attached to the ssh-userauth request RSA key is extracted, decrypted, and its signature is checked as explained earlier. The possible subcommands are:
The backdoor can also hide logs that indicate unauthorized connections to the SSH server by hooking the logging function and optionally calling the libc setlogmask(0x80000000), which causes all Syslog messages to be ignored. 0x80000000 in the argument actually means setting the log mask to zero, because the zero-mask argument is a reserved value with special meaning.
The hook will also filter out log messages by searching them in the prefix tree, as described in our first post about XZ backdoor. Its further behavior depends on which of the targeted messages were found (if any):
The available log filters are:
Log message | How it is processed |
“Connection closed by “ | Temporarily restores libc’s syslog mask to its default value 255, allowing all syslog messages if it was previously cleared, and allows this message to be logged. Disables syslog messages again by clearing the log mask |
“Accepted password for ” “Accepted publickey for “ |
Replaces these successful connection messages with messages about failed authentication attempts. Also temporarily enables and then disables the syslog mask if it was previously cleared. |
All other log messages | Filtered out (not printed) |
After three posts on this backdoor, we can conclude that it is indeed a highly sophisticated threat with many peculiarities. Several highlights make this threat unique, such as the way the public key information is embedded in the binary code itself, complicating the recovery process, and the meticulous preparation of the operation, which involves a long-running social engineering campaign.
It is notable that the group or attacker behind this threat has extensive knowledge of the internals of open-source projects such as SSH and libc, as well as expertise in code/script obfuscation used to start the infection.
Kaspersky products detect malicious objects associated with the attack as HEUR:Trojan.Script.XZ and Trojan.Shell.XZ. In addition, Kaspersky Endpoint Security for Linux detects malicious code in sshd process memory as MEM:Trojan.Linux.XZ (as part of the Critical Areas Scan task).