One of the biggest challenges we face in analyzing Android application package (APK) samples at scale is the diversity of Android platform versions that malware authors use. When trying to utilize static and dynamic analysis techniques in the malware detection space, the sheer variety of platform versions can feel overwhelming.
In this article, we will discuss this issue of how malware authors use obfuscation to make analyzing their Android malware more challenging. We will review two such case studies to illustrate those obfuscation techniques in action. Finally, we’ll cover some overall techniques researchers can use to address these obstacles.
The Advanced WildFire (AWF) cloud-delivered malware analysis service accurately identifies samples discussed in this blog as malicious. Palo Alto Networks customers with an active AWF subscription automatically receive protection from these threats.
Related Unit 42 Topics | Sandbox, Android |
Hooking-Based Sandbox
Extending API Level Coverage
Deobfuscation at Runtime
Case Studies
Cerberus Banking Trojan
Deobfuscating Configuration Keys
HiddenAd Adware
Faking "Bloons TD 6"
Hiding under U+FFF8
Conclusion
Indicators of Compromise
MITRE TTPs
Malware researchers frequently use both static and dynamic analysis techniques. The former involves examining characteristics of a malware sample on disk. The latter involves watching the sample in action, such as in a sandbox. Putting these two techniques together helps us form a complete picture of a threat’s behavior.
On the Android platform, dynamic analysis traces an APK sample's execution and extracts runtime information for malware detection and analysis. One sandboxing technique we can use for dynamic analysis is to build up a hooking framework.
Hooking is a technique that intercepts Android Framework API calls made by the sample during its analysis run, especially capturing and/or manipulating the functions’ arguments and return value, where present. This is not at all peculiar to the Android platform. The same basic theory applies to native and managed code on all major operating systems today.
Hooking is accomplished by inserting a redirection to the user-specified "hook" function at runtime, as shown below in Figure 1. Modern sandboxes rely on the hooking technique to provide versatile deobfuscation of code, data (strings) and embedded payloads at runtime.
Instead of being fixed to a certain Android platform version, the hooking framework flexibly supports a wide range of Android API levels by dynamically adding tracing logic to the sample's runtime. This allows the tracing development to be focused on triggering and extraction of the malicious behavior.
A limitation of this framework is that it cannot instrument every line of code within the executed Android Framework API function. This is because the instrumentation patches bytes at the function prologue (i.e., at the beginning of the function) to jump to the hook function, rather than at each instruction level. However, in practice, this is often a "good enough " tradeoff for a researcher to accept.
Before Hooking (Original) |
After Hooking (Modified) |
Figure 1. Hooking code flow modification.
Besides hooking, another way to implement an Android sandbox is to add instrumentation at specific points of interest into the Android Open Source Project (AOSP) codebase. Creating a custom build of the Android codebase for dynamic analysis purposes is possible due to the open-source nature of the Android ecosystem.
Researchers capture values by inserting instrumentation code at certain strategic points. Sensitive Android Framework API functions related to the reading of Short Message Service (SMS) text messages are an example of a suitable candidate for instrumentation because these are commonly abused by malware authors. Doing so allows the researcher to get a better understanding of the sample's behavior.
An advantage of this approach over hooking is that it has guaranteed comprehensive coverage of the entire Android Framework API. Researchers can instrument every line of code simply by inserting the extra instrumentation code at the line of interest and then recompiling the AOSP codebase, though recompilation can be a rather non-trivial task.
This method is also transparent to the sample being analyzed, meaning the sample is likely to be unable to distinguish that it is running within an instrumented AOSP environment. Instrumentation also accelerates the analysis process, as researchers must otherwise unpack the actual deobfuscated malicious payloads (likely in memory, or with dropped files) before detonation.
Unfortunately, this method comes with the high cost of regularly maintaining a repeatable build of the heavily modified AOSP image as the Android platform evolves. This is particularly problematic after major upgrades or changes to the API, because we must update our tracing modules to support these new versions. One example was when Google changed from using the old Dalvik runtime to the current Android runtime (ART). This is intrinsically an arduous undertaking of maintenance.
Google evolves the Android platform over time and, in so doing, produces different versions. This creates different API levels that are tied to these versions. Figure 2 shows the different Android versions as of May 2023.
Each of these API levels may deprecate outdated functions or functions that pose a security risk to the device owner, while also introducing new features. This can be problematic for dynamically instrumenting APK samples because developers will need to regularly maintain existing hooks, following changes in Android Framework API function specifications.
This is where hooking shines, because it is extensible by default. This refers to its ability to support multiple Android Framework API levels on-demand.
We searched VirusTotal (VT) and found 12,394 malicious APK samples in the 12-day period from August 27 through Sept. 7, 2023. The minimum API level runtime requirements of these APK malware samples varied from API level 19 to API level 30. Figure 3 is a chart summarizing these results.
Note: According to the Android API Levels website, minSdkVersion refers to "the minimum SDK version your app will support, defined in build.gradle." For example, if your minSdk is 26, this SDK version corresponds to API Level 26, i.e., Android 8. This means your app will only run on devices with Android 8 or higher.
Based on our internal telemetry in the same time range, we observe similar results in Figure 4 below.
For such samples, there is also a spread across a wide range of recent minimum API level runtime requirements. This ranges from API level 19 all the way to the latest API level 34, shown in Figure 5.
Google most recently announced its enforcement of target API level requirements for Google Play apps on Aug. 31, 2023, and enforced annually thereafter. With such an announcement, we can expect a continual spread across the range of available API levels of APK samples. This is because there will always be a grace period for upgrading, and alternatives, like distributing through a third-party App Store.
Based on Google's current detection, no apps containing the malware samples discussed in this article are found on Google Play. Google Play Protect protects users from apps known to contain malware on Android devices with Google Play Services, even when those apps come from other sources.
Consequently, this means that our WildFire sandbox would need to keep up with this trend of supporting newer API levels over time, as Google releases them. A hooking-based sandbox would simplify this task of maintenance, as hooks can be added or removed on-demand.
Obfuscation employed by APK samples on the Android platform exists because it aims to hinder traditional static analysis efforts, like what occurs on all other platforms. Both threat actors and vendors have a wealth of options available to them for easily switching between various obfuscation strategies. These options range from readily available commercial off-the-shelf solutions, to free and open-source software.
The same sort of challenges do not adversely affect dynamic analysis approaches the way they do classic static analysis. Dynamic analysis excels at capturing and extracting fully deobfuscated and unpacked core artifacts.
APK/Dalvik bytecode (DEX) payloads and URL strings are some examples of such artifacts. APK refers to the Android Application, DEX means the core programming code that powers the Android application and URL strings are server addresses to contact remote endpoints via the web (e.g., https://paloaltonetworks.com).
Researchers need these artifacts for identification and detection purposes, as well as for performing further analysis. Although deeper analysis does incur a longer runtime because it’s reporting its activities to us, it is still necessary since we need time to fully observe the interactions of the APK sample upon its detonation within an instrumented environment.
Researchers need to be aware that there is a significant associated investment of research and development for the malware author as they apply each evasion technique to their APK sample before distribution. Threat actors generally pick the least complex mechanism that would allow them to pass through defenses before they would consider upgrading their techniques.
Adopting a dynamic analysis approach would be more effective for tackling the challenge of obfuscation than traditional static analysis. A sandbox can overcome such evasive techniques, uncovering useful artifacts available in cleartext at runtime.
These artifacts would otherwise require us to invest some effort into tracing the flow of the DEX in the APK sample. It would also require us to dedicate resources to manually resolve tricky values, such as the database decryption key. Through reverse engineering efforts, we discovered the key to be derived from the code-signing digital signature attached to the APK sample.
Cerberus is a banking Trojan that steals valuable information off Android mobile devices. Attackers can also use it to gain access to and take control over the device, impersonating its owner to perform actions on their behalf.
A Cerberus sample (SHA-256 1249c4d3a4b499dc8a9a2b3591614966145daac808d440e5202335d9a4226ff8), is digitally code-signed with a generic Android certificate. It masquerades as a Google Play Store Android application by using the same icon. However, the author programmed it such that it blanks out its own application name, with a contiguous sequence of five whitespace (\x20) characters (see Figure 6).
This APK sample also achieves obfuscation by applying Base64 encoding on top of RC4 encryption to all of its configuration strings. The APK sample stores these as variable values within a single, monolithic String Pool class. The APK sample stores the responsible method named a inside the package named com.fky.lblabjglab, class a, shown in Table 1.
No | Variable name | Configuration key (obfuscated) | Configuration key (deobfuscated) |
1 | b | wssmnpdmydteY2Y3NGY4MzNmNg== | idbot |
2 | c | ujvsdjiocsqfMzg2Njg5ODcyZmExNzRkMzgxMjZkZjIyZTBhMw== | initialization |
3 | d | ysknmuiqllmjODRiMDRhNmJmZTQ4MDg1OTE2MDM4YTRiNGI= | urlAdminPanel |
4 | e | wdhinzpyayjvNDdlMzFkN2Q2MjU5MWZkMTVkZWZmMjE4Mjg5Mg== | starterService |
5 | f | vbylpyugkbfjYjZiMjgyNTY2ZmViZjA2YWM2ZTk1NTQyYTA= | statusInstall |
This sample obfuscates strings, according to the following scheme shown in Figure 7.
As an example, given the input string idbot with a randomly chosen lowercase alphabetic cipher key of length 12 (wssmnpdmydte), the Android application process applies RC4 encryption using this key onto the input string. The resulting bytes are then Base64 encoded to form printable ASCII characters (Y2Y3NGY4MzNmNg==), for lossless storage and transmission. Finally, the Android application process prepends the selected key to this result to produce the final obfuscated string, so that the reverse (i.e., decryption process) is possible.
It is common to encounter basic Base64 encoding (perhaps with a custom character set) or XOR encryption with a fixed single-byte key applied to strings in obfuscated APK samples. Alternatively, it would usually be stronger, more secure encryption – for example, a stream cipher like Rivest Cipher 4 (RC4). This is also the case here, but with a slight twist. They use key-as-prefix, symmetric key algorithms like Data Encryption Standard (DES) and Advanced Encryption Standard (AES).
Useful heuristics for identifying string obfuscation routines are to watch out for the most highly cross-referenced methods. Such methods should often take at least a string parameter input, producing a string output return value. They are often standalone methods containing loops (e.g., character-by-character iteration) and bear almost no dependency on Android Framework API functions.
By controlling the execution to restore program states and side effects at runtime, the sandbox can extract the deobfuscated configuration file named settings.xml. The Android application process stores this file in the Android application's runtime folder, under "Shared Preferences". During its initialization phase, the Android application process repeatedly rewrites this file, while it is deobfuscating its hardcoded configuration parameters with fixed values. We include an excerpt below for reference, in Figure 8.
We can quickly scale up the ability to parse and extract such information from samples belonging to this family. This will facilitate do the following:
HiddenAd is adware that aggressively displays advertisements to Android users and generates revenue for the adware author. Its functionality is mostly hidden by disguising as benign Android applications. Attackers can also use some variants of this family to deliver other packaged exploit kits, credential stealers or other malicious tools.
In this case study, we focus on a cluster of samples from this family, which perform the following activities:
To demonstrate these capabilities, we analyze the Android APK sample that has the SHA-256 hash 73dee5433d560c072ea42b2288f826b16250da6f07543b3e3387ace31a13bd7c.
Attackers consider hiding the threat’s application icon a tactically advantageous move. This is because the Android Launcher menu screen (shown in Figure 9) is where an Android user would commonly look to determine what applications they’ve installed on their device.
If one were to dig deeper, the sample entry is still evident in the application listing. Notice the "Bloons TD 6" row highlighted in red, which is the application name of this sample.
The application name originates from the android:label attribute of the <application> node in the manifest file AndroidManifest.xml. This sample is trying to masquerade as a legitimate Android game. According to the Wikipedia description of this game, "Bloons TD 6 is a 2018 tower defense game developed and published by Ninja Kiwi."
The sample encapsulates its next-stage APK and DEX payloads within an encrypted SQLite database file, embedded inside the original sample's assets/ directory. The author enabled the popular SQLCipher cryptographic extension in this SQLite database file. This automatically provides government-standard AES encryption (256-bit), operating in Cipher block chaining (CBC) mode.
The sample uses this encryption with SQLCipher 3(.5.9) default settings. A good indicator of this is the presence of the Android native library libsqlcipher.so under the original sample's lib/ directory.
In this case, the sample contains the database file muzikmp3mustafasandal.db. The cryptographic key is the hashCode of the code-signing digital signature attached to the sample. In this case, the key is the string -923130181.
This database file contains the core payloads, namely the APK sample muzikmp3mustafasandal.dat.jar, and DEX bytecode ZnWjqpRHi.dex. The table named iFqBWMAzy in the database stores these as its rows, where:
An additional feature of this sample not featured across the rest of the cluster is the way it conceals its URL string. The parts of the URL are separated to make static analysis (exact URL string pattern matching) more likely to fail. Joining the parts of the URL recovers the complete URL.
For this sample, the original URL is hxxp[://]madhavaapps[.]science/dwarkadhish/alternate148275android[.]php as shown in Figure 10 below.
Unlike the previous "Bloons TD 6" sample, another similar APK sample (SHA-256 hash 833d9669dd64a2aa009a3741c8f16612cfafc3104b1f2113ac69255b6fcabf8e) does not mimic a legitimate benign Android application. Instead, it appends itself at the end of the application listing using a blank white icon with an "empty," nonprintable Unicode (UTF-8) character U+FFF8 (\xef\xbf\xb8) label shown in Figure 11.
This sample uses SQLCipher 4(.3.0) default settings instead of SQLCipher 3(.5.9), which the previous sample we discussed used. The same method derives the cryptographic key, but because a different certificate is attached to the sample, the key changes to the String -1463079363.
The database file is named com.db, and it contains the core payloads, namely APK sample com.dat.jar, and DEX bytecode viVfyboRT.dex. They are stored as rows in a table named abaQwumOc in the database. This table consists of two columns:
We can obtain the SQLCipher database decryption key by placing a hook on the method getWritableDatabase in package net.sqlcipher.database, class SQLiteOpenHelper, to capture the String parameter (i.e., the decryption key).
Whether we’re using SQLCipher version 3 or 4, it collectively determines its associated cryptographic parameters. These cryptographic parameters are:
To gather these parameters, we can place a hook on the constructor of the package net.sqlcipher.database, class SQLiteDatabase. Placing a hook on the constructor will obtain the value of the static String field named SQLCIPHER_ANDROID_VERSION.
Deobfuscating secondary APK/DEX payloads provides the researcher with a greater depth of knowledge into the inner workings of these samples. It is common to find threat actors authoring samples in layers so that defenders cannot easily uncover their true, malicious intent. This allows them a better chance at evading security defenses, especially at the perimeter.
Adopting a strategy of layering grants attackers versatility, since they can swap in or out the modular components when the security defense measure detects them. The attacker is also able to mix-and-match components, producing a wide variety of combinations. This makes specifying and maintaining detection rules a chore for defenders.
With the fast advancement of the Android platform, threat actors are targeting a wider range of Android system versions, especially the more recent ones. To provide consistent protection for our customers, Advanced WildFire (AWF) has been using a hooking framework with related techniques to extend the supported range of Android API versions in the dynamic analysis sandbox.
We have shown that by relying on a hooking framework, a sandbox is able to expose malicious behaviors that are difficult to uncover using a static analysis approach. Identifying these malicious behaviors is critical in contributing to excellent detection quality.
If you think you might have been impacted or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:
Palo Alto Networks has shared our findings, including file samples and indicators of compromise, with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.
Category | Values |
APK samples | Cerberus
HiddenAd
|
URLs |
|
Certificate Thumbprints (SHA-1) |
|
ID | Technique | Description |
T1628.001 | Hide Artifacts: Suppress Application Icon | APK samples suppress their icon from displaying to the user in the application launcher.
This hides the fact that the sample has been installed, and can make it more difficult for the user to uninstall the application. |
T1406 | Obfuscated Files or Information | APK samples make a payload difficult to discover or analyze by encrypting or otherwise obfuscating (splitting and concatenating) its contents on the device or in transit.
This is a common behavior that attackers can use across different platforms and the network to evade defenses. Payloads may be encrypted or obfuscated to avoid detection. |
Sign up to receive the latest news, cyber threat intelligence and research from us