Author: Margit Hazenbroek
tl;dr
An approach to detecting suspicious TLS certificates using an incremental anomaly detection model is discussed. This model utilizes the Half-Space-Trees algorithm and provides our security operations teams (SOC) with the opportunity to detect suspicious behavior, in real-time, even when network traffic is encrypted.
The prevalence of encrypted traffic
As a company that provides Managed Network Detection & Response services an increase in the use of encrypted traffic has been observed. This trend is broadly welcome. The use of encrypted network protocols yields improved mitigation against eavesdropping. However, in an attempt to bypass security detection that relies on deep packet inspection, it is now a standard tactic for malicious actors to abuse the privacy that encryption enables. For example, when conducting malicious activity, such as command and control of an infected device, connections to the attacker controlled external domain now commonly occur using HTTPS.
The application of a range of data science techniques is now integral to identifying malicious activity that is conducted using encrypted network protocols. This blogpost expands on one such technique, how anomalous characteristics of TLS certificates can be identified using the Half Space Trees algorithm. In combination with other modelling, like the identification of an unusual JA3 hash [i], beaconing patterns [ii] or randomly generated domains [iii], effective detection logic can be created. The research described here has subsequently been further developed and added to our commercial offering. It is actively facilitating real time detection of malicious activity.
Malicious actors abuse the trust of TLS certificates
TLS certificates are a type of digital certificate, issued by a Certificate Authority (CA) certifying that they have verified the owners of the domain name which is the subject of the certificate. TLS certificates usually contain the following information:
If malicious actors want to use TLS to ensure that they appear as legitimate traffic they have to obtain a TLS certificate (Mitre, T1608.003) [v]. Malicious actors can obtain certificates in different ways, most commonly by:
The following example shows the subject name and issue name of a TLS certificate in a recent Ryuk ransomware campaign.
Subject Name:
C=US, ST=TX, L=Texas, O=lol, OU=, CN=idrivedownload[.]com
Issuer Name:
C=US, ST=TX, L=Texas, O=lol, OU=, CN=idrivedownload[.]com
Example 1. Subject and issuer fields in a TLS certificate used in Ryuk ransomware
The meaning of the attributes that can be found in the issuer name and subject name fields of TLS certificates are defined in RFC 5280 and are explained in the table below.
Attribute | |
C | Country of the entity |
S | State of province |
L | Locality |
O | Organizational name |
OU | Organizational Unit |
CN | Common Name |
Note the following characteristics that can be observed in this malicious certificate:
Compare these characteristics to the legitimate certificate used by the fox-it.com domain.
Subject Name:
C=GB, L=Manchester, O=NCC Group PLC, CN=www.nccgroup.com
Issuer Name:
C=US, O=Entrust, Inc., OU=See www.entrust.net/legal-terms, OU=(c) 2012 Entrust, Inc. - for authorized use only, CN=Entrust Certification Authority - L1K
Example 2. Subject and issuer fields in a TLS certificate used by fox-it.com
Observe the attributes in the Subject and Issuer Name. In the Subject Name is information about the owner of the certificate. In the Issuer Name is information of the CA.
Using machine learning to identify anomalous certificates
When comparing the legitimate and malicious certificate the certificate used in Ryuk ransomware “looks weird”. If humans could identify that the malicious certificate is peculiar, could machines also learn to classify such a certificate as anomalous? To explore this question a dataset of “known good” and “known bad” TLS certificates was curated. Using white-box algorithms, such as Random Forest, several features were identified that helped classify malicious certificates. For example, the number of empty attributes had a statistical relationship with how likely it was used for malicious activities.However, it was soon recognized that such an approach was problematic, there was a risk of “over-fitting” the algorithm to the training data, a situation whereby the algorithm would perform well on the training dataset but perform poorly when applied to real life data. Especially in a stream of data that evolves over time, such as network data, it is challenging to maintain high detection precision. To be effective this model needed the ability to become aware of new patterns that may present themselves outside of the small sample of training data provide; an unsupervised machine learning model which could detect anomalies in real-time was required.
An isolation-based anomaly detection approach
The Isolation Forest was the first isolation-based anomaly detection model, created by Liu et al. in 2008 (xi). The referenced paper presented an intuitive but powerful idea. Anomalous data points are rare. And a property of rarity is that the anomalous data point must be easier to isolate from the rest of the data.
From this insight the algorithm proposed computes the ease of isolating an anomaly. It achieves this by making a tree to split the data (visualization 1 includes an example of a tree structure). The more anomalous an observation is the faster an anomaly gets isolated in the tree, and the less splits in the tree are needed. Note, the Isolation Forest is an ensemble method, meaning it builds multiple trees (forest) and calculates the average amounts of splits made by the trees to isolate an anomaly (xi).
An advantage of this approach is that, in contrast to density and distance-based approaches, less computational cost is required to identify anomalies (xi, xii) whilst maintaining comparable levels of performance metrics (viii, ix).
Half-Space-Trees: Isolation-based approach for streaming data
In 2011, building on their earlier work, Tan and Liu created an isolation-based algorithm called Half-Space-Trees (HST) that utilized incremental learning techniques. HST enables an isolation-based anomaly detection approach to be applied to a stream of continuous data (xiii). The animation below demonstrates how a simple half-space-tree isolates anomalies in the window space with a tree-based structure:
Visualization 1: An example of 2-dimensional data in a window divided by two simple half-space-trees, the visualization is inspired by the original paper.
The HST is also an ensemble method, meaning it builds multiple half-space-trees. A single half-space-tree bisects the window (space) in half-spaces based on the features in the data. Every single half-space-tree does this randomly and goes on as long as the set height of the tree. The half-space-tree calculates the amount of data points per subspace and gives a mass score to that subspace (which is represented by the colors).
The subspaces where most datapoints fall in are considered high-mass subspaces, and the subspaces with low or no data points are considered low-mass subspaces. Most data points are expected to fall in high-mass subspaces because they need many more splits (i.e., a higher tree) to be isolated. The sum of the mass of all half-space-trees becomes the final anomaly score of the HST (xiii). Calculating mass is a different approach than looking at the number of splits (as conducted in the Isolation Forest). Even so, using recursive methods calculating the mass profile is maintained as a simple and fast way of computing data points in streaming data (xiii).
Moreover, the HST works with two consecutive windows; the reference window and the latest window. The HST learns the mass profile in the reference window and uses it as a reference for new incoming data in the latest window. Without going too deep into the workings of the windows, it is worth mentioning that the reference window is updated every time the latest window is full. Namely, when the latest window is full, it will override the mass profile to the reference window and the latest window is cleared so new data can come in. By updating its windows in this way, the HST is robust for evolving streaming data (xiii).
The anomaly scores output issued by HSTs falls between 0 and 1. The closer the anomaly score is to 1 the easier it was to isolate the certificate and the more likely that the certificate is anomalous. Testing HSTs on our initial collated data it was satisfied that this was a robust approach to the problem, with the Ryuk ransomware certificate repeatedly identified with an anomaly score of 0.84.
The importance of feedback loops – going from research to production
As a provider of managed cyber security services, we are fortunate to have a number of close customers who were willing to deploy the model in a controlled setting on live network traffic. In conjunction with quick feedback from human analysts on the anomaly scores that were being outputted it was possible to optimize the model to ensure that it produced sensible scoring across a wide range of environments. Having achieved credibility, the model could be more widely deployed. In an example of the economic concept of “network effects” the more environments on which the model was deployed, the more model performance has improved and proved itself adaptable to the unique environment in which it is operating.
Whilst high anomaly scores do not necessarily indicate malicious behavior, they are a measure of weirdness or novelty. Combining the anomaly scoring obtained from HSTs with other metrics or rules, derived in real-time, it has become possible to classify malicious activity with greater certainty.
Machines can learn to detect suspicious TLS certificates
An unsupervised, incremental anomaly detection model is applied in our security operations centers and now part of our commercial offerings. We would like to encourage other cyber security defenders to look at the characteristics of TLS certificates to detect malicious activities even when traffic is encrypted. Encryption does not equal invisibility and there is often (meta)data to consider. Accordingly so, it requires different approaches to search for malicious activity. Particularly as a Data Science team we found that the Half-Space-Trees is an effective and quick anomaly detector in streaming network data.
References
[i] NCC Group & Fox-IT. (2021). “Incremental Machine Learning by Example: Detecting Suspicious Activity with Zeek Data Streams, River, and JA3 Hashes.”
https://research.nccgroup.com/2021/06/14/incremental-machine-leaning-by-example-detecting-suspicious-activity-with-zeek-data-streams-river-and-ja3-hashes/
[ii] Van Luijk, R. (2020) “Hunting for beacons.” Fox-IT.
[iii] Van Luijk, R., Postma, A. (2019). “Using Anomaly Detecting to Find Malicious Domains” Fox-It < https://blog.fox-it.com/2019/06/11/using-anomaly-detection-to-find-malicious-domains/ >
[iv] https://protonmail.com/blog/tls-ssl-certificate/
[v] https://attack.mitre.org/techniques/T1608/003/
[vi] Mokbel, M. (2021). “The State of SSL/TLS Certificate Usage in Malware C&C Communications.” Trend Micro.
https://www.trendmicro.com/en_us/research/21/i/analyzing-ssl-tls-certificates-used-by-malware.html
[vii] https://sslbl.abuse.ch/statistics/
[viii] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk. (2008). “Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile”, RFC 5280, DOI 10.17487/RFC5280.
https://datatracker.ietf.org/doc/html/rfc5280
[ix] https://attack.mitre.org/software/S0446/
[x] Goody, K., Kennelly, J., Shilko, J. Elovitz, S., Bienstock, D. (2020). “Kegtap and SingleMalt with Ransomware Chaser.” FireEye.
https://www.fireeye.com/blog/jp-threat-research/2020/10/kegtap-and-singlemalt-with-a-ransomware-chaser.html
[xi] Liu, F. T. , Ting, K. M. & Zhou, Z. (2008). “Isolation Forest”. Eighth IEEE International Conference on Data Mining, pp. 413-422, doi: 10.1109/ICDM.2008.17.
https://ieeexplore.ieee.org/document/4781136
[xii] Togbe, M.U., Chabchoub, Y., Boly, A., Barry, M., Chiky, R., & Bahri, M. (2021). “Anomalies Detection Using Isolation in Concept-Drifting Data Streams.” Comput., 10, 13.
https://www.mdpi.com/2073-431X/10/1/13
[xiii] Tan, S. Ting, K. & Liu, F. (2011). “Fast Anomaly Detection for Streaming Data.” 1511-1516. 10.5591/978-1-57735-516-8/IJCAI11-254.
https://www.ijcai.org/Proceedings/11/Papers/254.pdf