In an earlier diary [1], I reviewed how using tools like DBSCAN [2] can be useful to group similar data. I used DBSCAN to try and group similar commands submitted to Cowrie [3] and URL paths submitted to the DShield web honeypot [4]. DBSCAN was very helpful to group similar commands, but it was also very useful when trying to determine whether commands from one honeypot were seen in another. How much overlap in attack data is there between honeypots? Is there any targeting based on the hosting location of the honeypot?
Once the data is separated into clusters and the appropriate EPS and Minsample values are selected, comparing the data in a table can help hightlight differences..
# query to pull out cluster with minsample=3, EPS=0.5
select sum(AWS), sum(Azure), sum("Digital Ocean"), sum(GCP), sum(Residential), sum(AWS + Azure + "Digital Ocean" + GCP + Residential) as total,
count(input) , "cluster-EPS(0.5)-MINS(3)" from commands group by "cluster-EPS(0.5)-MINS(3)";
Figure 1: Cluster showing number of similar commands run on honeypots, highlighting gaps in other honeypot reports.
Looking at cluster 9 can give us more details on what may have been different.
# select command data from cluster 9
select input, sum(AWS), sum(Azure), sum("Digital Ocean"), sum(GCP), sum(Residential), sum(AWS + Azure + "Digital Ocean" + GCP + Residential)
as total from commands where "cluster-EPS(0.5)-MINS(3)"=9;
# input value seen
apt update && apt install sudo curl -y && sudo useradd -m -p $(openssl passwd -1 233QPpqY) system && sudo usermod -aG sudo system
Going back to the source data, more information can be seen about the particular commands.
Figure 2: Commands that only showed up from the Azure honeypot.
FIgure 3: Differences between the commands is the password used
The original goal was to group similar commands, which worked well in this instance. The commands were all the same, except for the password used. In addition, the comparison of the cluster data helped demonstrate which information from one honeypot may not be in another.
Within the command clusters, there were other differences noted in my dataset:
Command | Honeypot |
---|---|
apt update && apt install sudo curl -y && sudo useradd -m -p $(openssl passwd -1 233QPpqY) system && sudo usermod -aG sudo system |
Azure |
echo "Dolphinscheduler@2022\nkeSBVXNe9Y9k\nkeSBVXNe9Y9k\n"|passwd |
Digital Ocean |
lscpu && echo -e "CRN63r9D\nCRN63r9D" | passwd && curl https://ipinfo.io/org --insecure -s && free -h && apt |
Azure |
openssl passwd -1 233QPpqY |
Azure |
Figure 4: Commands or types of commands seen that were unique to a honeypot.
Some of the commands above are more unqiue due to the general commands used. In other examples, such as the second example from Digital Ocean, the password used is more of an outlier than others. There were many instances of "Dolphinscheduler@2022" that were not seen within other honeypots, although the general command was seen within other honeypots with different passwords used. The length of this password helped highlight this particular command as being unique, but it did generate its own cluster with different passwords that were similarly formatted.
Figure 5: Examples of "Dolphinscheduler@2022" seen within Digital Ocean honeypot commands.
These honeypot comparisons were part of a research project for a Master's in Cyber Security Degree with SANS.edu [5]. The custom package created to extract and compare local honeypot data is available on GitHub [6]. This includes how the cluster features were created and used [7].
[1] https://isc.sans.edu/diary/Finding+Honeypot+Data+Clusters+Using+DBSCAN+Part+1/31050
[2] https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
[3] https://github.com/cowrie/cowrie
[4] https://isc.sans.edu/honeypot.html
[5] https://www.sans.edu/cyber-security-programs/masters-degree/
[6] https://github.com/jslagrew/dshield-parser/tree/main/examples
[7] https://github.com/jslagrew/dshield-parser/blob/main/examples/url-command-clustering.py
--
Jesse La Grew
Handler