[This is a Guest Diary by Justin Leibach, an ISC intern as a part of the SANS.edu BACS [1] degree program]
The web logs from my DShield [2] honeypot always produce the most interesting information. I enjoy being “on keyboard” and using command line tools to view, sort, filter and manipulate the .json files, it is weirdly satisfying. Pounding away on my mechanical keyboard is effective for a single log file, but when I started to zoom out a bit and began looking at multiple days, weeks, even months of information, it became clear there must be a better way.
If you are busy, and need a TLDR, here it is: My GitHub [3]
Log files combined: 31
Lines of JSON parsed: 163,510,310
Countries of origin: 76
Script running time: 7 minutes
Steps from start to finish: 4
My outputs from running the scripts:
Figure 3: Screenshot capturing running the script and the printed output of a Python Dictionary holding all countries and occurrences.
Before I started solving the problem, I needed to define it:
I want to know what countries are probing my honeypot the most, over time, to develop a better threat intelligence picture.
Here are the major things I thought I needed to do:
Step | Description | Initial Tool Attempt | Final Tool Used |
---|---|---|---|
1 | Combine JSON Files | Python | BASH |
2 | Filter JSON Files | Python | BASH |
3 | Search a database of WHOIS, return country name | Python | Python |
4 | Produce graphics for quick analysis | Python | Python |
Ideally, it is also a tool that is highly portable and easily modified for other users, even a very new analyst.
To solve this problem, I knew I wanted to use Python, but sometimes Bash proves the more elegant solution.
Python:
killed (program exited with code: 137)
, a memory error.geoip2.database
module.matplotlib
and pillows
modulepandas
and io
modules.Bourne Again Shell (BASH):
BASH scripting is highly portable, and surprisingly powerful. It is usable on any *NIX machine, MACOS, or Windows with either WSL or a terminal emulator like Git Bash.
jq
for
” loops.Moral of the story here: Use the right tool for the job.
The script I used to combine the files can be found at GitHub [4].
It takes two user inputs.
The script creates an array with all the file paths, then iterates through that array using the jq utility to append to a single file. The real work in the code is happening here:
#!/bin/bash
# Debugging output to check the contents of file_list
echo "Files in file_list:"
for f in "${file_list[@]}"; do
echo "$f"
done
echo "This may take a while"
# iterate through each file in the file list array and combine it into a single json file using jq.
for file in "${file_list[@]}"; do
jq -s '.' $file >> $temp
done
# command to remove the all arrays in the JSON file. Necessary to properly filter with jq
cat $temp | jq '.[]' > $filename
Through much trial and error, I found that the best way for python to look through the information was to give it a .csv file. So, I created two BASH scripts to do just that.
The first script filters everything and outputs a .csv file, the only problem with this was the sheer amount of Amazon Web Services (AWS) interactions with my AWS hosted honeypot on AWS LightSail. The second script is an improvement, and more versatile for use in any environment. You can run either.
The second script is a bit more complex. It takes an input of IP addresses and assigns them to a variable. Then the hard part, using sed to search and edit, then using jq to filter and tr to transform that output before finally writing to a user designated filename in the current directory. The bulk of the work in the code is here:
# from user input, create variable called $filepath
echo -n "Enter absolute path to JSON file to create new csv for source ip: "
read jsonfilepath
# from user input, create variable called $filename to be used later in writing final merged file
echo -n "What do you want you want your new JSON file to be called (please add .csv)?: "
read csvfilename
# from user inpute, create a variable to exclude ip addresses from the CSV.
# In my case I will exclude any IP address associated with Amazon services, as I am using an AWS honeypot as my source
# Example jq for figuring out top offenders that you would likely exclude: cat Merged.json| jq '.sip' | sort | uniq -c | sort
echo -n "Enter all IP addresses you wish to exclude here, separated by a space: "
echo -e "\nExample: 192.168.1.1 192.168.1.2 192.168.1.3"
read IP_Addresses
# Convert the IP addresses into a jq filter
jq_filter=$(printf 'select(.sip and (%s)) | [.sip] | @csv' "$(echo $IP_Addresses | sed 's/ /" and .sip != "/g' | sed 's/^/.sip != "/' | sed 's/$/"/')")
# Apply the jq filter to the JSON data and format the output
echo "This will take a little while"
cat $jsonfilepath | jq -r "$jq_filter" | tr ',' '\n' > $csvfilename
echo "$csvfilename created in current directory"
My original thought was to use an API for country lookups. This could provide valuable information for use in reporting, and there are numerous services that offer this to varying degrees.
One of those is https://ipgeolocation.io/ [7]. This seemed like the right choice. They have a free lookup, for up to 30k requests per month. But then I went back to my endgame…it wasn’t portable enough.
So instead I went with something I learned about in my SEC 573 SANS Class [8] authored by Mark Bagget; The geiop2 python module supported by MAXMIND, and the GEOIP local database. There are a few options on configuration, and it does require a free account. Follow these instructions (and my script) for easy setup: https://dev.maxmind.com/geoip/updating-databases [9].
Importantly, it requires some “C” extensions to be installed to greatly increase the performance of database lookups. This is a suitable time to introduce the setup script that can help initial setup. It can be found GitHub [10].
This is the BASH script (with liberal comments):
#! /bin/bash
# Script to set up system with required dependencies
sudo add-apt-repository ppa:maxmind/ppa
sudo apt update
# this installs a "C" extension to increase performance of database lookups
sudo apt install libmaxminddb0 libmaxminddb-dev mmdb-bin
# Consider installing all of this in a VENV (virtual Environment) if running certain distros of linux.
sudo apt install python3-pip
sudo apt update
python3 -m pip install numpy geoip2 pandas openpyxl pillow
#### follow instructions here: https://dev.maxmind.com/geoip/geolocate-an-ip/databases/#1-install-the-geoip2-client-library ####
sudo apt install geoipupdate
###create free account @ https://www.maxmind.com/en/geolite2/signup ###
: 'follow instructions in email
- follow instructions here: https://dev.maxmind.com/geoip/updating-databases
- can set up cronjob
- to run had to designate the config file when running:'
#### sudo geoipupdate -f /usr/local/etc/GeoIP.conf -v ####
The script itself can be found GitHub [11].
This python script takes input from the user for the location of the .csv file and uses functions to: open the file, create a one-time reader for the geoip database, strip the .csv files, lookup countries in the database, create a list of countries, count the countries in the list, remove duplicate countries adding to a list, create a dictionary of “Country:Count”, take input for “N” amount for the most common countries, create a pie chart of the “N” amount, export the pie chart as .png, create a dataframe from dictionary data, write dataframe to Excel.
This script requires these libraries, installed in the setup script via PIP:
The script itself has liberal comments with the hope that any user can edit for bespoke use.
To change the number of countries used for percentage calculation and subsequent labeling in the pie-chart, simply edit the script at the “N” number found at line 64:
# return top N value in the dict
def top_countries(data_dict):
n = 6
counter = Counter(data_dict)
top_n_countries = dict(counter.most_common(n))
return top_n_countries
Figure 6: N changed to 3 | Figure 7: N changed to 10. |
This is the linear path from start to finish (also outlined in GitHub Readme File [12] ):
*Times based on my system files and will vary based on file sizes, internet connection and system configuration
The tools described above met their original goal, but there is always room for improvements. Finding country of origin is great, but what if I want to know more?
As is, this tool can easily be paired with a single, or multiple honeypots set up in an enterprise environment to gather data and inform decision makers in your organization.
In a small, under-resourced, restrictive or immature organization, this free and open-source tool could be a quick win for a blue-teamer to improve the threat intelligence picture and bolster enterprise security through improved Firewall rules and IP address range blacklisting.
I hope this is helpful, and happy defending!
References and links:
[1] https://www.sans.edu/cyber-security-programs/bachelors-degree/
[2] https://github.com/DShield-ISC
[3] https://github.com/justin-leibach/JSON-Log-Country
[4] https://github.com/justin-leibach/JSON-Log-Country/blob/main/combine-JSON.sh
[5] https://github.com/justin-leibach/JSON-Log-Country/blob/main/filter.sh
[6] https://github.com/justin-leibach/JSON-Log-Country/blob/main/filter-with-exclude.sh
[7] https://ipgeolocation.io/
[8] https://www.sans.org/cyber-security-courses/automating-information-security-with-python/
[9] https://dev.maxmind.com/geoip/updating-databases
[10] https://github.com/justin-leibach/JSON-Log-Country/blob/main/setup.sh
[11] https://github.com/justin-leibach/JSON-Log-Country/blob/main/CountryLookup.py
[12] https://github.com/justin-leibach/JSON-Log-Country/blob/main/README.md
--
Jesse La Grew
Handler