Sometimes, you’ve to quickly investigate a webserver logs for potential malicious activity. If you're lucky, logs are already indexed in real-time in a log management solution and you can automatically launch some hunting queries. If that's not the case, you can download all logs on a local system or a cloud instance and index them manually. But it's not always the easiest/fastest way due to the amount of data to process.
These days, I'm always trying to process data as close as possible of their location/source and only download the investigation results. So you reduce the bandwidth usage, and local resources (memory, CPU, ...)
I had to analyze a huge set of Apache logs (the current one included all the archived ones - for 1 year) and used the following solution: mal2csv[1] (Malformed Access Logs to CSV). As the name says, the main purpose of this tool is to convert an Apache access log into a CSV file (easier to process in some cases) but it has two interesting extra features:
Interesting log entries are stored in separate files for further review.
On the web server, Docker was available. To perform my forensic analysis, I created a Docker image to not pollute the server with extra tools (and deleted after the processing). Simple config:
FROM ubuntu:latest LABEL maintainer="Xavier Mertens <[email protected]>" RUN apt update && \ apt install -y git python3 WORKDIR /opt RUN git clone https://github.com/RandomRhythm/mal2csv.git WORKDIR /opt/mal2csv ENTRYPOINT ["python3", "./mal2csv.py"]
Once the image is built, access log files can be analyzed like this (if they are located in a default location for Apache):
# mkdir /var/tmp/results # for F in /var/log/apache2/access.log* do zcat -f $F >/var/tmp/results/$(basename $F).txt docker run -it --rm -v /var/tmp/results:/data mal2csv:1.0 -i /data/$(basename $F).txt -o /data/$(basename $F).txt -d -l -p -r -f done
This loop will process all access.log files one by one, and extract them in /var/tmp/results. For every log, 3 files will be created. Example:
-rw-r--r-- 1 root root 20488876 Mar 28 15:33 access.log.txtLogOutput.Formatted -rw-r--r-- 1 root root 880986 Mar 28 15:33 access.log.txtLogOutput.Formatted.IDS -rw-r--r-- 1 root root 1418806 Mar 28 15:33 access.log.txtLogOutput.Formatted.interesting
The "Output.Formatted" file will contain all events converted in CSV. The two others are more interesting:
The "Formatted.IDS" file will contain a listing of events that match PHPIDS rules:
"24","Detects basic obfuscated JavaScript script injections","GET /config/.env HTTP/1.1" "35","Detects common comment types","GET /phpMyAdmin+++---/index.php HTTP/1.1" "20","Detects JavaScript language constructs","GET /index.php?s=/Index/\\think\\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21 HTTP/1.1" "8","Detects self-executing JavaScript functions","GET /?a=fetch&content=<php>die(@md5(HelloThinkCMF))</php> HTTP/1.1"
The "Formatted.Interesting" file will contain the original events that match a PHPIDS rule. Now, you know where to put more effort in your investigations.
Pretty straightforward to perform a quick first analysis of your logs! Note that mal2csv can also process Microsoft IIS logs (use the "-m" command line switch) and the detection rules are located in two files:
Easy to maintain them to add your own rules!
[1] https://github.com/RandomRhythm/mal2csv
[2] https://github.com/PHPIDS/PHPIDS
Xavier Mertens (@xme)
Xameco
Senior ISC Handler - Freelance Cyber Security Consultant
PGP Key