I wanted to learn a bit more about data engineering, databases, app building, managing systems, and so on so I decided to work on a small honeypot network as a project. I was partially inspired by Greynoise and AbuseIPDB, I use both of those a lot. I wanted to get this project done in about a week so this is a small project which isn't too scalable. I ended up learning things so it's fine.
- Use Suricata to see what type of signatures are triggered based on the incoming traffic from the internet
- Save all the Suricata logs to disk in a central place so I can go back and search all the data or reingest the data.
- Send logs to Humio for searching, dashboarding, and potentially alerting purposes
- Have a webapp for searching for an IP
-- Webapp should show the signatures the IP has triggered, first time the IP was seen, last time the IP was seen, and number of times it was seen triggering signatures.
- Sensors & databases are hosted on Vultr w/ Ubuntu
- Obviously Suricata for detecting attack attempt type
- Inetsim - this is not the best (i'm letting the attackers know I'm not running any real services, it's just inetsim, assuming attackers manually go look at the scan results) but it'll do for this project
- Zerotier - all sensors are connected to a zerotier network, it just makes networking, moving data around, and management easier
- Vector.dev - I'm using vector.dev to move data around
- Humio - it's for log storage and search, just like ELK or Splunk
- rinetd - I'm actually not running inetsim on all the sensors, I'm just forwarding all the traffic from sensors to one host running inetsim (it's good enough for this project)
- Redis - pubsub. I'm putting alerts into redis and letting python grab them and put the data in postgresql
- Postgresql - to store malicious IP, signature, and timestamp
- Appsmith - to make webui app (usually i'd use flask...)
Network kinda looks like this w/ Zerotier:
Sensors are exposed to the internet, servers aren't. rinetd takes in sensor traffic from the internet and forwards it to inetsim. inetsim is bound to zerotier IP address.
The flow for logs kinda looks like this:
Vector on the ingest server does multiple things. It'll save data to disk, send the data to humio, the alerts will get geoip info added, then it'll go to redis, python will ingest data from redis then put it into postgres.
postgres stores malicious IP, suricata signature, and timestamp.
I used AppSmith for the webapp. AppSmith allows you to build a webapp and connect it to integrations it supports with little to no coding.
For webapp, I just have an input field and some queries running based on the input. It looks like this:
What would I do different if I had more time and resources:
- I'd probably setup a more realistic honeypots or have honeypot profiles
- Put honeypot software on the sensor itself instead of doing rinetd
- Ship logs through the internet (not zerotier)
- Do geoip enrichment on the sensor itself
- Store alert data in opensearch or some cloud hosted database that I don't have to maintain?
- Add health monitoring for sensor, pipeline, etc..
- Better deployment and update (of software and suricata signatures) potentially through ansible?
There are probably many other things that can be done differently or more efficiently.
Resources/links: