How I Learned to Stop Worrying and Love the Bot
An insight into running a honeypot, and sharing the statistics and insights gained with a honeypot.
Hello world. Lets talk about honeypots, and the helpful insights they give into the state of botting on the internet.
But first, what is a honeypot?
Honeypots, explained
A honeypot (or honeytrap); is essentially a decoy computer or container that you run to trap and track attackers. It can be a real machine, or a fake one designed to emulate a real one. Often honeypots are made to look appealing by appearing as though they run exploitable software. The purpose of a honeypot is to gain an insight into what attackers would do, what they would target, and what they would exploit to gain access. This better equips you to understand what methods attackers use to gain access, and enables you to better defend from future attacks.
An example of a honeypot, is the Citrix Honeypot. The Citrix Honeypot aims to emulate a Citrix Gateway that is vulnerable to CVE-2019-19781. This vulnerability would allow an unauthenticated user to perform remote code execution on a Citrix Gateway (among other Citrix products). This is incredibly appealing to attackers, as if you can perform remote code execution, you can compromise firewalls, install cryptocoin miners, release malware and more. It's bad, and the Citrix vulnerability was even worse because an unauthenticated user could do it. That means that anyone could do it.
The Citrix honeypot is useful to people running Citrix systems, as they can safely[1] run the honeypot without risking their real infrastructure, and with the data gathered (stuff like attacker's IP addresses), they can strengthen the security of their real infrastructure by IP banning the attackers. This might not stop them in their tracks, but it would certainly slow them down.
The majority of attacks performed on the internet aren't done by hackers in some dimly lit basement somewhere. Most of them are done by automated bots, scanning the internet for exposed ports and running scripts to break in. These kinds of attacks are frequent, and annoying.
Honeypots, especially when used alongside monitoring and automated banning software; provide an excellent way to filter out 90% of these automated attacks; significantly reducing the attacks on your real systems.
Enter Tpot
Because I am a nerd and I love seeing bots attempt to break in, I set up a honeypot using the open source project Tpot. Tpot is special, in that it is a preconfigured collection of honeypots listening on a number of different ports to simulate a range of different services. This casts a very wide net to catch all manner of bots.
Tpot utilizes the sandboxing capabilities of docker to make it easy to upgrade, and safer[1] than running things bare metal. The best thing about tpot is that it aggregates the data from all the running services and presents the data in various forms using kibana.
To get this running, I created a VM inside my oVirt cluster and installed tpot using the ISO. I didn't want to expose my residential IP address to the internet, so instead I launched a VPS on DigitalOcean running VyOS, and setup a wireguard link on my firewalled netork. All ports are then forwarded from the VPS to the virtual machine, exposing it directly to the internet. This gives the highest opportunity for bots and attackers to fall into my trap.
I've been running this publicly for a few weeks now and not noticed any issues so far, so lets hope for the best. Now, lets discuss the things learnt from running a public honeypot.
Insights Gained
Lets take a look at the statistics. Over a 15 day period, there were a total of 2,768,090 Attacks from 7,423 different IP addresses. That's Two attacks, per second, of every hour, of every day.
The most popular honeypot by far, was Dionaea. This honeypot aims to emulate the popular network file sharing[2] protocol Samba (SMB), among other things. 95% of the attacks on Dionaea were targeting this protocol.
Why is the SMB protocol so popular? Well, it's tightly integrated into Windows operating systems, and has been a core component for a very long time. There are a huge number of security vulnerabilities in different versions of SMB; and it's enticing to attackers, as these vulnerabilities could easily grant an attacker with root (administrator) level access to a windows machine.
A notable example of SMB being exploited, is the famous WannaCry series of attacks. These attacks exploited a vulnerability in the SMB protocol, which allowed an unauthenticated user to execute code on a windows machine. This code would infect the machine with the worm, and spread to other machines on the networks. Did I mention it encrypted all the files on the host, demanding a ransom for the files to be decrypted? It's a trivial attack, and an easy payday for cybercriminals looking to get some bitcoin. That's probably why it's so popular.
My suggestion? Don't ever, ever expose SMB to the internet.
[2] SMB covers more than just filesharing; but that's beyond the scope of this article.
Next up, is Cowrie. This honeypot aims to emulate both SSH and Telnet. Both of these are incredibly popular protocols, used for establishing a commandline shell over the internet.
I won't be addressing Telnet much, as it's not as popular nowadays; and it's advised to just use SSH instead[3]
SSH is one of the most well established tools for initiating a remote connection over the internet. It's also fairly easy to misconfigure to make it vulnerable. The problem with misconfigured SSH servers, is that if you get in; you can do a lot. Often you need to bruteforce passwords to get in, but once you are, game over.
The Cowrie honeypot was dealt over 290,430 attacks over the 15 day observation period. The most common attack being on the SSH implementation. admin
and root
were the most common usernames attempted; with 16,000 and 10,200 attacks respectively. The most common passwords were 1234
, admin
, root
, user
and support
. Basic passwords really.
What does this mean for a server operator though?
It means, that when you have SSH exposed to the internet; you need to be careful. It won't take long for bots to begin attacking you, and you cannot use easy to guess passwords if you care about that machine not being compromised.
Another alternative to this, and one I highly suggest, is to forgo passwords altogether and switch to key based authentication. This will stop 99% of attacks outright.
There are a more honeypots to cover than just these two, but I don't want to make this post too long. If there's significant interest I'll do some more writeups covering some of the more unique honeypots.
Now what?
Now, I have a fairly large collection of data (50GB+ and growing). What can I do with it?
For a start I can feed the IP addresses collected from tpot, and send them to my frontline loadbalancers to drop connections from attackers. This is a work in progress, but I will be following on from this post when it is implemented.
In addition, the data collected from tpot is automatically submitted to sicherheitstacho. The aim of this project is to build a realtime visualization of attacks going on around the world.
I've also made the data public. You can take a look at the dashboards here. Give it a play! There's a lot that I didn't cover here.
Thanks.