1. HoneyD Challenge Submission Honeycomb, an IDS Signature Generator for Honeyd Traffic ======================================================================== Author: Christian Kreibich Email: Christian.Kreibich AT cl.cam.ac.uk [ Address removed. Niels Provos ] Hi guys, here's my humble contribution to your Honeyd contest: it is a pattern detection engine for the network traffic passing through Honeyd, including a signature generator that currently outputs Snort signatures. It's called Honeycomb, as it combs the data in your honeypot for useful stuff (think of the guys in Spaceballs combing the desert :) I originally hoped to have a paperlike document ready by the time of the deadline but unfortunately ran out of time, so this email will have to make do for now. System Description ================== The basic idea is as follows: given that we're dealing with a honeypot here, we know that any traffic we see is basically not supposed to be there. It hence would be cool to have an engine that looks for patterns and anomalies in the traffic automatically, not by comparing the traffic to an existing pattern set, but by comparing the traffic to previously seen traffic itself and by performing sanity checks on packet headers etc. The system is clearly responsive in nature, but could still be really helpful to get a quick grasp of what's been happening to your honeypot on a higher abstraction level that tcpdump logs, but in more detail than the syslog messages honeyd generates. Think of a worm that hits your pot twice, or typical cgi exploits -- if the system works correctly, the entire characteristic part of those attacks should show up in a new signature. Now, why does this need to live in honeyd? Well, it could be put in an external monitor watching the traffic in and out of the honeypot, but - it saves the overhead of grabbing the packets, as honeyd already does that. - honeyd already does IP fragment reassembly - honeyd *is* a honeypot. That means, only while it's running can there be any traffic. We hence eliminate any cold start or state synchronization issues compared to an external system that can be started/stopped at times honeyd keeps running. Obviously the signatures generated aren't instantly useful in a production environment, but could nevertheless prove of great value if unseen new attacks are becoming usable by scriptkiddies and are hence attempt repeatedly. Packet Handling =============== The system basically handles packets as follows: - IP, UDP and TCP packet headers are compared to a number of previously seen ones on a header field basis. Matching fields (or also partially matching ones like IP address ranges) are reported. - TCP stream reassembly is performed and packet payload is mined for similar content. The system maintains for a number of recent TCP connections and keeps the messages exchanged available. By message I mean payload data sent in one direction without real data (other than ACKs) flowing in the other direction. Think HTTP request/answer for example. The system then investigates matching TCP messages using a longest common substring algorithm, finding the largest match in the payload possible, and adding it to a new signature. System Design ============= In order to keep the impact on the honeyd code at a minimum, I've extended honeyd 0.5 by adding a plugin engine and pattern inspection hooks, which only required very few real changes to the code. Honeycomb itself is a plugin and doesn't interfere with honeyd at all. My contribution contains the follwing: - honeyd-hooks-0.5.0.tar.gz This is my modified version of honeyd-0.5. Look at plugins.[ch] and hooks.[ch] for the stuff I've added. I've also done pretty thorough cleanups of the automake/autoconf build stuff as it didn't work very well on my system (eg the libhoneyd hack broke as soon as a different libtool was used, libdnet is not detected as dnet is dumbnet on Debian etc). It does contain the two patches mentioned on the website. I hope I didn't break anything. A more detailed list of changes is at the bottom of this mail. - honeycomb-0.1.tar.gz My plugin. Install the modified honeyd first, and then make sure this plugin's configure script picks it up, use --with-honeyd=blah if necessary. - libstree-0.1.0.tar.gz This is a generic suffix tree implementation providing a longest common substring algorithm implmentation. It's probably not particularly valuable for this competition but took the longest time to implement :) You need it installed for honeycomb to build. I have not been able to test the builds on many systems, in particular not on BSDs yet, sorry. I hope things won't be hard to get working. I have built the thing on a Debian Linux system. There was generally not much time for testing (I basically finished the code two hours ago) so there will be bugs left. Sorry. There is however ample documentation in the code ... Feedback is appreciated. If you have any questions regarding why things don't work etc just let me know. I'll be offline for a few days but will get back to you next week. How to play with it =================== Basically 1. Install the patched honeyd. 2. Install libstree. 3. Install the Honeycomb plugin. 4. Run honeyd. You should see a message that the plugin got picked up. 5. Watch the signatures that appear in /tmp/honeycomb.log Configuration should be through a config file but that's not there yet. Look at the values in honeycomb.h to see what can be tweaked. TODO list ========= I think this thing clearly deserves more time than the three or so weeks I had to implement everything. In particular, one problem now is that the longest common substrings found aren't necessarily part of the area of the payload that contains the relevant data. The approach of analyzing packet payload should also incorporate the protocol we're dealing with, e.g. HTTP etc. The longest common substring algorithm in libstree is flexible enough to give you multiple longest strings, only those up to a certain length etc, so there's some room for experimenting. Another thing is a better mechanism for "accepting" generated signatures. Right now a new signature is printed out once it is different to all those previously printed (up to a certain number). Features here could be a minimum number of features included, for example. ICMP support is not yet there. Honeyd Changes: =============== More detailed list of honeyd changes: - Fixed configure check to honour dumbnet.h - Added dnet.h compatibility wrapper to compat/ directory - Fixed configure.in so that it correctly finds /usr/bin/dnet-conf by default. - My build sometimes aborted saying that before I could use "CFLAGS += ...", CFLAGS must be defined somewhere. I removed the + as it wasn't used in my case. - included grp.h in command.c to fix warning - changed a few NULLs to 0s to fix warnings - Added an option -o to use own packets -- I found that useful to test my honeyd on a standalone laptop, sending data to my vmnet1. - Switched to getopt_long and added --plugin-dir to display the directory used for plugins. - hooks.[ch]: A simple list-based implementation of hook implementations for packet data. Users can register hooks on a per-protocol basis for the various IP_PROTO_xxx constants. - plugins.[ch]: Added ltdl support to dynamically load in plugins installed in $(datadir)/honeyd/plugins. Makefile.am passes that value as a #define to each file as PATH_HONEYDPLUGINS. - Revamped the help output to be a bit clearer. - Added libltdl to provide plugin support.