Recently I have been working on a project to provide a File Intelligence API within the OpenStack ecosystem which can be used to find out information about a given file based on it’s hashes, the plan is to support multiple pluggable backends for processing files which are unknown to the system and provide a verdict of if the file is malicious and other data about the file like its type, or if it’s an image the contents of the image, or if it’s a document provide a summary of the document and so on.
As part of the process it has been imperative to choose the first pluggable backend for providing file analysis, whilst long term it is hoped there will be multiple backends developed for a range of software including commercial sandboxes and scanners it struck me as important for the first backend available to be opensource and freely available so people can experiment with the API without having to commit financially by more than a few virtual machines. Initial experimentation with ClamAV, Cuckoo Sandbox and a selection of Python libraries for discovering MIME type, header details etc… yielded that a combination of these approaches can be used to construct meaningful meta data about a given file. Cuckoo in particular is very impressive, with some minor modification it was relatively easy to get Cuckoo to run it’s executions inside of Nova and communicate using a Neutron tenant network, post execution the software analyses the artifacts collected against a bunch of community contributed signatures and suggests a score of how malicious a file is and provides enough information to categorise it somewhat based on it’s characteristics.
Lets take a look at Cuckoo, first Cuckoo was installed to a Nova instance as per the documentation at http://docs.cuckoosandbox.org/en/latest/. The Cuckoo server was configured with 2 interfaces, one on the public network and another to a tenant network where the executions would happen. The Cuckoo server acted as NAT for a tenant network configured on the second interface so the execution slaves could access the internet (via Cuckoo rooter) and none of the execution slaves had direct public network access. The execution slave was built as per the Cuckoo documentation and then imaged so further instances could be built and at the end of an execution rebuilt to a known state. Out of the box Cuckoo does not support Nova to provide virtual machines for execution for initial experientation a custom machine type was monkey patched in as a proof of concept, however long term this needs a more major rework to enable Cuckoo to autoscale executions, in it’s current form factor only one image can be used on a fixed number of Nova instances.
You can see my monkey patched code in my fork of Cuckoo here – https://github.com/robputt796/cuckoo/tree/nova_machinery. I would highly suggest if you’d like to give this a go you compare the changes to the upstream master branch of Cuckoo and rebase the machinery addition onto the current master.
Now that the Nova machinery is patched onto Cuckoo and the Cuckoo server and execution slave instances are up and running lets try running some file analysis and see what comes up. It should be mentioned the results may be highly variable and this is in no means a sure way to identify malware with high accuracy, malware commonly has various anti-sandbox techniques such as not running in a virtual machine, not running if the host has unusual usage patterns (e.g. being very new, not having commonly installed applications, or not having any signs of a physical user doing stuff on the machine), or simply just sleeping for longer than a typical analysis window. There are some steps that can be taken to alleviate these such as ensuring the host looks used, installing common software such as Microsoft Office and Adobe Reader (you’ll probably want to execute these file types anyway to see if there are any nasty payloads hidden in them so your execution slaves will probably need these installing anyway) or simply running the execution slave on a physical machine (Ironic can come in handy here). Long term it will be up to the deployer of Nemesis which pluggable backends they would like to use and how these backends are configured.
For the test samples were submitted via the Cuckoo CLI tool and then the meta data and artifacts considered in the web UI, long term the plan would be to upload the artifacts to Swift for storage and the meta data, along with meta data from other scanners and file analysis tools, to be passed to Nemesis API so it can be queried. Let’s submit some samples, first up lets send it some benign executable like PuTTY:
root@cuckoo-server:~# cuckoo submit putty.exe 'putty.exe' added as task with ID #53
Ok, so what results came back for PuTTY? Well it scored 1.2/10 on the maliciousness signature score in Cuckoo suggesting it is “potentially malicious”, lets have a look at the signatures which were picked up from the execution artifacts...
Here we can see there are a few signatures for PuTTY and some screenshots of the software running in the execution environment. The signatures are as followed:
- (information): The executable is signed.
- (warning): Potentially malicious URLs were found in the process memory dump.
- (critical): PuTTY Files, registry keys or mutexes detected.
These signatures may suprise you, of course PuTTY contains stuff related to PuTTY but why is this dangerous? It can only be assumed a fair bit of malware out there is using PuTTY for communication of some kind, and who can blame them? SSH is a good protocol, it’s encrypted and allows tunneling of traffic without setting up and complicated tunnels, however overall the signatures appear to be fairly accurate and the score seems justified. Now lets try executing something far more dangerous via Cuckoo, malware flavour of the week WannaCrypt:
root@cuckoo-server:~# cuckoo submit wannacrypt.exe 'wannacrypt.exe' added as task with ID #54
Now to checkout what Cuckoo thought of WannaCrypt. Coming in with a maliciousness score of 9/10 and the following signatures...
Ok, this has a whole load of warnings and critical signatures including installing Tor, deleting shadow copy, changing over 500 files on the system (as it encrypts them), listening on multiple ports, delaying execution, poking WMI and network interfaces and having a high level of entropy. It screams of malicious behaviour, and now for the screenshots of the execution environment running WannaCrypt.
As you can see you can learn alot from the execution of a file and this sort of analysis would be very advantageous to a file intelligence API for catching emergent threats which are not yet identified by typical AV scanners, of course with this approach there are chances of high false positive and high false negative rates so one should be wary and keep this in mind.