Python implementation of Hyperloglog, redis, fuzzy hashing for malware detection
March 16, 2013
Posted by on
I was thinking there must be a way to use Hyperloglog with fuzzy hash sets of malware rolling over blocks of 32 or 64 bytes. Stick that into a Redis cluster for persistence of objects for full analysis against rolling hashes of sample malware hash sets. There would be some error but it would give you a quick answer if a sample binary was a fuzzy match for an identified malware hash set in your datastore. You could also use this to identify copies of a “Top Secret” document on various systems. I’m having a brain block this morning on determining the inclusion and exclusion comparisons. (Either need more coffee or sleep)
Update: I guess simple lists and sets work directly out of the box for redis. No need for HLL yet. Plus I’m still stuck with trying to figure out intersections. I’m not sure what 1,000,000 item list looks like in memory for set comparisons, but I think they will get garbage collected.
Here’s my code: http://nbviewer.ipython.org/urls/raw.github.com/bigsnarfdude/bsides_vancouver_2013/master/fuzzy_hash_micro.ipynb
Determining Inclusion and Exclusion – WIP
WIP (need to build using sliding window hash aka. “rolling hash”)
>>> r = redis.Redis(...)
>>> r.set('bing', 'baz')
>>> # Use the pipeline() method to create a pipeline instance
>>> pipe = r.pipeline()
>>> # The following SET commands are buffered
>>> pipe.set('foo', 'bar')
>>> # the EXECUTE call sends all buffered commands to the server, returning
>>> # a list of responses, one for each command.