Adobe Systems has released a malware classification tool in order to help security incident first responders, malware analysts and security researchers more easily identify malicious binary files. So I downloaded the python script to look at what all the fuss was about on reddit. Dragged the code into wordle and isDirty stands out. I wasn’t entirely familiar with J48 and after some hunting I found out it was a decision tree.
Wikipedia says…a decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. If in practice decisions have to be taken online with no recall under incomplete knowledge, a decision tree should be paralleled by aProbability model as a best choice model or online selection model algorithm. Another use of decision trees is as a descriptive means for calculating conditional probabilities.
I still wasn’t satisfied because I really didn’t understand how the script really worked. I ended up finding this awesome book with d’uh a simple Google query “data mining malware”. Suffice to say I will have some reading in the next couple of days. Screenshots below from book: Data Mining Tools for Malware Detection.