A few days ago I posted a very specific question on Twitter and Mastodon:
You’ve got gazillion of random yara rules stored inside many random .yar files scattered around many folders. What do you use to read them all, remove duplicates, ensure all rule names are unique, and all the unique rules end up in a ‘merged’ final .yar file (or files)? I am aware of these projects & gists:
https://github.com/plyara/plyara
https://github.com/lsoumille/Yara_Merger
https://gist.github.com/Neo23x0/577926e34183b4cedd76aa33f6e4dfa3
https://gist.github.com/Neo23x0/81990b8e5eb351a118dca1d5f2a2a86b
https://gist.github.com/notareverser/7
I got 2 interesting answers:
Thanks AllenSwackhamer and bmmaloney97!
Still, I wanted something simpler. I just want to build a single, ‘megalopolis’ type of yara file that includes all yara rules I have ever saved.
I am a hoarder, so anytime I come across some interesting code (f.ex. c, idapython, idc, etc.), signatures and rules (flirt, yara, capa, etc.), file formats, compression, exploitation bits, bobs and PoCs, info on new attacks, any info really posted on social media, web sites, advertised via rss feeds, whatever, I just bookmark it, or download it, and I don’t really spend much time categorizing, deduplicating and organizing it. Despite many attempts over the years to make it ‘easier on me’ I always end up having it stored all over the place. I really wish I was more Marie Kondo, but it’s a mess.
I literally have a pile of different yara rules collected over last 8+ years, many of them written by me of course, all scattered across many folders, and with my aforementioned question on social media I simply wanted to achieve one thing: walk through all of these yara files, deduplicate, remove all the poor quality rules (f.ex. many PEiD rules), remove all complex rules (f.ex. where one rule depends on another), and also remove any rules related to Android malware, because I really don’t have much interest in this topic.
Many of the approaches presented by very mature projects and gists listed above focus on yara rules seen from a source code perspective. That is, you can use existing libraries to parse these yara rules, maybe calculate hashes of their bodies, and do a lot of interesting things. But then again, I wanted something simpler.
So I devised a cunning plan aka THE YARA PAGEANT ALGORITHM:
import "console" import "elf" import "hash" import "math" import "pe" import "dotnet"
This ensures all the module dependencies are resolved (except for androguard, but I ignored it ‘by design’)
That’s it.
Going forward, we simply run ‘yara -w -C <compiled mega yara file> <malware file>’ to have all these rules applied to the target file. If you have many yara files in your ‘mega’ pack you may see rules hitting on file properties, features, and if you are lucky – specific TA or malware family may sometimes hit too. It helps to use ‘-s’ argument so see the exact strings that are extracted from the sample that hit the rule so you can quickly tweak the ‘mega’ source file, recompile and avoid FPs in next runs.
I wish I could share the source code of my script and commands doing all the stuff I described above, or even my own mega yara pack. But I can’t. It’s a spaghetti code, some of the rules are super private, and in the end, your needs may be different from mine. Still, nothing can stop you from starting your own Yara pageant today…