And I'm not even joking, for example last ~24h I fuzzed one pdf reader software only with ONE instance on ONE machine and the result was bit less then 300 crashes. In this kind of situation it is really impossible to do preliminary exploitability analysis manually (at least I don't have that time) so only possibility is to train the fuzzer to do some of the analysis automatically so I can easily put aside 98%-99% of the crashes that are not unique or don't have potential.
The filtering and sorting that my fuzzer does, is most well built out in Windows environment, where I use winappdbg library to get all the info I need. In Linux I wrote wrapper for gdb and in OSX I rely on it's own crash reporter application and read data out of it's logs (should but lot more effort to last two). So I will use Windows to describe my logic:
The sorting is built up as a directory tree:
- level: Close to NULL, Not close to NULL, bit both*
- level: Type of the issue (write, read, read from IP, unknown, heap corruption etc)
- level: Location of the crash (labeled if possible, otherwise last 2 bytes of the addres in HEX - because ASLR)
- level: Last 2 bytes (because ASLR) in hex from stack trace last 8-10 addresses (something like "34FC_322D_31FD_411A_3CC3_3CC3_3108_31DB")
- level: The crash files themselves with additional txt file that contains all the cool crash information
After lot of other ideas and tests, I chose this structure because it's both easy to quickly look at and most of the times it's enough to seperate crashes that happen in same place but have different original reasons. For example I can quickly look to "Not close to NULL" directory and see what type of stuff is there. If after day of fuzzing there is a "write" directory inside, then this makes me already happy because it gives hope for heap overflow or other memory corruption type of issue/-s. If I go into that directory, I can get quick look of all the places that have caused incorrect memory write. If I go another level deeper from that, then I can see how many different stack paths were taken to any of them. And finally of course I have the files that caused the crash and txt files that have technical crash information.
Writing the code to do this kind of filtering is much easier then I thought at the beginning - even on Linux I can pipe in commands to GDB and pipe out the results, do some string analysis and get the data. In windows with WinAppDbg or PyDbgEng it's trivial and can be achieved with couple of hours of work (even by someone like me, for who Python is not second language).
*You might ask "How can crash happen in both near Null and not near Null" - well, with some applications I ran into situation where some crashes happened only when sun was exactly in right spot in the sky and never when I later opened the files that were reported by the fuzzer. My solution was to re-test every crash automatically right after first detection. And in some cases the crash happened during the re-tests also but in different location. And sometimes some of these locations were near null and others were not. This is the situation where fuzzer gives them status "Bit both".