Over the past few months, I’ve been playing with a new static analysis tool from Mozilla called Dehydra.
Dehydra is a GCC plugin that allows you to write Javascript that can perform queries on the Abstract Syntax Tree (AST) that GCC generates from source files. This lets you write a script that can notify you when it sees any type of code construct that you can describe in script.
There are a number of code constructs that might be interesting to a code auditor, for example:
- Calls to asnprintf, malloc, or calloc with unchecked return values.
- Assignment operations where the datatype of the Left Hand Side is signed and the Right Hand Side is unsigned, or vice versa.
- Assignment operations where the datatypes of both sides have different bit-lengths.
The possibilities are much greater than my short list of examples!
I will be the first to admit that static analysis has its faults. For one thing, it has been proven that static analysis cannot discover all possible bugs in any given program. Commercial static analysis tools, such as Coverity, are expensive and have not proven to be a particularly effective method of finding bugs by themselves. I have heard many accounts of nasty bugs discovered by code auditors when looking through source code routinely scanned by Coverity.
That said, on Day One of a code audit, 4 out of 5 code auditors find themselves reaching for Grep.
Grep is great, it lets you search for regular expressions across many files very quickly, but Grep has no awareness of the syntax of the C++ programming language. I’m really more interested in searching for specific code constructs and less interested in searching for substrings, which is Grep’s purpose.
When looking for vulnerabilities, I’m not interested in searching for the string “malloc”. What I really want to know is more along the lines of “Where are all the calls to malloc where the return value is not checked”. I don’t want to know all the locations of the string “int” as much as I want to know every location that a variable of type int is implicitly cast to an unsigned int when passed in as a function argument.
This is the great thing about Dehydra. It lets you query the parsed syntax tree of C++ source code and ask the kinds of questions that can’t be easily answered by Grep.
Scripts for Dehydra are written in Javascript by way of the SpiderMonkey engine. Javascript is a nice, small language that is good for operations on tree-like data structures. In a browser, this would mean the DOM, but in GCC this means the AST!
Dehydra is still in development, but the developers have been extremely responsive to feature requests from security auditors ( well, mine anyway… *grin* ).
It would be great to see a bunch of people contribute scripts and build a big set of security scanning scripts to replace the venerable regular-expression-based FlawFinder as the king of no-budget security-oriented static analysis.
Try it out and get back to me.
Setup and Installation Instructions for Dehydra on Linux or OSX
I’ve included a sample Dehydra script below that logs a message anytime it sees certain assignment operations.
The full sample script, along with a test file, is available here.
function assignVisitor(node) {
for(var i in node.statements) {
var loc = node.loc
var lhs = node.statements[i].type
var rhs = node.statements[i].assign
if( rhs && lhs ) {
if( lhs.unsigned ) {
if(parseInt(rhs[0].value) > 0) {
print( "ASSIGN: negative to unsigned at:"+loc+"\n" )
}
else if(rhs[0].type && !rhs[0].type.unsigned) {
print( "ASSIGN: signed to unsigned at:"+loc+"\n" )
}
}
else if(rhs[0].type) {// lhs is signed
if( rhs[0].type.unsigned ) {
print( "ASSIGN: unsigned to signed at:"+loc+"\n" );
}
}
}
}
}
function process_function(decl,body) {
iter(assignVisitor, body)
}
One thought on “Dehydra-GCC: Static Analysis for Poor People”