[Owasp-leaders] OWASP/WASC SAST Criteria

John Steven John.Steven at owasp.org
Tue Jan 25 09:44:51 EST 2011


All,

I wade into this topic gingerly. Notwithstanding, I'm bound to raise
some hackles. I apologize in advance--I'm desperately trying to be
helpful. As a note: architect's of NIST's SAMATE project have made
some of the same points I make below in different forms.

[Huge Disclaimer]
Those of you who know me understand I've been working on
building/tuning/customizing/scaling static analysis off-and-on for 13
years now. To be clear, I've recommended client organizations adopt
tools produced by Klocwork, Coverity, (formerly) Ounce,  (formerly)
Fortify, and finally Veracode's SaaS. I've got experience with
Parasoft's suite as well. I've recommended use of freely available
tools I've used (Valgrind, FX/Cop, CAT.NET, PMD, findbugs), sometimes
in place of commercial alternatives. I've got (albeit dated) knowledge
of Microsoft's internal tool suite, research and code slicing tools
such as Code Surfer) and remember SecureSoftware's CodeAssure with a
emotionally damaged longing. I have participated/lead the
research/construction of five static tools internally @ Cigital).

[Goal]
Organizations waste a ton of money selecting, piloting, and
implementing static analysis tools. In certain circumstances, I see
certain tool vendors under-estimate the cost of deployment by $2.5MM
the first year (*1). This awe-inspiring waste (both absolutely and as
a percentage of tool license cost) validates the potential benefit a
project such as is suggested.

[Difficulties]
1) Tool A will perform dramatically differently based on Tool B on the
same code base, despite claiming similar/identical language/rule
support in a particular context (*2). Specifics of tool performance
rely on esoteric detail such as:
   * Engine + rule implementation (Only fully discernible through
proprietary knowledge);
   * Structure/style of code: (Only discernible through a wealth of
experience);
        * Calling convention
        * Order-of-operations
        * Depth of call-chain, embeddedness of block/scope
        * Use of inheritance, polymorphism, pointers, and other indirection
2) Tool A will perform very differently than -THE SAME- Tool A based
on the same criteria listed above; <groan>
3) Tool A's configuration may dramatically change performance (amongst
criteria above). Settings affecting memory usage, scan parameters, and
tool-specific settings often affect tens-of-percentages of findings
4) Operator use of tools has overwhelming affect on Tool A's performance (*3);
5) A Tool A's minor release N +1 may perform dramatically different
than release N under a fixed set of criteria.

[Conclusions]
Difficulty 1 means that high-quality comparison is extremely difficult
to achieve. Difficulties #2 & #3 mean that high-quality test set-up
for even a single tool is extremely difficult to achieve.

Difficulty #4 indicates, to me, that finding qualified people to
attack this problem is neigh-on impossible. Difficulty #5 warns me
that publishing results (though not suggested, a potentially seductive
'oral tradition' amongst test maintainers) may be very misleading.

These difficulties also imply that any test criteria will:

1) Produce very inconsistent results if implemented by different
adopting organizations
2) Will produce very inconsistent results, sensitive to when (relative
to releases) tools are considered
3) If, provided OWASP/WASC implement such an evaluation suite, be very
sensitive to the suite's test case implementation

These conclusions purposefully omit knock-on effects of a published
suite of static (read: unchanging) test suite's implementations: such
as vendor design-for-test scenarios (*4).

[A Path Forward]
I _strongly_ suggest that:

* Those participating in this effort open channels to the NIST folk
who have walked this road before. They've considered some of these
difficulties, potential biases, and so forth. They, IMO, worked hard
to correct what they could, within their confines.

* Any proposed criteria focus on helping adopting organizations decide
if a tool will help raise their code's quality, rather than judging a
Tool A vs. a Tool B (Perhaps a subtle but important distinction)

* Separate evaluation criteria be defined for A) implementing,
operationally deploying, and scaling a tool  and B) the tool's
analysis capabilities (*5).

* Data Cigital collects indicate that the bulk of cost/benefit in a
static analysis implementation program will be borne out of
customization (though the bulk of the industry's experience remains in
implementing the base tool). This must become a criteria column C)
(see previous bullet) as the criteria-defining effort matures.
Unfortunately, I've yet to find someone outside my own staff who's
spent more than 1000 hrs on this problem.

The conclusions I've driven from this thought-exercise, which I
probably conduct yearly ;-), have led me away from attempting what is
suggested here. I'm not necessarily suggesting that OWASP/WASC abandon
their pursuit--but tread carefully. And, perhaps all this difficulty
invigorates potential participants who might think, "If this is hard
for organizations to do, it's IDEAL as an OWASP task to help them
climb the hill." Maybe, but maybe the elevation is too high, the wind
too strong, and the mount too high to summit.

This Sherpa, I think, is gonna stay at home while others attempt this journey.
----
John Steven
Senior Director; Advanced Technology Consulting
Desk: 703.404.9293 x1204 Cell: 703.727.4034
Key fingerprint = 4772 F7F3 1019 4668 62AD  94B0 AE7F EEF4 62D5 F908

Blog: http://www.cigital.com/justiceleague
Papers: http://www.cigital.com/papers/jsteven
http://www.cigital.com
Software Confidence. Achieved.

(*1) - Actual client data. Referenced number represents an average of
six clients expenditure on internal (loaded cost) / external time
(consulting dollars) to meet previously stated goals for tool
implementation (IE: "On-board 100 apps. with a single scan / quarter")

(*2) - Ability to detect buffer overflows in out-of-range char[]
access in C++, for example. Very specific vulnerability type, very
different results.

(*3) - Our tool operators have witnessed both a Tool Vendor (A)'s own
staff mis-configure tools for a scan. They have also reported very
senior AppSec personnel, including members of this list, mis-configure
tools for a scan. In both cases, Cigital operators have produced
result sets for which _a majority_ amount of findings were added,
removed, or changed. I've observed Cigital tool operators suffer the
same limitation. Indeed, last year (2010), he whom I consider my most
experienced tool operator (in a particular tool) did not know about a
configuration option that changed a scan's results dramatically (for
the better).

(*4) - Any one do bake-offs between two tools in the 2004-2006 time
frame? Boy was it fun to use OWASP code as the test suite. It was
_immediately_ evident which tool vendor(s) had 'designed for test'.

(*5) - Tool vendors have become accustomed to discussing their
"engine's" capabilities but vary dramatically in terms of their
abilities to understand and articulate how their suite of offerings
fit within a software security program and a vulnerability management
lifecycle.





On Tue, Jan 25, 2011 at 8:14 AM, Tom Brennan <tomb at owasp.org> wrote:
> Excellent, we need more of these type of efforts globally Jim.  Ideally we would want to have a owasp project set up to track it and give it globally visibility.
>
>
> -----Original Message-----
> From: Ryan Barnett <ryan.barnett at owasp.org>
> Sender: owasp-leaders-bounces at lists.owasp.org
> Date: Tue, 25 Jan 2011 08:08:13
> To: owasp-leaders at lists.owasp.org<owasp-leaders at lists.owasp.org>
> Reply-To: owasp-leaders at lists.owasp.org
> Subject: Re: [Owasp-leaders] OWASP/WASC SAST Criteria
>
> So this will be like WASSEC but for SAST instead of DAST?  Sounds good to me.
>
> --
> Ryan Barnett
>
>
> On Jan 25, 2011, at 7:57 AM, "Jim Manico" <jim.manico at owasp.org> wrote:
>
>> Hello all,
>>
>> I'm working with the folks at WASC to define a SAST (static analysis) tool evaluation criteria and benchmark suite. This is not an actual tool study - just a project to set up public evaluation criteria.
>>
>> I think this is a marvelous way for OWASP and WASC to collaborate.
>>
>> If you are interested in participating (and have significant expertise in this area) please contact me off list.
>>
>> OWASP Board : Are you ok with this project? I think everyone involved wants this to be objective criteria. We could certainly help...
>>
>> -Jim Manico
>> http://manico.net


More information about the OWASP-Leaders mailing list