[Owasp-board] Working toward a resolution on the Constrast Security / OWASP Benchmark fiasco

Eoin Keary eoin.keary at owasp.org
Mon Nov 30 11:17:03 UTC 2015


I don't believe vendors should lead any project. 

Contribute? yes, Lead? No.

This goes for all projects and shall help with independence and objectivity. 


Eoin Keary
OWASP Volunteer
@eoinkeary



> On 28 Nov 2015, at 11:39 p.m., Kevin W. Wall <kevin.w.wall at gmail.com> wrote:
> 
> Until very recently, I've been following at a distance this dispute between
> various OWASP members and Contrast Security over the latter's advertising
> references to the OWASP Benchmark Project
> 
> While I too believe that mistakes were made, I believe that we all need to
> take a step back and not throw out the baby with the bath water.
> 
> While unlike Johanna, I have not executed the OWASP Benchmark Project for
> any given SAST or DAST tool, having used many such commercial tools, I feel
> qualified to render a reasoned opinion of the OWASP Benchmark Project, and
> perhaps some steps that we can take towards amicable resolution.
> 
> Let me start with the OWASP Benchmark Project. I find the idea of having an
> extensive baseline of tests against we can gauge the effectiveness of SAST
> and DAST software quite sound. In a way, these tests are analogous to unit
> tests that we, as developers, use to find bugs in our code and help us
> improve it, where here the discovered false positives and false negatives
> being revealed are being used as the PASS / FAIL criteria for the tests. Just
> as in unit testing, where the ideal is to have extensive tests to broaden one's
> "test coverage" of the software under test, the Benchmark Project strives
> to have a broad set of tests to assist in revealing deficiencies (with
> the goal of removing these "defects") in various SAST and DAST tools.
> 
> This is all well and good, and I whole-heartedly applaud this effort.
> 
> However, I see several ways that this Benchmark Project fails. For one,
> we have no way to measure the "test coverage" of the vulnerabilities that
> the Benchmark Project claims to measure. There are (by figures that I've
> seen claimed) something like 21,000 different test cases. How do we, as AppSec
> people, know if these 21k 'tests' provide "even" test coverage? For
> instance, it is not unreasonable to think that they may be heavy coverage
> on tests that are easy to create (e.g., SQLi, buffer overflows, XSS) and
> a much lesser emphasis on "test cases" for things like cryptographic
> weaknesses. (This would not be surprising in the least, since the coverage
> of every SAST and DAST tool that I've ever used seems to excel in some
> areas and absolutely suck in others.)
> 
> Another way that the Benchmark Project is lacking is one that is admitted
> on the Benchmark Project wiki page under the "Benchmark Validity" section:
>    The Benchmark tests are not exactly like real applications. The
>    tests are derived from coding patterns observed in real
>    applications, but the majority of them are considerably *simpler*
>    than real applications. That is, most real world applications will
>    be considerably harder to successfully analyse than the OWASP
>    Benchmark Test Suite. Although the tests are based on real code,
>    it is possible that some tests may have coding patterns that don't
>    occur frequently in real code.
> 
> A lot of tools are great at detecting data and control flows that are simple,
> but fail completely when facing "real code" that uses complex MVC frameworks
> like Spring Framework or Apache Struts. The bottom line is that we need
> realistic tests. While we can be fairly certain that if a SAST or DAST tool
> misses the low bar of one of the existing Benchmark Project test cases, if
> they are able to _pass_ those tests, it still says *absolutely nothing* about
> their ability to detect vulnerabilities in real world code where the code
> is often orders of magnitude more complex. (And I would argue that this is
> one reason we see the false positive rate so high for SAST and DAST tools;
> rather than err on the side of false negatives, they flag "issues" that
> they are generally unreliable and then rely on appsec analysts to discern which
> are real and which are red herrings. This is still easier than if they
> appsec engineers had to hunt down these potential issues manually and then
> analyze them, so it is not entirely inappropriate. As long as the tool
> provides some sort of "confidence" indicator for the various issues that it
> finds, an analyst can easily decide whether they are worth spending effort on
> further investigation.)
> 
> This brings me to what I see as the third major area of where the Benchmark
> Project is lacking. In striving to be simple, it attempts to distill all the
> findings into a single metric. The nicest thing I can think of saying about
> this is that it is woefully naive and misguided. I think where it is misguided
> is that it assumes that every IT organization in every company weights
> everything equally. For instance, false positives and false negatives are both
> _equally_ bad. However, in reality, most organizations that I've been involved
> in AppSec would highly prefer false positives over false negatives. Likewise,
> all categories (e.g., buffer overflows, heap corruption, SQLi, XSS, CSRF,
> etc.) are all weighted equally. Every appsec engineer knows that this is
> generally unrealistic; indeed it is _one_ reason that we have different risk
> ratings for different findings. Also, if a company writes all of their
> applications in "safe" programming languages like C# or Java, then categories
> like buffer overflows or heap corruption completely disappear. What that means
> is that those companies don't care at all whether or not a given SAST or DAST
> tool can find those categories of vulnerabilities or not because they are
> completely irrelevant for them. However, because there is no way to customize
> the weighting of Benchmark Project findings when run for a given tool,
> everything is attempted to be shoe-horned into a single magical figure. The
> result is that that magical Benchmark Project figure becomes almost
> meaningless. At best, it's meaning is very subjective and not at all as
> objective as Contrast's advertising is attempting to lead people to believe.
> 
> I believe that the general reaction to all of this has been negative, at
> least based on the comments that I've read not only in the OWASP mailing
> lists, but also on Twitter. In the end, this will be damaging to either
> OWASP's overall reputation or at the very least, the reputation of the
> OWASP Benchmark Project, both of which I think most of us agreed is
> bad for the appsec community in general.
> 
> Therefore, I have a simple proposal towards resolution. I would appeal to
> the OWASP project leaders to appeal to the OWASP Board to simply mark the
> OWASP Benchmark Project Wiki page (and ideally, its GitHub site) as noting
> that the findings are being disputed. For the wiki page, we could do this
> in a manner that Wikipedia marks disputes, using a Template:Disputed tag
> (see https://en.wikipedia.org/wiki/Template:Disputed_tag) or their
> "Accurracy Disputes" (for example, see
> https://en.wikipedia.org/wiki/Wikipedia:Accuracy_dispute
> and https://en.wikipedia.org/wiki/Category:Accuracy_disputes)
> 
> At a mininum, we should have this tag result in rendering something like:
>    "The use and accuracy of this page is currently being disputed.
>    OWASP does not support any vendor endorsing any of their
>    software according to the scores resulting in execution of
>    the OWASP Benchmark."
> that the OWASP Board should apply (so that no one is permitted to
> remove it without proper authorization).
> 
> I will leave the exact wording up to the board. But just like disputed
> pages on Wikipedia, OWASP must take action on this or I think they are
> likely to have credibility issues in the future.
> 
> Thank you for listening,
> -kevin wall
> -- 
> Blog: http://off-the-wall-security.blogspot.com/
> NSA: All your crypto bit are belong to us.
> _______________________________________________
> Owasp-board mailing list
> Owasp-board at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-board
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/owasp-board/attachments/20151130/9e891bc6/attachment.html>


More information about the Owasp-board mailing list