[Owasp-board] Working toward a resolution on the Constrast Security / OWASP Benchmark fiasco

Josh Sokol josh.sokol at owasp.org
Mon Nov 30 17:15:25 UTC 2015


I tend to agree with this.  It seems that having a vendor lead a project
leads to questions about their ability to remain objective.  That said, how
do we qualify who is and is not a "vendor".  Anyone who sells something?
Do you have to sell a security product?  What about a security service?
Does it matter if my project has nothing to do with what I "sell"?  I would
bet that many of our project leaders are currently working for vendors in
the security space and should be removed.  What happens if we remove the
vendor and nobody wants to step up and take over the project?  Just some
questions in my mind that I've been pondering around this.

~josh

On Mon, Nov 30, 2015 at 5:17 AM, Eoin Keary <eoin.keary at owasp.org> wrote:

> I don't believe vendors should lead any project.
>
> Contribute? yes, Lead? No.
>
> This goes for all projects and shall help with independence and
> objectivity.
>
>
> Eoin Keary
> OWASP Volunteer
> @eoinkeary
>
>
>
> On 28 Nov 2015, at 11:39 p.m., Kevin W. Wall <kevin.w.wall at gmail.com>
> wrote:
>
> Until very recently, I've been following at a distance this dispute between
> various OWASP members and Contrast Security over the latter's advertising
> references to the OWASP Benchmark Project
>
> While I too believe that mistakes were made, I believe that we all need to
> take a step back and not throw out the baby with the bath water.
>
> While unlike Johanna, I have not executed the OWASP Benchmark Project for
> any given SAST or DAST tool, having used many such commercial tools, I feel
> qualified to render a reasoned opinion of the OWASP Benchmark Project, and
> perhaps some steps that we can take towards amicable resolution.
>
> Let me start with the OWASP Benchmark Project. I find the idea of having an
> extensive baseline of tests against we can gauge the effectiveness of SAST
> and DAST software quite sound. In a way, these tests are analogous to unit
> tests that we, as developers, use to find bugs in our code and help us
> improve it, where here the discovered false positives and false negatives
> being revealed are being used as the PASS / FAIL criteria for the tests.
> Just
> as in unit testing, where the ideal is to have extensive tests to broaden
> one's
> "test coverage" of the software under test, the Benchmark Project strives
> to have a broad set of tests to assist in revealing deficiencies (with
> the goal of removing these "defects") in various SAST and DAST tools.
>
> This is all well and good, and I whole-heartedly applaud this effort.
>
> However, I see several ways that this Benchmark Project fails. For one,
> we have no way to measure the "test coverage" of the vulnerabilities that
> the Benchmark Project claims to measure. There are (by figures that I've
> seen claimed) something like 21,000 different test cases. How do we, as
> AppSec
> people, know if these 21k 'tests' provide "even" test coverage? For
> instance, it is not unreasonable to think that they may be heavy coverage
> on tests that are easy to create (e.g., SQLi, buffer overflows, XSS) and
> a much lesser emphasis on "test cases" for things like cryptographic
> weaknesses. (This would not be surprising in the least, since the coverage
> of every SAST and DAST tool that I've ever used seems to excel in some
> areas and absolutely suck in others.)
>
> Another way that the Benchmark Project is lacking is one that is admitted
> on the Benchmark Project wiki page under the "Benchmark Validity" section:
>    The Benchmark tests are not exactly like real applications. The
>    tests are derived from coding patterns observed in real
>    applications, but the majority of them are considerably *simpler*
>    than real applications. That is, most real world applications will
>    be considerably harder to successfully analyse than the OWASP
>    Benchmark Test Suite. Although the tests are based on real code,
>    it is possible that some tests may have coding patterns that don't
>    occur frequently in real code.
>
> A lot of tools are great at detecting data and control flows that are
> simple,
> but fail completely when facing "real code" that uses complex MVC
> frameworks
> like Spring Framework or Apache Struts. The bottom line is that we need
> realistic tests. While we can be fairly certain that if a SAST or DAST tool
> misses the low bar of one of the existing Benchmark Project test cases, if
> they are able to _pass_ those tests, it still says *absolutely nothing*
> about
> their ability to detect vulnerabilities in real world code where the code
> is often orders of magnitude more complex. (And I would argue that this is
> one reason we see the false positive rate so high for SAST and DAST tools;
> rather than err on the side of false negatives, they flag "issues" that
> they are generally unreliable and then rely on appsec analysts to discern
> which
> are real and which are red herrings. This is still easier than if they
> appsec engineers had to hunt down these potential issues manually and then
> analyze them, so it is not entirely inappropriate. As long as the tool
> provides some sort of "confidence" indicator for the various issues that it
> finds, an analyst can easily decide whether they are worth spending effort
> on
> further investigation.)
>
> This brings me to what I see as the third major area of where the Benchmark
> Project is lacking. In striving to be simple, it attempts to distill all
> the
> findings into a single metric. The nicest thing I can think of saying about
> this is that it is woefully naive and misguided. I think where it is
> misguided
> is that it assumes that every IT organization in every company weights
> everything equally. For instance, false positives and false negatives are
> both
> _equally_ bad. However, in reality, most organizations that I've been
> involved
> in AppSec would highly prefer false positives over false negatives.
> Likewise,
> all categories (e.g., buffer overflows, heap corruption, SQLi, XSS, CSRF,
> etc.) are all weighted equally. Every appsec engineer knows that this is
> generally unrealistic; indeed it is _one_ reason that we have different
> risk
> ratings for different findings. Also, if a company writes all of their
> applications in "safe" programming languages like C# or Java, then
> categories
> like buffer overflows or heap corruption completely disappear. What that
> means
> is that those companies don't care at all whether or not a given SAST or
> DAST
> tool can find those categories of vulnerabilities or not because they are
> completely irrelevant for them. However, because there is no way to
> customize
> the weighting of Benchmark Project findings when run for a given tool,
> everything is attempted to be shoe-horned into a single magical figure. The
> result is that that magical Benchmark Project figure becomes almost
> meaningless. At best, it's meaning is very subjective and not at all as
> objective as Contrast's advertising is attempting to lead people to
> believe.
>
> I believe that the general reaction to all of this has been negative, at
> least based on the comments that I've read not only in the OWASP mailing
> lists, but also on Twitter. In the end, this will be damaging to either
> OWASP's overall reputation or at the very least, the reputation of the
> OWASP Benchmark Project, both of which I think most of us agreed is
> bad for the appsec community in general.
>
> Therefore, I have a simple proposal towards resolution. I would appeal to
> the OWASP project leaders to appeal to the OWASP Board to simply mark the
> OWASP Benchmark Project Wiki page (and ideally, its GitHub site) as noting
> that the findings are being disputed. For the wiki page, we could do this
> in a manner that Wikipedia marks disputes, using a Template:Disputed tag
> (see https://en.wikipedia.org/wiki/Template:Disputed_tag) or their
> "Accurracy Disputes" (for example, see
> https://en.wikipedia.org/wiki/Wikipedia:Accuracy_dispute
> and https://en.wikipedia.org/wiki/Category:Accuracy_disputes)
>
> At a mininum, we should have this tag result in rendering something like:
>    "The use and accuracy of this page is currently being disputed.
>    OWASP does not support any vendor endorsing any of their
>    software according to the scores resulting in execution of
>    the OWASP Benchmark."
> that the OWASP Board should apply (so that no one is permitted to
> remove it without proper authorization).
>
> I will leave the exact wording up to the board. But just like disputed
> pages on Wikipedia, OWASP must take action on this or I think they are
> likely to have credibility issues in the future.
>
> Thank you for listening,
> -kevin wall
> --
> Blog: http://off-the-wall-security.blogspot.com/
> NSA: All your crypto bit are belong to us.
> _______________________________________________
> Owasp-board mailing list
> Owasp-board at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-board
>
>
> _______________________________________________
> Owasp-board mailing list
> Owasp-board at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-board
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/owasp-board/attachments/20151130/701a7bd5/attachment-0001.html>


More information about the Owasp-board mailing list