[Owasp-leaders] [Owasp-board] Working toward a resolution on the Constrast Security / OWASP Benchmark fiasco

Tobias Glemser tobias.glemser at owasp.org
Tue Dec 1 17:07:12 UTC 2015


> Does this mean a vendor could never lead this kind of project and we lose all
> the merits of the benchmark? I think if the vendor could also get 2 other
> independent project leaders that aren't from the same vendor then maybe
> that would work.
We did this for a german OWASP paper years ago (https://www.owasp.org/index.php/Best_Practice:_Projektierung_der_Sicherheitspr%C3%BCfung_von_Webanwendungen). We're describing how one should plan a web pentest and what a good Pentester should deliver. If I would have done it on my own, it would of course be biased, even with the best intentions. But we had five vendors and one "customer" working on that paper. For me, we vendors in project are not a bad thin per se. it has to be transparent and it must not be just one on a vendor focused project. 

Tobias

> -----Ursprüngliche Nachricht-----
> Von: owasp-leaders-bounces at lists.owasp.org [mailto:owasp-leaders-
> bounces at lists.owasp.org] Im Auftrag von Michael Coates
> Gesendet: Dienstag, 1. Dezember 2015 16:21
> An: psiinon
> Cc: OWASP Board; OWASP Leaders
> Betreff: Re: [Owasp-leaders] [Owasp-board] Working toward a resolution on
> the Constrast Security / OWASP Benchmark fiasco [ Z1 UNGESICHERT ]
> 
> I think that's a critical point Simon and tend to agree. In a situation where a
> project compares and critiques tools/approaches/etc we need
> independence in the project leadership.
> 
> Does this mean a vendor could never lead this kind of project and we lose all
> the merits of the benchmark? I think if the vendor could also get 2 other
> independent project leaders that aren't from the same vendor then maybe
> that would work.
> 
> Thoughts?
> 
> On Tuesday, December 1, 2015, psiinon <psiinon at gmail.com> wrote:
> 
> 
> 	I actually disagree.
> 
> 	I'm fine with vendors leading most types of projects - we should be
> encouraging more vendor involvement / sponsorship.
> 
> 	But I now dont think its a good idea for any vendor to lead a project
> which is designed to evaluate competing commercial and open source
> projects.
> 
> 
> 	Cheers,
> 
> 
> 	Simon
> 
> 
> 	On Mon, Nov 30, 2015 at 11:17 AM, Eoin Keary
> <eoin.keary at owasp.org
> <javascript:_e(%7B%7D,'cvml','eoin.keary at owasp.org');> > wrote:
> 
> 
> 		I don't believe vendors should lead any project.
> 
> 		Contribute? yes, Lead? No.
> 
> 		This goes for all projects and shall help with independence
> and objectivity.
> 
> 
> 
> 		Eoin Keary
> 		OWASP Volunteer
> 		@eoinkeary
> 
> 
> 
> 
> 		On 28 Nov 2015, at 11:39 p.m., Kevin W. Wall
> <kevin.w.wall at gmail.com
> <javascript:_e(%7B%7D,'cvml','kevin.w.wall at gmail.com');> > wrote:
> 
> 
> 
> 			Until very recently, I've been following at a distance
> this dispute between
> 			various OWASP members and Contrast Security over
> the latter's advertising
> 			references to the OWASP Benchmark Project
> 
> 			While I too believe that mistakes were made, I
> believe that we all need to
> 			take a step back and not throw out the baby with the
> bath water.
> 
> 			While unlike Johanna, I have not executed the
> OWASP Benchmark Project for
> 			any given SAST or DAST tool, having used many such
> commercial tools, I feel
> 			qualified to render a reasoned opinion of the OWASP
> Benchmark Project, and
> 			perhaps some steps that we can take towards
> amicable resolution.
> 
> 			Let me start with the OWASP Benchmark Project. I
> find the idea of having an
> 			extensive baseline of tests against we can gauge the
> effectiveness of SAST
> 			and DAST software quite sound. In a way, these tests
> are analogous to unit
> 			tests that we, as developers, use to find bugs in our
> code and help us
> 			improve it, where here the discovered false positives
> and false negatives
> 			being revealed are being used as the PASS / FAIL
> criteria for the tests. Just
> 			as in unit testing, where the ideal is to have extensive
> tests to broaden one's
> 			"test coverage" of the software under test, the
> Benchmark Project strives
> 			to have a broad set of tests to assist in revealing
> deficiencies (with
> 			the goal of removing these "defects") in various SAST
> and DAST tools.
> 
> 			This is all well and good, and I whole-heartedly
> applaud this effort.
> 
> 			However, I see several ways that this Benchmark
> Project fails. For one,
> 			we have no way to measure the "test coverage" of
> the vulnerabilities that
> 			the Benchmark Project claims to measure. There are
> (by figures that I've
> 			seen claimed) something like 21,000 different test
> cases. How do we, as AppSec
> 			people, know if these 21k 'tests' provide "even" test
> coverage? For
> 			instance, it is not unreasonable to think that they may
> be heavy coverage
> 			on tests that are easy to create (e.g., SQLi, buffer
> overflows, XSS) and
> 			a much lesser emphasis on "test cases" for things like
> cryptographic
> 			weaknesses. (This would not be surprising in the
> least, since the coverage
> 			of every SAST and DAST tool that I've ever used
> seems to excel in some
> 			areas and absolutely suck in others.)
> 
> 			Another way that the Benchmark Project is lacking is
> one that is admitted
> 			on the Benchmark Project wiki page under the
> "Benchmark Validity" section:
> 			   The Benchmark tests are not exactly like real
> applications. The
> 			   tests are derived from coding patterns observed in
> real
> 			   applications, but the majority of them are
> considerably *simpler*
> 			   than real applications. That is, most real world
> applications will
> 			   be considerably harder to successfully analyse than
> the OWASP
> 			   Benchmark Test Suite. Although the tests are based
> on real code,
> 			   it is possible that some tests may have coding
> patterns that don't
> 			   occur frequently in real code.
> 
> 			A lot of tools are great at detecting data and control
> flows that are simple,
> 			but fail completely when facing "real code" that uses
> complex MVC frameworks
> 			like Spring Framework or Apache Struts. The bottom
> line is that we need
> 			realistic tests. While we can be fairly certain that if a
> SAST or DAST tool
> 			misses the low bar of one of the existing Benchmark
> Project test cases, if
> 			they are able to _pass_ those tests, it still says
> *absolutely nothing* about
> 			their ability to detect vulnerabilities in real world code
> where the code
> 			is often orders of magnitude more complex. (And I
> would argue that this is
> 			one reason we see the false positive rate so high for
> SAST and DAST tools;
> 			rather than err on the side of false negatives, they
> flag "issues" that
> 			they are generally unreliable and then rely on appsec
> analysts to discern which
> 			are real and which are red herrings. This is still easier
> than if they
> 			appsec engineers had to hunt down these potential
> issues manually and then
> 			analyze them, so it is not entirely inappropriate. As
> long as the tool
> 			provides some sort of "confidence" indicator for the
> various issues that it
> 			finds, an analyst can easily decide whether they are
> worth spending effort on
> 			further investigation.)
> 
> 			This brings me to what I see as the third major area of
> where the Benchmark
> 			Project is lacking. In striving to be simple, it attempts
> to distill all the
> 			findings into a single metric. The nicest thing I can
> think of saying about
> 			this is that it is woefully naive and misguided. I think
> where it is misguided
> 			is that it assumes that every IT organization in every
> company weights
> 			everything equally. For instance, false positives and
> false negatives are both
> 			_equally_ bad. However, in reality, most
> organizations that I've been involved
> 			in AppSec would highly prefer false positives over
> false negatives. Likewise,
> 			all categories (e.g., buffer overflows, heap corruption,
> SQLi, XSS, CSRF,
> 			etc.) are all weighted equally. Every appsec engineer
> knows that this is
> 			generally unrealistic; indeed it is _one_ reason that
> we have different risk
> 			ratings for different findings. Also, if a company writes
> all of their
> 			applications in "safe" programming languages like C#
> or Java, then categories
> 			like buffer overflows or heap corruption completely
> disappear. What that means
> 			is that those companies don't care at all whether or
> not a given SAST or DAST
> 			tool can find those categories of vulnerabilities or not
> because they are
> 			completely irrelevant for them. However, because
> there is no way to customize
> 			the weighting of Benchmark Project findings when
> run for a given tool,
> 			everything is attempted to be shoe-horned into a
> single magical figure. The
> 			result is that that magical Benchmark Project figure
> becomes almost
> 			meaningless. At best, it's meaning is very subjective
> and not at all as
> 			objective as Contrast's advertising is attempting to
> lead people to believe.
> 
> 			I believe that the general reaction to all of this has
> been negative, at
> 			least based on the comments that I've read not only
> in the OWASP mailing
> 			lists, but also on Twitter. In the end, this will be
> damaging to either
> 			OWASP's overall reputation or at the very least, the
> reputation of the
> 			OWASP Benchmark Project, both of which I think
> most of us agreed is
> 			bad for the appsec community in general.
> 
> 			Therefore, I have a simple proposal towards
> resolution. I would appeal to
> 			the OWASP project leaders to appeal to the OWASP
> Board to simply mark the
> 			OWASP Benchmark Project Wiki page (and ideally, its
> GitHub site) as noting
> 			that the findings are being disputed. For the wiki
> page, we could do this
> 			in a manner that Wikipedia marks disputes, using a
> Template:Disputed tag
> 			(see
> https://en.wikipedia.org/wiki/Template:Disputed_tag) or their
> 			"Accurracy Disputes" (for example, see
> 
> 	https://en.wikipedia.org/wiki/Wikipedia:Accuracy_dispute
> 			and
> https://en.wikipedia.org/wiki/Category:Accuracy_disputes)
> 
> 			At a mininum, we should have this tag result in
> rendering something like:
> 			   "The use and accuracy of this page is currently being
> disputed.
> 			   OWASP does not support any vendor endorsing any
> of their
> 			   software according to the scores resulting in
> execution of
> 			   the OWASP Benchmark."
> 			that the OWASP Board should apply (so that no one is
> permitted to
> 			remove it without proper authorization).
> 
> 			I will leave the exact wording up to the board. But just
> like disputed
> 			pages on Wikipedia, OWASP must take action on this
> or I think they are
> 			likely to have credibility issues in the future.
> 
> 			Thank you for listening,
> 			-kevin wall
> 			--
> 			Blog: http://off-the-wall-security.blogspot.com/
> 			NSA: All your crypto bit are belong to us.
> 
> 	_______________________________________________
> 			Owasp-board mailing list
> 			Owasp-board at lists.owasp.org
> <javascript:_e(%7B%7D,'cvml','Owasp-board at lists.owasp.org');>
> 			https://lists.owasp.org/mailman/listinfo/owasp-
> board
> 
> 
> 
> 		_______________________________________________
> 		Owasp-board mailing list
> 		Owasp-board at lists.owasp.org
> <javascript:_e(%7B%7D,'cvml','Owasp-board at lists.owasp.org');>
> 		https://lists.owasp.org/mailman/listinfo/owasp-board
> 
> 
> 
> 
> 
> 
> 	--
> 
> 	OWASP ZAP <https://www.owasp.org/index.php/ZAP>  Project
> leader
> 
> 
> 
> 
> --
> 
> 
> --
> Michael Coates | @_mwc
> <https://twitter.com/intent/user?screen_name=_mwc>
> 
> OWASP Global Board
> 
> 
> 
> 
> 




More information about the OWASP-Leaders mailing list