[OWASP-ESAPI] Canonicalize now passes all test cases!

Jeff Williams jeff.williams at owasp.org
Wed Nov 26 20:52:23 EST 2008



The canonicalization engine in ESAPI now passes all the test cases!  You'll
need to sync to SVN to get the code until we can get a jar file built and
released. While it was a lot of work to get to this point, the
implementation of both the canonicalize() method and all the codecs is very


Everyone says <http://cwe.mitre.org/data/definitions/180.html>  you
shouldn't do validation without canonicalizing the data first. I said it for
many years.  Well, it's easier said than done. I don't know of any other
library that actually attempts this - please let me know if you do so we can
compare results.


ESAPI.encoder().canonicalize() now handles:

-        All major application layer encoding/escaping schemes

o   CSS Escaping

o   HTMLEntity Encoding

o   JavaScript Escaping

o   MySQL Escaping

o   Oracle Escaping

o   Percent Encoding (aka URL Encoding)

o   Unix Escaping

o   VBScript Escaping

o   Windows Encoding

-        Perverse but legal variants of escaping schemes

-        Multiple escaping (%2526 or &#x26;lt;)

-        Mixed escaping (%26lt;)

-        Nested escaping (%%316 or &%6ct;)

-        All combinations of multiple, mixed, and nested encoding/escaping
(%2&#x35;3c or &#x2526gt;)

-        (NOTE: Canonicalize does not currently handle Unicode encoding


ESAPI's canonicalizer couldn't be simpler to use - the default is just.


    ESAPI.encoder().canonicalize( request.getParameter("input"));


You need to decode untrusted data so that it's safe for ANY downstream
interpreter or decoder.  For example, if your data goes into a Windows
command shell, then into a database, and then to a browser, you're going to
need to decode for all of those systems. You can build a custom encoder to
canonicalize for your application like this.


    ArrayList list = new ArrayList();

    list.add( new WindowsCodec() );

    list.add( new MySQLCodec() );

    list.add( new PercentCodec() );

    Encoder encoder = new DefaultEncoder( list );

    encoder.canonicalize( request.getParameter( "input" ));


In ESAPI, the Validator uses the canonicalize method before it does
validation.  So all you need to do is to validate as normal and you'll be
protected against a host of encoded attacks.


    ESAPI.validator().isValidInput( "test", input, "FirstName", 20, false);


However, the default canonicalize() method only decodes HTMLEntity, percent
(URL) encoding, and JavaScript encoding. If you'd like to use a custom
canonicalizer with your validator, that's pretty easy too.


    .  setup custom encoder as above

    Validator validator = new DefaultValidator( encoder );

    validator.isValidInput( "test", input, "name", 20, false);


Although ESAPI is able to canonicalize multiple, mixed, or nested encoding,
it's safer to not accept this stuff in the first place.  In ESAPI, the
default is "strict" mode that throws an IntrusionException if it receives
anything not single-encoded with a single scheme.  Currently this is not
configurable in ESAPI.properties, but it probably should be.  Even if you
disable "strict" mode, you'll still get warning messages in the log about
each multiple encoding and mixed encoding received.


    // disabling strict mode to allow mixed encoding

    ESAPI.encoder().canonicalize( request.getParameter("url"), false);


If anyone is building a validation engine, you might want to look at the
Encoder test cases. There are many examples of multiple, mixed, and nested




There is still some performance work to do here for anyone who'd like to
pitch in.  I'd really like to get a decent performance testing setup in





-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-esapi/attachments/20081126/93d5755b/attachment-0001.html 

More information about the OWASP-ESAPI mailing list