[Esapi-user] HTMLEntityCodec and optional semicolon for namedentities

Jeff Williams jeff.williams at aspectsecurity.com
Wed Oct 19 14:23:43 EDT 2011


This is an excellent question that came up a few years ago, actually.

First, whether or not the spec requires a ; at the end of an entity is sort of irrelevant.  The fact is that browsers do actually interpret those characters.  So we need to treat those non-terminated entities as potential injection ingredients.

The bigger question, IMO, is to figure out what the "canonical" form of the input you sent actually is.  The problem is that if the URL interpreter handles this first, then the & will be used to separate the URL parameters, and there are no HTML entities.  Alternatively, if the HTML interpreter handles this input first, it will see the &super and decide to decode the special character.

This behavior is not necessarily standard, as different browsers may have made different choices.  If you want to change the way that the canonicalizer in ESAPI works, you can create your own  encoder with a list of codecs.  The order of the codecs should match the order of the interpreters and/or decoders that you expect the data to visit.

There are many other possible examples of data that can be canonicalized two different ways given the rich encoding environment created for us in modern web application petri dishes.  One that bit me recently was that "c:\file" decoded into "c:[0x0f]ile" -- beware multiple encoding schemes!!

Personally, I recommend not using URL parameters that start with HTML entity names.  This is unfortunate, but I don't see a great way around it.

--Jeff

Jeff Williams, CEO
Aspect Security
410-707-1487


-----Original Message-----
From: esapi-user-bounces at lists.owasp.org [mailto:esapi-user-bounces at lists.owasp.org] On Behalf Of Olivier Jaquemet
Sent: Wednesday, October 19, 2011 9:18 AM
To: esapi-user at lists.owasp.org
Subject: [Esapi-user] HTMLEntityCodec and optional semicolon for namedentities

Hello all,

I have a question regarding a discutable behavior of the HTMLEntityCodec in ESAPI 2.0.1 (which I think is a bug).

Let's say a user send the following value for an URL input :
http://www.example.com/someservlet?foo=bar&super=great&baz=qux

I use the HTTPURL validator (in the default ESAPI.properties) to make sure the input is appropriate :
String validatedUrl =
org.owasp.esapi.ESAPI.validator().getValidInput("userURL",
request.getParameter("userURL"), "HTTPURL", 2000, true);

As expected, a "canonicalization" occurs and the HTMLEntityCodec is applied to decode characters.
Problem : the start of parameter "&super=great" is consider as an html entity (⊇) and the URL will be canonicalized this way :
http://www.example.com/someservlet?foo=bar⊇r=great&baz=qux
(Unicode code point U+2287 described as "superset of or equal to" 
appears after "bar" insead of the expected parameter name)

It is clearly stated in the JavaDoc of the HTMLEntityCodec that this behavior is applied :
/> Formats all are legal both with and without semi-colon, upper/lower case/ cf
http://owasp-esapi-java.googlecode.com/svn/trunk_doc/latest/org/owasp/esapi/codecs/HTMLEntityCodec.html#decodeCharacter%28org.owasp.esapi.codecs.PushbackString%29
/
/However, the wikipedia article "List of XML and HTML character entity references" (cf http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
) states :
/> The semicolon is required./
Which is not entirely true. Indeed, if we dive into the HTML specification for Character references ( http://www.w3.org/TR/REC-html40/charset.html#entities ), we can read :
/*> *In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present.
/
And I think this is the problem of the HTMLEntityCodec.
/--> In other circumstances it may not be eliminated (e.g., in the middle of a word)./

In my use case "&super=great", the entity "&supe" is part of another word and MUST not have been decoded.
I think the HTMLEntityCodec should be modified to apply this behavior, otherwise it leads to invalid data being retrieved.
But as I am no security expert... What do you think ?

Regards,
Olivier Jaquemet
_______________________________________________
Esapi-user mailing list
Esapi-user at lists.owasp.org
https://lists.owasp.org/mailman/listinfo/esapi-user


More information about the Esapi-user mailing list