[Esapi-user] HTMLEntityCodec and optional semicolon for namedentities

Jeffrey Walton noloader at gmail.com
Wed Oct 19 14:44:32 EDT 2011

On Wed, Oct 19, 2011 at 2:23 PM, Jeff Williams
<jeff.williams at aspectsecurity.com> wrote:
> This is an excellent question that came up a few years ago, actually.
> First, whether or not the spec requires a ; at the end of an entity is sort of irrelevant.  The fact is that browsers do actually interpret those characters.  So we need to treat those non-terminated entities as potential injection ingredients.
> The bigger question, IMO, is to figure out what the "canonical" form of the input you sent actually is.  The problem is that if the URL interpreter handles this first, then the & will be used to separate the URL parameters, and there are no HTML entities.  Alternatively, if the HTML interpreter handles this input first, it will see the &super and decide to decode the special character.
> This behavior is not necessarily standard, as different browsers may have made different choices.  If you want to change the way that the canonicalizer in ESAPI works, you can create your own  encoder with a list of codecs.  The order of the codecs should match the order of the interpreters and/or decoders that you expect the data to visit.
On the surface, it seems to me that the best way to handle this is via
a UserAgent string, and not an ordered list of expected browsers.
Given a user agent, a factory class would cough up an appropriate
codec or possibly throw an excpetion. If an exception is too drastic,
perhaps a default codec could be returned.

Things can get tricky since the user agent string is controlled by the
adversary. But it should work as expected for the common case.

> There are many other possible examples of data that can be canonicalized two different ways given the rich encoding environment created for us in modern web application petri dishes.  One that bit me recently was that "c:\file" decoded into "c:[0x0f]ile" -- beware multiple encoding schemes!!
> Personally, I recommend not using URL parameters that start with HTML entity names.  This is unfortunate, but I don't see a great way around it.


> -----Original Message-----
> From: esapi-user-bounces at lists.owasp.org [mailto:esapi-user-bounces at lists.owasp.org] On Behalf Of Olivier Jaquemet
> Sent: Wednesday, October 19, 2011 9:18 AM
> To: esapi-user at lists.owasp.org
> Subject: [Esapi-user] HTMLEntityCodec and optional semicolon for namedentities
> Hello all,
> I have a question regarding a discutable behavior of the HTMLEntityCodec in ESAPI 2.0.1 (which I think is a bug).
> [SNIP]

More information about the Esapi-user mailing list