[Esapi-user] HTMLEntityCodec and optional semicolon for namedentities

Olivier Jaquemet olivier.jaquemet at jalios.com
Wed Oct 19 15:34:26 EDT 2011

Thank you Jim, I will have a look a this.

On 19/10/2011 20:45, Jim Manico wrote:
> So when you are trying to validate and display a complete URL from 
> untrusted input, I would suggest the following workflow:
> 1) (on input) When first accepting the URL from the user, validate 
> that URL with the Apache commons URL validation class or something 
> similar:
> http://commons.apache.org/validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html 
> * If the URL does not pass then reject the input.
> PS: If you use ESAPI's validator for this validation then I recommend 
> you turn off canonicalization for that one input.
> 2) (on input) Make sure the URL starts with http:// or https:// or 
> otherwise reject the input (this rule will need to be modified for 
> your app most likely)
> * If the URL does not pass this rule then reject the input.
> 3) (on output) Render the URL using the normal encoder rules based on 
> context of display. If you are putting this URL in a HREF link so it 
> can be click, you need to do attribute encoding. If you are putting 
> this URL in a body content for display only, then just do normal HTML 
> entity encoding.
> This workflow should solve your problem.
> Aloha,
> Jim
> PS: Keep in mind I never did any kind of URL encoding. If untrusted 
> data lands in just a get parameter, then URL encode. :)
>> This is an excellent question that came up a few years ago, actually.
>> First, whether or not the spec requires a ; at the end of an entity 
>> is sort of irrelevant.  The fact is that browsers do actually 
>> interpret those characters.  So we need to treat those non-terminated 
>> entities as potential injection ingredients.
>> The bigger question, IMO, is to figure out what the "canonical" form 
>> of the input you sent actually is.  The problem is that if the URL 
>> interpreter handles this first, then the&  will be used to separate 
>> the URL parameters, and there are no HTML entities.  Alternatively, 
>> if the HTML interpreter handles this input first, it will see 
>> the&super and decide to decode the special character.
>> This behavior is not necessarily standard, as different browsers may 
>> have made different choices.  If you want to change the way that the 
>> canonicalizer in ESAPI works, you can create your own  encoder with a 
>> list of codecs.  The order of the codecs should match the order of 
>> the interpreters and/or decoders that you expect the data to visit.
>> There are many other possible examples of data that can be 
>> canonicalized two different ways given the rich encoding environment 
>> created for us in modern web application petri dishes.  One that bit 
>> me recently was that "c:\file" decoded into "c:[0x0f]ile" -- beware 
>> multiple encoding schemes!!
>> Personally, I recommend not using URL parameters that start with HTML 
>> entity names.  This is unfortunate, but I don't see a great way 
>> around it.
>> --Jeff
>> Jeff Williams, CEO
>> Aspect Security
>> 410-707-1487
>> -----Original Message-----
>> From: esapi-user-bounces at lists.owasp.org 
>> [mailto:esapi-user-bounces at lists.owasp.org] On Behalf Of Olivier 
>> Jaquemet
>> Sent: Wednesday, October 19, 2011 9:18 AM
>> To: esapi-user at lists.owasp.org
>> Subject: [Esapi-user] HTMLEntityCodec and optional semicolon for 
>> namedentities
>> Hello all,
>> I have a question regarding a discutable behavior of the 
>> HTMLEntityCodec in ESAPI 2.0.1 (which I think is a bug).
>> Let's say a user send the following value for an URL input :
>> http://www.example.com/someservlet?foo=bar&super=great&baz=qux
>> I use the HTTPURL validator (in the default ESAPI.properties) to make 
>> sure the input is appropriate :
>> String validatedUrl =
>> org.owasp.esapi.ESAPI.validator().getValidInput("userURL",
>> request.getParameter("userURL"), "HTTPURL", 2000, true);
>> As expected, a "canonicalization" occurs and the HTMLEntityCodec is 
>> applied to decode characters.
>> Problem : the start of parameter "&super=great" is consider as an 
>> html entity (⊇) and the URL will be canonicalized this way :
>> http://www.example.com/someservlet?foo=bar⊇r=great&baz=qux
>> (Unicode code point U+2287 described as "superset of or equal to"
>> appears after "bar" insead of the expected parameter name)
>> It is clearly stated in the JavaDoc of the HTMLEntityCodec that this 
>> behavior is applied :
>> />  Formats all are legal both with and without semi-colon, 
>> upper/lower case/ cf
>> http://owasp-esapi-java.googlecode.com/svn/trunk_doc/latest/org/owasp/esapi/codecs/HTMLEntityCodec.html#decodeCharacter%28org.owasp.esapi.codecs.PushbackString%29 
>> /
>> /However, the wikipedia article "List of XML and HTML character 
>> entity references" (cf 
>> http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
>> ) states :
>> />  The semicolon is required./
>> Which is not entirely true. Indeed, if we dive into the HTML 
>> specification for Character references ( 
>> http://www.w3.org/TR/REC-html40/charset.html#entities ), we can read :
>> /*>  *In SGML, it is possible to eliminate the final ";" after a 
>> character reference in some cases (e.g., at a line break or 
>> immediately before a tag). In other circumstances it may not be 
>> eliminated (e.g., in the middle of a word). We strongly suggest using 
>> the ";" in all cases to avoid problems with user agents that require 
>> this character to be present.
>> /
>> And I think this is the problem of the HTMLEntityCodec.
>> /-->  In other circumstances it may not be eliminated (e.g., in the 
>> middle of a word)./
>> In my use case "&super=great", the entity "&supe" is part of another 
>> word and MUST not have been decoded.
>> I think the HTMLEntityCodec should be modified to apply this 
>> behavior, otherwise it leads to invalid data being retrieved.
>> But as I am no security expert... What do you think ?
>> Regards,
>> Olivier Jaquemet
>> _______________________________________________
>> Esapi-user mailing list
>> Esapi-user at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/esapi-user
>> _______________________________________________
>> Esapi-user mailing list
>> Esapi-user at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/esapi-user

Olivier Jaquemet<olivier.jaquemet at jalios.com>
Ingénieur R&D Jalios S.A. - http://www.jalios.com/
@OlivierJaquemet +33970461480

More information about the Esapi-user mailing list