[Esapi-user] HTMLEntityCodec and optional semicolon for namedentities
Olivier Jaquemet
olivier.jaquemet at jalios.com
Wed Oct 19 15:34:26 EDT 2011
Thank you Jim, I will have a look a this.
On 19/10/2011 20:45, Jim Manico wrote:
> So when you are trying to validate and display a complete URL from
> untrusted input, I would suggest the following workflow:
>
> 1) (on input) When first accepting the URL from the user, validate
> that URL with the Apache commons URL validation class or something
> similar:
> http://commons.apache.org/validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html
>
> * If the URL does not pass then reject the input.
> PS: If you use ESAPI's validator for this validation then I recommend
> you turn off canonicalization for that one input.
>
> 2) (on input) Make sure the URL starts with http:// or https:// or
> otherwise reject the input (this rule will need to be modified for
> your app most likely)
> * If the URL does not pass this rule then reject the input.
>
> 3) (on output) Render the URL using the normal encoder rules based on
> context of display. If you are putting this URL in a HREF link so it
> can be click, you need to do attribute encoding. If you are putting
> this URL in a body content for display only, then just do normal HTML
> entity encoding.
>
> This workflow should solve your problem.
>
> Aloha,
> Jim
>
> PS: Keep in mind I never did any kind of URL encoding. If untrusted
> data lands in just a get parameter, then URL encode. :)
>
>> This is an excellent question that came up a few years ago, actually.
>>
>> First, whether or not the spec requires a ; at the end of an entity
>> is sort of irrelevant. The fact is that browsers do actually
>> interpret those characters. So we need to treat those non-terminated
>> entities as potential injection ingredients.
>>
>> The bigger question, IMO, is to figure out what the "canonical" form
>> of the input you sent actually is. The problem is that if the URL
>> interpreter handles this first, then the& will be used to separate
>> the URL parameters, and there are no HTML entities. Alternatively,
>> if the HTML interpreter handles this input first, it will see
>> the&super and decide to decode the special character.
>>
>> This behavior is not necessarily standard, as different browsers may
>> have made different choices. If you want to change the way that the
>> canonicalizer in ESAPI works, you can create your own encoder with a
>> list of codecs. The order of the codecs should match the order of
>> the interpreters and/or decoders that you expect the data to visit.
>>
>> There are many other possible examples of data that can be
>> canonicalized two different ways given the rich encoding environment
>> created for us in modern web application petri dishes. One that bit
>> me recently was that "c:\file" decoded into "c:[0x0f]ile" -- beware
>> multiple encoding schemes!!
>>
>> Personally, I recommend not using URL parameters that start with HTML
>> entity names. This is unfortunate, but I don't see a great way
>> around it.
>>
>> --Jeff
>>
>> Jeff Williams, CEO
>> Aspect Security
>> 410-707-1487
>>
>>
>> -----Original Message-----
>> From: esapi-user-bounces at lists.owasp.org
>> [mailto:esapi-user-bounces at lists.owasp.org] On Behalf Of Olivier
>> Jaquemet
>> Sent: Wednesday, October 19, 2011 9:18 AM
>> To: esapi-user at lists.owasp.org
>> Subject: [Esapi-user] HTMLEntityCodec and optional semicolon for
>> namedentities
>>
>> Hello all,
>>
>> I have a question regarding a discutable behavior of the
>> HTMLEntityCodec in ESAPI 2.0.1 (which I think is a bug).
>>
>> Let's say a user send the following value for an URL input :
>> http://www.example.com/someservlet?foo=bar&super=great&baz=qux
>>
>> I use the HTTPURL validator (in the default ESAPI.properties) to make
>> sure the input is appropriate :
>> String validatedUrl =
>> org.owasp.esapi.ESAPI.validator().getValidInput("userURL",
>> request.getParameter("userURL"), "HTTPURL", 2000, true);
>>
>> As expected, a "canonicalization" occurs and the HTMLEntityCodec is
>> applied to decode characters.
>> Problem : the start of parameter "&super=great" is consider as an
>> html entity (⊇) and the URL will be canonicalized this way :
>> http://www.example.com/someservlet?foo=bar⊇r=great&baz=qux
>> (Unicode code point U+2287 described as "superset of or equal to"
>> appears after "bar" insead of the expected parameter name)
>>
>> It is clearly stated in the JavaDoc of the HTMLEntityCodec that this
>> behavior is applied :
>> /> Formats all are legal both with and without semi-colon,
>> upper/lower case/ cf
>> http://owasp-esapi-java.googlecode.com/svn/trunk_doc/latest/org/owasp/esapi/codecs/HTMLEntityCodec.html#decodeCharacter%28org.owasp.esapi.codecs.PushbackString%29
>>
>> /
>> /However, the wikipedia article "List of XML and HTML character
>> entity references" (cf
>> http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
>> ) states :
>> /> The semicolon is required./
>> Which is not entirely true. Indeed, if we dive into the HTML
>> specification for Character references (
>> http://www.w3.org/TR/REC-html40/charset.html#entities ), we can read :
>> /*> *In SGML, it is possible to eliminate the final ";" after a
>> character reference in some cases (e.g., at a line break or
>> immediately before a tag). In other circumstances it may not be
>> eliminated (e.g., in the middle of a word). We strongly suggest using
>> the ";" in all cases to avoid problems with user agents that require
>> this character to be present.
>> /
>> And I think this is the problem of the HTMLEntityCodec.
>> /--> In other circumstances it may not be eliminated (e.g., in the
>> middle of a word)./
>>
>> In my use case "&super=great", the entity "&supe" is part of another
>> word and MUST not have been decoded.
>> I think the HTMLEntityCodec should be modified to apply this
>> behavior, otherwise it leads to invalid data being retrieved.
>> But as I am no security expert... What do you think ?
>>
>> Regards,
>> Olivier Jaquemet
>> _______________________________________________
>> Esapi-user mailing list
>> Esapi-user at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/esapi-user
>> _______________________________________________
>> Esapi-user mailing list
>> Esapi-user at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/esapi-user
>
>
--
Olivier Jaquemet<olivier.jaquemet at jalios.com>
Ingénieur R&D Jalios S.A. - http://www.jalios.com/
@OlivierJaquemet +33970461480
More information about the Esapi-user
mailing list