[Esapi-dev] esapi-2.0.1.jar - incorrect treatement of named html entities?

Chris Schmidt chris.schmidt at owasp.org
Wed Jul 25 15:01:15 UTC 2012


This seems like a legit request to me but i think that it should be an
option, not the default choice. Gunther, can you log an issue in our google
code issue tracker for this please?
On Jul 25, 2012 7:39 AM, "Günther Zwetti" <guenther.zwetti at unycom.com>
wrote:

>  Hello all,****
>
> ** **
>
> Funny – very funny – please don’t wonder about my previous posting, but
> this website unfortunately encodes all my named HTML entities and so my
> text seems not be very meaningful. ****
>
> As a result of this I decided to put a “|” character  after each “&”
> character  to avoid this encoding ;-)****
>
> ** **
>
> But back to topic:****
>
> ** **
>
> Some days ago we found out that a very strange behavior of our software
> was caused by the implementation of method *decodeForHTML*  as defined in
> interface *org.owasp.esapi.Encoder*.****
>
> In detail, the concrete implementation (class *HTMLEntityCodec*) tries to
> decode HTML encoded text by finding out HTML entity parts of string
> literals first and then trying to find a corresponding entry in a map
> (class *HTMLEntityCodec*, method *getNamedEntity*).****
>
> ** **
>
> An example: Input (HML encoded) text: *“abcDefG&*|*Uuml;xyz”* ****
>
> Now the parts *“abcDefG”* and *“xyz”* pass entity check und won’t be
> modified, whereas the part “*&|Uuml;”* will be recognized being an HTML
> entity.****
>
> As a result of this, the part “*&|Uuml;”* will be handed over to the
> method *getNamedEntity*, which now tries to get a corresponding entry for
> this named HTML entity (e.g. method should return “<” for “&|lt;”).****
>
> ** **
>
> In my opinion, this method does not work correctly due to the fact, that
> input will be converted to lower case which leads to incorrect output if
> you use case sensitive HTML entities like “&|Uuml;” (=Ü), “&|uuml;” (=ü).*
> ***
>
> This results in an incorrect output *“ü”* for input “*&|Uuml;” *but
> should be *“Ü”* (upper case!)****
>
> ** **
>
> Also, this method (in class *HTMLEntityCodec)* uses a hard coded map for
> lookup even though there also exists a property file named *
> antisamy-esapi.xml* which also defines HTML entities.****
>
> ** **
>
> Therefore two question arises: ****
>
> (1) Is this a known bug (and maybe already fixed) or can/should we fix it
> by removing the toLowerCase statement without any negative side effects?**
> **
>
> (2) What is the file *antisamy-esapi.xml* used for (especially the part
> for named HTML entities)?****
>
> ** **
>
> Thanks for your answers in advance,****
>
> ** **
>
> Kind regards,****
>
> Günther****
>
> ** **
>
> Code:****
>
> ** **
>
>       *private* Character getNamedEntity( PushbackString input ) {****
>
>             // …****
>
>             len = Math.*min*(input.remainder().length(), *
> entityToCharacterTrie*.getMaxKeyLength());****
>
>             *for*(*int* i=0;i<len;i++)****
>
>                   possible.append(Character.toLowerCase(input.next()));
> // *** problem! *******
>
>             // look up the longest match****
>
>             entry = *entityToCharacterTrie*.getLongestMatch(possible);****
>
> ** **
>
> ** **
>
> _______________________________________________
> Esapi-dev mailing list
> Esapi-dev at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/esapi-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/esapi-dev/attachments/20120725/62322ba1/attachment.html>


More information about the Esapi-dev mailing list