[Esapi-dev] esapi-2.0.1.jar - incorrect treatement of named html entities?

Günther Zwetti guenther.zwetti at unycom.com
Wed Jul 25 13:38:57 UTC 2012


Hello all,

Funny - very funny - please don't wonder about my previous posting, but this website unfortunately encodes all my named HTML entities and so my text seems not be very meaningful.
As a result of this I decided to put a "|" character  after each "&" character  to avoid this encoding ;-)

But back to topic:

Some days ago we found out that a very strange behavior of our software was caused by the implementation of method decodeForHTML  as defined in interface org.owasp.esapi.Encoder.
In detail, the concrete implementation (class HTMLEntityCodec) tries to decode HTML encoded text by finding out HTML entity parts of string literals first and then trying to find a corresponding entry in a map (class HTMLEntityCodec, method getNamedEntity).

An example: Input (HML encoded) text: "abcDefG&|Uuml;xyz"
Now the parts "abcDefG" and "xyz" pass entity check und won't be modified, whereas the part "&|Uuml;" will be recognized being an HTML entity.
As a result of this, the part "&|Uuml;" will be handed over to the method getNamedEntity, which now tries to get a corresponding entry for this named HTML entity (e.g. method should return "<" for "&|lt;").

In my opinion, this method does not work correctly due to the fact, that input will be converted to lower case which leads to incorrect output if you use case sensitive HTML entities like "&|Uuml;" (=Ü), "&|uuml;" (=ü).
This results in an incorrect output "ü" for input "&|Uuml;" but should be "Ü" (upper case!)

Also, this method (in class HTMLEntityCodec) uses a hard coded map for lookup even though there also exists a property file named antisamy-esapi.xml which also defines HTML entities.

Therefore two question arises:
(1) Is this a known bug (and maybe already fixed) or can/should we fix it by removing the toLowerCase statement without any negative side effects?
(2) What is the file antisamy-esapi.xml used for (especially the part for named HTML entities)?

Thanks for your answers in advance,

Kind regards,
Günther

Code:

      private Character getNamedEntity( PushbackString input ) {
            // ...
            len = Math.min(input.remainder().length(), entityToCharacterTrie.getMaxKeyLength());
            for(int i=0;i<len;i++)
                  possible.append(Character.toLowerCase(input.next()));           // *** problem! ***
            // look up the longest match
            entry = entityToCharacterTrie.getLongestMatch(possible);


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/esapi-dev/attachments/20120725/07444340/attachment-0001.html>


More information about the Esapi-dev mailing list