[Esapi-user] HTML and XML encoder do not support unicode codepoints using surrogate pair

Olivier Jaquemet olivier.jaquemet at jalios.com
Fri May 22 15:52:50 UTC 2015


Should anyone be interested in a workaround, the Owasp Java Encoder 
Project provides a working solution with proper unicode encoding support.
https://www.owasp.org/index.php/OWASP_Java_Encoder_Project

Olivier

On 23/04/2015 17:18, Olivier Jaquemet wrote:
> Hello all,
>
> The default codec/encoder classes do not properly handle Unicode 
> codepoints whose representation requires more than 16 bits.
>
> For example, using the default HTML Encoder the following code :
>    String in = "\uD840\uDC0A";// https://codepoints.net/U+2000A
>    System.out.println("HTML: " + 
> ESAPI.encoder().encodeForHTMLAttribute(in));
>    System.out.println("XML : " + ESAPI.encoder().encodeForXML(in));
>
> ... outputs the following entities  :
> HTML: ��
> XML : ��
>
> ... whereas the following entity would be expected to correctly 
> represents the codepoint in HTML :
> HTML: 𠀊
> XML : 𠀊
>
> As far as I can see, the problem is located in the Codec implementations
> 1. method Codec.encode(char[], String ) characters are not properly 
> iterated (surrogate pair should be verified or codepoint used)
> 2. consequently method Codec.encodeCharacter(char[], Character) does 
> not handle code points on >16bits
> 3. in the end, in all codec implementation the method 
> encodeCharacter(char[] immune, Character c) *cannot* properly process 
> codepoints and this can be observed in both HTMLEntityCodec and 
> XMLEntityCodec
>
> That being said :
> Is ESAPI stil maintained ?
> If so, are you interested in adding such support ?
>
> Olivier Jaquemet
>
> PS : I'm reporting it as I could not find any information regarding 
> this matter, in any previous discussions or javadoc, that would 
> indicates it is a known limit.
> _______________________________________________
> Esapi-user mailing list
> Esapi-user at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/esapi-user
>



More information about the Esapi-user mailing list