[Esapi-user] A question on your HTMLEntityCodec code

Jeff Williams jeff.williams at aspectsecurity.com
Tue May 4 23:48:05 EDT 2010


Hi Brad,

Sorry for the delay in responding... your email went into my junk
folder.  You're welcome to post questions like this to the ESAPI dev or
user lists BTW.  In any case, those codepoints are unprintable and
shouldn't be used in web pages.  They are...

DEL
PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI SS2 SS3 
DCS PU1 PU2 STS CCH MW SPA EPA SOS SGCI SCI CSI ST OSC PM APC

However, for backward compatibility with early HTML authors and browsers
that ignored this restriction, raw characters and numeric character
references in the 80-9F range are interpreted by some browsers as
representing the characters mapped to bytes 80-9F in the Windows-1252
encoding

http://en.wikipedia.org/wiki/Character_encodings_in_HTML 

We suggest using the appropriate codepoint if you want those characters.

--Jeff

Jeff Williams



-----Original Message-----
From: Brad Baker [mailto:bbaker at atlassian.com] 
Sent: Thursday, April 29, 2010 10:12 PM
To: Jeff Williams
Subject: A question on your HTMLEntityCodec code

Hi my name is Brad Baker and  work for Atlassian (www.atlassian.com)

I recently was investigating better HTML encoding and I came across your

code as recommended by OSWAP

I was intrigued by the following lines:

                // check for illegal characters
                if ( ( c <= 0x1f && c != '\t' && c != '\n' && c != '\r' 
) || ( c >= 0x7f && c <= 0x9f ) ) {
                        return( " " );
                }


Why do you remove code points 127 --> 159 inclusive.  These are a bunch 
of Accent charactersa as I understand it and pose no XSS risk.

If thats not true can you please explain why.

Thanks in advance for any help you can offer with this

Brad Baker
Developer at Atlassian



More information about the Esapi-user mailing list