[Esapi-user] Issue in HtmlEntityCodec while implementing ESAPI security filter.

Seil, Matt matt.seil at nelnet.net
Fri Jul 25 13:46:48 UTC 2014

Heh.  Good day Bhuvanesh, I tried to answer your question on Stack Overflow actually.

This is related directly to the discussion I was having previously (Partially offline) with Jim Manico and Chris Schmidt in relation to how the default HtmlEntityCodec is prematurely converting html entities embedded in URLs because its not looking for the semicolon as defined in the HTML Spec.

The bug in question is here:


To refresh everyone’s memory, I had (well, still have) a case where URLs are being sent into a system for callbacks, but the canonicalize method was transforming embedded HTML entities in the query parameters.  One solution would be to alter the default HTMLEntityCodec to add the semicolon.  I’ve validated that in the current versions of IE, Chrome, and Firefox that they do not prematurely interpret say “&phi” to “ϕ”.  I suspect the worry of different browsers on this matter was probably pointed at IE6 which I don’t think we need to worry about now.  But I pause on such a seemingly simple solution:  What if instead of “;” its “%3B”?  At the moment its looking at an HTML entity, have we already handled %-encoding?

(Continued from my private conversation with Jim Manico, expanded to everyone here)
My issue was/is really the same as Bhuvanesh’s:  We want to canonicalize and validate a URL.  But a GET URL as input is a special case:

1.      There are different legal characters for each segment of a URL—most if not nearly all regex approaches to validating a URL fail to capture that fact.  (Including, I think, our own.)

a.      See:  Validator.HTTPURI=^[a-zA-Z0-9()\\-=\\*\\.\\?;,+\\/:&_ ]*$

b.      What the heck is this? -->  Validator.HTTPURL=^.*$

                                                    i.     A valid URL is ANY character string with a beginning and end?  There’s gotta be a story here.

2.      GET query Parameters can have html entities embedded within them, adding another layer

3.      GET query Parameters can have URL or HTML % encoding embedded.

4.      GET query Parameters are an attack vector and therefore need to be scrutinized.

So we can’t just canonicalize/validate a GET URL like we do other pieces of data.  Its more than just a String.  We have to break it apart and apply different rules to each section.  AND THEN if we’re trying to prevent XSS we need to look at the query parameters as special cases as well.

Thus far, the closest (clunky) solution I’ve come up with is this:  http://stackoverflow.com/questions/23434156/esapi-xss-prevention-for-user-supplied-url-property/23448264#23448264

That solution demonstrates using the URI/URIBuilder to break apart a URL using its RFC rules and with a minor modification could be used to apply validation against each piece.  This technique would bypass the issue with the HTMLEntityCodec not looking for semicolons, and at least as-is, would provide a fully canonicalized URL String while being protected from multiple-encoding attacks in a GET query.

From: esapi-user-bounces at lists.owasp.org [mailto:esapi-user-bounces at lists.owasp.org] On Behalf Of Bhuvanesh Waran
Sent: Thursday, July 24, 2014 11:41 PM
To: Kevin W. Wall; esapi-user at lists.owasp.org
Subject: [Esapi-user] Issue in HtmlEntityCodec while implementing ESAPI security filter.

Thanks for the Response Kevin.

Let me Explain the scenario which is giving intervention while implementing ESAPI. Filter in our Web based Application.

Please find the Attached document for the steps which we are following For implementing ESAPI in business App..

The getQueryString() method of SecurityWrapperRequest.java is invoked 3 times for each request. Also the queryString is getting emptied when it finds invalid character inside queryString.

For example:
When the Url is http://localhost:9081/w0094553/execute.do?nextPageId=testSearch, the new page is loaded successfully. But when the URL is http://localhost:9089/w0094553/execute.do?nextPageId=testSearch&next=abc, the same page is loaded again and not navigated to new page. This is happening due to the queryString is getting removed because of the canonicalization of “ne”.

Also tried with URL http://localhost:9081/w0094553/launchTest.redirect?nextPageId=testSearch≠xt=abc and found blank page with NoSuchElementException in console.

Rootcause for the Issue.In HtmlEntityCodec.class Line 278 for the Method :mkCharacterToEntityMap()

Line 510 : map.put((char)8800, "ne"); /* not equal to */ which is responsible for adding ≠ while Canonicalize the querystring.
So the validation fails and we are unable to redirect to any of the pages.
Since We are implementing ESAPI as a filter .we cant make canonicalize as false. By default canonicalize as true and we cannot invoke the any other methods.
Please provide your inputs to get rid of the issue.

The Reason we are implementing ESAPI is for  avoiding  cross site scripting issues for any of the request.

The information contained in this message is confidential proprietary property of Nelnet, Inc. and its affiliated companies (Nelnet) and is intended for the recipient only. Any reproduction, forwarding, or copying without the express permission of Nelnet is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to this e-mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/esapi-user/attachments/20140725/a04b3af7/attachment.html>

More information about the Esapi-user mailing list