<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:1386877913;
        mso-list-type:hybrid;
        mso-list-template-ids:1819699440 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Heh.  Good day Bhuvanesh, I tried to answer your question on Stack Overflow actually. 
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">This is related directly to the discussion I was having previously (Partially offline) with Jim Manico and Chris Schmidt in relation to how the default HtmlEntityCodec
 is prematurely converting html entities embedded in URLs because its not looking for the semicolon as defined in the HTML Spec. 
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The bug in question is here:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="https://code.google.com/p/owasp-esapi-java/issues/detail?id=258">https://code.google.com/p/owasp-esapi-java/issues/detail?id=258</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">To refresh everyone’s memory, I had (well, still have) a case where URLs are being sent into a system for callbacks, but the canonicalize method was transforming
 embedded HTML entities in the query parameters.  One solution would be to alter the default HTMLEntityCodec to add the semicolon.  I’ve validated that in the current versions of IE, Chrome, and Firefox that they do not prematurely interpret say “&phi” to “ϕ”. 
 I suspect the worry of different browsers on this matter was probably pointed at IE6 which I don’t think we need to worry about now.  But I pause on such a seemingly simple solution:  What if instead of “;” its “%3B”?  At the moment its looking at an HTML
 entity, have we already handled %-encoding?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">(Continued from my private conversation with Jim Manico, expanded to everyone here)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">My issue was/is really the same as Bhuvanesh’s:  We want to canonicalize and validate a URL.  But a GET URL as input is a special case:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">There are different legal characters for each segment of a URL—most if not nearly all regex approaches to validating a URL fail to capture that fact. 
 (Including, I think, our own.)<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo1">
<![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore">a.<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">See:  Validator.HTTPURI=^[a-zA-Z0-9()\\-=\\*\\.\\?;,+\\/:&_ ]*$<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo1">
<![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore">b.<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">What the heck is this?
</span><span style="font-size:11.0pt;font-family:Wingdings;color:#1F497D">à</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">  Validator.HTTPURL=^.*$<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.5in;text-indent:-1.5in;mso-text-indent-alt:-9.0pt;mso-list:l0 level3 lfo1">
<![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore"><span style="font:7.0pt "Times New Roman"">                                                   
</span>i.<span style="font:7.0pt "Times New Roman"">     </span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">A valid URL is ANY character string with a beginning and end?  There’s gotta be a story here.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">GET query Parameters can have html entities embedded within them, adding another layer<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">GET query Parameters can have URL or HTML % encoding embedded. 
<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><span style="mso-list:Ignore">4.<span style="font:7.0pt "Times New Roman"">     
</span></span></span><![endif]><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">GET query Parameters are an attack vector and therefore need to be scrutinized.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">So we can’t just canonicalize/validate a GET URL like we do other pieces of data.  Its more than just a String.  We have to break it apart and apply different
 rules to each section.  AND THEN if we’re trying to prevent XSS we need to look at the query parameters as special cases as well. 
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thus far, the closest (clunky) solution I’ve come up with is this: 
<a href="http://stackoverflow.com/questions/23434156/esapi-xss-prevention-for-user-supplied-url-property/23448264#23448264">
http://stackoverflow.com/questions/23434156/esapi-xss-prevention-for-user-supplied-url-property/23448264#23448264</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">That solution demonstrates using the URI/URIBuilder to break apart a URL using its RFC rules and with a minor modification could be used to apply validation
 against each piece.  This technique would bypass the issue with the HTMLEntityCodec not looking for semicolons, and at least as-is, would provide a fully canonicalized URL String while being protected from multiple-encoding attacks in a GET query.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> esapi-user-bounces@lists.owasp.org [mailto:esapi-user-bounces@lists.owasp.org]
<b>On Behalf Of </b>Bhuvanesh Waran<br>
<b>Sent:</b> Thursday, July 24, 2014 11:41 PM<br>
<b>To:</b> Kevin W. Wall; esapi-user@lists.owasp.org<br>
<b>Subject:</b> [Esapi-user] Issue in HtmlEntityCodec while implementing ESAPI security filter.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p><br>
Thanks for the Response Kevin.<o:p></o:p></p>
<p>Let me Explain the scenario which is giving intervention while implementing ESAPI. Filter in our Web based Application.<o:p></o:p></p>
<p>Please find the Attached document for the steps which we are following For implementing ESAPI in business App..<o:p></o:p></p>
<p>The getQueryString() method of SecurityWrapperRequest.java is invoked 3 times for each request. Also the queryString is getting emptied when it finds invalid character inside queryString.<o:p></o:p></p>
<p>For example:<br>
When the Url is <a href="http://localhost:9081/w0094553/execute.do?nextPageId=testSearch">
http://localhost:9081/w0094553/execute.do?nextPageId=testSearch</a>, the new page is loaded successfully. But when the URL is
<a href="http://localhost:9089/w0094553/execute.do?nextPageId=testSearch&next=abc">
http://localhost:9089/w0094553/execute.do?nextPageId=testSearch&next=abc</a>, the same page is loaded again and not navigated to new page. This is happening due to the queryString is getting removed because of the canonicalization of “ne”.<o:p></o:p></p>
<p>Also tried with URL <a href="http://localhost:9081/w0094553/launchTest.redirect?nextPageId=testSearch">
http://localhost:9081/w0094553/launchTest.redirect?nextPageId=testSearch</a>≠xt=abc and found blank page with NoSuchElementException in console.<o:p></o:p></p>
<p>Rootcause for the Issue.In HtmlEntityCodec.class Line 278 for the Method :mkCharacterToEntityMap()<o:p></o:p></p>
<p>Line 510 : map.put((char)8800, "ne"); /* not equal to */ which is responsible for adding ≠ while Canonicalize the querystring.<br>
So the validation fails and we are unable to redirect to any of the pages.<br>
Since We are implementing ESAPI as a filter .we cant make canonicalize as false. By default canonicalize as true and we cannot invoke the any other methods.<br>
Please provide your inputs to get rid of the issue.<o:p></o:p></p>
<p>The Reason we are implementing ESAPI is for  avoiding  cross site scripting issues for any of the request.<o:p></o:p></p>
<p><o:p> </o:p></p>
</div>
<P><strong>------------------------------------------------------------ <br>The information contained in this message is confidential proprietary property of Nelnet, Inc. and its affiliated companies (Nelnet) and is intended for the recipient only. Any reproduction, forwarding, or copying without the express permission of Nelnet is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to this e-mail. <br>------------------------------------------------------------ </strong></P></body>
</html>