[Esapi-dev] [Esapi-user] URL Validation and Encoding

Jeff Williams jeff.williams at owasp.org
Thu Sep 23 21:59:54 EDT 2010


AntiSamy DOES use a full parser, not regex.

--Jeff


On Sep 23, 2010, at 9:49 PM, Chris Schmidt <chrisisbeef at gmail.com> wrote:

> This is pretty similar to a point that was brought up by the A in the K blog guy.
> 
> If we are indeed going to add methods to the interface to do this, I think that we absolutely want to use URI under the hood to do so, and as much as it pains me to say it (because it breaks backwards compatibility) we need to rename encodeForURL to encodeUrlParameter. We can deprecate the old method and just have it wrap the new call for now, but with the added functionality, it is pretty confusing to know which one I am supposed to use.
> 
> Food for thought...
> 
> On Sep 23, 2010, at 7:32 PM, Jim Manico wrote:
> 
>> Yup, I totally agree – we can use the URI class or ESAPI.encodeForURL()
>>  
>> I’m just looking for an encoding function in order to shove an untrusted URL, URL Fragment, or URL parameter into a href link context in a way that stops XSS without breaking the URL. These are special cases since its in an attribute context <a href=”DATA”>click me</a>  bet we do not want to attribute encode here.
>>  
>> And I think there are 2 cases to consider:
>>  
>> 1)      The URL root is hard coded and you only need to encode a GET parameter:
>> a.       <a href=”/site/user?id=UNTRUSTED-DATA”>click me</a>
>> b.      In this case we just URL encode UNTRUSTED-DATA
>> 2)      The untrusted data is a relative or absolute URL
>> a.       <a href=”UNTRUSTED-DATA”>click me</a>
>> b.      We cannot URL encode UNTRUSTED-DATA here or we will break the link
>> c.       We can surely use the URI class under the hood here
>>  
>> So I’m thinking that we still need:
>>  
>> ESAPI.encoder().encodeURLComponent
>> and
>> ESAPI.encoder().encodeURL(String url)
>> And perhaps
>> ESAPI.encoder().encodeURL(List<String> legalProtocols, String url)
>> And/or make legal protocols configurable
>>  
>> Also, a URL needs to be valid to be encoded for safe display if we use this scheme. If a URL is invalid at encoding time, perhaps just return a “#” or a blank string?
>>  
>> Again, why this madness? I’m trying to get away from regular expression based defense and instead take a page from compiler design thinking: (1) first load the input in question (a url) into a object abstraction that formally models that input and then (2) “write” the data in a whitelist way only supporting features of that input that are safe.
>>  
>> I’ve seen proprietary versions of AntiSamy that do this (where instead of an regular-expression based set of rules, the untrusted HTML is loaded into a HTML abstraction set of classes, like Wicket. Then, the “clean” function would just just write out only the legal tags that are to be supported. This kind of coding is (1) way faster (2) may more accurate (3) simpler code (4) less chance of failure over time. I think.
>>   
>> - Jim
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>> From: esapi-dev-bounces at lists.owasp.org [mailto:esapi-dev-bounces at lists.owasp.org] On Behalf Of Chris Schmidt
>> Sent: Thursday, September 23, 2010 4:01 AM
>> To: esapi-dev at lists.owasp.org
>> Subject: Re: [Esapi-dev] [Esapi-user] URL Validation and Encoding
>>  
>> It seems like we may be redoing a lot of work here that is already done for us - 
>> 
>> From the JavaDocs on java.net.URL
>> 
>> Note, the URI class does perform escaping of its component fields in certain circumstances. The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL(). 
>> 
>> Unless I am missing something, why not just use the built-in API to perform the encoding of the URL. 
>> 
>> Validation is another story altogether, but URL validation seems like a big dark hole that could lead to some interesting assumptions and expectations - I have written a couple of URL validators that even go so far as to do a DNS lookup of the domain, submit a request to the url specified (and thought has been given to scanning the response for *dangerous* content), verify the response code is a 200 and only then would the URL be valid.
>> 
>> Point here being that while this sounds like something that may be somewhat useful to a handful of people, and perhaps at least a basic - this is a valid url - functionality would be helpful, I think that there are bigger fish to fry that re-inventing RFC2396 Encoding for URLs. To the best of my knowledge, the URI encoding is fully compliant. If we really want to add to the encoding interface, perhaps just a delegation method to that is the right way to go?
>> 
>> Thoughts?
>> 
>> On 9/22/2010 11:58 PM, Jim Manico wrote:
>> We can add a second encoder for relative URL's, but the programmer would
>> need to specify the domain, using one of the other URL constructors, like:
>>   new URL("http", "www.gamelan.com", "/pages/Gamelan.net.html");
>>  
>> And ESAPI would provide:
>>  
>> ESAPI.encoder().encodeCompleteURL(String URL);
>> ESAPI.encoder().encodeURLParameter(String data); //Javascript calls this a
>> "URIComponent"
>> ESAPI.encoder().encodeRelativeURL(String root, String relativeURL);
>>  
>> As well as
>>  
>> ESAPI.validator().assertValidCompleteURL(String url) throws
>> ValidationException;
>> ESAPI.validator().assertValidRelativeURL(String root, String relativeURL)
>> throws ValidationException;
>> boolean ESAPI.validator().isValidCompleteURL(String url);
>> boolean ESAPI.validator().isValidRelativeURL(String root, String
>> relativeURL);
>>  
>> - Jim
>>  
>>  
>> -----Original Message-----
>> From: Ed Schaller [mailto:schallee at darkmist.net] 
>> Sent: Wednesday, September 22, 2010 4:44 PM
>> To: augustd
>> Cc: Jim Manico; ESAPI-Developers; esapi-user at lists.owasp.org
>> Subject: Re: [Esapi-user] [Esapi-dev] URL Validation and Encoding
>>  
>> > Old Signed by an unknown key
>>  
>> This should be easy enough to do with built-in methods of java.net.URL
>> like
>> getProtocol(), getHost(), getPath(), etc.
>>  
>> Just to be the devil's advocate here, what happens if the URL the
>> developer wants to support doesn't have a protocol handler? Is this
>> something we care about? If it is, java.net.URL wont work well and
>> adding new protocol handlers has implications on ClassLoaders and java
>> 2 security.
>>  
>> ------>
>>  
>> * Unknown Key
>> * 0xA1297841
>>  
>> _______________________________________________
>> Esapi-dev mailing list
>> Esapi-dev at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/esapi-dev
>>  
> 
> _______________________________________________
> Esapi-dev mailing list
> Esapi-dev at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/esapi-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-dev/attachments/20100923/32dc7298/attachment-0001.html 


More information about the Esapi-dev mailing list