[Esapi-dev] [Esapi-user] URL Validation and Encoding

Jim Manico jim.manico at owasp.org
Thu Sep 23 22:03:42 EDT 2010


I’m not picking on AntiSamy – I use it in every app and I think it’s an outstanding project. Arshan and Jason Li should be commended for this great library.

 

I’m just pushing at the edges, Jeff. J

 

But, specific to URL’s (at least) AntiSamy is all RegEx. From an older AntiSamy policy file….

 

      <common-regexps>

            

            <!-- 

            From W3C:

            This attribute assigns a class name or set of class names to an

            element. Any number of elements may be assigned the same class

            name or names. Multiple class names must be separated by white 

            space characters.

            -->

            

            <regexp name="htmlTitle" value="[a-zA-Z0-9\s-_',:\[\]!\./\\\(\)]*"/> <!-- force non-empty with a '+' at the end instead of '*' -->

            <regexp name="onsiteURL" value="([\w\\/\.\?=&amp;;\#-~]+|\#(\w)+)"/>

            <regexp name="offsiteURL" value="(\s)*((ht|f)tp(s?)://|mailto:)[A-Za-z0-9]+[~a-zA-Z0-9-_\.@#$%&amp;;:,\?=/\+!]*(\s)*"/>

      

      </common-regexps>

      

 

From: Jeff Williams [mailto:jeff.williams at owasp.org] 
Sent: Thursday, September 23, 2010 4:00 PM
To: Chris Schmidt
Cc: Jim Manico; esapi-dev at lists.owasp.org
Subject: Re: [Esapi-dev] [Esapi-user] URL Validation and Encoding

 

AntiSamy DOES use a full parser, not regex.

--Jeff

 

 

On Sep 23, 2010, at 9:49 PM, Chris Schmidt <chrisisbeef at gmail.com> wrote:

This is pretty similar to a point that was brought up by the A in the K blog guy.

 

If we are indeed going to add methods to the interface to do this, I think that we absolutely want to use URI under the hood to do so, and as much as it pains me to say it (because it breaks backwards compatibility) we need to rename encodeForURL to encodeUrlParameter. We can deprecate the old method and just have it wrap the new call for now, but with the added functionality, it is pretty confusing to know which one I am supposed to use.

 

Food for thought...

 

On Sep 23, 2010, at 7:32 PM, Jim Manico wrote:





Yup, I totally agree – we can use the URI class or ESAPI.encodeForURL()

 

I’m just looking for an encoding function in order to shove an untrusted URL, URL Fragment, or URL parameter into a href link context in a way that stops XSS without breaking the URL. These are special cases since its in an attribute context <a href=”DATA”>click me</a>  bet we do not want to attribute encode here.

 

And I think there are 2 cases to consider:

 

1)      The URL root is hard coded and you only need to encode a GET parameter:

a.       <a href=”/site/user?id=UNTRUSTED-DATA”>click me</a>

b.      In this case we just URL encode UNTRUSTED-DATA

2)      The untrusted data is a relative or absolute URL

a.       <a href=”UNTRUSTED-DATA”>click me</a>

b.      We cannot URL encode UNTRUSTED-DATA here or we will break the link

c.       We can surely use the URI class under the hood here

 

So I’m thinking that we still need:

 

ESAPI.encoder().encodeURLComponent

and

ESAPI.encoder().encodeURL(String url)

And perhaps

ESAPI.encoder().encodeURL(List<String> legalProtocols, String url)

And/or make legal protocols configurable

 

Also, a URL needs to be valid to be encoded for safe display if we use this scheme. If a URL is invalid at encoding time, perhaps just return a “#” or a blank string?

 

Again, why this madness? I’m trying to get away from regular expression based defense and instead take a page from compiler design thinking: (1) first load the input in question (a url) into a object abstraction that formally models that input and then (2) “write” the data in a whitelist way only supporting features of that input that are safe.

 

I’ve seen proprietary versions of AntiSamy that do this (where instead of an regular-expression based set of rules, the untrusted HTML is loaded into a HTML abstraction set of classes, like Wicket. Then, the “clean” function would just just write out only the legal tags that are to be supported. This kind of coding is (1) way faster (2) may more accurate (3) simpler code (4) less chance of failure over time. I think.

  

- Jim

 

 

 

 

 

 

 

 

From: esapi-dev-bounces at lists.owasp.org [mailto:esapi-dev-bounces at lists.owasp.org] On Behalf Of Chris Schmidt
Sent: Thursday, September 23, 2010 4:01 AM
To: esapi-dev at lists.owasp.org
Subject: Re: [Esapi-dev] [Esapi-user] URL Validation and Encoding

 

It seems like we may be redoing a lot of work here that is already done for us - 

>From the JavaDocs on java.net.URL

Note, the  <http://download.oracle.com/javase/6/docs/api/java/net/URI.html> URI class does perform escaping of its component fields in certain circumstances. The recommended way to manage the encoding and decoding of URLs is to use  <http://download.oracle.com/javase/6/docs/api/java/net/URI.html> URI, and to convert between these two classes using  <http://download.oracle.com/javase/6/docs/api/java/net/URL.html#toURI%28%29> toURI() and  <http://download.oracle.com/javase/6/docs/api/java/net/URI.html#toURL%28%29> URI.toURL(). 

Unless I am missing something, why not just use the built-in API to perform the encoding of the URL. 

Validation is another story altogether, but URL validation seems like a big dark hole that could lead to some interesting assumptions and expectations - I have written a couple of URL validators that even go so far as to do a DNS lookup of the domain, submit a request to the url specified (and thought has been given to scanning the response for *dangerous* content), verify the response code is a 200 and only then would the URL be valid.

Point here being that while this sounds like something that may be somewhat useful to a handful of people, and perhaps at least a basic - this is a valid url - functionality would be helpful, I think that there are bigger fish to fry that re-inventing RFC2396 Encoding for URLs. To the best of my knowledge, the URI encoding is fully compliant. If we really want to add to the encoding interface, perhaps just a delegation method to that is the right way to go?

Thoughts?

On 9/22/2010 11:58 PM, Jim Manico wrote:

We can add a second encoder for relative URL's, but the programmer would
need to specify the domain, using one of the other URL constructors, like:
  new URL("http", "www.gamelan.com", "/pages/Gamelan.net.html");
 
And ESAPI would provide:
 
ESAPI.encoder().encodeCompleteURL(String URL);
ESAPI.encoder().encodeURLParameter(String data); //Javascript calls this a
"URIComponent"
ESAPI.encoder().encodeRelativeURL(String root, String relativeURL);
 
As well as
 
ESAPI.validator().assertValidCompleteURL(String url) throws
ValidationException;
ESAPI.validator().assertValidRelativeURL(String root, String relativeURL)
throws ValidationException;
boolean ESAPI.validator().isValidCompleteURL(String url);
boolean ESAPI.validator().isValidRelativeURL(String root, String
relativeURL);
 
- Jim
 
 
-----Original Message-----
From: Ed Schaller [mailto:schallee at darkmist.net] 
Sent: Wednesday, September 22, 2010 4:44 PM
To: augustd
Cc: Jim Manico; ESAPI-Developers; esapi-user at lists.owasp.org
Subject: Re: [Esapi-user] [Esapi-dev] URL Validation and Encoding
 
> Old Signed by an unknown key
 

This should be easy enough to do with built-in methods of java.net.URL

like

getProtocol(), getHost(), getPath(), etc.

 
Just to be the devil's advocate here, what happens if the URL the
developer wants to support doesn't have a protocol handler? Is this
something we care about? If it is, java.net.URL wont work well and
adding new protocol handlers has implications on ClassLoaders and java
2 security.
 

------>

 
* Unknown Key
* 0xA1297841
 
_______________________________________________
Esapi-dev mailing list
Esapi-dev at lists.owasp.org
https://lists.owasp.org/mailman/listinfo/esapi-dev

 

 

_______________________________________________
Esapi-dev mailing list
Esapi-dev at lists.owasp.org
https://lists.owasp.org/mailman/listinfo/esapi-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-dev/attachments/20100923/3f09e46a/attachment-0001.html 


More information about the Esapi-dev mailing list