[Owasp-antisamy] escaped tags goes thru without getting removed

Girish ivgirish at yahoo.com
Tue Apr 14 15:28:27 EDT 2009


awesome! this is what i was looking for.

thanks a lot,
Girish


Arshan Dabirsiaghi wrote:
> It passes all of RSnake's tests and more. You can look at the unit 
> tests here:
>  
> http://code.google.com/p/owaspantisamy/source/browse/trunk/Java/current/TestSource/org/owasp/validator/html/test/AntiSamyTest.java
>  
> Thanks,
> Arshan
>
> ------------------------------------------------------------------------
> *From:* owasp-antisamy-bounces at lists.owasp.org on behalf of Girish
> *Sent:* Tue 4/14/2009 3:23 PM
> *To:* Jason Li
> *Cc:* owasp-antisamy at lists.owasp.org
> *Subject:* Re: [Owasp-antisamy] escaped tags goes thru without getting 
> removed
>
> Jason,
> btw,  is antisamy tested against RSnake's cheat sheet ? if so, how are 
> the results ?
>
> thanks,
> Girish
>
>
> Jason Li wrote:
>> Girish,
>>
>> I think RSnake's cheatsheet that you mentioned
>> (http://ha.ckers.org/xss.html) is probably the best resource for
>> malicious content.
>>
>> In terms of expanding it to cleanse XML, it's unlikely we'll do this
>> because it's impossible for AntiSamy to determine how the XML content
>> data will be used. XML by it's nature is extremely flexible and could
>> be used in any number of contexts. Someone might be using a SOAP
>> message to deliver base64 encoded content that is then further HTML
>> entity encoded for interpreting by a browser at the end. AntiSamy
>> can't unravel the onion so to speak.
>>
>> IMHO, it's a task best left to the developer as they're the ones who
>> ultimately know the intended target.
>>
>> Thanks for the info about unescaping HTML! We'll have to look into
>> what Eric said earlier in the thread about AntiSamy's behavior with
>> encoded HTML.
>>
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Mon, Apr 13, 2009 at 7:01 PM, Girish <ivgirish at yahoo.com> wrote:
>>   
>>> Jason,
>>> quick questions:
>>> (1) do you have more sample html files with malicious content that I can use
>>> to test ?
>>> (2) any plans to expand antisamy to cleanse XML files with html/js malicious
>>> code in the future?
>>>
>>> btw, forgot to mention that
>>> http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html
>>> helps to unescape html/javascript text.
>>>
>>> thanks,
>>> Girish
>>>
>>>
>>> Jason Li wrote:
>>>
>>> Girish,
>>>
>>> With regards to validating the description and title elements, I'm
>>> surprised this isn't working straight out of box.
>>>
>>> My tests with the default AntiSamy policy file and plaintext version
>>> show that you should get an empty string back from AntiSamy.
>>>
>>> A few questions:
>>> - Are you using the default AntiSamy policy file?
>>> - When you extract the text content for each node from your DOM
>>> object, is it returning the XML encoded version of the text content
>>> (i.e. &lt;script&gt;) or just the unencoded text version (i.e.
>>> <script>)?
>>> - Can you also provide the input and output (debugging or
>>> System.out.println()) of AntiSamy?  I'd like to eliminate the
>>> possibility that it's the DOM implementation that you're using
>>> (perhaps returns text content different than expected or doesn't
>>> update the DOM tree as expected).
>>>
>>> As to your question about links, you're absolutely right about URLs
>>> such as javascript:alert();.  That's why AntiSamy applies URL
>>> validation to anchor tags and link tags in HTML.
>>>
>>> However, AntiSamy only validates HTML. I can't stress this enough:
>>> AntiSamy assumes the input it receives is meant to be interpreted as
>>> HTML.
>>>
>>> If you pass in text that is going to be interpreted as a URL (like the
>>> text content of the URL XML node), AntiSamy doesn't have context to
>>> know that it's going to be interpreted as URL. The best thing I can
>>> suggest is kind of a kluge-hack which is to take the HTML *attribute*
>>> encoded version of the text content of your URL XML node and place it
>>> inside a make shift anchor tag with an href so that AntiSamy will
>>> perform proper URL validation on it.
>>>
>>> Take that suggestion with a grain of salt though - it's off the top of
>>> my head and I haven't thought through all the security considerations.
>>>
>>> --
>>> -Jason Li-
>>> -jason.li at owasp.org-
>>>
>>>
>>>
>>> On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>>
>>>
>>> Jason,
>>> Thank you very much for detailed reply. Yes, you are right. The html content
>>> is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
>>> actually
>>> - walking the xml tree
>>> - extracting the text content for each node
>>> - passing only text content to antisamy
>>> - then, updating the xml/dom tree with filtered content
>>>
>>> for example, here is the RSS feed that I am using for testing.
>>>
>>> <?xml version="1.0" ?>
>>> <rss version="2.0">
>>>     <!-- rsstest.markwoodman.com\malicious_2.rss -->
>>>     <!-- Entity encoded script insertion -->
>>>     <channel>
>>>
>>>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>>
>>>       <title><script>alert('Channel Title Vulnerability - Type
>>> 2')</script></title>
>>>       <description>&lt;script&gt;alert('Channel Title Description
>>> Vulnerability - Type 2')&lt;/script&gt;</description>
>>>       <link>&lt;script&gt;alert('Channel Link Vulnerability - Type
>>> 2')&lt;/script&gt;</link>
>>>         <url>javascript:alert('Channel Image URL Vulnerability - Type
>>> 1');</url>
>>>
>>>       <item>
>>>         <title>&lt;script&gt;alert('Item Title Vulnerability - Type
>>> 2')&lt;/script&gt;</title>
>>>         <description>&lt;script&gt;alert('Item Description Vulnerability -
>>> Type 2')&lt;/script&gt;</description>
>>>         <link>&lt;script&gt;alert('Item Link Vulnerability - Type
>>> 2')&lt;/script&gt;</link>
>>>       </item>
>>>
>>>     </channel>
>>>
>>> </rss>
>>>
>>> ======== the output after running it thru antisamy is=========
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <rss version="2.0">
>>> <!-- rsstest.markwoodman.com\malicious_2.rss -->    <!-- Entity encoded
>>> script insertion -->    <channel>
>>>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>>       <title/>
>>>       <description>&amp;lt;script&amp;gt;alert('Channel Title Description
>>> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>>>       <link>&amp;lt;script&amp;gt;alert('Channel Link Vulnerability - Type
>>> 2')&amp;lt;/script&amp;gt;</link>
>>>       <url>javascript:alert('Channel Image URL Vulnerability - Type
>>> 1');</url>
>>>       <item>
>>>          <title>&amp;lt;script&amp;gt;alert('Item Title Vulnerability - Type
>>> 2')&amp;lt;/script&amp;gt;</title>
>>>          <description>&amp;lt;script&amp;gt;alert('Item Description
>>> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>>>          <link>&amp;lt;script&amp;gt;alert('Item Link Vulnerability - Type
>>> 2')&amp;lt;/script&amp;gt;</link>
>>>       </item>
>>>    </channel>
>>> </rss>
>>>
>>>
>>> NOTE: it didn't remove "javascript:alert" also in url link above.
>>>
>>> As you can see here (http://ha.ckers.org/xss.html), most of the time the
>>> malicious html/javascript is going to be encoded using octal/hex/comment
>>> tags to bypass regular expression filters of purifiers.
>>>
>>> an example
>>> ------------------------------------------------------------
>>> <IMG SRC="jav&#x09;ascript:alert('XSS');">
>>> -------------------------------------------------------------
>>>
>>> So, wondering what's the best way to deal with this type of code ?
>>>
>>> appreciate your help.
>>>
>>> thanks,
>>> Girish
>>>
>>>
>>>
>>> Jason Li wrote:
>>>
>>> That's definitely an issue if encoded HTML gets decoded by the DOM parser...
>>>
>>> That's something we need to look into and fix.
>>>
>>> Thanks for pointing that out Eric!
>>> --
>>> -Jason Li-
>>> -jason.li at owasp.org-
>>>
>>>
>>>
>>> On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> wrote:
>>>
>>>
>>> The other problem I have seen with antisamy is that if the value you
>>> send to antisamy is escaped... but you use the
>>> getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
>>> all the escaping... leaving you now with a value that would have
>>> otherwise violated the policy file
>>>
>>>
>>>
>>> Jason Li wrote:
>>>
>>>
>>> Girish,
>>>
>>> By default, script tags should be removed by AntiSamy.
>>>
>>> I think the problem may lie in your statement, "even if they are escaped."
>>>
>>> If you pass in:
>>> <script>alert('Channel Title Description Vulnerability - Type 2')</script>
>>>
>>> to AntiSamy, you should get nothing back.
>>>
>>> However, your statement leads me to believe that in fact you're passing in:
>>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>>> 2')&lt;/script&gt;
>>>
>>> The above is "safe" from AntiSamy's perspective because it assumes
>>> that the content is directly rendered in an HTML interpreter.
>>>
>>> My guess from the behavior you describe and examples you give sounds
>>> like you have encoded HTML embedded in XML - so something that looks
>>> like this (here the tainted input is contained in an XML element, item
>>> description, and therefore encoded):
>>> <rss version="2.0">
>>>   <channel>
>>>     <title>Example</title>
>>>     <link>http://example.com</link>
>>>     <description>Example</description>
>>>     <item>
>>>       <title>Example</title>
>>>       <link>http://example.com</link>
>>>       <description>This is the text that you're trying to validate
>>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>>> 2')&lt;/script&gt;</description>
>>>     </item>
>>>   </channel>
>>> </rss>
>>>
>>> AntiSamy can't know the context where your content is coming from -
>>> it's expecting HTML content that goes to an HTML interpreter. If the
>>> content you are provided is encoded HTML that goes to an interpreter
>>> that decodes the HTML, AntiSamy won't be able to properly validate it.
>>> You'd have to provide an HTML decoded version for AntiSamy to handle
>>> properly.
>>>
>>> Am I interpreting your use case correctly? And if so, does that
>>> explanation make sense?
>>> --
>>> -Jason Li-
>>> -jason.li at owasp.org-
>>>
>>>
>>>
>>> On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>>
>>>
>>>
>>> I am using 1.3 version and i have tried all the 4 policy files. They all
>>> give the same result.
>>>
>>> For example, if my html is this (passing line by line to antisamy):
>>>
>>>      <script>alert('Channel Title Description Vulnerability -
>>> Type 2')</script>
>>>      <script>alert('Channel Link Vulnerability - Type
>>> 2')</script>
>>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>>
>>> the output I am getting is:
>>>
>>>      &lt;script&gt;alert('Channel Title Description
>>> Vulnerability - Type 2')&lt;/script&gt;
>>>      &lt;script&gt;alert('Channel Link Vulnerability - Type
>>> 2')&lt;/script&gt;
>>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>>
>>> any idea on how to remove the tags like
>>> script/javascript/embed/frame/etc even if they are escaped.
>>>
>>>
>>> _______________________________________________
>>> Owasp-antisamy mailing list
>>> Owasp-antisamy at lists.owasp.org
>>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>>
>>>
>>>
>>> _______________________________________________
>>> Owasp-antisamy mailing list
>>> Owasp-antisamy at lists.owasp.org
>>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>>
>>>
>>>
>>>
>>>
>>>     
>>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090414/f7b6589d/attachment.html 


More information about the Owasp-antisamy mailing list