[Owasp-antisamy] escaped tags goes thru without getting removed

Girish ivgirish at yahoo.com
Mon Apr 13 18:55:01 EDT 2009


Jason,
unescaping the text  before passing it to antisamy seem to work. Thanks 
for your help.

-Girish



Girish wrote:
> Jason,
> yes, I am using *antisamy-1.3.xml" policy file. I got your point about 
> URL. I think I can deal with it by adding ancho/href tags.
>
> --------------- partial xml snippet for reference ----------
>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>
>       <title><script>alert('Channel Title Vulnerability - Type 
> 2')</script></title>
>       <description>&lt;script&gt;alert('Channel Title Description 
> Vulnerability - Type 2')&lt;/script&gt;</description>
>       <link>&lt;script&gt;alert('Channel Link Vulnerability - Type 
> 2')&lt;/script&gt;</link>
>        <url>javascript:alert('Channel Image URL Vulnerability - Type 
> 1');</url>
>
> ------------------------------------
>
>
> here is the debug output that shows what is being passed in to 
> antisamy and the output.
>
> **********processing node = pubDate*********
> input to Antisamy========>Tue, 9 Aug 2006 00:00:00 GMT
> output from Antisamy========>Tue, 9 Aug 2006 00:00:00 GMT
>
> **********processing node = title*********
> input to Antisamy========><script>alert('Channel Title Vulnerability - 
> Type 2')</script>
> output from Antisamy========>
>
> **********processing node = description*********
> input to Antisamy========>&lt;script&gt;alert('Channel Title 
> Description Vulnerability - Type 2')&lt;/script&gt;
> output from Antisamy========>&lt;script&gt;alert('Channel Title 
> Description Vulnerability - Type 2')&lt;/script&gt;
>
> **********processing node = link*********
> input to Antisamy========>&lt;script&gt;alert('Channel Link 
> Vulnerability - Type 2')&lt;/script&gt;
> output from Antisamy========>&lt;script&gt;alert('Channel Link 
> Vulnerability - Type 2')&lt;/script&gt;
>
> **********processing node = url*********
> input to Antisamy========>javascript:alert('Channel Image URL 
> Vulnerability - Type 1');
> output from Antisamy========>javascript:alert('Channel Image URL 
> Vulnerability - Type 1');
>
> pls let me know if you need more info.
>
> thanks,
> Girish
>
>
> Jason Li wrote:
>> Girish,
>>
>> With regards to validating the description and title elements, I'm
>> surprised this isn't working straight out of box.
>>
>> My tests with the default AntiSamy policy file and plaintext version
>> show that you should get an empty string back from AntiSamy.
>>
>> A few questions:
>> - Are you using the default AntiSamy policy file?
>> - When you extract the text content for each node from your DOM
>> object, is it returning the XML encoded version of the text content
>> (i.e. &lt;script&gt;) or just the unencoded text version (i.e.
>> <script>)?
>> - Can you also provide the input and output (debugging or
>> System.out.println()) of AntiSamy?  I'd like to eliminate the
>> possibility that it's the DOM implementation that you're using
>> (perhaps returns text content different than expected or doesn't
>> update the DOM tree as expected).
>>
>> As to your question about links, you're absolutely right about URLs
>> such as javascript:alert();.  That's why AntiSamy applies URL
>> validation to anchor tags and link tags in HTML.
>>
>> However, AntiSamy only validates HTML. I can't stress this enough:
>> AntiSamy assumes the input it receives is meant to be interpreted as
>> HTML.
>>
>> If you pass in text that is going to be interpreted as a URL (like the
>> text content of the URL XML node), AntiSamy doesn't have context to
>> know that it's going to be interpreted as URL. The best thing I can
>> suggest is kind of a kluge-hack which is to take the HTML *attribute*
>> encoded version of the text content of your URL XML node and place it
>> inside a make shift anchor tag with an href so that AntiSamy will
>> perform proper URL validation on it.
>>
>> Take that suggestion with a grain of salt though - it's off the top of
>> my head and I haven't thought through all the security considerations.
>>
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>   
>>> Jason,
>>> Thank you very much for detailed reply. Yes, you are right. The html content
>>> is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
>>> actually
>>> - walking the xml tree
>>> - extracting the text content for each node
>>> - passing only text content to antisamy
>>> - then, updating the xml/dom tree with filtered content
>>>
>>> for example, here is the RSS feed that I am using for testing.
>>>
>>> <?xml version="1.0" ?>
>>> <rss version="2.0">
>>>     <!-- rsstest.markwoodman.com\malicious_2.rss -->
>>>     <!-- Entity encoded script insertion -->
>>>     <channel>
>>>
>>>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>>
>>>       <title><script>alert('Channel Title Vulnerability - Type
>>> 2')</script></title>
>>>       <description>&lt;script&gt;alert('Channel Title Description
>>> Vulnerability - Type 2')&lt;/script&gt;</description>
>>>       <link>&lt;script&gt;alert('Channel Link Vulnerability - Type
>>> 2')&lt;/script&gt;</link>
>>>         <url>javascript:alert('Channel Image URL Vulnerability - Type
>>> 1');</url>
>>>
>>>       <item>
>>>         <title>&lt;script&gt;alert('Item Title Vulnerability - Type
>>> 2')&lt;/script&gt;</title>
>>>         <description>&lt;script&gt;alert('Item Description Vulnerability -
>>> Type 2')&lt;/script&gt;</description>
>>>         <link>&lt;script&gt;alert('Item Link Vulnerability - Type
>>> 2')&lt;/script&gt;</link>
>>>       </item>
>>>
>>>     </channel>
>>>
>>> </rss>
>>>
>>> ======== the output after running it thru antisamy is=========
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <rss version="2.0">
>>> <!-- rsstest.markwoodman.com\malicious_2.rss -->    <!-- Entity encoded
>>> script insertion -->    <channel>
>>>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>>       <title/>
>>>       <description>&amp;lt;script&amp;gt;alert('Channel Title Description
>>> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>>>       <link>&amp;lt;script&amp;gt;alert('Channel Link Vulnerability - Type
>>> 2')&amp;lt;/script&amp;gt;</link>
>>>       <url>javascript:alert('Channel Image URL Vulnerability - Type
>>> 1');</url>
>>>       <item>
>>>          <title>&amp;lt;script&amp;gt;alert('Item Title Vulnerability - Type
>>> 2')&amp;lt;/script&amp;gt;</title>
>>>          <description>&amp;lt;script&amp;gt;alert('Item Description
>>> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>>>          <link>&amp;lt;script&amp;gt;alert('Item Link Vulnerability - Type
>>> 2')&amp;lt;/script&amp;gt;</link>
>>>       </item>
>>>    </channel>
>>> </rss>
>>>
>>>
>>> NOTE: it didn't remove "javascript:alert" also in url link above.
>>>
>>> As you can see here (http://ha.ckers.org/xss.html), most of the time the
>>> malicious html/javascript is going to be encoded using octal/hex/comment
>>> tags to bypass regular expression filters of purifiers.
>>>
>>> an example
>>> ------------------------------------------------------------
>>> <IMG SRC="jav&#x09;ascript:alert('XSS');">
>>> -------------------------------------------------------------
>>>
>>> So, wondering what's the best way to deal with this type of code ?
>>>
>>> appreciate your help.
>>>
>>> thanks,
>>> Girish
>>>
>>>
>>>
>>> Jason Li wrote:
>>>
>>> That's definitely an issue if encoded HTML gets decoded by the DOM parser...
>>>
>>> That's something we need to look into and fix.
>>>
>>> Thanks for pointing that out Eric!
>>> --
>>> -Jason Li-
>>> -jason.li at owasp.org-
>>>
>>>
>>>
>>> On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> wrote:
>>>
>>>
>>> The other problem I have seen with antisamy is that if the value you
>>> send to antisamy is escaped... but you use the
>>> getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
>>> all the escaping... leaving you now with a value that would have
>>> otherwise violated the policy file
>>>
>>>
>>>
>>> Jason Li wrote:
>>>
>>>
>>> Girish,
>>>
>>> By default, script tags should be removed by AntiSamy.
>>>
>>> I think the problem may lie in your statement, "even if they are escaped."
>>>
>>> If you pass in:
>>> <script>alert('Channel Title Description Vulnerability - Type 2')</script>
>>>
>>> to AntiSamy, you should get nothing back.
>>>
>>> However, your statement leads me to believe that in fact you're passing in:
>>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>>> 2')&lt;/script&gt;
>>>
>>> The above is "safe" from AntiSamy's perspective because it assumes
>>> that the content is directly rendered in an HTML interpreter.
>>>
>>> My guess from the behavior you describe and examples you give sounds
>>> like you have encoded HTML embedded in XML - so something that looks
>>> like this (here the tainted input is contained in an XML element, item
>>> description, and therefore encoded):
>>> <rss version="2.0">
>>>   <channel>
>>>     <title>Example</title>
>>>     <link>http://example.com</link>
>>>     <description>Example</description>
>>>     <item>
>>>       <title>Example</title>
>>>       <link>http://example.com</link>
>>>       <description>This is the text that you're trying to validate
>>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>>> 2')&lt;/script&gt;</description>
>>>     </item>
>>>   </channel>
>>> </rss>
>>>
>>> AntiSamy can't know the context where your content is coming from -
>>> it's expecting HTML content that goes to an HTML interpreter. If the
>>> content you are provided is encoded HTML that goes to an interpreter
>>> that decodes the HTML, AntiSamy won't be able to properly validate it.
>>> You'd have to provide an HTML decoded version for AntiSamy to handle
>>> properly.
>>>
>>> Am I interpreting your use case correctly? And if so, does that
>>> explanation make sense?
>>> --
>>> -Jason Li-
>>> -jason.li at owasp.org-
>>>
>>>
>>>
>>> On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>>
>>>
>>>
>>> I am using 1.3 version and i have tried all the 4 policy files. They all
>>> give the same result.
>>>
>>> For example, if my html is this (passing line by line to antisamy):
>>>
>>>      <script>alert('Channel Title Description Vulnerability -
>>> Type 2')</script>
>>>      <script>alert('Channel Link Vulnerability - Type
>>> 2')</script>
>>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>>
>>> the output I am getting is:
>>>
>>>      &lt;script&gt;alert('Channel Title Description
>>> Vulnerability - Type 2')&lt;/script&gt;
>>>      &lt;script&gt;alert('Channel Link Vulnerability - Type
>>> 2')&lt;/script&gt;
>>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>>
>>> any idea on how to remove the tags like
>>> script/javascript/embed/frame/etc even if they are escaped.
>>>
>>>
>>> _______________________________________________
>>> Owasp-antisamy mailing list
>>> Owasp-antisamy at lists.owasp.org
>>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>>
>>>
>>>
>>> _______________________________________________
>>> Owasp-antisamy mailing list
>>> Owasp-antisamy at lists.owasp.org
>>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>>
>>>
>>>     
>>
>>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Owasp-antisamy mailing list
> Owasp-antisamy at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090413/c9fc0278/attachment.html 


More information about the Owasp-antisamy mailing list