[Owasp-antisamy] escaped tags goes thru without getting removed

Girish ivgirish at yahoo.com
Mon Apr 13 14:52:15 EDT 2009


Jason,
Thank you very much for detailed reply. Yes, you are right. The html 
content is inside RSS feeds. But I am not passing the entire xml to 
antisamy. I am actually
- walking the xml tree
- extracting the text content for each node
- passing only text content to antisamy
- then, updating the xml/dom tree with filtered content

for example, here is the RSS feed that I am using for testing.

<?xml version="1.0" ?>
<rss version="2.0">
    <!-- rsstest.markwoodman.com\malicious_2.rss -->
    <!-- Entity encoded script insertion -->
    <channel>

      <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>

      <title><script>alert('Channel Title Vulnerability - Type 
2')</script></title>
      <description>&lt;script&gt;alert('Channel Title Description 
Vulnerability - Type 2')&lt;/script&gt;</description>
      <link>&lt;script&gt;alert('Channel Link Vulnerability - Type 
2')&lt;/script&gt;</link>
        <url>javascript:alert('Channel Image URL Vulnerability - Type 
1');</url>

      <item>
        <title>&lt;script&gt;alert('Item Title Vulnerability - Type 
2')&lt;/script&gt;</title>
        <description>&lt;script&gt;alert('Item Description Vulnerability 
- Type 2')&lt;/script&gt;</description>
        <link>&lt;script&gt;alert('Item Link Vulnerability - Type 
2')&lt;/script&gt;</link>
      </item>

    </channel>

</rss>

======== the output after running it thru antisamy is=========
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">   
<!-- rsstest.markwoodman.com\malicious_2.rss -->    <!-- Entity encoded 
script insertion -->    <channel>         
      <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>             
      <title/>     
      <description>&amp;lt;script&amp;gt;alert('Channel Title 
Description Vulnerability - Type 
2')&amp;lt;/script&amp;gt;</description>     
      <link>&amp;lt;script&amp;gt;alert('Channel Link Vulnerability - 
Type 2')&amp;lt;/script&amp;gt;</link>       
      <url>javascript:alert('Channel Image URL Vulnerability - Type 
1');</url>           
      <item>       
         <title>&amp;lt;script&amp;gt;alert('Item Title Vulnerability - 
Type 2')&amp;lt;/script&amp;gt;</title>       
         <description>&amp;lt;script&amp;gt;alert('Item Description 
Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>       
         <link>&amp;lt;script&amp;gt;alert('Item Link Vulnerability - 
Type 2')&amp;lt;/script&amp;gt;</link>     
      </item>   
   </channel>
</rss>


NOTE: it didn't remove "javascript:alert" also in url link above.

As you can see here (http://ha.ckers.org/xss.html), most of the time the 
malicious html/javascript is going to be encoded using octal/hex/comment 
tags to bypass regular expression filters of purifiers.

an example
------------------------------------------------------------
<IMG SRC="jav&#x09;ascript:alert('XSS');">
-------------------------------------------------------------

So, wondering what's the best way to deal with this type of code ?

appreciate your help.

thanks,
Girish



Jason Li wrote:
> That's definitely an issue if encoded HTML gets decoded by the DOM parser...
>
> That's something we need to look into and fix.
>
> Thanks for pointing that out Eric!
> --
> -Jason Li-
> -jason.li at owasp.org-
>
>
>
> On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> wrote:
>   
>> The other problem I have seen with antisamy is that if the value you
>> send to antisamy is escaped... but you use the
>> getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
>> all the escaping... leaving you now with a value that would have
>> otherwise violated the policy file
>>
>>
>>
>> Jason Li wrote:
>>     
>>> Girish,
>>>
>>> By default, script tags should be removed by AntiSamy.
>>>
>>> I think the problem may lie in your statement, "even if they are escaped."
>>>
>>> If you pass in:
>>> <script>alert('Channel Title Description Vulnerability - Type 2')</script>
>>>
>>> to AntiSamy, you should get nothing back.
>>>
>>> However, your statement leads me to believe that in fact you're passing in:
>>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>>> 2')&lt;/script&gt;
>>>
>>> The above is "safe" from AntiSamy's perspective because it assumes
>>> that the content is directly rendered in an HTML interpreter.
>>>
>>> My guess from the behavior you describe and examples you give sounds
>>> like you have encoded HTML embedded in XML - so something that looks
>>> like this (here the tainted input is contained in an XML element, item
>>> description, and therefore encoded):
>>> <rss version="2.0">
>>>   <channel>
>>>     <title>Example</title>
>>>     <link>http://example.com</link>
>>>     <description>Example</description>
>>>     <item>
>>>       <title>Example</title>
>>>       <link>http://example.com</link>
>>>       <description>This is the text that you're trying to validate
>>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>>> 2')&lt;/script&gt;</description>
>>>     </item>
>>>   </channel>
>>> </rss>
>>>
>>> AntiSamy can't know the context where your content is coming from -
>>> it's expecting HTML content that goes to an HTML interpreter. If the
>>> content you are provided is encoded HTML that goes to an interpreter
>>> that decodes the HTML, AntiSamy won't be able to properly validate it.
>>> You'd have to provide an HTML decoded version for AntiSamy to handle
>>> properly.
>>>
>>> Am I interpreting your use case correctly? And if so, does that
>>> explanation make sense?
>>> --
>>> -Jason Li-
>>> -jason.li at owasp.org-
>>>
>>>
>>>
>>> On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>>
>>>       
>>>> I am using 1.3 version and i have tried all the 4 policy files. They all
>>>> give the same result.
>>>>
>>>> For example, if my html is this (passing line by line to antisamy):
>>>>
>>>>      <script>alert('Channel Title Description Vulnerability -
>>>> Type 2')</script>
>>>>      <script>alert('Channel Link Vulnerability - Type
>>>> 2')</script>
>>>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>>>
>>>> the output I am getting is:
>>>>
>>>>      &lt;script&gt;alert('Channel Title Description
>>>> Vulnerability - Type 2')&lt;/script&gt;
>>>>      &lt;script&gt;alert('Channel Link Vulnerability - Type
>>>> 2')&lt;/script&gt;
>>>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>>>
>>>> any idea on how to remove the tags like
>>>> script/javascript/embed/frame/etc even if they are escaped.
>>>>         
>> _______________________________________________
>> Owasp-antisamy mailing list
>> Owasp-antisamy at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>
>>     
> _______________________________________________
> Owasp-antisamy mailing list
> Owasp-antisamy at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090413/a47788df/attachment-0001.html 


More information about the Owasp-antisamy mailing list