[Owasp-antisamy] escaped tags goes thru without getting removed

Girish ivgirish at yahoo.com
Tue Apr 14 15:23:53 EDT 2009


Jason,
btw,  is antisamy tested against RSnake's cheat sheet ? if so, how are 
the results ?

thanks,
Girish


Jason Li wrote:
> Girish,
>
> I think RSnake's cheatsheet that you mentioned
> (http://ha.ckers.org/xss.html) is probably the best resource for
> malicious content.
>
> In terms of expanding it to cleanse XML, it's unlikely we'll do this
> because it's impossible for AntiSamy to determine how the XML content
> data will be used. XML by it's nature is extremely flexible and could
> be used in any number of contexts. Someone might be using a SOAP
> message to deliver base64 encoded content that is then further HTML
> entity encoded for interpreting by a browser at the end. AntiSamy
> can't unravel the onion so to speak.
>
> IMHO, it's a task best left to the developer as they're the ones who
> ultimately know the intended target.
>
> Thanks for the info about unescaping HTML! We'll have to look into
> what Eric said earlier in the thread about AntiSamy's behavior with
> encoded HTML.
>
> --
> -Jason Li-
> -jason.li at owasp.org-
>
>
>
> On Mon, Apr 13, 2009 at 7:01 PM, Girish <ivgirish at yahoo.com> wrote:
>   
>> Jason,
>> quick questions:
>> (1) do you have more sample html files with malicious content that I can use
>> to test ?
>> (2) any plans to expand antisamy to cleanse XML files with html/js malicious
>> code in the future?
>>
>> btw, forgot to mention that
>> http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html
>> helps to unescape html/javascript text.
>>
>> thanks,
>> Girish
>>
>>
>> Jason Li wrote:
>>
>> Girish,
>>
>> With regards to validating the description and title elements, I'm
>> surprised this isn't working straight out of box.
>>
>> My tests with the default AntiSamy policy file and plaintext version
>> show that you should get an empty string back from AntiSamy.
>>
>> A few questions:
>> - Are you using the default AntiSamy policy file?
>> - When you extract the text content for each node from your DOM
>> object, is it returning the XML encoded version of the text content
>> (i.e. &lt;script&gt;) or just the unencoded text version (i.e.
>> <script>)?
>> - Can you also provide the input and output (debugging or
>> System.out.println()) of AntiSamy?  I'd like to eliminate the
>> possibility that it's the DOM implementation that you're using
>> (perhaps returns text content different than expected or doesn't
>> update the DOM tree as expected).
>>
>> As to your question about links, you're absolutely right about URLs
>> such as javascript:alert();.  That's why AntiSamy applies URL
>> validation to anchor tags and link tags in HTML.
>>
>> However, AntiSamy only validates HTML. I can't stress this enough:
>> AntiSamy assumes the input it receives is meant to be interpreted as
>> HTML.
>>
>> If you pass in text that is going to be interpreted as a URL (like the
>> text content of the URL XML node), AntiSamy doesn't have context to
>> know that it's going to be interpreted as URL. The best thing I can
>> suggest is kind of a kluge-hack which is to take the HTML *attribute*
>> encoded version of the text content of your URL XML node and place it
>> inside a make shift anchor tag with an href so that AntiSamy will
>> perform proper URL validation on it.
>>
>> Take that suggestion with a grain of salt though - it's off the top of
>> my head and I haven't thought through all the security considerations.
>>
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>
>>
>> Jason,
>> Thank you very much for detailed reply. Yes, you are right. The html content
>> is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
>> actually
>> - walking the xml tree
>> - extracting the text content for each node
>> - passing only text content to antisamy
>> - then, updating the xml/dom tree with filtered content
>>
>> for example, here is the RSS feed that I am using for testing.
>>
>> <?xml version="1.0" ?>
>> <rss version="2.0">
>>     <!-- rsstest.markwoodman.com\malicious_2.rss -->
>>     <!-- Entity encoded script insertion -->
>>     <channel>
>>
>>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>
>>       <title><script>alert('Channel Title Vulnerability - Type
>> 2')</script></title>
>>       <description>&lt;script&gt;alert('Channel Title Description
>> Vulnerability - Type 2')&lt;/script&gt;</description>
>>       <link>&lt;script&gt;alert('Channel Link Vulnerability - Type
>> 2')&lt;/script&gt;</link>
>>         <url>javascript:alert('Channel Image URL Vulnerability - Type
>> 1');</url>
>>
>>       <item>
>>         <title>&lt;script&gt;alert('Item Title Vulnerability - Type
>> 2')&lt;/script&gt;</title>
>>         <description>&lt;script&gt;alert('Item Description Vulnerability -
>> Type 2')&lt;/script&gt;</description>
>>         <link>&lt;script&gt;alert('Item Link Vulnerability - Type
>> 2')&lt;/script&gt;</link>
>>       </item>
>>
>>     </channel>
>>
>> </rss>
>>
>> ======== the output after running it thru antisamy is=========
>> <?xml version="1.0" encoding="UTF-8"?>
>> <rss version="2.0">
>> <!-- rsstest.markwoodman.com\malicious_2.rss -->    <!-- Entity encoded
>> script insertion -->    <channel>
>>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>       <title/>
>>       <description>&amp;lt;script&amp;gt;alert('Channel Title Description
>> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>>       <link>&amp;lt;script&amp;gt;alert('Channel Link Vulnerability - Type
>> 2')&amp;lt;/script&amp;gt;</link>
>>       <url>javascript:alert('Channel Image URL Vulnerability - Type
>> 1');</url>
>>       <item>
>>          <title>&amp;lt;script&amp;gt;alert('Item Title Vulnerability - Type
>> 2')&amp;lt;/script&amp;gt;</title>
>>          <description>&amp;lt;script&amp;gt;alert('Item Description
>> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>>          <link>&amp;lt;script&amp;gt;alert('Item Link Vulnerability - Type
>> 2')&amp;lt;/script&amp;gt;</link>
>>       </item>
>>    </channel>
>> </rss>
>>
>>
>> NOTE: it didn't remove "javascript:alert" also in url link above.
>>
>> As you can see here (http://ha.ckers.org/xss.html), most of the time the
>> malicious html/javascript is going to be encoded using octal/hex/comment
>> tags to bypass regular expression filters of purifiers.
>>
>> an example
>> ------------------------------------------------------------
>> <IMG SRC="jav&#x09;ascript:alert('XSS');">
>> -------------------------------------------------------------
>>
>> So, wondering what's the best way to deal with this type of code ?
>>
>> appreciate your help.
>>
>> thanks,
>> Girish
>>
>>
>>
>> Jason Li wrote:
>>
>> That's definitely an issue if encoded HTML gets decoded by the DOM parser...
>>
>> That's something we need to look into and fix.
>>
>> Thanks for pointing that out Eric!
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> wrote:
>>
>>
>> The other problem I have seen with antisamy is that if the value you
>> send to antisamy is escaped... but you use the
>> getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
>> all the escaping... leaving you now with a value that would have
>> otherwise violated the policy file
>>
>>
>>
>> Jason Li wrote:
>>
>>
>> Girish,
>>
>> By default, script tags should be removed by AntiSamy.
>>
>> I think the problem may lie in your statement, "even if they are escaped."
>>
>> If you pass in:
>> <script>alert('Channel Title Description Vulnerability - Type 2')</script>
>>
>> to AntiSamy, you should get nothing back.
>>
>> However, your statement leads me to believe that in fact you're passing in:
>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>> 2')&lt;/script&gt;
>>
>> The above is "safe" from AntiSamy's perspective because it assumes
>> that the content is directly rendered in an HTML interpreter.
>>
>> My guess from the behavior you describe and examples you give sounds
>> like you have encoded HTML embedded in XML - so something that looks
>> like this (here the tainted input is contained in an XML element, item
>> description, and therefore encoded):
>> <rss version="2.0">
>>   <channel>
>>     <title>Example</title>
>>     <link>http://example.com</link>
>>     <description>Example</description>
>>     <item>
>>       <title>Example</title>
>>       <link>http://example.com</link>
>>       <description>This is the text that you're trying to validate
>> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
>> 2')&lt;/script&gt;</description>
>>     </item>
>>   </channel>
>> </rss>
>>
>> AntiSamy can't know the context where your content is coming from -
>> it's expecting HTML content that goes to an HTML interpreter. If the
>> content you are provided is encoded HTML that goes to an interpreter
>> that decodes the HTML, AntiSamy won't be able to properly validate it.
>> You'd have to provide an HTML decoded version for AntiSamy to handle
>> properly.
>>
>> Am I interpreting your use case correctly? And if so, does that
>> explanation make sense?
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>
>>
>>
>> I am using 1.3 version and i have tried all the 4 policy files. They all
>> give the same result.
>>
>> For example, if my html is this (passing line by line to antisamy):
>>
>>      <script>alert('Channel Title Description Vulnerability -
>> Type 2')</script>
>>      <script>alert('Channel Link Vulnerability - Type
>> 2')</script>
>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>
>> the output I am getting is:
>>
>>      &lt;script&gt;alert('Channel Title Description
>> Vulnerability - Type 2')&lt;/script&gt;
>>      &lt;script&gt;alert('Channel Link Vulnerability - Type
>> 2')&lt;/script&gt;
>>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>>
>> any idea on how to remove the tags like
>> script/javascript/embed/frame/etc even if they are escaped.
>>
>>
>> _______________________________________________
>> Owasp-antisamy mailing list
>> Owasp-antisamy at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>
>>
>>
>> _______________________________________________
>> Owasp-antisamy mailing list
>> Owasp-antisamy at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>
>>
>>
>>
>>
>>     
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090414/a38d2d21/attachment.html 


More information about the Owasp-antisamy mailing list