[Owasp-antisamy] escaped tags goes thru without getting removed

Jason Li jason.li at owasp.org
Tue Apr 14 02:17:37 EDT 2009


Girish,

I think RSnake's cheatsheet that you mentioned
(http://ha.ckers.org/xss.html) is probably the best resource for
malicious content.

In terms of expanding it to cleanse XML, it's unlikely we'll do this
because it's impossible for AntiSamy to determine how the XML content
data will be used. XML by it's nature is extremely flexible and could
be used in any number of contexts. Someone might be using a SOAP
message to deliver base64 encoded content that is then further HTML
entity encoded for interpreting by a browser at the end. AntiSamy
can't unravel the onion so to speak.

IMHO, it's a task best left to the developer as they're the ones who
ultimately know the intended target.

Thanks for the info about unescaping HTML! We'll have to look into
what Eric said earlier in the thread about AntiSamy's behavior with
encoded HTML.

--
-Jason Li-
-jason.li at owasp.org-



On Mon, Apr 13, 2009 at 7:01 PM, Girish <ivgirish at yahoo.com> wrote:
> Jason,
> quick questions:
> (1) do you have more sample html files with malicious content that I can use
> to test ?
> (2) any plans to expand antisamy to cleanse XML files with html/js malicious
> code in the future?
>
> btw, forgot to mention that
> http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html
> helps to unescape html/javascript text.
>
> thanks,
> Girish
>
>
> Jason Li wrote:
>
> Girish,
>
> With regards to validating the description and title elements, I'm
> surprised this isn't working straight out of box.
>
> My tests with the default AntiSamy policy file and plaintext version
> show that you should get an empty string back from AntiSamy.
>
> A few questions:
> - Are you using the default AntiSamy policy file?
> - When you extract the text content for each node from your DOM
> object, is it returning the XML encoded version of the text content
> (i.e. &lt;script&gt;) or just the unencoded text version (i.e.
> <script>)?
> - Can you also provide the input and output (debugging or
> System.out.println()) of AntiSamy?  I'd like to eliminate the
> possibility that it's the DOM implementation that you're using
> (perhaps returns text content different than expected or doesn't
> update the DOM tree as expected).
>
> As to your question about links, you're absolutely right about URLs
> such as javascript:alert();.  That's why AntiSamy applies URL
> validation to anchor tags and link tags in HTML.
>
> However, AntiSamy only validates HTML. I can't stress this enough:
> AntiSamy assumes the input it receives is meant to be interpreted as
> HTML.
>
> If you pass in text that is going to be interpreted as a URL (like the
> text content of the URL XML node), AntiSamy doesn't have context to
> know that it's going to be interpreted as URL. The best thing I can
> suggest is kind of a kluge-hack which is to take the HTML *attribute*
> encoded version of the text content of your URL XML node and place it
> inside a make shift anchor tag with an href so that AntiSamy will
> perform proper URL validation on it.
>
> Take that suggestion with a grain of salt though - it's off the top of
> my head and I haven't thought through all the security considerations.
>
> --
> -Jason Li-
> -jason.li at owasp.org-
>
>
>
> On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> wrote:
>
>
> Jason,
> Thank you very much for detailed reply. Yes, you are right. The html content
> is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
> actually
> - walking the xml tree
> - extracting the text content for each node
> - passing only text content to antisamy
> - then, updating the xml/dom tree with filtered content
>
> for example, here is the RSS feed that I am using for testing.
>
> <?xml version="1.0" ?>
> <rss version="2.0">
>     <!-- rsstest.markwoodman.com\malicious_2.rss -->
>     <!-- Entity encoded script insertion -->
>     <channel>
>
>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>
>       <title><script>alert('Channel Title Vulnerability - Type
> 2')</script></title>
>       <description>&lt;script&gt;alert('Channel Title Description
> Vulnerability - Type 2')&lt;/script&gt;</description>
>       <link>&lt;script&gt;alert('Channel Link Vulnerability - Type
> 2')&lt;/script&gt;</link>
>         <url>javascript:alert('Channel Image URL Vulnerability - Type
> 1');</url>
>
>       <item>
>         <title>&lt;script&gt;alert('Item Title Vulnerability - Type
> 2')&lt;/script&gt;</title>
>         <description>&lt;script&gt;alert('Item Description Vulnerability -
> Type 2')&lt;/script&gt;</description>
>         <link>&lt;script&gt;alert('Item Link Vulnerability - Type
> 2')&lt;/script&gt;</link>
>       </item>
>
>     </channel>
>
> </rss>
>
> ======== the output after running it thru antisamy is=========
> <?xml version="1.0" encoding="UTF-8"?>
> <rss version="2.0">
> <!-- rsstest.markwoodman.com\malicious_2.rss -->    <!-- Entity encoded
> script insertion -->    <channel>
>       <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>       <title/>
>       <description>&amp;lt;script&amp;gt;alert('Channel Title Description
> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>       <link>&amp;lt;script&amp;gt;alert('Channel Link Vulnerability - Type
> 2')&amp;lt;/script&amp;gt;</link>
>       <url>javascript:alert('Channel Image URL Vulnerability - Type
> 1');</url>
>       <item>
>          <title>&amp;lt;script&amp;gt;alert('Item Title Vulnerability - Type
> 2')&amp;lt;/script&amp;gt;</title>
>          <description>&amp;lt;script&amp;gt;alert('Item Description
> Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
>          <link>&amp;lt;script&amp;gt;alert('Item Link Vulnerability - Type
> 2')&amp;lt;/script&amp;gt;</link>
>       </item>
>    </channel>
> </rss>
>
>
> NOTE: it didn't remove "javascript:alert" also in url link above.
>
> As you can see here (http://ha.ckers.org/xss.html), most of the time the
> malicious html/javascript is going to be encoded using octal/hex/comment
> tags to bypass regular expression filters of purifiers.
>
> an example
> ------------------------------------------------------------
> <IMG SRC="jav&#x09;ascript:alert('XSS');">
> -------------------------------------------------------------
>
> So, wondering what's the best way to deal with this type of code ?
>
> appreciate your help.
>
> thanks,
> Girish
>
>
>
> Jason Li wrote:
>
> That's definitely an issue if encoded HTML gets decoded by the DOM parser...
>
> That's something we need to look into and fix.
>
> Thanks for pointing that out Eric!
> --
> -Jason Li-
> -jason.li at owasp.org-
>
>
>
> On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> wrote:
>
>
> The other problem I have seen with antisamy is that if the value you
> send to antisamy is escaped... but you use the
> getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
> all the escaping... leaving you now with a value that would have
> otherwise violated the policy file
>
>
>
> Jason Li wrote:
>
>
> Girish,
>
> By default, script tags should be removed by AntiSamy.
>
> I think the problem may lie in your statement, "even if they are escaped."
>
> If you pass in:
> <script>alert('Channel Title Description Vulnerability - Type 2')</script>
>
> to AntiSamy, you should get nothing back.
>
> However, your statement leads me to believe that in fact you're passing in:
> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
> 2')&lt;/script&gt;
>
> The above is "safe" from AntiSamy's perspective because it assumes
> that the content is directly rendered in an HTML interpreter.
>
> My guess from the behavior you describe and examples you give sounds
> like you have encoded HTML embedded in XML - so something that looks
> like this (here the tainted input is contained in an XML element, item
> description, and therefore encoded):
> <rss version="2.0">
>   <channel>
>     <title>Example</title>
>     <link>http://example.com</link>
>     <description>Example</description>
>     <item>
>       <title>Example</title>
>       <link>http://example.com</link>
>       <description>This is the text that you're trying to validate
> &lt;script&gt;alert('Channel Title Description Vulnerability - Type
> 2')&lt;/script&gt;</description>
>     </item>
>   </channel>
> </rss>
>
> AntiSamy can't know the context where your content is coming from -
> it's expecting HTML content that goes to an HTML interpreter. If the
> content you are provided is encoded HTML that goes to an interpreter
> that decodes the HTML, AntiSamy won't be able to properly validate it.
> You'd have to provide an HTML decoded version for AntiSamy to handle
> properly.
>
> Am I interpreting your use case correctly? And if so, does that
> explanation make sense?
> --
> -Jason Li-
> -jason.li at owasp.org-
>
>
>
> On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> wrote:
>
>
>
> I am using 1.3 version and i have tried all the 4 policy files. They all
> give the same result.
>
> For example, if my html is this (passing line by line to antisamy):
>
>      <script>alert('Channel Title Description Vulnerability -
> Type 2')</script>
>      <script>alert('Channel Link Vulnerability - Type
> 2')</script>
>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>
> the output I am getting is:
>
>      &lt;script&gt;alert('Channel Title Description
> Vulnerability - Type 2')&lt;/script&gt;
>      &lt;script&gt;alert('Channel Link Vulnerability - Type
> 2')&lt;/script&gt;
>      javascript:alert('Channel Image URL Vulnerability - Type 1');
>
> any idea on how to remove the tags like
> script/javascript/embed/frame/etc even if they are escaped.
>
>
> _______________________________________________
> Owasp-antisamy mailing list
> Owasp-antisamy at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>
>
>
> _______________________________________________
> Owasp-antisamy mailing list
> Owasp-antisamy at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>
>
>
>
>


More information about the Owasp-antisamy mailing list