[Owasp-antisamy] escaped tags goes thru without getting removed
Girish
ivgirish at yahoo.com
Mon Apr 13 15:44:37 EDT 2009
Jason,
yes, I am using *antisamy-1.3.xml" policy file. I got your point about
URL. I think I can deal with it by adding ancho/href tags.
--------------- partial xml snippet for reference ----------
<pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
<title><script>alert('Channel Title Vulnerability - Type
2')</script></title>
<description><script>alert('Channel Title Description
Vulnerability - Type 2')</script></description>
<link><script>alert('Channel Link Vulnerability - Type
2')</script></link>
<url>javascript:alert('Channel Image URL Vulnerability - Type
1');</url>
------------------------------------
here is the debug output that shows what is being passed in to antisamy
and the output.
**********processing node = pubDate*********
input to Antisamy========>Tue, 9 Aug 2006 00:00:00 GMT
output from Antisamy========>Tue, 9 Aug 2006 00:00:00 GMT
**********processing node = title*********
input to Antisamy========><script>alert('Channel Title Vulnerability -
Type 2')</script>
output from Antisamy========>
**********processing node = description*********
input to Antisamy========><script>alert('Channel Title Description
Vulnerability - Type 2')</script>
output from Antisamy========><script>alert('Channel Title
Description Vulnerability - Type 2')</script>
**********processing node = link*********
input to Antisamy========><script>alert('Channel Link
Vulnerability - Type 2')</script>
output from Antisamy========><script>alert('Channel Link
Vulnerability - Type 2')</script>
**********processing node = url*********
input to Antisamy========>javascript:alert('Channel Image URL
Vulnerability - Type 1');
output from Antisamy========>javascript:alert('Channel Image URL
Vulnerability - Type 1');
pls let me know if you need more info.
thanks,
Girish
Jason Li wrote:
> Girish,
>
> With regards to validating the description and title elements, I'm
> surprised this isn't working straight out of box.
>
> My tests with the default AntiSamy policy file and plaintext version
> show that you should get an empty string back from AntiSamy.
>
> A few questions:
> - Are you using the default AntiSamy policy file?
> - When you extract the text content for each node from your DOM
> object, is it returning the XML encoded version of the text content
> (i.e. <script>) or just the unencoded text version (i.e.
> <script>)?
> - Can you also provide the input and output (debugging or
> System.out.println()) of AntiSamy? I'd like to eliminate the
> possibility that it's the DOM implementation that you're using
> (perhaps returns text content different than expected or doesn't
> update the DOM tree as expected).
>
> As to your question about links, you're absolutely right about URLs
> such as javascript:alert();. That's why AntiSamy applies URL
> validation to anchor tags and link tags in HTML.
>
> However, AntiSamy only validates HTML. I can't stress this enough:
> AntiSamy assumes the input it receives is meant to be interpreted as
> HTML.
>
> If you pass in text that is going to be interpreted as a URL (like the
> text content of the URL XML node), AntiSamy doesn't have context to
> know that it's going to be interpreted as URL. The best thing I can
> suggest is kind of a kluge-hack which is to take the HTML *attribute*
> encoded version of the text content of your URL XML node and place it
> inside a make shift anchor tag with an href so that AntiSamy will
> perform proper URL validation on it.
>
> Take that suggestion with a grain of salt though - it's off the top of
> my head and I haven't thought through all the security considerations.
>
> --
> -Jason Li-
> -jason.li at owasp.org-
>
>
>
> On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> wrote:
>
>> Jason,
>> Thank you very much for detailed reply. Yes, you are right. The html content
>> is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
>> actually
>> - walking the xml tree
>> - extracting the text content for each node
>> - passing only text content to antisamy
>> - then, updating the xml/dom tree with filtered content
>>
>> for example, here is the RSS feed that I am using for testing.
>>
>> <?xml version="1.0" ?>
>> <rss version="2.0">
>> <!-- rsstest.markwoodman.com\malicious_2.rss -->
>> <!-- Entity encoded script insertion -->
>> <channel>
>>
>> <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>>
>> <title><script>alert('Channel Title Vulnerability - Type
>> 2')</script></title>
>> <description><script>alert('Channel Title Description
>> Vulnerability - Type 2')</script></description>
>> <link><script>alert('Channel Link Vulnerability - Type
>> 2')</script></link>
>> <url>javascript:alert('Channel Image URL Vulnerability - Type
>> 1');</url>
>>
>> <item>
>> <title><script>alert('Item Title Vulnerability - Type
>> 2')</script></title>
>> <description><script>alert('Item Description Vulnerability -
>> Type 2')</script></description>
>> <link><script>alert('Item Link Vulnerability - Type
>> 2')</script></link>
>> </item>
>>
>> </channel>
>>
>> </rss>
>>
>> ======== the output after running it thru antisamy is=========
>> <?xml version="1.0" encoding="UTF-8"?>
>> <rss version="2.0">
>> <!-- rsstest.markwoodman.com\malicious_2.rss --> <!-- Entity encoded
>> script insertion --> <channel>
>> <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
>> <title/>
>> <description>&lt;script&gt;alert('Channel Title Description
>> Vulnerability - Type 2')&lt;/script&gt;</description>
>> <link>&lt;script&gt;alert('Channel Link Vulnerability - Type
>> 2')&lt;/script&gt;</link>
>> <url>javascript:alert('Channel Image URL Vulnerability - Type
>> 1');</url>
>> <item>
>> <title>&lt;script&gt;alert('Item Title Vulnerability - Type
>> 2')&lt;/script&gt;</title>
>> <description>&lt;script&gt;alert('Item Description
>> Vulnerability - Type 2')&lt;/script&gt;</description>
>> <link>&lt;script&gt;alert('Item Link Vulnerability - Type
>> 2')&lt;/script&gt;</link>
>> </item>
>> </channel>
>> </rss>
>>
>>
>> NOTE: it didn't remove "javascript:alert" also in url link above.
>>
>> As you can see here (http://ha.ckers.org/xss.html), most of the time the
>> malicious html/javascript is going to be encoded using octal/hex/comment
>> tags to bypass regular expression filters of purifiers.
>>
>> an example
>> ------------------------------------------------------------
>> <IMG SRC="jav	ascript:alert('XSS');">
>> -------------------------------------------------------------
>>
>> So, wondering what's the best way to deal with this type of code ?
>>
>> appreciate your help.
>>
>> thanks,
>> Girish
>>
>>
>>
>> Jason Li wrote:
>>
>> That's definitely an issue if encoded HTML gets decoded by the DOM parser...
>>
>> That's something we need to look into and fix.
>>
>> Thanks for pointing that out Eric!
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> wrote:
>>
>>
>> The other problem I have seen with antisamy is that if the value you
>> send to antisamy is escaped... but you use the
>> getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
>> all the escaping... leaving you now with a value that would have
>> otherwise violated the policy file
>>
>>
>>
>> Jason Li wrote:
>>
>>
>> Girish,
>>
>> By default, script tags should be removed by AntiSamy.
>>
>> I think the problem may lie in your statement, "even if they are escaped."
>>
>> If you pass in:
>> <script>alert('Channel Title Description Vulnerability - Type 2')</script>
>>
>> to AntiSamy, you should get nothing back.
>>
>> However, your statement leads me to believe that in fact you're passing in:
>> <script>alert('Channel Title Description Vulnerability - Type
>> 2')</script>
>>
>> The above is "safe" from AntiSamy's perspective because it assumes
>> that the content is directly rendered in an HTML interpreter.
>>
>> My guess from the behavior you describe and examples you give sounds
>> like you have encoded HTML embedded in XML - so something that looks
>> like this (here the tainted input is contained in an XML element, item
>> description, and therefore encoded):
>> <rss version="2.0">
>> <channel>
>> <title>Example</title>
>> <link>http://example.com</link>
>> <description>Example</description>
>> <item>
>> <title>Example</title>
>> <link>http://example.com</link>
>> <description>This is the text that you're trying to validate
>> <script>alert('Channel Title Description Vulnerability - Type
>> 2')</script></description>
>> </item>
>> </channel>
>> </rss>
>>
>> AntiSamy can't know the context where your content is coming from -
>> it's expecting HTML content that goes to an HTML interpreter. If the
>> content you are provided is encoded HTML that goes to an interpreter
>> that decodes the HTML, AntiSamy won't be able to properly validate it.
>> You'd have to provide an HTML decoded version for AntiSamy to handle
>> properly.
>>
>> Am I interpreting your use case correctly? And if so, does that
>> explanation make sense?
>> --
>> -Jason Li-
>> -jason.li at owasp.org-
>>
>>
>>
>> On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> wrote:
>>
>>
>>
>> I am using 1.3 version and i have tried all the 4 policy files. They all
>> give the same result.
>>
>> For example, if my html is this (passing line by line to antisamy):
>>
>> <script>alert('Channel Title Description Vulnerability -
>> Type 2')</script>
>> <script>alert('Channel Link Vulnerability - Type
>> 2')</script>
>> javascript:alert('Channel Image URL Vulnerability - Type 1');
>>
>> the output I am getting is:
>>
>> <script>alert('Channel Title Description
>> Vulnerability - Type 2')</script>
>> <script>alert('Channel Link Vulnerability - Type
>> 2')</script>
>> javascript:alert('Channel Image URL Vulnerability - Type 1');
>>
>> any idea on how to remove the tags like
>> script/javascript/embed/frame/etc even if they are escaped.
>>
>>
>> _______________________________________________
>> Owasp-antisamy mailing list
>> Owasp-antisamy at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>
>>
>>
>> _______________________________________________
>> Owasp-antisamy mailing list
>> Owasp-antisamy at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090413/048a79ee/attachment-0001.html
More information about the Owasp-antisamy
mailing list