[Owasp-antisamy] escaped tags goes thru without getting removed
Arshan Dabirsiaghi
arshan.dabirsiaghi at aspectsecurity.com
Tue Apr 14 15:24:21 EDT 2009
It passes all of RSnake's tests and more. You can look at the unit tests here:
http://code.google.com/p/owaspantisamy/source/browse/trunk/Java/current/TestSource/org/owasp/validator/html/test/AntiSamyTest.java <http://code.google.com/p/owaspantisamy/source/browse/trunk/Java/current/TestSource/org/owasp/validator/html/test/AntiSamyTest.java>
Thanks,
Arshan
________________________________
From: owasp-antisamy-bounces at lists.owasp.org on behalf of Girish
Sent: Tue 4/14/2009 3:23 PM
To: Jason Li
Cc: owasp-antisamy at lists.owasp.org
Subject: Re: [Owasp-antisamy] escaped tags goes thru without getting removed
Jason,
btw, is antisamy tested against RSnake's cheat sheet ? if so, how are the results ?
thanks,
Girish
Jason Li wrote:
Girish,
I think RSnake's cheatsheet that you mentioned
(http://ha.ckers.org/xss.html) is probably the best resource for
malicious content.
In terms of expanding it to cleanse XML, it's unlikely we'll do this
because it's impossible for AntiSamy to determine how the XML content
data will be used. XML by it's nature is extremely flexible and could
be used in any number of contexts. Someone might be using a SOAP
message to deliver base64 encoded content that is then further HTML
entity encoded for interpreting by a browser at the end. AntiSamy
can't unravel the onion so to speak.
IMHO, it's a task best left to the developer as they're the ones who
ultimately know the intended target.
Thanks for the info about unescaping HTML! We'll have to look into
what Eric said earlier in the thread about AntiSamy's behavior with
encoded HTML.
--
-Jason Li-
-jason.li at owasp.org-
On Mon, Apr 13, 2009 at 7:01 PM, Girish <ivgirish at yahoo.com> <mailto:ivgirish at yahoo.com> wrote:
Jason,
quick questions:
(1) do you have more sample html files with malicious content that I can use
to test ?
(2) any plans to expand antisamy to cleanse XML files with html/js malicious
code in the future?
btw, forgot to mention that
http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html
helps to unescape html/javascript text.
thanks,
Girish
Jason Li wrote:
Girish,
With regards to validating the description and title elements, I'm
surprised this isn't working straight out of box.
My tests with the default AntiSamy policy file and plaintext version
show that you should get an empty string back from AntiSamy.
A few questions:
- Are you using the default AntiSamy policy file?
- When you extract the text content for each node from your DOM
object, is it returning the XML encoded version of the text content
(i.e. <script>) or just the unencoded text version (i.e.
<script>)?
- Can you also provide the input and output (debugging or
System.out.println()) of AntiSamy? I'd like to eliminate the
possibility that it's the DOM implementation that you're using
(perhaps returns text content different than expected or doesn't
update the DOM tree as expected).
As to your question about links, you're absolutely right about URLs
such as javascript:alert();. That's why AntiSamy applies URL
validation to anchor tags and link tags in HTML.
However, AntiSamy only validates HTML. I can't stress this enough:
AntiSamy assumes the input it receives is meant to be interpreted as
HTML.
If you pass in text that is going to be interpreted as a URL (like the
text content of the URL XML node), AntiSamy doesn't have context to
know that it's going to be interpreted as URL. The best thing I can
suggest is kind of a kluge-hack which is to take the HTML *attribute*
encoded version of the text content of your URL XML node and place it
inside a make shift anchor tag with an href so that AntiSamy will
perform proper URL validation on it.
Take that suggestion with a grain of salt though - it's off the top of
my head and I haven't thought through all the security considerations.
--
-Jason Li-
-jason.li at owasp.org-
On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> <mailto:ivgirish at yahoo.com> wrote:
Jason,
Thank you very much for detailed reply. Yes, you are right. The html content
is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
actually
- walking the xml tree
- extracting the text content for each node
- passing only text content to antisamy
- then, updating the xml/dom tree with filtered content
for example, here is the RSS feed that I am using for testing.
<?xml version="1.0" ?>
<rss version="2.0">
<!-- rsstest.markwoodman.com\malicious_2.rss -->
<!-- Entity encoded script insertion -->
<channel>
<pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
<title><script>alert('Channel Title Vulnerability - Type
2')</script></title>
<description><script>alert('Channel Title Description
Vulnerability - Type 2')</script></description>
<link><script>alert('Channel Link Vulnerability - Type
2')</script></link>
<url>javascript:alert('Channel Image URL Vulnerability - Type
1');</url>
<item>
<title><script>alert('Item Title Vulnerability - Type
2')</script></title>
<description><script>alert('Item Description Vulnerability -
Type 2')</script></description>
<link><script>alert('Item Link Vulnerability - Type
2')</script></link>
</item>
</channel>
</rss>
======== the output after running it thru antisamy is=========
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<!-- rsstest.markwoodman.com\malicious_2.rss --> <!-- Entity encoded
script insertion --> <channel>
<pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
<title/>
<description>&lt;script&gt;alert('Channel Title Description
Vulnerability - Type 2')&lt;/script&gt;</description>
<link>&lt;script&gt;alert('Channel Link Vulnerability - Type
2')&lt;/script&gt;</link>
<url>javascript:alert('Channel Image URL Vulnerability - Type
1');</url>
<item>
<title>&lt;script&gt;alert('Item Title Vulnerability - Type
2')&lt;/script&gt;</title>
<description>&lt;script&gt;alert('Item Description
Vulnerability - Type 2')&lt;/script&gt;</description>
<link>&lt;script&gt;alert('Item Link Vulnerability - Type
2')&lt;/script&gt;</link>
</item>
</channel>
</rss>
NOTE: it didn't remove "javascript:alert" also in url link above.
As you can see here (http://ha.ckers.org/xss.html), most of the time the
malicious html/javascript is going to be encoded using octal/hex/comment
tags to bypass regular expression filters of purifiers.
an example
------------------------------------------------------------
<IMG SRC="jav	ascript:alert('XSS');">
-------------------------------------------------------------
So, wondering what's the best way to deal with this type of code ?
appreciate your help.
thanks,
Girish
Jason Li wrote:
That's definitely an issue if encoded HTML gets decoded by the DOM parser...
That's something we need to look into and fix.
Thanks for pointing that out Eric!
--
-Jason Li-
-jason.li at owasp.org-
On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> <mailto:ekreiser at mzinga.com> wrote:
The other problem I have seen with antisamy is that if the value you
send to antisamy is escaped... but you use the
getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
all the escaping... leaving you now with a value that would have
otherwise violated the policy file
Jason Li wrote:
Girish,
By default, script tags should be removed by AntiSamy.
I think the problem may lie in your statement, "even if they are escaped."
If you pass in:
<script>alert('Channel Title Description Vulnerability - Type 2')</script>
to AntiSamy, you should get nothing back.
However, your statement leads me to believe that in fact you're passing in:
<script>alert('Channel Title Description Vulnerability - Type
2')</script>
The above is "safe" from AntiSamy's perspective because it assumes
that the content is directly rendered in an HTML interpreter.
My guess from the behavior you describe and examples you give sounds
like you have encoded HTML embedded in XML - so something that looks
like this (here the tainted input is contained in an XML element, item
description, and therefore encoded):
<rss version="2.0">
<channel>
<title>Example</title>
<link>http://example.com <http://example.com/> </link>
<description>Example</description>
<item>
<title>Example</title>
<link>http://example.com <http://example.com/> </link>
<description>This is the text that you're trying to validate
<script>alert('Channel Title Description Vulnerability - Type
2')</script></description>
</item>
</channel>
</rss>
AntiSamy can't know the context where your content is coming from -
it's expecting HTML content that goes to an HTML interpreter. If the
content you are provided is encoded HTML that goes to an interpreter
that decodes the HTML, AntiSamy won't be able to properly validate it.
You'd have to provide an HTML decoded version for AntiSamy to handle
properly.
Am I interpreting your use case correctly? And if so, does that
explanation make sense?
--
-Jason Li-
-jason.li at owasp.org-
On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> <mailto:ivgirish at yahoo.com> wrote:
I am using 1.3 version and i have tried all the 4 policy files. They all
give the same result.
For example, if my html is this (passing line by line to antisamy):
<script>alert('Channel Title Description Vulnerability -
Type 2')</script>
<script>alert('Channel Link Vulnerability - Type
2')</script>
javascript:alert('Channel Image URL Vulnerability - Type 1');
the output I am getting is:
<script>alert('Channel Title Description
Vulnerability - Type 2')</script>
<script>alert('Channel Link Vulnerability - Type
2')</script>
javascript:alert('Channel Image URL Vulnerability - Type 1');
any idea on how to remove the tags like
script/javascript/embed/frame/etc even if they are escaped.
_______________________________________________
Owasp-antisamy mailing list
Owasp-antisamy at lists.owasp.org
https://lists.owasp.org/mailman/listinfo/owasp-antisamy
_______________________________________________
Owasp-antisamy mailing list
Owasp-antisamy at lists.owasp.org
https://lists.owasp.org/mailman/listinfo/owasp-antisamy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090414/857decf0/attachment-0001.html
More information about the Owasp-antisamy
mailing list