[Owasp-antisamy] escaped tags goes thru without getting removed

Arshan Dabirsiaghi arshan.dabirsiaghi at aspectsecurity.com
Tue Apr 14 15:24:21 EDT 2009


It passes all of RSnake's tests and more. You can look at the unit tests here:
 
http://code.google.com/p/owaspantisamy/source/browse/trunk/Java/current/TestSource/org/owasp/validator/html/test/AntiSamyTest.java <http://code.google.com/p/owaspantisamy/source/browse/trunk/Java/current/TestSource/org/owasp/validator/html/test/AntiSamyTest.java> 
 
Thanks,
Arshan

________________________________

From: owasp-antisamy-bounces at lists.owasp.org on behalf of Girish
Sent: Tue 4/14/2009 3:23 PM
To: Jason Li
Cc: owasp-antisamy at lists.owasp.org
Subject: Re: [Owasp-antisamy] escaped tags goes thru without getting removed


Jason,
btw,  is antisamy tested against RSnake's cheat sheet ? if so, how are the results ?

thanks,
Girish


Jason Li wrote: 

	Girish,
	
	I think RSnake's cheatsheet that you mentioned
	(http://ha.ckers.org/xss.html) is probably the best resource for
	malicious content.
	
	In terms of expanding it to cleanse XML, it's unlikely we'll do this
	because it's impossible for AntiSamy to determine how the XML content
	data will be used. XML by it's nature is extremely flexible and could
	be used in any number of contexts. Someone might be using a SOAP
	message to deliver base64 encoded content that is then further HTML
	entity encoded for interpreting by a browser at the end. AntiSamy
	can't unravel the onion so to speak.
	
	IMHO, it's a task best left to the developer as they're the ones who
	ultimately know the intended target.
	
	Thanks for the info about unescaping HTML! We'll have to look into
	what Eric said earlier in the thread about AntiSamy's behavior with
	encoded HTML.
	
	--
	-Jason Li-
	-jason.li at owasp.org-
	
	
	
	On Mon, Apr 13, 2009 at 7:01 PM, Girish <ivgirish at yahoo.com> <mailto:ivgirish at yahoo.com>  wrote:
	  

		Jason,
		quick questions:
		(1) do you have more sample html files with malicious content that I can use
		to test ?
		(2) any plans to expand antisamy to cleanse XML files with html/js malicious
		code in the future?
		
		btw, forgot to mention that
		http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html
		helps to unescape html/javascript text.
		
		thanks,
		Girish
		
		
		Jason Li wrote:
		
		Girish,
		
		With regards to validating the description and title elements, I'm
		surprised this isn't working straight out of box.
		
		My tests with the default AntiSamy policy file and plaintext version
		show that you should get an empty string back from AntiSamy.
		
		A few questions:
		- Are you using the default AntiSamy policy file?
		- When you extract the text content for each node from your DOM
		object, is it returning the XML encoded version of the text content
		(i.e. &lt;script&gt;) or just the unencoded text version (i.e.
		<script>)?
		- Can you also provide the input and output (debugging or
		System.out.println()) of AntiSamy?  I'd like to eliminate the
		possibility that it's the DOM implementation that you're using
		(perhaps returns text content different than expected or doesn't
		update the DOM tree as expected).
		
		As to your question about links, you're absolutely right about URLs
		such as javascript:alert();.  That's why AntiSamy applies URL
		validation to anchor tags and link tags in HTML.
		
		However, AntiSamy only validates HTML. I can't stress this enough:
		AntiSamy assumes the input it receives is meant to be interpreted as
		HTML.
		
		If you pass in text that is going to be interpreted as a URL (like the
		text content of the URL XML node), AntiSamy doesn't have context to
		know that it's going to be interpreted as URL. The best thing I can
		suggest is kind of a kluge-hack which is to take the HTML *attribute*
		encoded version of the text content of your URL XML node and place it
		inside a make shift anchor tag with an href so that AntiSamy will
		perform proper URL validation on it.
		
		Take that suggestion with a grain of salt though - it's off the top of
		my head and I haven't thought through all the security considerations.
		
		--
		-Jason Li-
		-jason.li at owasp.org-
		
		
		
		On Mon, Apr 13, 2009 at 2:52 PM, Girish <ivgirish at yahoo.com> <mailto:ivgirish at yahoo.com>  wrote:
		
		
		Jason,
		Thank you very much for detailed reply. Yes, you are right. The html content
		is inside RSS feeds. But I am not passing the entire xml to antisamy. I am
		actually
		- walking the xml tree
		- extracting the text content for each node
		- passing only text content to antisamy
		- then, updating the xml/dom tree with filtered content
		
		for example, here is the RSS feed that I am using for testing.
		
		<?xml version="1.0" ?>
		<rss version="2.0">
		    <!-- rsstest.markwoodman.com\malicious_2.rss -->
		    <!-- Entity encoded script insertion -->
		    <channel>
		
		      <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
		
		      <title><script>alert('Channel Title Vulnerability - Type
		2')</script></title>
		      <description>&lt;script&gt;alert('Channel Title Description
		Vulnerability - Type 2')&lt;/script&gt;</description>
		      <link>&lt;script&gt;alert('Channel Link Vulnerability - Type
		2')&lt;/script&gt;</link>
		        <url>javascript:alert('Channel Image URL Vulnerability - Type
		1');</url>
		
		      <item>
		        <title>&lt;script&gt;alert('Item Title Vulnerability - Type
		2')&lt;/script&gt;</title>
		        <description>&lt;script&gt;alert('Item Description Vulnerability -
		Type 2')&lt;/script&gt;</description>
		        <link>&lt;script&gt;alert('Item Link Vulnerability - Type
		2')&lt;/script&gt;</link>
		      </item>
		
		    </channel>
		
		</rss>
		
		======== the output after running it thru antisamy is=========
		<?xml version="1.0" encoding="UTF-8"?>
		<rss version="2.0">
		<!-- rsstest.markwoodman.com\malicious_2.rss -->    <!-- Entity encoded
		script insertion -->    <channel>
		      <pubDate>Tue, 9 Aug 2006 00:00:00 GMT</pubDate>
		      <title/>
		      <description>&amp;lt;script&amp;gt;alert('Channel Title Description
		Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
		      <link>&amp;lt;script&amp;gt;alert('Channel Link Vulnerability - Type
		2')&amp;lt;/script&amp;gt;</link>
		      <url>javascript:alert('Channel Image URL Vulnerability - Type
		1');</url>
		      <item>
		         <title>&amp;lt;script&amp;gt;alert('Item Title Vulnerability - Type
		2')&amp;lt;/script&amp;gt;</title>
		         <description>&amp;lt;script&amp;gt;alert('Item Description
		Vulnerability - Type 2')&amp;lt;/script&amp;gt;</description>
		         <link>&amp;lt;script&amp;gt;alert('Item Link Vulnerability - Type
		2')&amp;lt;/script&amp;gt;</link>
		      </item>
		   </channel>
		</rss>
		
		
		NOTE: it didn't remove "javascript:alert" also in url link above.
		
		As you can see here (http://ha.ckers.org/xss.html), most of the time the
		malicious html/javascript is going to be encoded using octal/hex/comment
		tags to bypass regular expression filters of purifiers.
		
		an example
		------------------------------------------------------------
		<IMG SRC="jav&#x09;ascript:alert('XSS');">
		-------------------------------------------------------------
		
		So, wondering what's the best way to deal with this type of code ?
		
		appreciate your help.
		
		thanks,
		Girish
		
		
		
		Jason Li wrote:
		
		That's definitely an issue if encoded HTML gets decoded by the DOM parser...
		
		That's something we need to look into and fix.
		
		Thanks for pointing that out Eric!
		--
		-Jason Li-
		-jason.li at owasp.org-
		
		
		
		On Mon, Apr 13, 2009 at 1:28 PM, Eric Kreiser <ekreiser at mzinga.com> <mailto:ekreiser at mzinga.com>  wrote:
		
		
		The other problem I have seen with antisamy is that if the value you
		send to antisamy is escaped... but you use the
		getCleanXMLDocumentFragment() to get your scrubbed value... it reverses
		all the escaping... leaving you now with a value that would have
		otherwise violated the policy file
		
		
		
		Jason Li wrote:
		
		
		Girish,
		
		By default, script tags should be removed by AntiSamy.
		
		I think the problem may lie in your statement, "even if they are escaped."
		
		If you pass in:
		<script>alert('Channel Title Description Vulnerability - Type 2')</script>
		
		to AntiSamy, you should get nothing back.
		
		However, your statement leads me to believe that in fact you're passing in:
		&lt;script&gt;alert('Channel Title Description Vulnerability - Type
		2')&lt;/script&gt;
		
		The above is "safe" from AntiSamy's perspective because it assumes
		that the content is directly rendered in an HTML interpreter.
		
		My guess from the behavior you describe and examples you give sounds
		like you have encoded HTML embedded in XML - so something that looks
		like this (here the tainted input is contained in an XML element, item
		description, and therefore encoded):
		<rss version="2.0">
		  <channel>
		    <title>Example</title>
		    <link>http://example.com <http://example.com/> </link>
		    <description>Example</description>
		    <item>
		      <title>Example</title>
		      <link>http://example.com <http://example.com/> </link>
		      <description>This is the text that you're trying to validate
		&lt;script&gt;alert('Channel Title Description Vulnerability - Type
		2')&lt;/script&gt;</description>
		    </item>
		  </channel>
		</rss>
		
		AntiSamy can't know the context where your content is coming from -
		it's expecting HTML content that goes to an HTML interpreter. If the
		content you are provided is encoded HTML that goes to an interpreter
		that decodes the HTML, AntiSamy won't be able to properly validate it.
		You'd have to provide an HTML decoded version for AntiSamy to handle
		properly.
		
		Am I interpreting your use case correctly? And if so, does that
		explanation make sense?
		--
		-Jason Li-
		-jason.li at owasp.org-
		
		
		
		On Fri, Apr 10, 2009 at 6:52 PM, Girish <ivgirish at yahoo.com> <mailto:ivgirish at yahoo.com>  wrote:
		
		
		
		I am using 1.3 version and i have tried all the 4 policy files. They all
		give the same result.
		
		For example, if my html is this (passing line by line to antisamy):
		
		     <script>alert('Channel Title Description Vulnerability -
		Type 2')</script>
		     <script>alert('Channel Link Vulnerability - Type
		2')</script>
		     javascript:alert('Channel Image URL Vulnerability - Type 1');
		
		the output I am getting is:
		
		     &lt;script&gt;alert('Channel Title Description
		Vulnerability - Type 2')&lt;/script&gt;
		     &lt;script&gt;alert('Channel Link Vulnerability - Type
		2')&lt;/script&gt;
		     javascript:alert('Channel Image URL Vulnerability - Type 1');
		
		any idea on how to remove the tags like
		script/javascript/embed/frame/etc even if they are escaped.
		
		
		_______________________________________________
		Owasp-antisamy mailing list
		Owasp-antisamy at lists.owasp.org
		https://lists.owasp.org/mailman/listinfo/owasp-antisamy
		
		
		
		_______________________________________________
		Owasp-antisamy mailing list
		Owasp-antisamy at lists.owasp.org
		https://lists.owasp.org/mailman/listinfo/owasp-antisamy
		
		
		
		
		
		    

	  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20090414/857decf0/attachment-0001.html 


More information about the Owasp-antisamy mailing list