[Owasp-antisamy] antisamy problems with  

Arshan Dabirsiaghi arshan.dabirsiaghi at aspectsecurity.com
Fri Apr 11 12:44:36 EDT 2008


Have you found that the HTMLSerializer has any side effects? Like, for
instance, adding a "type" to a <script> field if it doesn't have one,
etc.? Or wrapping text within <body> and <html> tags?

Arshan

-----Original Message-----
From: owasp-antisamy-bounces at lists.owasp.org
[mailto:owasp-antisamy-bounces at lists.owasp.org] On Behalf Of Joel
Worrall
Sent: Thursday, April 10, 2008 9:36 AM
To: owasp-antisamy at lists.owasp.org
Subject: Re: [Owasp-antisamy] antisamy problems with &nbsp;

I found a viable workaround to my formatting and output issues with HTML
characters (&nbsp;, &#160;, &, etc). I had additional issues with
AntiSamy's assumptions regarding the output formatting for line
separators and indents. Both of those issues are resolved as well with
this method.

Instead of using the CleanResults.getCleanHTML() method, use the
following code:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.HTMLSerializer;

...


OutputFormat format = new OutputFormat();
format.setEncoding("UTF-8");
format.setOmitXMLDeclaration(true);
format.setOmitDocumentType(true);
format.setIndent(0);
format.setIndenting(false);
format.setLineSeparator("");
format.setPreserveEmptyAttributes(true);
format.setOmitComments(true);
format.setPreserveSpace(true);
format.setStandalone(true);

HTMLSerializer serializer = new HTMLSerializer(format);

CleanResults cr = antisamy.scan(stringToBeCleaned,
Policy.getInstance());

StringWriter sw = new StringWriter();
serializer.setOutputCharStream(sw);
serializer.serialize(cr.getCleanXMLDocumentFragment());
cleanString = sw.toString();

In this case, the input:

<p>Does this work?&nbsp;&nbsp;&nbsp;</p><br/>

Becomes the output:

<p>Does this work?&nbsp;&nbsp;&nbsp;</p><br />

For my purposes, that's close enough.

The downside with this approach is that HTMLSerializer is deprecated, so
the code is effective but will end-of-life at some point in the future.

_______________________________________________
Owasp-antisamy mailing list
Owasp-antisamy at lists.owasp.org
https://lists.owasp.org/mailman/listinfo/owasp-antisamy


More information about the Owasp-antisamy mailing list