[Owasp-antisamy] antisamy problems with  

Arshan Dabirsiaghi arshan.dabirsiaghi at aspectsecurity.com
Mon Mar 31 21:28:30 EDT 2008

Sure. Thanks for your input, and I dearly want to address it. Let me use this opportunity to first summarize, for myself mostly, the character set issues we've run into in 1.0:
1. The policy files don't consider non-latin Unicode characters (should be fixed in 1.1.1 or 1.2)
2. Special characters like MS Word's extra long dash were turned into not relevant characters (fixed in 1.1)
3. ASCII entity replacement weirdness, like   becoming á
So, the onus was on me to fix #3 for 1.1. I probably should have mentioned on this list that this may be a translation issue from my research.
Consider the following link:
It looked like AntiSamy is correctly generating the data in UTF-8 but that whatever mechanism people are using to view the output is iso-8859-1 (or another set that's not UTF-8). That may be wrong - it also may be that the CyberNeko HTML library that I use to parse the data into a DOM defaults its encoding with a value from the JVM (since AntiSamy doesn't specify) to that of a non-UTF-8 character set.
Given that, and these facts:
a) I'm no encoding expert and may not realize the implications of global character translations
b) The entity replacement wasn't buying any security (that we know of)
c) I want to be as character set neutral as possible
... I ripped out the entity replacement. It was probably my mistake to be character set neutral, and what may end up happening is the addition of a set of parameters to AntiSamy that specify the input and output character sets.
So I guess in summary - the entity replacement was ripped out because I didn't feel like it was adding much, and the translation of   to Unicode &#160 in any non-UTF readers was causing a problem. In the future, however, I may put it in, but that enforces UTF-8 on our API users, and I'm not sure if that's a good route to take either.
Am I making any sense? Let me know what you think.


From: owasp-antisamy-bounces at lists.owasp.org on behalf of Joel Worrall
Sent: Mon 3/31/2008 6:04 PM
To: owasp-antisamy at lists.owasp.org
Subject: [Owasp-antisamy] antisamy problems with  


I found a previous thread where people were mentioning trouble with HTML "nbsp" being turned into non-standard characters like "á". Also read your reply where you committed to address that issue in 1.1.

I see in the code that the AntiSamyDOMScanner has the method "replaceEntityCodes" commented out. I read from a reply you posted on February 27 that you planned to replace the HTML 1.0 process that is performing that work, but that code (the code I assume you mean to resolve the issue) is commented out in 1.1.

Can you comment or provide further assistance? Is there a reason the HTML entities are not processed by antisamy's own process?

Thanks for the help and for writing this,

Owasp-antisamy mailing list
Owasp-antisamy at lists.owasp.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20080331/486d8538/attachment.html 

More information about the Owasp-antisamy mailing list