[owasp-antisamy] French character issue in Antisamy

Jason Li jason.li at owasp.org
Fri Aug 12 15:57:30 EDT 2011


I'm not sure what you mean by "supports internationalization".

As I said, the HTML encoded version of the character should display the
proper international character in an HTML context - which is what AntiSamy
is designed to do.

I understand that you want the exact same text to appear before and after.
But As I mentioned before, my understanding is that past successful XSS
attacks have relied on unencoded international characters being
misinterpreted by browsers. The idea of AntiSamy is to make the input safe.
In the same way that '<' is encoded into '&lt;', an international character
is encoded to the safe HTML entity version of that character. So in that
sense, AntiSamy supports internationalization.

If the data is being placed into an HTML context, then the international
character should appear. Can you clarify your use case so we can understand
how you're using AntiSamy?

Unfortunately, I do not know when the next release of AntiSamy will be made.
That is a question for the project lead.


On Fri, Aug 12, 2011 at 5:07 AM, Jobus <jobuss at gmail.com> wrote:

> Hi Jaosn,
> I appreciate your immediate replay. I am wondering then how AntiSamy work
> well with website which supports internationalization or multilanguage.
> it would be nice if it supports multilingual characters/disabling only the
> encoding.
> My application is an application which supports internationalization.
> Whatever user enter in textbox i validate by AntiSamy, if a french user is
> entered some french character \, after the validation i noticed the
> character difference.
> I have noticed the bellow thread
> http://code.google.com/p/owaspantisamy/issues/detail?id=101
> May i know when the version 1.5 will be released?
> thanks
> Jobu
> On Thu, Aug 11, 2011 at 8:11 PM, Jason Li <jason.li at owasp.org> wrote:
>> Jobu,
>> I believe this encoding is being done by the NekoHTML parser - though
>> someone on the AntiSamy mailing list can correct me if I'm wrong. There may
>> be a way to override this behavior but off the top of my head I'm not sure.
>> AntiSamy is meant to be an HTML validation/sanitizing engine and &eacute;
>> is the properly encoded HTML version of that particular character. Changing
>> this encoding behavior can probably be done - but I believe there have been
>> known XSS attacks in the pas that have depended on the fact that some
>> international letters are interpreted differently depending on locale and
>> region. As a result, I believe it's safer to rely on the HTML entity encoded
>> version if possible.
>> Obviously if you're not placing the data directly into an HTML context,
>> that conversion might have side effects...
>> -Jason
>> On Thu, Aug 11, 2011 at 7:12 AM, Jobus <jobuss at gmail.com> wrote:
>>> Hi Jason,
>>> I am facing an issue related to Antisamy. In my application user can give
>>> input in French characters. But Antisamy is encoding it and not giving the
>>> input string back
>>> eg:
>>> My input string is
>>> Pour accéder au journal de test
>>> and output given from getCleanHTMl is
>>> Pour acc&eacute;der au journal de test
>>> how can i solve this issue? i need to get exactly the same input string i
>>> provided. mine is a multilingual application.
>>> I really appreciate if you can help me on this issue.
>>> tanks
>>> Jobu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20110812/8e206085/attachment.html 

More information about the Owasp-antisamy mailing list