[owasp-antisamy] French character issue in Antisamy

Jason Li jason.li at owasp.org
Thu Aug 18 10:45:42 EDT 2011


Jobu,

AntiSamy is meant to validate and sanitize HTML data. The intention is that
any data that has been passed through AntiSamy can be used safely in an HTML
context.

If you put "Pour accéder au journal de test" directly into an HTML
context (as in the attached file), it will display properly in any
major/modern web browser.

Based on your description below and your original email, I believe there is
probably some code in your application that is applying additional HTML
entity encoding to the data after it has been through AntiSamy.

If your application is applying additional HTML entity encoding, then the
output of AntiSamy ("Pour accéder au journal de test") will be
encoded an extra time resulting in "Pour accéder au journal de
test" which would not display correctly.

-Jason

On Tue, Aug 16, 2011 at 7:42 AM, Jobus <jobuss at gmail.com> wrote:

> Hi Jason,
> I am extremely sorry if i am trying to confuse you. let me explain the
> issue once more briefly..
>
> My web application supports different languages en_US,ja_JP, fr_FR, etc
> User can enter English, Japanese, French strings through UI. We are
> validating the string the user entered in textbox/textarea through AntiSamy.
> so that it can filter if any html/javascript content they have entered.
>
> The  issue is
>
> if the user entered any french string; after the AntiSamy validation, we
> are getting the encoded string, it is different from what the user entered.
> Since the encoded string is getting stored in the DB, when we show it back
> to UI, it is not what actually user provided and its wrong behavior. Any
> quick idea you are thinking to resolve this issue?
>
>
> thanks in advance
> Jobu.
>
>
>
>
> On Sat, Aug 13, 2011 at 1:27 AM, Jason Li <jason.li at owasp.org> wrote:
>
>> Jobu,
>>
>> I'm not sure what you mean by "supports internationalization".
>>
>> As I said, the HTML encoded version of the character should display the
>> proper international character in an HTML context - which is what AntiSamy
>> is designed to do.
>>
>> I understand that you want the exact same text to appear before and after.
>> But As I mentioned before, my understanding is that past successful XSS
>> attacks have relied on unencoded international characters being
>> misinterpreted by browsers. The idea of AntiSamy is to make the input safe.
>> In the same way that '<' is encoded into '&lt;', an international character
>> is encoded to the safe HTML entity version of that character. So in that
>> sense, AntiSamy supports internationalization.
>>
>> If the data is being placed into an HTML context, then the international
>> character should appear. Can you clarify your use case so we can understand
>> how you're using AntiSamy?
>>
>>  Unfortunately, I do not know when the next release of AntiSamy will be
>> made. That is a question for the project lead.
>>
>> -Jason
>>
>> On Fri, Aug 12, 2011 at 5:07 AM, Jobus <jobuss at gmail.com> wrote:
>>
>>> Hi Jaosn,
>>> I appreciate your immediate replay. I am wondering then how AntiSamy work
>>> well with website which supports internationalization or multilanguage.
>>>
>>> it would be nice if it supports multilingual characters/disabling only
>>> the encoding.
>>> My application is an application which supports internationalization.
>>> Whatever user enter in textbox i validate by AntiSamy, if a french user
>>> is entered some french character \, after the validation i noticed the
>>> character difference.
>>> I have noticed the bellow thread
>>> http://code.google.com/p/owaspantisamy/issues/detail?id=101
>>>
>>> May i know when the version 1.5 will be released?
>>>
>>> thanks
>>> Jobu
>>>
>>>
>>> On Thu, Aug 11, 2011 at 8:11 PM, Jason Li <jason.li at owasp.org> wrote:
>>>
>>>> Jobu,
>>>>
>>>> I believe this encoding is being done by the NekoHTML parser - though
>>>> someone on the AntiSamy mailing list can correct me if I'm wrong. There may
>>>> be a way to override this behavior but off the top of my head I'm not sure.
>>>>
>>>> AntiSamy is meant to be an HTML validation/sanitizing engine and
>>>> &eacute; is the properly encoded HTML version of that particular character.
>>>> Changing this encoding behavior can probably be done - but I believe there
>>>> have been known XSS attacks in the pas that have depended on the fact that
>>>> some international letters are interpreted differently depending on locale
>>>> and region. As a result, I believe it's safer to rely on the HTML entity
>>>> encoded version if possible.
>>>>
>>>> Obviously if you're not placing the data directly into an HTML context,
>>>> that conversion might have side effects...
>>>>
>>>> -Jason
>>>>
>>>> On Thu, Aug 11, 2011 at 7:12 AM, Jobus <jobuss at gmail.com> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> I am facing an issue related to Antisamy. In my application user can
>>>>> give input in French characters. But Antisamy is encoding it and not giving
>>>>> the input string back
>>>>>
>>>>> eg:
>>>>> My input string is
>>>>>
>>>>> Pour accéder au journal de test
>>>>>
>>>>> and output given from getCleanHTMl is
>>>>>
>>>>> Pour acc&eacute;der au journal de test
>>>>>
>>>>> how can i solve this issue? i need to get exactly the same input string
>>>>> i provided. mine is a multilingual application.
>>>>>
>>>>> I really appreciate if you can help me on this issue.
>>>>>
>>>>> tanks
>>>>> Jobu
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20110818/09d4316d/attachment-0002.html 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/owasp-antisamy/attachments/20110818/09d4316d/attachment-0003.html 


More information about the Owasp-antisamy mailing list