[Esapi-user] [Esapi-dev] Localization and InputValidation

Rob Spremulli rob.spremulli+esapi at gmail.com
Thu Jan 28 10:29:26 EST 2010


A follow-up question which might simplify things: is there any known danger
in whitelisting Unicode characters above 0x7F?

2010/1/27 Rob Spremulli
<rob.spremulli+esapi at gmail.com<rob.spremulli%2Besapi at gmail.com>
>

> I did a little bit more research on the side; from the Pattern class
> javadocs, intersting parts bolded
>
>>  Unicode support
>>
>> This class follows Unicode Technical Report #18: Unicode Regular
>> Expression Guidelines, implementing its second level of support though with
>> a slightly different concrete syntax.
>>
>> ...
>>
>> Unicode blocks and categories are written with the \p and \P constructs as
>> in Perl. \p{prop} matches if the input has the property prop, while \P{prop}
>> does not match if the input has that property. *Blocks are specified with
>> the prefix In, as in InMongolian*. Categories may be specified with the
>> optional prefix Is: Both \p{L} and \p{IsL} denote the category of Unicode
>> letters. Blocks and categories can be used both inside and outside of a
>> character class.
>>
>> The supported blocks and categories are those of The Unicode Standard,
>> Version 3.0. The block names are those defined in Chapter 14 and in the file
>> Blocks-3.txt <http://www.unicode.org/Public/3.0-Update/Blocks-3.txt> of
>> the Unicode Character Database except that the spaces are removed; "Basic
>> Latin", for example, becomes "BasicLatin". The category names are those
>> defined in table 4-5 of the Standard (p. 88), both normative and
>> informative.
>>
> I wrote a simple little test program, and 30b0 seems to pass the regex
> \\p{InKatakana}, with and without ESAPI canonicalization.
> It's a low knowledge area for me, so if anyone with more experience can
> chime in, I'd be grateful.
>
> 2010/1/27 Calderon, Juan Carlos (GE, Corporate, consultant) <
> juan.calderon at ge.com>
>
>  This is interesting, AFAIK, there is java versions in Japanese, what I
>> mean is that actual code is written in Japanese not only string variable
>> values. So I wonder how regex patterns are build and would work on those
>> versions, I guess just as Jim is wondering. Maybe the same is true for
>> Chinese, and in that case we can ask someone at Hong Kong or China-Mainland
>> chapters for some more light on this topic.
>>
>> What do you think?
>>
>> Regards,
>> *Juan C Calderon*
>>
>>
>>  ------------------------------
>> *From:* esapi-dev-bounces at lists.owasp.org [mailto:
>> esapi-dev-bounces at lists.owasp.org] *On Behalf Of *Jim Manico
>> *Sent:* Miércoles, 27 de Enero de 2010 12:16 a.m.
>> *To:* rob.spremulli+esapi at gmail.com <rob.spremulli%2Besapi at gmail.com>
>> *Cc:* ESAPI-Developers; esapi-user at lists.owasp.org
>> *Subject:* Re: [Esapi-dev] [Esapi-user] Localization and InputValidation
>>
>>   I cannot answer this easily. Does anyone else on the dev team have
>> experience with i18n and RegEx's inside of ESAPI?
>>
>> - Jim
>>
>>  Hi guys, a question has arisen re: input validation
>>
>> I should prefix this by stating we are on 1.4, not 2.0.
>>
>> Let's say I want to pass "グ" in my input.  For those of you who can't read
>> that, it's a Japanese Katakana with Unicode value 30B0
>>  http://www.fileformat.info/info/unicode/char/30b0/index.htm
>>
>> I want to allow this in my input, so I need to create a regex that will
>> permit it.  What I'm not sure about is:
>> 1) what canonicalize is going to do to that string, and
>> 2) if there's a locale-aware way of identifying characters in a regex.
>>
>> I can see this potentially showing up as
>> \u30b0, where I would need to permit \ characters,
>> \u30b0, where the slash is encoded, though I doubt this.
>>>>
>> the latter can lead to two possibilities
>> 1) my regex would need to allow a range of Unicode values
>> 2) a character class (\p{Alpha} and such) would seamlessly match 'letters'
>> of any langauge.
>>
>> The confusion on my end is due to lack of knowledge on characters outside
>> the typical US character set.  Can anyone shed some light on this issue, as
>> to the expected canonicalization and recommended whitelist regex?
>>
>>
>>
>> _______________________________________________
>> Esapi-user mailing listEsapi-user at lists.owasp.orghttps://lists.owasp.org/mailman/listinfo/esapi-user
>>
>>
>>
>> --
>> Jim Manico
>> OWASP Podcast Host/Producer
>> OWASP ESAPI Project Managerhttp://www.manico.net
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-user/attachments/20100128/d42335c2/attachment.html 


More information about the Esapi-user mailing list