[Esapi-user] [Esapi-dev] Localization and InputValidation

Rob Spremulli rob.spremulli+esapi at gmail.com
Wed Jan 27 10:48:43 EST 2010

I did a little bit more research on the side; from the Pattern class
javadocs, intersting parts bolded

>  Unicode support
> This class follows Unicode Technical Report #18: Unicode Regular Expression
> Guidelines, implementing its second level of support though with a slightly
> different concrete syntax.
> ...
> Unicode blocks and categories are written with the \p and \P constructs as
> in Perl. \p{prop} matches if the input has the property prop, while \P{prop}
> does not match if the input has that property. *Blocks are specified with
> the prefix In, as in InMongolian*. Categories may be specified with the
> optional prefix Is: Both \p{L} and \p{IsL} denote the category of Unicode
> letters. Blocks and categories can be used both inside and outside of a
> character class.
> The supported blocks and categories are those of The Unicode Standard,
> Version 3.0. The block names are those defined in Chapter 14 and in the file
> Blocks-3.txt <http://www.unicode.org/Public/3.0-Update/Blocks-3.txt> of
> the Unicode Character Database except that the spaces are removed; "Basic
> Latin", for example, becomes "BasicLatin". The category names are those
> defined in table 4-5 of the Standard (p. 88), both normative and
> informative.
I wrote a simple little test program, and 30b0 seems to pass the regex
\\p{InKatakana} <file://p%7binkatakana/>, with and without ESAPI
It's a low knowledge area for me, so if anyone with more experience can
chime in, I'd be grateful.

2010/1/27 Calderon, Juan Carlos (GE, Corporate, consultant) <
juan.calderon at ge.com>

>  This is interesting, AFAIK, there is java versions in Japanese, what I
> mean is that actual code is written in Japanese not only string variable
> values. So I wonder how regex patterns are build and would work on those
> versions, I guess just as Jim is wondering. Maybe the same is true for
> Chinese, and in that case we can ask someone at Hong Kong or China-Mainland
> chapters for some more light on this topic.
> What do you think?
> Regards,
> *Juan C Calderon*
>  ------------------------------
> *From:* esapi-dev-bounces at lists.owasp.org [mailto:
> esapi-dev-bounces at lists.owasp.org] *On Behalf Of *Jim Manico
> *Sent:* Miércoles, 27 de Enero de 2010 12:16 a.m.
> *To:* rob.spremulli+esapi at gmail.com <rob.spremulli%2Besapi at gmail.com>
> *Cc:* ESAPI-Developers; esapi-user at lists.owasp.org
> *Subject:* Re: [Esapi-dev] [Esapi-user] Localization and InputValidation
>   I cannot answer this easily. Does anyone else on the dev team have
> experience with i18n and RegEx's inside of ESAPI?
> - Jim
>  Hi guys, a question has arisen re: input validation
> I should prefix this by stating we are on 1.4, not 2.0.
> Let's say I want to pass "グ" in my input.  For those of you who can't read
> that, it's a Japanese Katakana with Unicode value 30B0
>  http://www.fileformat.info/info/unicode/char/30b0/index.htm
> I want to allow this in my input, so I need to create a regex that will
> permit it.  What I'm not sure about is:
> 1) what canonicalize is going to do to that string, and
> 2) if there's a locale-aware way of identifying characters in a regex.
> I can see this potentially showing up as
> \u30b0, where I would need to permit \ characters,
> \u30b0, where the slash is encoded, though I doubt this.
> the latter can lead to two possibilities
> 1) my regex would need to allow a range of Unicode values
> 2) a character class (\p{Alpha} and such) would seamlessly match 'letters'
> of any langauge.
> The confusion on my end is due to lack of knowledge on characters outside
> the typical US character set.  Can anyone shed some light on this issue, as
> to the expected canonicalization and recommended whitelist regex?
> _______________________________________________
> Esapi-user mailing listEsapi-user at lists.owasp.orghttps://lists.owasp.org/mailman/listinfo/esapi-user
> --
> Jim Manico
> OWASP Podcast Host/Producer
> OWASP ESAPI Project Managerhttp://www.manico.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-user/attachments/20100127/5608b593/attachment-0001.html 

More information about the Esapi-user mailing list