[Esapi-user] [Esapi-dev] Localization and InputValidation

chrisisbeef at gmail.com chrisisbeef at gmail.com
Wed Jan 27 01:46:11 EST 2010

As for using unicode in regex - This is straight from the JDK 1.4.2 API Docs for Pattern
\0n     The character with octal value 0n         (0 <= n <= 7) \0nn     The character with octal value 0nn         (0 <= n <= 7) \0mnn     The character with octal value 0mnn         (0 <= m <= 3,         0 <= n <= 7) \xhh     The character with hexadecimal value 0xhh \uhhhh     The character with hexadecimal value 0xhhhh \t     The tab character ('\u0009') \n     The newline (line feed) character ('\u000A') \r     The carriage-return character ('\u000D') \f     The form-feed character ('\u000C') \a     The alert (bell) character ('\u0007') \e     The escape character ('\u001B')
So in theory you should be able to create regex that allows a range of unicode characters (from whatever language you need) by referencing their unicode directly:

For Example


Hope this helps ya out

On Tue, Jan 26, 2010 at 11:16 PM, Jim Manico <jim.manico at owasp.org> wrote:
  I cannot answer this easily. Does anyone else on the dev team haveexperience with i18n and RegEx's inside of ESAPI?

- Jim

  Hi guys, a question has arisen re: input validation     I should prefix this by stating we are on 1.4, not 2.0.     Let's say I want to pass "グ" in my input.  For those of you whocan't read that, it's a Japanese Katakana with Unicode value 30B0   http://www.fileformat.info/info/unicode/char/30b0/index.htm     I want to allow this in my input, so I need to create a regexthat will permit it.  What I'm not sure about is:  1) what canonicalize is going to do to that string, and   2) if there's a locale-aware way of identifying characters in aregex.     I can see this potentially showing up as   \u30b0, where I would need to permit \ characters,   \u30b0, where the slash is encoded, though I doubt this.  グ     the latter can lead to two possibilities  1) my regex would need to allow a range of Unicode values  2) a character class (\p{Alpha} and such) would seamlessly match'letters' of any langauge.     The confusion on my end is due to lack of knowledge oncharacters outside the typical US character set.  Can anyone shed somelight on this issue, as to the expected canonicalization andrecommended whitelist regex?     _______________________________________________Esapi-user mailing listEsapi-user at lists.owasp.orghttps://lists.owasp.org/mailman/listinfo/esapi-user  

-- Jim ManicoOWASP Podcast Host/ProducerOWASP ESAPI Project Managerhttp://www.manico.net
Esapi-dev mailing list
Esapi-dev at lists.owasp.org

Chris Schmidt


Check out OWASP ESAPI for Java

OWASP ESAPI for JavaScript

Yet Another Developers Blog

Bio and Resume

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-user/attachments/20100126/d80b72cd/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 271 bytes
Desc: OpenPGP digital signature
Url : https://lists.owasp.org/pipermail/esapi-user/attachments/20100126/d80b72cd/attachment.bin 

More information about the Esapi-user mailing list