[Esapi-user] Localization and InputValidation

Rob Spremulli rob.spremulli+esapi at gmail.com
Tue Jan 26 15:18:49 EST 2010


Hi guys, a question has arisen re: input validation

I should prefix this by stating we are on 1.4, not 2.0.

Let's say I want to pass "グ" in my input.  For those of you who can't read
that, it's a Japanese Katakana with Unicode value 30B0
 http://www.fileformat.info/info/unicode/char/30b0/index.htm

I want to allow this in my input, so I need to create a regex that will
permit it.  What I'm not sure about is:
1) what canonicalize is going to do to that string, and
2) if there's a locale-aware way of identifying characters in a regex.

I can see this potentially showing up as
\u30b0, where I would need to permit \ characters,
\u30b0, where the slash is encoded, though I doubt this.
グ

the latter can lead to two possibilities
1) my regex would need to allow a range of Unicode values
2) a character class (\p{Alpha} and such) would seamlessly match 'letters'
of any langauge.

The confusion on my end is due to lack of knowledge on characters outside
the typical US character set.  Can anyone shed some light on this issue, as
to the expected canonicalization and recommended whitelist regex?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-user/attachments/20100126/e1e33080/attachment.html 


More information about the Esapi-user mailing list