[Esapi-php] getHexForNonAlphanumeric -- redux
mike.boberski at gmail.com
mike.boberski at gmail.com
Wed Nov 4 20:39:55 EST 2009
Let's hold off on deviating on that for right now... I'll enter an issue so we don't forget about it...
Sent from my Verizon Wireless BlackBerry
-----Original Message-----
From: "Linden Darling" <Linden.Darling at jds.net.au>
Date: Thu, 5 Nov 2009 12:28:23
To: ESAPI for PHP development list<esapi-php at lists.owasp.org>
Subject: Re: [Esapi-php] getHexForNonAlphanumeric -- redux
Codec normalizes to UTF-32 so that all string comparaisons occur at this normalized level. After decoding/encoding the string should get converted back to the encoding that it was originally supplied in.
I've encountered issues along the way regarding mb_detect_encoding detecting the encoding of single characters, hence the fleshed out logic surrounding detection of encoding.
There's also an issue if we pass a UTF-32 encoded string in the first place, currently the code won't detect that and it will double-UTF-32 encode :o Have been considering allowing the User (of ESAPI) to be able to state the initial encoding so as to avoid such issues...
- Linden
________________________________
From: esapi-php-bounces at lists.owasp.org on behalf of Boberski, Michael [USA]
Sent: Thu 5/11/2009 8:59 AM
To: ESAPI for PHP development list
Subject: Re: [Esapi-php] getHexForNonAlphanumeric -- redux
I've been poking away at this for a good portion of the day... Attempting to retrofit unicode support really makes the codecs much more complicated than they need to be... I'm not sure if the approach taken will translate from the HTML entities codec to the other codecs (it is I think not OK e.g. in Windows codec to pass it a string and receive multibyte back)... Might there be any willingness to compromise, and defer supporting unicode for the initial release? I'm a little concerned we're getting bogged down with the codecs. Thoughts?
Mike B.
________________________________
From: Boberski, Michael [USA]
Sent: Wednesday, November 04, 2009 3:18 PM
To: 'ESAPI for PHP development list'
Subject: RE: [Esapi-php] getHexForNonAlphanumeric -- redux
FYI, I checked in updated versions of Windows, Unix, and Percent codec encode methods using the encode for multibyte using approach in HTML entities codec, if it might help. The Percent codec's encode works taking this approach, the Windows and Unix doesn't, I'm not sure if the fix needs to be in Codec.php, or in the test harness, to be explicit about unicode/single-byte...
Mike B.
________________________________
From: Boberski, Michael [USA]
Sent: Wednesday, November 04, 2009 1:48 PM
To: 'ESAPI for PHP development list'
Subject: RE: [Esapi-php] getHexForNonAlphanumeric -- redux
Linden, OK, I borrowed some of your encode entity code for the Windows and Unix encode functions, but now I'm getting some odd results, e.g.:
Fail: C:\Code\ESAPI-PHP-mike.boberski\test/codecs/WindowsCodecTest.php -> WindowsCodecTest -> testEncode -> Equal expectation fails at character 1 with [^"^ ^&^ dir^/s^ c^:] and [^"^ ^&^ dir^/s^ c^:] at [C:\Code\ESAPI-PHP-mike.boberski\test\codecs\WindowsCodecTest.php line 47]
Those strings look the same to me... Help!
Mike B.
________________________________
From: esapi-php-bounces at lists.owasp.org [mailto:esapi-php-bounces at lists.owasp.org] On Behalf Of Linden Darling
Sent: Tuesday, November 03, 2009 7:20 PM
To: ESAPI for PHP development list
Subject: Re: [Esapi-php] getHexForNonAlphanumeric -- redux
@All: Sorry I got hijacked over weekend and early this week, didn't get the ESAPI quality time I had planned to have and didn't commit the decode methods as I'd promised! Am doing what I can where I can though and hopefully will commit that stuff in next days.
@Mike: PHP's ord function doesn't handle multibyte strings. My version of getHexForNonAlphanumeric has a prerequisite of being passed a UTF_32 encoded string, it then uses forceToSingleChar to ensure that only the first single, multibyte character is used for the conversion. Normalizing encoding of strings to UTF-32 is the "Ben Methodology" of dealing with Unicode in PHP (until 6.0 comes). Java characters are Unicode already and thus the JAVA ord converts unicode appropriately.
From: esapi-php-bounces at lists.owasp.org [mailto:esapi-php-bounces at lists.owasp.org] On Behalf Of Boberski, Michael [USA]
Sent: Wednesday, 4 November 2009 6:58 AM
To: ESAPI for PHP development list
Subject: [Esapi-php] getHexForNonAlphanumeric -- redux
Hi codec guys,
Can you all explain to me again (sorry for being dense) why this:
$ordinalValue = ord($c);
if ( $ordinalValue > 255 ) return null;
return self::$hex[$ordinalValue];
Isn't sufficient for our purposes? I'm taking a stab at the percent codec, and the above works, the more involved mb_ approach doesn't, it simply doesn't match on the characters for the percent codec encode test case ('"; ls / > /tmp/foo; # ').
Best,
Mike B.
More information about the Esapi-php
mailing list