[Esapi-php] getHexForNonAlphanumeric -- redux

Boberski, Michael [USA] boberski_michael at bah.com
Wed Nov 4 16:59:43 EST 2009


I've been poking away at this for a good portion of the day... Attempting to retrofit unicode support really makes the codecs much more complicated than they need to be... I'm not sure if the approach taken will translate from the HTML entities codec to the other codecs (it is I think not OK e.g. in Windows codec to pass it a string and receive multibyte back)... Might there be any willingness to compromise, and defer supporting unicode for the initial release? I'm a little concerned we're getting bogged down with the codecs. Thoughts?

Mike B.


________________________________
From: Boberski, Michael [USA]
Sent: Wednesday, November 04, 2009 3:18 PM
To: 'ESAPI for PHP development list'
Subject: RE: [Esapi-php] getHexForNonAlphanumeric -- redux

FYI, I checked in updated versions of Windows, Unix, and Percent codec encode methods using the encode for multibyte using approach in HTML entities codec, if it might help. The Percent codec's encode works taking this approach, the Windows and Unix doesn't, I'm not sure if the fix needs to be in Codec.php, or in the test harness, to be explicit about unicode/single-byte...

Mike B.


________________________________
From: Boberski, Michael [USA]
Sent: Wednesday, November 04, 2009 1:48 PM
To: 'ESAPI for PHP development list'
Subject: RE: [Esapi-php] getHexForNonAlphanumeric -- redux

Linden, OK, I borrowed some of your encode entity code for the Windows and Unix encode functions, but now I'm getting some odd results, e.g.:


Fail: C:\Code\ESAPI-PHP-mike.boberski\test/codecs/WindowsCodecTest.php -> WindowsCodecTest -> testEncode -> Equal expectation fails at character 1 with [^"^ ^&^ dir^/s^ c^:] and [^"^ ^&^ dir^/s^ c^:] at [C:\Code\ESAPI-PHP-mike.boberski\test\codecs\WindowsCodecTest.php line 47]

Those strings look the same to me... Help!

Mike B.


________________________________
From: esapi-php-bounces at lists.owasp.org [mailto:esapi-php-bounces at lists.owasp.org] On Behalf Of Linden Darling
Sent: Tuesday, November 03, 2009 7:20 PM
To: ESAPI for PHP development list
Subject: Re: [Esapi-php] getHexForNonAlphanumeric -- redux

@All: Sorry I got hijacked over weekend and early this week, didn't get the ESAPI quality time I had planned to have and didn't commit the decode methods as I'd promised! Am doing what I can where I can though and hopefully will commit that stuff in next days.

@Mike: PHP's ord function doesn't handle multibyte strings. My version of getHexForNonAlphanumeric has a prerequisite of being passed a UTF_32 encoded string, it then uses forceToSingleChar to ensure that only the first single, multibyte character is used for the conversion. Normalizing encoding of strings to UTF-32 is the "Ben Methodology" of dealing with Unicode in PHP (until 6.0 comes). Java characters are Unicode already and thus the JAVA ord converts unicode appropriately.

From: esapi-php-bounces at lists.owasp.org [mailto:esapi-php-bounces at lists.owasp.org] On Behalf Of Boberski, Michael [USA]
Sent: Wednesday, 4 November 2009 6:58 AM
To: ESAPI for PHP development list
Subject: [Esapi-php] getHexForNonAlphanumeric -- redux

Hi codec guys,

Can you all explain to me again (sorry for being dense) why this:


$ordinalValue = ord($c);

if ( $ordinalValue > 255 ) return null;

return self::$hex[$ordinalValue];

Isn't sufficient for our purposes? I'm taking a stab at the percent codec, and the above works, the more involved mb_ approach doesn't, it simply doesn't match on the characters for the percent codec encode test case ('"; ls / > /tmp/foo; # ').
Best,

Mike B.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.owasp.org/pipermail/esapi-php/attachments/20091104/9cc9aafd/attachment-0001.html 


More information about the Esapi-php mailing list