[Owasp-esapi-c++] Codec.{h|cpp} Checkin (if there's a crunchingsound....)

Kevin W. Wall kevin.w.wall at gmail.com
Thu Aug 4 23:06:27 EDT 2011


On Thu, Aug 4, 2011 at 3:29 PM, Jeffrey Walton <noloader at gmail.com> wrote:
> On Thu, Aug 4, 2011 at 7:24 AM, David Anderson
> <david.anderson at aspectsecurity.com> wrote:
>> It should be a fixed length vector of strings, based on the Java
>> implementation.  And, in this case, I believe that ASCII strings are
>> appropriate.
>>
>> I'm thinking about general Unicode support in this project though.  Is
>> Unicode really used much in the C++ world?
> Thanks David.

I think that the answer to this is....probably not too much in the countries
where their alphabet can be represented by 8-bits, but probably Unicode
is used a lot in countries where it can't be. There are a lot of such places.

OTOH, as long as it won't involve a complete redesign, I don't have a
probably if we punt and defer Unicode until some later time.

That said, I was under the impression that Microsoft uses Unicode
quite often. I call running early versions of 'strings(1)' on Windows binaries
from AT&T SVR4 Unix and having it not work correctly. I then learned
that the binaries (DLLs I think) were using Unicode (or at least _some_
16-bit wide character representation) and so I had to download ad compile
a special version of 'strings' that groked Windows, or more
specifically Unicode.
(Nowadays, I think all versions do.) Not all there strings were like this...it
may have been only error messages they tried to internationalize. SVR4 was
a l-o-n-g time ago. It was probably 98 or 99 when I tried this and my memory is
a little fuzzy as to what I ate for lunch today, so I may not have all
(or any) of
the details right.

But like I said, we can postpone it unless one of you thinks adding it
later is going to cause a redesign. If so, then I'd prefer to do it now.

> One last question: are the strings padded. For example:
>
>    char ch = 0x0f;
>
>    ostringstream os;
>    os << ch;
>
>    string s(os.str());
>
> Should `s` be 'f' or '0f'?

Well, actually for the example that you show here, I would imagine
that it would be interpreted as an ASCII SI (shift in) character, at least
if you did:

      cout << s;

If you are just referring to the bit pattern, are not 'f' and '0f' equivalent???

But the question that you bring up I think is one about canonicalization,
right?

Personally, I think it needs to act 'sane' in the following sense that
this should work:

    char ch = 0x0f;
    ostringstream os;
    os << ch;
    string s(os.str());
    char ch2 = s[0];    // Or   s.at(0)   if you prefer

    assert( ch == ch2 );

I think that follows from the principle of least surprise. So whatever
canonicalization
that we choose should make this true.  I think if you were truly
expecting 'f' or '0f'
to be output (for example, if one did the      cout << s;      thing), then the
assertion would end up failing.  And like I alluded to, if you were
only referring
to the bit pattern internal to memory, 'f' and '0f' are the same for
8-bit quantities,
so in that case, I don't see where you are going with this and you will have to
explain further.

-kevin
-- 
Blog: http://off-the-wall-security.blogspot.com/
"The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We *cause* accidents."        -- Nathaniel Borenstein


More information about the Owasp-esapi-c++ mailing list