[Owasp-esapi-c++] Codec.{h|cpp} Checkin (if there's a crunchingsound....)
Kevin W. Wall
kevin.w.wall at gmail.com
Thu Aug 4 23:06:27 EDT 2011
On Thu, Aug 4, 2011 at 3:29 PM, Jeffrey Walton <noloader at gmail.com> wrote:
> On Thu, Aug 4, 2011 at 7:24 AM, David Anderson
> <david.anderson at aspectsecurity.com> wrote:
>> It should be a fixed length vector of strings, based on the Java
>> implementation. And, in this case, I believe that ASCII strings are
>> appropriate.
>>
>> I'm thinking about general Unicode support in this project though. Is
>> Unicode really used much in the C++ world?
> Thanks David.
I think that the answer to this is....probably not too much in the countries
where their alphabet can be represented by 8-bits, but probably Unicode
is used a lot in countries where it can't be. There are a lot of such places.
OTOH, as long as it won't involve a complete redesign, I don't have a
probably if we punt and defer Unicode until some later time.
That said, I was under the impression that Microsoft uses Unicode
quite often. I call running early versions of 'strings(1)' on Windows binaries
from AT&T SVR4 Unix and having it not work correctly. I then learned
that the binaries (DLLs I think) were using Unicode (or at least _some_
16-bit wide character representation) and so I had to download ad compile
a special version of 'strings' that groked Windows, or more
specifically Unicode.
(Nowadays, I think all versions do.) Not all there strings were like this...it
may have been only error messages they tried to internationalize. SVR4 was
a l-o-n-g time ago. It was probably 98 or 99 when I tried this and my memory is
a little fuzzy as to what I ate for lunch today, so I may not have all
(or any) of
the details right.
But like I said, we can postpone it unless one of you thinks adding it
later is going to cause a redesign. If so, then I'd prefer to do it now.
> One last question: are the strings padded. For example:
>
> char ch = 0x0f;
>
> ostringstream os;
> os << ch;
>
> string s(os.str());
>
> Should `s` be 'f' or '0f'?
Well, actually for the example that you show here, I would imagine
that it would be interpreted as an ASCII SI (shift in) character, at least
if you did:
cout << s;
If you are just referring to the bit pattern, are not 'f' and '0f' equivalent???
But the question that you bring up I think is one about canonicalization,
right?
Personally, I think it needs to act 'sane' in the following sense that
this should work:
char ch = 0x0f;
ostringstream os;
os << ch;
string s(os.str());
char ch2 = s[0]; // Or s.at(0) if you prefer
assert( ch == ch2 );
I think that follows from the principle of least surprise. So whatever
canonicalization
that we choose should make this true. I think if you were truly
expecting 'f' or '0f'
to be output (for example, if one did the cout << s; thing), then the
assertion would end up failing. And like I alluded to, if you were
only referring
to the bit pattern internal to memory, 'f' and '0f' are the same for
8-bit quantities,
so in that case, I don't see where you are going with this and you will have to
explain further.
-kevin
--
Blog: http://off-the-wall-security.blogspot.com/
"The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We *cause* accidents." -- Nathaniel Borenstein
More information about the Owasp-esapi-c++
mailing list