[OWASP-ESAPI] Encoder feedback

Jeff Williams jeff.williams at owasp.org
Tue Jul 8 23:44:53 EDT 2008


Hi Ivan,

Thanks for the excellent comments!  We're in the process of revamping the
Encoder a bit so your timing is excellent.

Besides the extensions you proposed, what do you think of the API itself,
independent of the reference implementation?  Will developers be able to use
it effectively?

> 1. What is the escapeForJavaScript method supposed to do? The name
> implies it will convert my data into a form that can be safely passed
> into JavaScript (e.g. into a string), but in my tests the method
> encodes data using HTML entities. I was expecting my data to be
> encoded using the JavaScript encoding syntax (e.g. \uHHHH).

That's definitely a problem! Stefano di Paolo pointed this out just
yesterday. I believe the following rules are right, right?

	Encode characters > 7f with \uhhhh
	Encode special characters <= 7f with \xhh
	Otherwise don't encode character

> 2. For cases where nested encoding is required, are you expecting
> users to manually chain method invocations, or is the plan to provide
> helper methods where commonly required (e.g. encode for JavaScript
> then encode for HTML attribute).

My inclination is to provide a convenience method for things like this. I'm
assuming your example is for an event handler like onBlur. So I think we
should create a method like encodeForJavaScriptEventHandler() would be best.
Agree?

> 3. I don't understand this business of detecting double encodings:
> firstly because that should not be a concern of an encoding library
> and, secondly, because what you consider doubly encoded is entirely
> legitimate: what is a developer supposed to do when his program gets
> "&amp;nbsp;" in input, but you respond with an exception when asked to
> encode for HTML? How is ESAPI going to handle CMS applications?

First, I think canonicalization definitely belongs in an encoding library.
Second, I don't think that your example *is* legitimate.  Or maybe it's just
a bad idea. Either way, I think what "&amp;nbsp;" means is "&nbsp;" and
that's what the application should validate.  As long as we continue to
tolerate double encoding and multiple-encoding schemes we will never be able
to detect or prevent buried attacks.

> Maybe I just don't understand what the method is doing? For example,
> when I pass "&amp;" to it, I get "&amp;" back. I was expecting to get
> "&amp;amp;".

Do you have a use case for why you would want "&amp;amp;"? The vast majority
of applications do not need this.  Of course, there are a few applications,
like HTML tutorials, that need this kind of thing. In that case, you don't
want the canonicalized version, so I think we need to add a way to
*validate* the canonical version but *use* the raw version.

> 4. I think encodeForCSS should be added.

Totally agree. Got a good reference for CSS encoding?

> 5. encodeForSQL looks dangerous in principle: different SQL dialects
> use different meta-characters so it's not possible to handle all
> dialects with only one function. Something like encodeForMySQL would
> be better.

Agree. Jim Manico just suggested this today. Are you guys talking off the
list or something?

	encodeForOracle
	encodeForMySQL
	encodeForMSSSQL
	encodeForPostgres

--Jeff



More information about the OWASP-ESAPI mailing list