[Esapi-dev] XSS: Filter vs Encode?
Kevin W. Wall
kevin.w.wall at gmail.com
Sun Dec 11 19:26:56 UTC 2011
On Sun, Dec 11, 2011 at 1:33 PM, Dave Wichers <dave.wichers at owasp.org> wrote:
> I think encoding before putting something in a database is almost always a
> BAD IDEA. Once its in the database, you never know how its going to be used
> in the future. If you encode it, then its for a specific context, and its
> difficult to know that it will always be used in that context. XSS had 4-5
> different contexts for example. And I might put it into a PDF, send it to be
> TV, send it in an email, and a zillion other things all of which will be
> broken except for the 1 context it was encoded for.
I wholeheartedly agree, but who said anything about doing that? I think that
what Jeff had in mind was the preference to do white-list data validation
BEFORE inserting anything potentially harmful into the DB. I was merely
trying to point out, that while ideal, that is not always possible and gave a
few examples of where such an approach would not work.
And if you got the idea that *I* was proposing this, it either was because of
sloppy wording on my part or misinterpretation on your part, but I certainly
never intended to suggest anything of the sort.
> So, validate as best you can before putting it into the database and if its
> not valid, don't include it. And then when you get it out, encode it for the
> proper context you are including it in, in case something evil slipped
> through the validator.
Yep, that's what I was recommending. In fact, there are a lot of times that
you will be using previously untrusted input data pulled from a DB in
_multiple_ output contexts, which is the major reason why you should NOT
encode it prior to storing in the DB. It just makes your app too damn brittle.
I often hear the rather lame excuse of developers doing this for performance
reasons, but ultimately it always bites them in the butt. For instance, they
originally output the data in an HTML context and thus hard-coded it with
HTML encoding in the DB, but then latter a decision is make to switch to
AJAX and it ends up as JSON. Oops! Wrong encoding context!
> I've seen too many encoding screw-ups, like Ryan's Song on the guide on
> my cable box, which is why I recommend avoiding encoding before storage in a
True, but that wasn't necessarily because a developer stored "Ryan's Song"
as "Ryan's Song" in the DB. It could well be that the double-encoding
arose from a combination of the developer doing things correctly and then
the output being naively filtered by some additional layers (e.g., J2EE
Servlet Filter, WAF) that the developer was unaware of occurring
at a later stage.
"The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We *cause* accidents." -- Nathaniel Borenstein
More information about the Esapi-dev