[Owasp Source Flaws Top 10] Education and Culture; Canonicalization

Erlend Oftedal erlend at oftedal.no
Thu Dec 18 10:58:14 EST 2008


I was merely using SQL as an example.

To me the real problem is not separating good and bad input. That is what 
input validation is trying to solve.

To me the real problem is the fact that we are mixing control with data. 
This is true not only for SQL queries, but for XSS, LDAP injection, XPATH 
injection etc.
When we are mixing control and data, data can become control. So we need 
to make sure that data stays data. In SQL this means escaping the data for 
SQL. For HTML this means HTML-encoding the data. For LDAP you need to 
escape the LDAP metacharacters. For web services you need to XML-encode 
etc. etc....

If you've read all my posts in this mailing list, you will see that I'm 
definitely not saying that input validation is unecessary. I'm saying BOTH 
input validation and output escaping should be used. There are domains 
where it is easy to solve everything using input validation (example: 
numeric), and there are domains where this is almost impossible (example: 
blog comments in different languages). In most cases, a mix is probably 
the best.

My point was only that input validation may not be enough. If you want to 
allow people to post SQL statements, HTML-snippets, LDAP queries etc. in 
the text of a page (example: a blog comment), this will be impossible to 
solve using input validation, because syntactily and semantically the 
comment is valid even though it contains characters that are dangerous for 
a given subsystem. So we need to escape it for each subsystem we want 
to exchange data with, be it a browser, SQL, Web Services or something 
completely different, and that's not something we can do as a part of the 
input validation. It needs to be performed when we are building the 
queries and adding the user data.

So no, IMHO I'm not mixing a problem with its solution. Input validation 
and output escaping are both attempts to solve the problem described 
above. Input validation can also help solve other problems like semantic 
validation as you mentioned in your post, but it cannot necessarily solve 
all problems.


Erlend


On Thu, 18 Dec 2008, Andrew Petukhov wrote:

> Well, let us start from the beginning.
> Let us define what do we mean under "input validation". From my
> perspective, "input validation" is a process that aims at distinguishing
> bad input from good input for a given service.
> Alongside two questions arise:
> 1. How to determine good input?
> 2. What to do with bad input?
> 
> There are two major kinds of checks in order to determine good input:
> syntax checking and semantics checking.
> 
> For instance, let us consider Square root service, which takes one input
> HTTP parameter. In order to validate input we should check:
> - its syntax (i.e. a number was supplied)
> - its semantics (i.e. a nonnegative number was supplied)
> 
> How to perform syntax checks is yet the question of further investigation
> (not a primary one): canonicalization, white listing vs black listing,
> using grammars, etc. Those are the solutions to the problem, how to
> separate bad input from good input.
> 
> The second question: what to do with bad input?
> And possible solutions are:
> - remove it (strip carriage returns by Trim function)
> - stop processing and return an error (as in the square root example)
> - make it good if possible (works fine with SQL and javascript)
> 
> And the exact answers to these two questions (how to detect bad vs good
> input and what to do with bad input) should be solved for eached used
> service: SQL, generation of HTML (i.e. using browser interpreter), Shell
> interpreter invocation, usage of mathematical operations and so on.
> 
> And what you, Erlend, is proposing is to mix the problem (separate bad
> and good input) and particular way to solve it (output escaping).
> Furthermore, your arguments are based only upon common SQL service.
> However there are plenty of other services.
> How would you escape output before sqare root function?
> 
> So, the fact is: we should not mix the problem and a solution to it in
> one entity.
> 
> Andrew
> 
> 
> Erlend Oftedal wrote:
>
>  Let us consider the following logic:
> 1. I cut of everythong I deem malicious.
> or
> 2. If I encounter malicious input, I present a user a custom error page
> 
> Definitely, there is no vulnerability here. However, output escaping is
> missing.
> 
>
>  How do you know what is malicious? I you are going to allow users from all 
> over the world to input data, how do you know whether a name is valid or 
> not. And as mentioned before, the name "O'Brian" contains a quote. How do 
> you handle this? Do you cut it out? O'Brian is a valid name, and should be 
> stored as a valid name including the quote. For all I know "Delete", 
> "Select" or "Union" might be names in some country, which makes it hard 
> to filter out SQL keywords.
> Building a proper whitelist for input validation can be hard. It's easy 
> for things like numbers, but it gets harder the more complex the user 
> input is. Commments are probably the worst type to build a proper white 
> list for. You might even want to allow people to enter SQL statements or 
> at least SQL keywords like "select" or "delete" because those words are 
> used in natural language
> 
> Lets consider a different logic:
> 1. I escape user input whenever I send it to a subsystem, and I escape it 
> for that subsystem (HTML-encoding for the browser, quote escaping etc. or 
> parameterized queries for SQL database connections etc.)
> 
> There are no injection vulnerabilities in this system, however input 
> validation is missing.
> 
> -
> Erlend Oftedal
> 
> _______________________________________________
> Owasp-source-code-flaws-top-10 mailing list
> Owasp-source-code-flaws-top-10 at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-source-code-flaws-top-10
> 
> 
> 
> 
>


More information about the Owasp-source-code-flaws-top-10 mailing list