[Owasp-natal] Para o pessoal do Voip e nós clientes SIP.

Noilson Caio caiogore em gmail.com
Sábado Fevereiro 16 14:50:35 UTC 2013

Silent Circle and inexperienced protocol parsing
Date:Yesterday 16:57
Author:Robert Graham
A well-known principle of cryptography is that algorithms/implementations
should be done by experts. Those without experience frequently make the
same mistakes over and over. When crypto breaks, it usually because a
non-expert made a well-known mistake.

The same principle needs to be applied to parsing network protocols.
Incoming network packets are exposed raw to hackers on the Internet.
Hackers break into machines because inexperienced programmers make common
mistakes, like buffer overflows.

This is the problem with Silent Circle, a company who makes a product for
encrypting phone calls. Their crypto algorithms are designed by crypto
experts, such as Phil Zimmerman (of PGP fame). Their network parsers,
though, are designed by inexperienced coders.

Silent Circle released their code yesterday. I haven’t found an exploitable
flaw yet, but the first file I looked at demonstrated common parser
mistakes. I thought I’d document them in this post. Specifically, I’m
referring to their SDP parser. SDP is part of SIP, which is used to
establish phone calls (call, hangup, etc.), but doesn’t itself carry the

char vs. unsigned char

Their code refers to the packet as “char *buf”. This is wrong. The packet
contains “bytes” not “characters”. It should be “unsigned char *” or
“uint8_t *” instead.

SDP is a text-based protocol. That implies to programmers that it contains
“characters”. But that’s not a requirement, just a convention. Hackers
don’t have to follow that convention, and can insert any sort of bytes they
want into the packet.

The problem is that “char” is not defined to be “signed” or “unsigned”.
When “char” is signed, it sign extends. This causes 32-bit values to appear
when the code is only expecting 8-bit values. A good example is the call to
“islower(*buf)”  on line 196 of their code. If “*buf” is 0x80, then it sign
extends to 0xffffff80. This can cause weird things to happen.

Some static code analyzers check this. Some dynamic stuff also checks this.
If you were to run this SDP parser under Microsoft Visual Studio and then
fuzz this protocol, it’ll throw an assertion on the call to

A lot of programmers will claim that I’m being pedantic, that this isn’t
really a problem. Compilers can be set to force “char” to be unsigned
anyway, so it’s “good enough”. That’s the same thinking pedantic
cryptographers have been fighting for years. This is the thinking behind
most crypto failures. Likewise, protocol experts have a lot of experience
when “char*” isn’t good enough.

Const correct

The “const” keyword is rarely used in their code. This is wrong. Most
pointer parameters to should have it. Functions should avoid changing their
input. The “const” keyword enforces this.

Non-const inputs lead to code instability. Over time, a programmer may
decide to change input in a function, while caller depends upon the
parameters not changing. Thus as a general principle of good code, more
“const” is needed.

This is especially true for packets. Packet data read from the network
should be immutable. Parsers should not change the data as they parse it.
Again, this is one of those pedantic things: in the past, there are lots of
known problems that arose because packet data was changed unexpectedly, so
we’ve developed the principle that it shouldn’t change.

Thus, the correct type of packet bytes is "const unsigned char *", or
equivalent things like "const uint8_t *". Everything else is incorrect.

Conditional compiles

Protocol parsing is CPU independent. There is never a reason to do
something different depending upon the CPU.

Consider line 96 in their code that says the following:

#if !defined(_WIN32) && !defined(__SYMBIAN32__) //if not arm

The code is trying for an optimization here, to compare 4 bytes at once
rather than 4 separate single byte compares. The problem is that same RISC
processors crash when given unaligned data. Hence the conditional compile,
either the fast way for some CPUs or the slow but safe way for others.

This is monumentally stupid. I don’t want to sound harsh, but this
optimization is so tiny it makes no measurable difference ever. All it does
is destabilize the code for no good reason. It makes testing the code a lot
harder. It allows subtle bugs to creep through testing as things behave
differently on different platforms.

Per-OS, per-compiler, per-CPU conditional compiles are NEVER needed for
protocol parsing. If you use them, then you are doing something very wrong.

Casting and byte-swapping

The mistake of doing an #if conditional above comes from trying to fix an
underlying mistake of casting between “char*” and “int*’. This is a common
technique taught in your “UNIX Network Programming” class. It’s also wrong.
You should never do it when parsing packets.

There are two reasons to avoid this. The first is that (as mentioned above)
some CPUs, such as SPARC and some versions of ARM crash when referencing
unaligned integers. This makes network code unstable on RISC systems,
because most integers are usually aligned anyway, meaning a lot of
alignment issues escape undetected into shipping code. The only way to make
stable code is to stop casting integers in network (or file) parsers.

The second problem is that it causes confusion with byte-order/endianess
that doesn’t happen if you just don’t cast integers. Consider the IP
address “”. There are only two forms for this number, either an
integer with the value of 0x0a010203, or an array of bytes with the value
0a 01 02 03. The problem is that little endian machines are weird. The
integer 0x0a010203 is represented internally as 03 02 01 0a on x86
processors, with the order of bytes “swapped”.

But this is just an internal detail that YOU NEVER NEED TO WORRY ABOUT. As
long as you never cross the streams and cast from a “char*” to an “int*”
(or the reverse), then the byte-order/endianness never matters.

If your first language was anything other than C/C++, then you are probably
scratching your head wondering what the fuss is. Since your language
doesn’t allow you to cast between “char*” and “int*”, you’ve never been
confused with byte-order/endianess.

On line 119 their code calls ‘reverseIP’ to byte swap the IP address.
That’s a clear sign that things aren’t getting coded right.

The only time to byte-swap is right at the moment of using the Sockets
interface. That’s because Sockets itself was written wrong back in the
early 1980s. Thus, the only time you byte-swap IP addresses is when you do
something like “sin.sin_addr.s_addr = htonl(my_ip_address)”.

Follow the spec

Early Internet doctrine was to write code that followed the spec “close
enough”. This was good in developing the protocols themselves, but was bad
for long-term interoperability as popular products proliferated using wrong
techniques, that all products then had to unofficially support.

Being liberal is allows hackers to inject a lot bad packets into the system
that will nonetheless work. This allows them to reach vulnerable code that
would otherwise have been rejected. Thus, critical software (like telephone
encryption) needs to be conservative about what they accept, not liberal.

Consider the line from SDP like the following:

c=IN IP4

The SilentCircle parser checks for the ‘c’, then skips the next character
whatever it is. Thus, the following line is allowed:

crIN IP4

Silent Circle needs to more formally follow the SDP spec at RFC 4566,
rather than coding "whatever works".

Some good stuff

The SilentCircle code isn’t all bad. They do some things right.

One thing they get right is the “concreteness” of their parser. Protocol
specifications are ugly, and code should be written to match the
specification. That means protocol parsing code should be ugly. There’s
nothing more annoying than debugging protocol parsers that have been
prettied up, where I  can't match up things in the spec and the packes with
what I see in the code.

In school, your professors stressed that code should have “no magic
numbers”. They were wrong: protocol parsers should be full of magic
numbers. Consider the line 255 in the SDP code where is references the
letter ‘c’. What does ‘c’ mean? Shouldn’t we instead use a #define
somewhere, like:


Wouldn’t that help make the code more self documenting? No! it wouldn’t!
The spec says “c” so the code should too. Otherwise, I have to hunt all
over the place to match things up.

My point is this: other’s might criticize the code for being ugly, but it’s
not. It’s as pretty as it should be.

The other good thing the code does is correct coupling and cohesion. All
this module does is parse the data, but it doesn’t really act on the data.
That’s the correct feature of protocol parsers. Parsers are the external
attack surface exposed to hackers. You want to keep that bit as independent
from the rest of the system as possible. The SDP file that it has only two
includes. That’s awesome.


The Silent Sircle protocol parsers aren’t all bad. Nor have I found a bug.
It’s just that the first file I looked at, the SDP parser, demonstrates
inexperience on the part of the coder. Good crypto needs experienced
cryptographers, good network parsing code likewise needs experienced coders.

Complete Story

Noilson Caio Teixeira de Araújo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/owasp-natal/attachments/20130216/f08e649c/attachment.html>

More information about the Owasp-natal mailing list