[Owasp-antisamy] Questions about Anti-Samy architecture

Jason Li jason.li at owasp.org
Thu Jul 29 11:42:49 EDT 2010


Charles,

* Many AntiSamy users have requested the SAX parser for performance
reasons. Using a DOM parsing model requires more overhead to first
build the DOM object whereas the SAX parsing model can be run
essentially as if it were streaming.

* We try to make AntiSamy as easy to incorporate as possible for as
many people as possible. There's still a segment of the world that is
using Java 1.4 and there's no real pressing need to make AntiSamy
incompatible for them.

* Can you outline some of the changes you envision for an IOC
implementation of AntiSamy?

* I'm not sure off the top of my head, but I think part of this is due
to dependency on NekoHTML as our parser. One possibility for a
hypothetical next-generation AntiSamy architecture would be to add the
ability to plug in a chosen HTML parser.

* Sounds like a reasonable suggestion, though we'll probably want to
maintain some kind of default fallback to the existing keys until
there's widespread adoption of a new namespace.

I think the easiest way to facilitate any proposed architecture
changes is for you to flesh them out a little (describe more precisely
what you want to do, what benefits there are to making the change,
what impact it has on existing AntiSamy users, etc). Even if we can't
incorporate them in the current baseline of AntiSamy, they are things
which we could look at when it comes time to the next major design
change for AntiSamy.

-Jason

On Thu, Jul 29, 2010 at 11:29 AM, Charles Forsythe <cforsythe at hotels.com> wrote:
> For my project, I have made some modifications to meet some of our
> requirements.  I’d be interested in contributing generally-useful changes to
> the project, but I’m not sure about some of the change I’d like to see.  I
> had some questions about the code base.
>
>
>
> What is the utility of having a SAX and DOM version of the scan?  Right now,
> I’m basically calling into the DOM scanner directly, as I have my own DOM
> edits I need to do.
> Is there really that much demand for Java 1.4 compatibility?  I’d prefer a
> Java 5+ code base.  There are also some code cleanups that would be
> worthwhile (using spaces instead of hard tabs is one example).
>
>
>
> Note that none of this affects the policy file and its processing.  That is
> great.  I changed the API (slightly) to make the Anti-Samy processing more
> extensible without touching the policy file processing.  Adding everyone’s
> ad-hoc requirements as policy file options will quickly make Anti-Samy
> overcomplicated and difficult to use (a barrier to acceptance).
>
>
>
> There are some other changes that would make it easier to embed:
>
>
>
> Following IOC (inversion of control) configuration design patterns
> Using DOM3 APIs instead of directly referencing Xerces classes.  (Not sure
> how to do HTML instead of XHTML output with that, though.)
> Allowing the caller to set the ResourceBundle.  It would also help if the
> i18n keys had a better namespace, like starting with “antisamy.”  The
> ResourceBundle could use the provided properties file, or it could us
> something else (we have our own i18n system that we’d prefer.)
>
>
>
> Is anyone interested in these updates/structural changes?  If not, I can
> fork the code (I’d rather not, for obvious reasons.)
>
>
>
> _______________________________________________
> Owasp-antisamy mailing list
> Owasp-antisamy at lists.owasp.org
> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
>
>


More information about the Owasp-antisamy mailing list