[owasp-antisamy] Help with ignoring invalid attribute name in HTML Tag

Chao Jiang Chao.Jiang at anu.edu.au
Mon Feb 28 00:54:51 EST 2011


Thanks Jim, actually our code does the similar stuff.
Here is our code and more exception output. 
==================================================
  CleanResults cr = as.scan(contentString, policy);
  contentString  = cr.getCleanHTML();
==================================================
org.owasp.validator.html.ScanException: org.w3c.dom.DOMException:
INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.

	at
org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(AntiSamyDOMScanner
.java:182)
	at org.owasp.validator.html.AntiSamy.scan(AntiSamy.java:89)
	at
org.jasig.portlet.emailpreview.dao.impl.EmailAccountDaoImpl.wrapMessage(
EmailAccountDaoImpl.java:387)
	at
org.jasig.portlet.emailpreview.dao.impl.EmailAccountDaoImpl.retrieveMess
age(EmailAccountDaoImpl.java:307)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
...
Caused by: org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid
or illegal XML character is specified. 
	at
org.apache.xerces.dom.CoreDocumentImpl.createAttribute(Unknown Source)
	at org.apache.xerces.dom.ElementImpl.setAttribute(Unknown
Source)
	at
org.cyberneko.html.parsers.DOMFragmentParser.startElement(DOMFragmentPar
ser.java:433)
	at
org.cyberneko.html.parsers.DOMFragmentParser.emptyElement(DOMFragmentPar
ser.java:442)
	at
org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:642
)
	at
org.cyberneko.html.filters.DefaultFilter.startElement(DefaultFilter.java
:136)
	at
org.cyberneko.html.filters.NamespaceBinder.startElement(NamespaceBinder.
java:278)
	at
org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScann
er.java:2680)
	at
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2012
)
	at
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:910)
	at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
	at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
	at
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.jav
a:166)
	at
org.owasp.validator.html.scan.AntiSamyDOMScanner.scan(AntiSamyDOMScanner
.java:180)
==================================================

You can see the exception is thrown when I am doing a "Scan". Code fails
when it delegates to CoreDocumentImpl.createAttribute(...), because an
attribute name cannot be created using a number.

<img src="http://www.xxx.com/xxx.gif" 3="" width="10" height="1"
border="0">

Thank you.
Kind regards
Chao
-----Original Message-----
From: Jim Manico [mailto:jim at manico.net] 
Sent: Monday, 28 February 2011 4:43 PM
To: Chao Jiang
Cc: owasp-antisamy at lists.owasp.org
Subject: Re: [owasp-antisamy] Help with ignoring invalid attribute name
in HTML Tag

No, I mean calling AntiSamy in "clean" mode instead of reject mode, like
so:

AntiSamy as = new AntiSamy();
CleanResults test = as.scan(input, antiSamyPolicy);
String antiSamyCleanOutput = test.getCleanHTML();  <--- Key is here

This should not throw an exception, even if the input is bad. It should
just return "clean" and safe XML, with JS and other markup stripped out
based on your policy.

-Jim

> Hi Jim
> 
> You mean updating antisamy.xml file to
> 
> <tag name="img" action="clean">
> 
> I tried "clean","remove", and "truncate", none of them work, the same
> exception was printed out.
> org.owasp.validator.html.ScanException: org.w3c.dom.DOMException:
> INVALID_CHARACTER_ERR: An invalid or illegal XML character is
specified.
> ...
> 
> Thanks
> Kind regards
> Chao
> 
> 
> -----Original Message-----
> From: Jim Manico [mailto:jim at manico.net] 
> Sent: Monday, 28 February 2011 4:22 PM
> To: Chao Jiang
> Cc: owasp-antisamy at lists.owasp.org
> Subject: Re: [owasp-antisamy] Help with ignoring invalid attribute
name
> in HTML Tag
> 
> Have you tried the AntiSamy "clean" function? What output do you get
if
> you try to "clean" the html (instead of validate?)
> 
> - Jim
> 
> 
>> Hi All
>>
>>  
>>
>> One quick question please.
>>
>>  
>>
>> When AntiSamy encounters invalid HTML as follows (using number as
>> attribute name), it will throw exception
>>
>>  
>>
>> <img src="http://www.xxx.com/xxx.gif" 3="" width="10" height="1"
>> border="0">
>>
>>  
>>
>>  
>>
>> How can I update antisamy.xml file to ignore the error or even remove
>> it?
>>
>>  
>>
>> By the way I am using version 1.4.
>>
>>  
>>
>> Thanks a lot.
>>
>>  
>>
>> Kind regards
>>
>> Chao
>>
>>  
>>
>>
>>
>>
>>
>> _______________________________________________
>> Owasp-antisamy mailing list
>> Owasp-antisamy at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-antisamy
> 



More information about the Owasp-antisamy mailing list