From marcin at tssci-security.com Mon Jan 26 09:46:26 2009 From: marcin at tssci-security.com (Marcin Wielgoszewski) Date: Mon, 26 Jan 2009 09:46:26 -0500 Subject: [Owasp-antisamy-python] Welcome, and some first thoughts Message-ID: <1232981186.15264.18.camel@thinker> Hey all, thanks for voicing an interest in working on the Python port of AntiSamy and welcome to the new OWASP AntiSamy-Python mailing list. I have set up this mailing list so we can all discuss Python development outside of the usual AntiSamy mailing list. Topics I think we should discuss before starting development: 1.) Project plan. I would like us to get moving soon. I understand we all have commitments including work, clients, life, etc.. So I hope this mailing list will let us communicate our accomplishments and setbacks on a weekly or bi-weekly basis. I know I regularly communicate over Gtalk, so if everyone can share their Gtalk information, we can do quick chats about whatever we're doing when we're looking for that instant feedback. 2.) Requirements. I've already listed the requirements in a Google document I shared with you all below. Basically, our requirements are we have to implement 100% of the same functionality of the Java and .NET versions of AntiSamy to Python. 3.) Design. Both Arshan and Jason designed the architecture behind AntiSamy and made porting it to .NET that much easier. Nobody had to reinvent the wheel -- just take Java code and rewrite in .NET. Python is a bit different from those two languages, and we /have/ to take the "Pythonic way" of doing things, and more likely be integrated into already existing large Python projects (Django, etc). 4.) Libraries. The less dependence we put on external libraries, the better, as Arshan pointed out in his reply to Mike's original posting. So far, I've come across lxml, beautiful soup and cssutils. Apparently, lxml can do everything BS can do, plus XML, and is better and faster. It cannot parse CSS though, only use CSS selectors. See: http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/ 5.) Test site/code repository. We will need to set up a private web server similar to www.antisamy.net, Lastly, I wanted to point you all to a Google document I created the other day available here: https://docs.google.com/Doc?docid=dcrbq3jf_25hhgnbzgr&hl=en (If you do not have access, please email me privately.) Thank you all for stepping up and helping out. Regards, -- Marcin Wielgoszewski tssci-security.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : https://lists.owasp.org/pipermail/owasp_antisamy_python/attachments/20090126/00eb74e7/attachment.bin From r at rgaucher.info Mon Jan 26 10:10:22 2009 From: r at rgaucher.info (Romain Gaucher) Date: Mon, 26 Jan 2009 10:10:22 -0500 Subject: [Owasp-antisamy-python] Welcome, and some first thoughts In-Reply-To: <1232981186.15264.18.camel@thinker> References: <1232981186.15264.18.camel@thinker> Message-ID: <6fef34660901260710w5bd7d03bvbf9b62490943c64b@mail.gmail.com> Marcin, Thanks for setting up the mailing lists. 1/ I'm also available on gtalk: romain.gaucher at gmail.com (and if you're MSN user: r_gaucher at hotmail.com). 2/ I have a small concern concerning the architecture of tests: - I see the unit testing input which are in java source code; I think that would be good to externalize this in xml files (or whatever files) in order to all, have exactly the same testing. Also, is there somewhere a benchmark of AntiSamy (Java/.NET)? I think it's very important to reduce the overhead as much as possible; web devs won't use such a library if it's going to make their apps slow... 4/ I'm a big fan of lxml as Marcin knows. As for comparing with BS for HTML handling, this is way faster (BS is actually very slow). I do not know about cssutils and if it fits the requirements, fast, etc. 5/ That would be great to have http://python.antisamy.net or something.. My 2cents, --Romain http://rgaucher.info On Mon, Jan 26, 2009 at 9:46 AM, Marcin Wielgoszewski wrote: > Hey all, thanks for voicing an interest in working on the Python port of > AntiSamy and welcome to the new OWASP AntiSamy-Python mailing list. I > have set up this mailing list so we can all discuss Python development > outside of the usual AntiSamy mailing list. > > Topics I think we should discuss before starting development: > > 1.) Project plan. I would like us to get moving soon. I understand we > all have commitments including work, clients, life, etc.. So I hope > this mailing list will let us communicate our accomplishments and > setbacks on a weekly or bi-weekly basis. I know I regularly communicate > over Gtalk, so if everyone can share their Gtalk information, we can do > quick chats about whatever we're doing when we're looking for that > instant feedback. > > 2.) Requirements. I've already listed the requirements in a Google > document I shared with you all below. Basically, our requirements are > we have to implement 100% of the same functionality of the Java and .NET > versions of AntiSamy to Python. > > 3.) Design. Both Arshan and Jason designed the architecture behind > AntiSamy and made porting it to .NET that much easier. Nobody had to > reinvent the wheel -- just take Java code and rewrite in .NET. Python > is a bit different from those two languages, and we /have/ to take the > "Pythonic way" of doing things, and more likely be integrated into > already existing large Python projects (Django, etc). > > 4.) Libraries. The less dependence we put on external libraries, the > better, as Arshan pointed out in his reply to Mike's original posting. > So far, I've come across lxml, beautiful soup and cssutils. Apparently, > lxml can do everything BS can do, plus XML, and is better and faster. > It cannot parse CSS though, only use CSS selectors. > > See: > http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/ > > 5.) Test site/code repository. We will need to set up a private web > server similar to www.antisamy.net, > > Lastly, I wanted to point you all to a Google document I created the > other day available here: > https://docs.google.com/Doc?docid=dcrbq3jf_25hhgnbzgr&hl=en (If you do > not have access, please email me privately.) > > > Thank you all for stepping up and helping out. > > Regards, > -- > Marcin Wielgoszewski > tssci-security.com > > _______________________________________________ > Owasp_antisamy_python mailing list > Owasp_antisamy_python at lists.owasp.org > https://lists.owasp.org/mailman/listinfo/owasp_antisamy_python > > From arshan.dabirsiaghi at gmail.com Mon Jan 26 10:36:12 2009 From: arshan.dabirsiaghi at gmail.com (Arshan Dabirsiaghi) Date: Mon, 26 Jan 2009 10:36:12 -0500 Subject: [Owasp-antisamy-python] Welcome, and some first thoughts In-Reply-To: <6fef34660901260710w5bd7d03bvbf9b62490943c64b@mail.gmail.com> References: <1232981186.15264.18.camel@thinker> <6fef34660901260710w5bd7d03bvbf9b62490943c64b@mail.gmail.com> Message-ID: <4397A9CE-C07D-4C9E-BECA-0070AC31DAFB@gmail.com> 2) Absolutely. I may do this in the upcoming weeks. 3) I have benchmark data but enterprises haven't asked for it so I haven't kept up with it. 5) Also a good idea, I will ping Jerry on this. On Jan 26, 2009, at 10:10 AM, Romain Gaucher wrote: > Marcin, > Thanks for setting up the mailing lists. > > 1/ I'm also available on gtalk: romain.gaucher at gmail.com (and if > you're MSN user: r_gaucher at hotmail.com). > > 2/ I have a small concern concerning the architecture of tests: > - I see the unit testing input which are in java source code; I think > that would be good to externalize this in xml files (or whatever > files) in order to all, have exactly the same testing. > Also, is there somewhere a benchmark of AntiSamy (Java/.NET)? I think > it's very important to reduce the overhead as much as possible; web > devs won't use such a library if it's going to make their apps slow... > > 4/ I'm a big fan of lxml as Marcin knows. As for comparing with BS for > HTML handling, this is way faster (BS is actually very slow). > I do not know about cssutils and if it fits the requirements, fast, > etc. > > 5/ That would be great to have http://python.antisamy.net or > something.. > > My 2cents, > > --Romain > http://rgaucher.info > > > On Mon, Jan 26, 2009 at 9:46 AM, Marcin Wielgoszewski > wrote: >> Hey all, thanks for voicing an interest in working on the Python >> port of >> AntiSamy and welcome to the new OWASP AntiSamy-Python mailing >> list. I >> have set up this mailing list so we can all discuss Python >> development >> outside of the usual AntiSamy mailing list. >> >> Topics I think we should discuss before starting development: >> >> 1.) Project plan. I would like us to get moving soon. I >> understand we >> all have commitments including work, clients, life, etc.. So I hope >> this mailing list will let us communicate our accomplishments and >> setbacks on a weekly or bi-weekly basis. I know I regularly >> communicate >> over Gtalk, so if everyone can share their Gtalk information, we >> can do >> quick chats about whatever we're doing when we're looking for that >> instant feedback. >> >> 2.) Requirements. I've already listed the requirements in a Google >> document I shared with you all below. Basically, our requirements >> are >> we have to implement 100% of the same functionality of the Java >> and .NET >> versions of AntiSamy to Python. >> >> 3.) Design. Both Arshan and Jason designed the architecture behind >> AntiSamy and made porting it to .NET that much easier. Nobody had to >> reinvent the wheel -- just take Java code and rewrite in .NET. >> Python >> is a bit different from those two languages, and we /have/ to take >> the >> "Pythonic way" of doing things, and more likely be integrated into >> already existing large Python projects (Django, etc). >> >> 4.) Libraries. The less dependence we put on external libraries, >> the >> better, as Arshan pointed out in his reply to Mike's original >> posting. >> So far, I've come across lxml, beautiful soup and cssutils. >> Apparently, >> lxml can do everything BS can do, plus XML, and is better and faster. >> It cannot parse CSS though, only use CSS selectors. >> >> See: >> http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/ >> >> 5.) Test site/code repository. We will need to set up a private web >> server similar to www.antisamy.net, >> >> Lastly, I wanted to point you all to a Google document I created the >> other day available here: >> https://docs.google.com/Doc?docid=dcrbq3jf_25hhgnbzgr&hl=en (If you >> do >> not have access, please email me privately.) >> >> >> Thank you all for stepping up and helping out. >> >> Regards, >> -- >> Marcin Wielgoszewski >> tssci-security.com >> >> _______________________________________________ >> Owasp_antisamy_python mailing list >> Owasp_antisamy_python at lists.owasp.org >> https://lists.owasp.org/mailman/listinfo/owasp_antisamy_python >> >> > _______________________________________________ > Owasp_antisamy_python mailing list > Owasp_antisamy_python at lists.owasp.org > https://lists.owasp.org/mailman/listinfo/owasp_antisamy_python