[Owasp-appsensor-project] GSoC 2016 Trend Monitoring Analysis Engine

Timothy Sum Hon Mun timothy22000 at gmail.com
Tue Mar 8 09:33:51 UTC 2016

Hi John,

Thanks for getting back to me. It was good hearing back from you. I've
replied to you inline below.

Besides that, I made a pull request for some minor changes and test that I
added for appsensor as a first contribution:

Thanks again!

Best Regards,

On Mon, Mar 7, 2016 at 4:37 AM, John Melton <jtmelton at gmail.com> wrote:

> Tim,
> Hi, and thanks so much for your email. I've responded with specific
> comments inline below.
> Thanks,
> John
> On Sun, Mar 6, 2016 at 1:58 PM, Timothy Sum Hon Mun <
> timothy22000 at gmail.com> wrote:
>> Hi all,
>> Firstly, congratulations on OWASP being accepted for GSoC 2016!!
>> My name is Timothy Sum and I am from Malaysia. I am currently a final
>> year MSc Computer Science student studying at University of Kent in the UK.
>> I have experience in Java, Javascript, Python, Node.js, MongoDB, AWS,
>> Jenkins, Git workflow, Dropwizard, Logstash, Apache Spark (MSc
>> dissertation) and others, I am always keen to learn new technologies and
>> try things outside my comfort zone!
>> I am currently undergoing my placement (where I gained most of my
>> experience from) which will be concluded on the 31st March 2016. I will be
>> working full time on the weekdays before then. Therefore, I will do my
>> research about the project and prepare my proposal typically at night or
>> during the weekends. After my placement finishes, I will be able to
>> completely commit to GSoC by researching, learning and experimenting about
>> gaps in my knowledge during April even before the community bonding period.
>> I’ll have a written report to write about my placement that is due on June
>> 2016 but I can do that while coding over the summer!
>> I just recently stumbled over GSoC 3 days ago and have been looking
>> through the project list to decide which project I should go for. This will
>> be my first time contributing to an open source project and I am very hyped
>> up about it as I get to learn from a mentor and contribute at the same
>> time. :) I also do not mind having skype/hangout discussion with mentors
>> regularly to discuss about my progress.
> Yes, skype/hangouts is the normal way we communicate. I generally aim for
> meetings 2-3 times a week so we can make sure we're making forward progress
> and then use email in between meetings for specific questions.

>> I am interested in the Trend Monitoring Analysis Engine project for
>> OWASP AppSensor and would be excited if I can work on it. I do not have
>> a background in application security and intrusion detection but am highly
>> interested learning about it. So far, I have:
> Fantastic. Honestly, a background in spark / machine learning will be more
> important.

Cool! I did a module in data mining for my MSc that would come in handy
(learned about machine learning algos like decision trees etc).  I used
Spark for the first time during my dissertation to implement a
classification algorithm. I did not get to use Spark's machine learning
library but my past experience would hopefully make the transition easier.

>> i) Read the Chapter 3 and Chapter 4 of the OWASP guide briefly and
>> understand the approach behind AppSensor, its high level architecture
>> (detection and response unit), its pattern (Event, EventManager,
>> EventAnalysisEngine and so on)
>> ii) Manage to get a demo running locally as per the AppSensor Demo Setup
>> guide (
>> https://github.com/jtmelton/appsensor/blob/master/sample-apps/DemoSetup.md).
>> Had a little bump with a mongo test failing when doing mvn install but got
>> it to work in the end. Went through part of the codebase while doing this.
>> iii) Research on trend monitoring analysis techniques. It seems that
>> trend analysis falls into anomaly detection based on my understanding so
>> far but feel free to correct me (will expand in the section below). It
>> would be great if you recommend me additional papers/books to read to learn
>> more on this topic.
>> Did a first pass on two papers that cover general topics in IDS:
>> http://galaxy.cs.lamar.edu/~bsun/seminar/example_papers/IDS_taxonomy.pdf
>> http://www.ijcset.net/docs/Volumes/volume2issue4/ijcset2012020419.pdf
> There is not much literature specific to application intrusion detection.
> The concept is roughly based on network IDS systems. It is mostly
> transferring those concepts to the application layer, and looking for
> activity that is not possible (or is much harder) to detect at the network
> layer, but is possible (or much easier) at the application layer.

 Interesting, I will probably do some reading to get an better overview of
IDS in general.

>> Currently, I have given it some thought and my high level understanding
>> of the expected deliverables are:
>> i)  A trend monitoring analysis engine - Extend the analysis-engines
>> package and add tests. Depending on which implementation strategies to use,
>> it seems that I would have to record the “normal” behaviour pattern of a
>> system and then trigger a response if the application behaves out of the
>> norm which will be defined by the trending rules.
> I think of 2 possible approaches:
> - *simple trending engine* - this would be an implementation that would
> essentially do some simple counting. An example here might be that we have
> seen the occurrence of detection point ABC go up 500% in the last hour over
> the "normal" usage. This would likely be pretty straightforward, and could
> use something like a time series database to track the metadata, and do
> some very fast analysis.

I looked up on time series database to learn about them better as I have
not work with it.


I notice that we have a implementation to integrate with influxdb in the
package appsensor-integration-influxdb.

If I were to do the simple trending machine, I would have to extend the
current implementation to be able to retrieve events written to it so that
I can retrieve it in order to conduct the counting and analysis to compare
whether it is unusual. This is assuming that I will be using influxDB of
course. what are your opinions?

- *machine learning engine* - this is a more complex implementation. This
> would involve creating a ML style engine that would allow for various types
> of analysis. An example might be noticing a shift in the composition of
> HTTP verb usage for a given time period. If you decide to go this route, I
> think you'll want to be very specific with the types of analysis you want
> to provide, and focus on doing great documentation about how to build rules
> based on training data and the algorithm selection process.

 This is a really interesting idea! I did some researching in order to get
an idea of what needs to be done using Spark as a base. Idea and questions

i) Idea 1: There has been some work on using spark and cassandra (as a time
series db even though its a k-v store) for data analysis. In relation to
appsensor, I would have to implement Spark (probably as part of the
analysis engine) for its machine learning library and implement a storage
provider for cassandra prior to wiring them together. I will have to design
a schema for the time series data storage inside cassandra as well. This
seems quite a lot of work for the duration of the project but i'll be able
to leverage some existing work done.


ii) Idea 2: Implement a simple trending analysis ending as the main project
work (related to the question below simple trending approach) and finish
the 3 deliverables. Built a ML engine using Spark for machine learning
which will involve wiring it to the time series db used in the simple
trending approach. This way I don't have to implement a separate store for
the ML analysis engine but challenge probably lies into working out how to
connect them together.

iii) Question 1: What do you mean by specific about the type of analysis
that I am providing and algorithm selection? From what I understand, its
- For example, if we have 2 cases: measure shift in composition of HTTP
verb and number of API calls to an endpoint. I would implement it such that
I will use one algorithm for checking composition of HTTP verb and another
algorithm for number of API calls. I guess some research needs to be done
to decide which algorithm would be suitable for which use

- Implement wide variety of algorithm for analysis engine and then let the
user decide which algorithm to use for events or each event.

I am leaning towards with the simple trending approach for now taking into
account of time although I would really like to give the machine learning a
go. Feedback and answers to the questions above will help me scope out the
amount of work required for the machine learning approach especially (ii).

>> ii)  Associated configuration mechanism to specify the trending
>> rules/policy - Extend the configuration mode package, create respective
>> xml and xsd configuration for the Trend Monitoring analysis engine.
>> iii) A small full sample demo application showing usage of the trend
>> monitoring feature. - Built on the existing demo application?
> Yes, these would be the 3 basic outputs for that project, along with the
> associated documentation. Additionally, I would say that we should produce
> a small number of rules. That will be necessary for the demo application
> anyways, but we can use those rules as examples for the community. As for
> the demo application, it's very small and trivial. We actually have a user
> who built a demo application for a talk about appsensor that is likely a
> much better fit (https://github.com/dschadow/ApplicationIntrusionDetection
> )

Agreed about the rules bit. I took a look at the demo application built
above and it looks great, will refer to it when working on the demo
application part. I've used Dropwizard to built web apps but I haven't work
with Spring (only a little on DI) before and will have to read about it.

>> It would be great if the mentor/team can give me feedback on my ideas and
>> things to read to expand my knowledge in this domain. If there is any task
>> that you would like me to complete, I am eager to do it and will find time
>> at night or the weekends to complete it.
> I think what I'd be most interested in is if you could let us know which
> approach (simple trending, machine learning) you would prefer to take when
> building the analysis engine. Beyond that, I think your skillset looks well
> suited to the project.
>> I would also like to start preparing my project proposal to be able to
>> share with the mailing list to get feedback as this will be my first time
>> applying for GSoC and I will need all the help I can get!!
> Sounds great. I think your notes in this email are a very solid start. To
> build a good proposal, I think the most important thing to do is scope the
> work. Try to build a detailed plan (ie. what task(s) you will accomplish
> each week). After that, we can review it and make suggestions about whether
> or not we think you should try to do more or less work, and what parts may
> be tricky. It will also help us know which mentor(s) to bring onto the
> project.
 I will build up my plan as I scope out the work for the two approaches and
will definitely share it as soon as it is ready.

>> Thanks for your time and look forward to your feedbacks/replies. This
>> young padawan needs guidance. :D
> Thank you!
>> I have also started a topic in the OWASP GSoC group.
>> https://groups.google.com/forum/?fromgroups#!topic/owasp-gsoc/59vAa402jXo
>> Kind Regards,
>> Tim
>> _______________________________________________
>> Owasp-appsensor-project mailing list
>> Owasp-appsensor-project at lists.owasp.org
>> https://lists.owasp.org/mailman/listinfo/owasp-appsensor-project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/owasp-appsensor-project/attachments/20160308/c0199a4d/attachment-0001.html>

More information about the Owasp-appsensor-project mailing list