[Owasp-appsensor-project] GSoC 2016 Trend Monitoring Analysis Engine

John Melton jtmelton at gmail.com
Tue Mar 8 17:27:30 UTC 2016


Responses inline.

On Tue, Mar 8, 2016 at 4:33 AM, Timothy Sum Hon Mun <timothy22000 at gmail.com>
wrote:

> Hi John,
>
> Thanks for getting back to me. It was good hearing back from you. I've
> replied to you inline below.
>
> Besides that, I made a pull request for some minor changes and test that I
> added for appsensor as a first contribution:
> https://github.com/jtmelton/appsensor/pull/38
>
>
Fantastic. I'll take a look at that later today!


> Thanks again!
>
> Best Regards,
> Tim
>
> On Mon, Mar 7, 2016 at 4:37 AM, John Melton <jtmelton at gmail.com> wrote:
>
>> Tim,
>>
>> Hi, and thanks so much for your email. I've responded with specific
>> comments inline below.
>>
>> Thanks,
>> John
>>
>> On Sun, Mar 6, 2016 at 1:58 PM, Timothy Sum Hon Mun <
>> timothy22000 at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Firstly, congratulations on OWASP being accepted for GSoC 2016!!
>>>
>>> My name is Timothy Sum and I am from Malaysia. I am currently a final
>>> year MSc Computer Science student studying at University of Kent in the UK.
>>> I have experience in Java, Javascript, Python, Node.js, MongoDB, AWS,
>>> Jenkins, Git workflow, Dropwizard, Logstash, Apache Spark (MSc
>>> dissertation) and others, I am always keen to learn new technologies and
>>> try things outside my comfort zone!
>>>
>>> I am currently undergoing my placement (where I gained most of my
>>> experience from) which will be concluded on the 31st March 2016. I will be
>>> working full time on the weekdays before then. Therefore, I will do my
>>> research about the project and prepare my proposal typically at night or
>>> during the weekends. After my placement finishes, I will be able to
>>> completely commit to GSoC by researching, learning and experimenting about
>>> gaps in my knowledge during April even before the community bonding period.
>>> I’ll have a written report to write about my placement that is due on June
>>> 2016 but I can do that while coding over the summer!
>>>
>>> I just recently stumbled over GSoC 3 days ago and have been looking
>>> through the project list to decide which project I should go for. This will
>>> be my first time contributing to an open source project and I am very hyped
>>> up about it as I get to learn from a mentor and contribute at the same
>>> time. :) I also do not mind having skype/hangout discussion with mentors
>>> regularly to discuss about my progress.
>>>
>>
>> Yes, skype/hangouts is the normal way we communicate. I generally aim for
>> meetings 2-3 times a week so we can make sure we're making forward progress
>> and then use email in between meetings for specific questions.
>>
>
>>
>>>
>>> I am interested in the Trend Monitoring Analysis Engine project for
>>> OWASP AppSensor and would be excited if I can work on it. I do not have
>>> a background in application security and intrusion detection but am highly
>>> interested learning about it. So far, I have:
>>>
>>
>> Fantastic. Honestly, a background in spark / machine learning will be
>> more important.
>>
>
> Cool! I did a module in data mining for my MSc that would come in handy
> (learned about machine learning algos like decision trees etc).  I used
> Spark for the first time during my dissertation to implement a
> classification algorithm. I did not get to use Spark's machine learning
> library but my past experience would hopefully make the transition easier.
>
>>
>>
>>>
>>> i) Read the Chapter 3 and Chapter 4 of the OWASP guide briefly and
>>> understand the approach behind AppSensor, its high level architecture
>>> (detection and response unit), its pattern (Event, EventManager,
>>> EventAnalysisEngine and so on)
>>>
>>> ii) Manage to get a demo running locally as per the AppSensor Demo Setup
>>> guide (
>>> https://github.com/jtmelton/appsensor/blob/master/sample-apps/DemoSetup.md).
>>> Had a little bump with a mongo test failing when doing mvn install but got
>>> it to work in the end. Went through part of the codebase while doing this.
>>>
>>> iii) Research on trend monitoring analysis techniques. It seems that
>>> trend analysis falls into anomaly detection based on my understanding so
>>> far but feel free to correct me (will expand in the section below). It
>>> would be great if you recommend me additional papers/books to read to learn
>>> more on this topic.
>>>
>>> Did a first pass on two papers that cover general topics in IDS:
>>>
>>> http://galaxy.cs.lamar.edu/~bsun/seminar/example_papers/IDS_taxonomy.pdf
>>>
>>> http://www.ijcset.net/docs/Volumes/volume2issue4/ijcset2012020419.pdf
>>>
>>>
>> There is not much literature specific to application intrusion detection.
>> The concept is roughly based on network IDS systems. It is mostly
>> transferring those concepts to the application layer, and looking for
>> activity that is not possible (or is much harder) to detect at the network
>> layer, but is possible (or much easier) at the application layer.
>>
>
>  Interesting, I will probably do some reading to get an better overview of
> IDS in general.
>
>>
>>
>>> Currently, I have given it some thought and my high level understanding
>>> of the expected deliverables are:
>>>
>>> i)  A trend monitoring analysis engine - Extend the analysis-engines
>>> package and add tests. Depending on which implementation strategies to use,
>>> it seems that I would have to record the “normal” behaviour pattern of a
>>> system and then trigger a response if the application behaves out of the
>>> norm which will be defined by the trending rules.
>>>
>>
>> I think of 2 possible approaches:
>> - *simple trending engine* - this would be an implementation that would
>> essentially do some simple counting. An example here might be that we have
>> seen the occurrence of detection point ABC go up 500% in the last hour over
>> the "normal" usage. This would likely be pretty straightforward, and could
>> use something like a time series database to track the metadata, and do
>> some very fast analysis.
>>
>
> I looked up on time series database to learn about them better as I have
> not work with it.
>
>
> http://stackoverflow.com/questions/8816429/is-there-a-powerful-database-system-for-time-series-data
>
> I notice that we have a implementation to integrate with influxdb in the
> package appsensor-integration-influxdb.
>
> If I were to do the simple trending machine, I would have to extend the
> current implementation to be able to retrieve events written to it so that
> I can retrieve it in order to conduct the counting and analysis to compare
> whether it is unusual. This is assuming that I will be using influxDB of
> course. what are your opinions?
>

Yes, that's the basic idea. There are several that you could use. I don't
really care that much about the implementation (tool) to be honest, but
rather the idea. We can provide 1 implementation, then add implementations
for specific tools if people would like one that we don't already cover.


>
>>
> - *machine learning engine* - this is a more complex implementation. This
>> would involve creating a ML style engine that would allow for various types
>> of analysis. An example might be noticing a shift in the composition of
>> HTTP verb usage for a given time period. If you decide to go this route, I
>> think you'll want to be very specific with the types of analysis you want
>> to provide, and focus on doing great documentation about how to build rules
>> based on training data and the algorithm selection process.
>>
>
>  This is a really interesting idea! I did some researching in order to get
> an idea of what needs to be done using Spark as a base. Idea and questions
> below:
>
> i) Idea 1: There has been some work on using spark and cassandra (as a
> time series db even though its a k-v store) for data analysis. In relation
> to appsensor, I would have to implement Spark (probably as part of the
> analysis engine) for its machine learning library and implement a storage
> provider for cassandra prior to wiring them together. I will have to design
> a schema for the time series data storage inside cassandra as well. This
> seems quite a lot of work for the duration of the project but i'll be able
> to leverage some existing work done.
>
>
> http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-for-time-series-data
>
> ii) Idea 2: Implement a simple trending analysis ending as the main
> project work (related to the question below simple trending approach) and
> finish the 3 deliverables. Built a ML engine using Spark for machine
> learning which will involve wiring it to the time series db used in the
> simple trending approach. This way I don't have to implement a separate
> store for the ML analysis engine but challenge probably lies into working
> out how to connect them together.
>
>
Both of these ideas are good, honestly. I'd focus on which one you think
you can accomplish in the 3-month time frame. We don't want you to be able
to finish the project in 2 weeks, but we also don't want it to take a year.


> iii) Question 1: What do you mean by specific about the type of analysis
> that I am providing and algorithm selection? From what I understand, its
> either:
> - For example, if we have 2 cases: measure shift in composition of HTTP
> verb and number of API calls to an endpoint. I would implement it such that
> I will use one algorithm for checking composition of HTTP verb and another
> algorithm for number of API calls. I guess some research needs to be done
> to decide which algorithm would be suitable for which use
> case/scenario/event.
>
> - Implement wide variety of algorithm for analysis engine and then let the
> user decide which algorithm to use for events or each event.
>
> I am leaning towards with the simple trending approach for now taking into
> account of time although I would really like to give the machine learning a
> go. Feedback and answers to the questions above will help me scope out the
> amount of work required for the machine learning approach especially (ii).
> :D
>

What I meant by the "specific type of analysis" comment is around machine
learning. For machine learning, you have to decide which algorithm (or
family of algorithms) to use to solve a particular problem. We can
certainly use spark-ml or some other library to give us those algorithms,
but in order to make it useful to our users, we'll have to write some code
to integrate those algorithms with the types of problems we want to solve.
If we're trying to solve a problem that requires "k nearest neighbors",
then we'll have to write some code that uses that. My point was that we
don't want to solve _every_ problem. We want to essentially document the
process: 1) decide what problem you want to solve, 2) pick best algorithm,
3) implement algorithm, 4) use training dataset, 5) turn on analysis. In
that workflow, we are not going to implement _all_ the different types of
analysis you could do over the summer of code. I just want us to pick a few
problems to solve, and document the process so that our users can do the
same thing themselves to build new types of analysis.


>
>>
>>>
>>> ii)  Associated configuration mechanism to specify the trending
>>> rules/policy - Extend the configuration mode package, create respective
>>> xml and xsd configuration for the Trend Monitoring analysis engine.
>>>
>>> iii) A small full sample demo application showing usage of the trend
>>> monitoring feature. - Built on the existing demo application?
>>>
>>>
>> Yes, these would be the 3 basic outputs for that project, along with the
>> associated documentation. Additionally, I would say that we should produce
>> a small number of rules. That will be necessary for the demo application
>> anyways, but we can use those rules as examples for the community. As for
>> the demo application, it's very small and trivial. We actually have a user
>> who built a demo application for a talk about appsensor that is likely a
>> much better fit (
>> https://github.com/dschadow/ApplicationIntrusionDetection)
>>
>
> Agreed about the rules bit. I took a look at the demo application built
> above and it looks great, will refer to it when working on the demo
> application part. I've used Dropwizard to built web apps but I haven't work
> with Spring (only a little on DI) before and will have to read about it.
>

The Spring parts should be pretty straightforward, and I (and others) can
help you there if you need anything. You don't need to know much Spring at
all for this project.


>
>>
>>> It would be great if the mentor/team can give me feedback on my ideas
>>> and things to read to expand my knowledge in this domain. If there is any
>>> task that you would like me to complete, I am eager to do it and will find
>>> time at night or the weekends to complete it.
>>>
>>
>> I think what I'd be most interested in is if you could let us know which
>> approach (simple trending, machine learning) you would prefer to take when
>> building the analysis engine. Beyond that, I think your skillset looks well
>> suited to the project.
>>
>>
>>>
>>> I would also like to start preparing my project proposal to be able to
>>> share with the mailing list to get feedback as this will be my first time
>>> applying for GSoC and I will need all the help I can get!!
>>>
>>
>> Sounds great. I think your notes in this email are a very solid start. To
>> build a good proposal, I think the most important thing to do is scope the
>> work. Try to build a detailed plan (ie. what task(s) you will accomplish
>> each week). After that, we can review it and make suggestions about whether
>> or not we think you should try to do more or less work, and what parts may
>> be tricky. It will also help us know which mentor(s) to bring onto the
>> project.
>>
>>
>  I will build up my plan as I scope out the work for the two approaches
> and will definitely share it as soon as it is ready.
>

Perfect.


>
>
>>
>>> Thanks for your time and look forward to your feedbacks/replies. This
>>> young padawan needs guidance. :D
>>>
>>>
>>>
>> Thank you!
>>
>>
>>> I have also started a topic in the OWASP GSoC group.
>>>
>>> https://groups.google.com/forum/?fromgroups#!topic/owasp-gsoc/59vAa402jXo
>>>
>>>
>>> Kind Regards,
>>>
>>> Tim
>>>
>>>
>>> _______________________________________________
>>> Owasp-appsensor-project mailing list
>>> Owasp-appsensor-project at lists.owasp.org
>>> https://lists.owasp.org/mailman/listinfo/owasp-appsensor-project
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.owasp.org/pipermail/owasp-appsensor-project/attachments/20160308/aa9204b2/attachment-0001.html>


More information about the Owasp-appsensor-project mailing list