[Oisf-devel] Deriving browser activity (user activity) from suricata flow

Vipul Hari vipu.hari at gmail.com
Tue Sep 1 11:07:17 UTC 2015


Hi,

I am using suricata(2.1beta4) to develop an IDP/IPS solution based on
insights derived from a user's browsing activity.

The key metrics I am looking for are:
1. top websites being visited
2. amount of time spend on each website**

The second part is the one thats actually causing me problems, "The amount
of time spend on a website".
I want to derive this as accurately as possible from the "flow" & "http"
events suricata logs into eve.json and by fine tuning suricata settings.

[My Setup]
local network user (browsing)---> suricata ----> router -----> internet

An example scenario: If a user with source_ip 192.168.0.1 visits
www.facebook.com and spends 10 minutes browsing through facebook.
I want to quantitatively capture this information under a single unique
"session".

Currently I parse eve.json and form a packet similar to the following (some
additional fields are not shown for simplicity):

{"flow_id": '140664679399216', 'hostname': 'www.facebook.com', 'duration':
'600', 'timestamp': 1441097587, 'state': 'established', 'tx_id': 0}

I depend on the "flow_id" field of eve.json logs to identity and bind
"http" event_type to the correct "flow".

[Challenges]
# Flow_ID - the flow_id field of eve.json is not unique enough to identity
and correlate the above "session" to say "alerts" or "fileinfo".
# Flow-Manager(flow_timeouts) - might force flow timeout, before the user
actually terminates/closes the http session.
In other words, it is difficult to identify "flow timeouts" and actual user
"termination" of http sessions.

[Questions]

# Is it possible to make flow_id unique,
The flow_id filed that I am getting is a long integer.
This will cause potential problems in the database because its likely that
the same flow_id may repeat in future for a different flow event.

# What is the role of suricata flow-manager in timing out TCP flows OR What
is the best setting for flow-timeouts for my use case
I observed that there are three TCP states [new, established and closed].
Hence my rudementary assumption is that to corrently derive "How much time
a user spend on a website".
One might have to accumilate the time duration accross each 3 states?.

# What is the best way to identify and correlate an "http" event to the
correct "flow", when there is more than one flow per http event within a
single flow_id
Currently I am simply using a combination of (flow_id + timestamp) to
predict the correct "flow" for a given "http" event.
I believe there is a better way.

# What is the purpose of the tx_id field
I have noticed the field "tx_id" with every "http" event_type. Is it
something I can use as an additional identifier?

# A similar discussion I found online:
https://lists.openinfosecfoundation.org/pipermail/oisf-users/2015-March/004650.html


I am interested to know if anybody else is facing similar challenges, your
thoughts and suggestions.

Thanks in advance.
- Vipul
vipu.hari at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-devel/attachments/20150901/73006567/attachment.html>


More information about the Oisf-devel mailing list