[Oisf-users] Platform suitability - SPARC vs X86

Victor Julien lists at inliniac.net
Fri Nov 29 07:41:39 UTC 2013


On 11/29/2013 04:59 AM, Mark Ashley wrote:
> I've been working on an IDS solution for a while, hoping to get
> something running on Solaris as it's our platform of choice.
> 
> Currently the system is a T5220 (1.4GHz 8 core ( x 8 threads == 64
> threads)) with 64GB RAM.
> It has four PCIe quad GBE cards and dual PCIe SFP 10 GBE cards. It used
> to be running Solaris 10 but I recently upgraded it to Solaris 11 to
> take advantage of the BPF support in that, for libpcap.
> 
> It's using libpcap 1.5.1, tcpdump 4.5.1 and suricata 1.4.6. The libpcap
> and tcpdump are essentially as-shipped, but the suricata is somewhat
> bastardized to enable it to compile and run on the SPARC 64 platform.
> 
> There are seven main interfaces so far which we point suricata at, all
> mirror ports on switches on our network. The most interesting network is
> the DMZ and the packet count is so high we lose 90% of all packets when
> running suricata. If I use tcpdump and tune it with "-B 32gb -n -q -t"
> to not print too much then it'll stop dropping packets on that
> interface. This means it's possible to keep up with the packet flow but
> not do much interesting with it.

I assume Solaris will have some kind of profiling option. Can you see
what Suricata is spending it's time on?

Also, the --enable-profiling configure option may be interesting, esp
together with --enable-profiling-locks.

In general, we use 3 types of threading sync methods: mutex, spinlocks
and atomics. We then have all kinds of fallbacks if the latter 2 are
unavailable. This will hurt performance though.

Finally, Eric identified a case where on a 32cpu(core?) x86 box he would
get massive contention on a single spinlock in our packetpool. He hacked
up a workaround here: https://github.com/inliniac/suricata/pull/658


> Another approach which I'll also look at is to replicate how tcpdump
> gets values out of unaligned packet headers. The extract.h file has
> EXTRACT_[16|32|64]BITS macros which seem to be quite portable, and would
> reduce the double handling being done now to cope with the alignment
> requirements. I'll trial it on a few suricata source files and see how
> it goes.
> 
> https://github.com/the-tcpdump-group/tcpdump/blob/master/extract.h

Looks interesting.

> 
> # cat suricata.yaml
> %YAML 1.1
> ---
> 
> # Suricata configuration file. In addition to the comments describing all
> # options in this file, full documentation can be found at:
> #
> https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Suricatayaml
> 
> 
> # Number of packets allowed to be processed simultaneously.  Default is a
> # conservative 1024. A higher number will make sure CPU's/CPU cores will be
> # more easily kept busy, but may negatively impact caching.
> #
> # If you are using the CUDA pattern matcher (b2g_cuda below), different
> rules
> # apply. In that case try something like 4000 or more. This is because
> the CUDA
> # pattern matcher scans many packets in parallel.
> max-pending-packets: 8192
> 
> # Runmode the engine should use. Please check --list-runmodes to get the
> available
> # runmodes for each packet acquisition method. Defaults to "autofp"
> (auto flow pinned
> # load balancing).
> # runmode: autofp
> runmode: workers

The workers runmode means one thread per interface in this case, as
libpcap can't do any clustering. Each of those threads does:
[capture][decode][flow][stream][detect][output]

If you switch to autofp, each thread per interface will only do:
[capture][decode][flow]

And it then passes off the packets to a set of (flow balanced) queues,
when other threads can pick them up.

In the Linux+afpacket/pfring case, we can have multiple worker threads
per interface, as the capture method supports flow balancing the
traffic. libpcap can't do this for us.

> # Specifies the kind of flow load balancer used by the flow pinned
> autofp mode.
> #
> # Supported schedulers are:
> #
> # round-robin       - Flows assigned to threads in a round robin fashion.
> # active-packets    - Flows assigned to threads that have the lowest
> number of
> #                     unprocessed packets (default).
> # hash              - Flow alloted usihng the address hash. More of a random
> #                     technique. Was the default in Suricata 1.2.1 and
> older.
> #
> autofp-scheduler: round-robin

I'd then try active-packets here.

> # Suricata is multi-threaded. Here the threading can be influenced.
> threading:
>   # On some cpu's/architectures it is beneficial to tie individual threads
>   # to specific CPU's/CPU cores. In this case all threads are tied to CPU0,
>   # and each extra CPU/core has one "detect" thread.
>   #
>   # On Intel Core2 and Nehalem CPU's enabling this will degrade performance.
>   #
>   set-cpu-affinity: yes

Ppl have been reporting very mixed results with this, so it may be good
to experiment with on/off, and different settings.

>   # By default Suricata creates one "detect" thread per available
> CPU/CPU core.
>   # This setting allows controlling this behaviour. A ratio setting of 2
> will
>   # create 2 detect threads for each CPU/CPU core. So for a dual core
> CPU this
>   # will result in 4 detect threads. If values below 1 are used, less
> threads
>   # are created. So on a dual core CPU a setting of 0.5 results in 1 detect
>   # thread being created. Regardless of the setting at a minimum 1 detect
>   # thread will always be created.
>   #
>   detect-thread-ratio: 2

On the number of cpus you have, setting this around 1 is probably
better. (only used with autofp)

> pcap:
>   - interface: net16
>     checksum-checks: no
>     threads: 48
>   - interface: net17
>     checksum-checks: no
>     threads: 2
>   - interface: net18
>     checksum-checks: no
>   - interface: net19
>     threads: 16
>     checksum-checks: no
>   - interface: net7
>     checksum-checks: no
>     threads: 1
>   - interface: net8
>     checksum-checks: no
>     threads: 1
>   - interface: net9
>     checksum-checks: no
>     threads: 2
>     # On Linux, pcap will try to use mmaped capture and will use buffer-size
>     # as total of memory used by the ring. So set this to something bigger
>     # than 1% of your bandwidth.
>     buffer-size: 1gb # 2gb

You'll have to set this buffer size per interface. Also, if the MTU
check doesn't work (should print error on startup), it may be good to
manually set a snaplen.

>     #bpf-filter: "tcp and port 25"
>     # Choose checksum verification mode for the interface. At the moment
>     # of the capture, some packets may be with an invalid checksum due to
>     # offloading to the network card of the checksum computation.
>     # Possible values are:
>     #  - yes: checksum validation is forced
>     #  - no: checksum validation is disabled
>     #  - auto: suricata uses a statistical approach to detect when
>     #  checksum off-loading is used. (default)
>     # Warning: 'checksum-validation' must be set to yes to have any
> validation
>     checksum-checks: no
>     # With some accelerator cards using a modified libpcap (like
> myricom), you
>     # may want to have the same number of capture threads as the number
> of capture
>     # rings. In this case, set up the threads variable to N to start N
> threads
>     # listening on the same interface.
>     #threads: 16
>     # set to no to disable promiscuous mode:
>     #promisc: no
>     # set snaplen, if not set it defaults to MTU if MTU can be known
>     # via ioctl call and to full capture if not.
>     #snaplen: 1518
>   # Put default values here
>   - interface: default
>     checksum-checks: no
> 

The options below can be interesting to inspect.

> # Profiling settings. Only effective if Suricata has been built with the
> # the --enable-profiling configure flag.
> #
> profiling:
> 
>   # rule profiling
>   rules:
> 
>     # Profiling can be disabled here, but it will still have a
>     # performance impact if compiled in.
>     enabled: yes
>     filename: rule_perf.log
>     append: yes
> 
>     # Sort options: ticks, avgticks, checks, matches, maxticks
>     sort: avgticks
> 
>     # Limit the number of items printed at exit.
>     limit: 100
> 
>   # packet profiling
>   packets:
> 
>     # Profiling can be disabled here, but it will still have a
>     # performance impact if compiled in.
>     enabled: yes
>     filename: packet_stats.log
>     append: yes
> 
>     # per packet csv output
>     csv:
> 
>       # Output can be disabled here, but it will still have a
>       # performance impact if compiled in.
>       enabled: no
>       filename: packet_stats.csv
> 
>   # profiling of locking. Only available when Suricata was built with
>   # --enable-profiling-locks.
>   locks:
>     enabled: no
>     filename: lock_stats.log
>     append: yes
> 

Cheers,
Victor

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------




More information about the Oisf-users mailing list