[Oisf-users] Platform suitability - SPARC vs X86
Victor Julien
lists at inliniac.net
Fri Nov 29 07:41:39 UTC 2013
On 11/29/2013 04:59 AM, Mark Ashley wrote:
> I've been working on an IDS solution for a while, hoping to get
> something running on Solaris as it's our platform of choice.
>
> Currently the system is a T5220 (1.4GHz 8 core ( x 8 threads == 64
> threads)) with 64GB RAM.
> It has four PCIe quad GBE cards and dual PCIe SFP 10 GBE cards. It used
> to be running Solaris 10 but I recently upgraded it to Solaris 11 to
> take advantage of the BPF support in that, for libpcap.
>
> It's using libpcap 1.5.1, tcpdump 4.5.1 and suricata 1.4.6. The libpcap
> and tcpdump are essentially as-shipped, but the suricata is somewhat
> bastardized to enable it to compile and run on the SPARC 64 platform.
>
> There are seven main interfaces so far which we point suricata at, all
> mirror ports on switches on our network. The most interesting network is
> the DMZ and the packet count is so high we lose 90% of all packets when
> running suricata. If I use tcpdump and tune it with "-B 32gb -n -q -t"
> to not print too much then it'll stop dropping packets on that
> interface. This means it's possible to keep up with the packet flow but
> not do much interesting with it.
I assume Solaris will have some kind of profiling option. Can you see
what Suricata is spending it's time on?
Also, the --enable-profiling configure option may be interesting, esp
together with --enable-profiling-locks.
In general, we use 3 types of threading sync methods: mutex, spinlocks
and atomics. We then have all kinds of fallbacks if the latter 2 are
unavailable. This will hurt performance though.
Finally, Eric identified a case where on a 32cpu(core?) x86 box he would
get massive contention on a single spinlock in our packetpool. He hacked
up a workaround here: https://github.com/inliniac/suricata/pull/658
> Another approach which I'll also look at is to replicate how tcpdump
> gets values out of unaligned packet headers. The extract.h file has
> EXTRACT_[16|32|64]BITS macros which seem to be quite portable, and would
> reduce the double handling being done now to cope with the alignment
> requirements. I'll trial it on a few suricata source files and see how
> it goes.
>
> https://github.com/the-tcpdump-group/tcpdump/blob/master/extract.h
Looks interesting.
>
> # cat suricata.yaml
> %YAML 1.1
> ---
>
> # Suricata configuration file. In addition to the comments describing all
> # options in this file, full documentation can be found at:
> #
> https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Suricatayaml
>
>
> # Number of packets allowed to be processed simultaneously. Default is a
> # conservative 1024. A higher number will make sure CPU's/CPU cores will be
> # more easily kept busy, but may negatively impact caching.
> #
> # If you are using the CUDA pattern matcher (b2g_cuda below), different
> rules
> # apply. In that case try something like 4000 or more. This is because
> the CUDA
> # pattern matcher scans many packets in parallel.
> max-pending-packets: 8192
>
> # Runmode the engine should use. Please check --list-runmodes to get the
> available
> # runmodes for each packet acquisition method. Defaults to "autofp"
> (auto flow pinned
> # load balancing).
> # runmode: autofp
> runmode: workers
The workers runmode means one thread per interface in this case, as
libpcap can't do any clustering. Each of those threads does:
[capture][decode][flow][stream][detect][output]
If you switch to autofp, each thread per interface will only do:
[capture][decode][flow]
And it then passes off the packets to a set of (flow balanced) queues,
when other threads can pick them up.
In the Linux+afpacket/pfring case, we can have multiple worker threads
per interface, as the capture method supports flow balancing the
traffic. libpcap can't do this for us.
> # Specifies the kind of flow load balancer used by the flow pinned
> autofp mode.
> #
> # Supported schedulers are:
> #
> # round-robin - Flows assigned to threads in a round robin fashion.
> # active-packets - Flows assigned to threads that have the lowest
> number of
> # unprocessed packets (default).
> # hash - Flow alloted usihng the address hash. More of a random
> # technique. Was the default in Suricata 1.2.1 and
> older.
> #
> autofp-scheduler: round-robin
I'd then try active-packets here.
> # Suricata is multi-threaded. Here the threading can be influenced.
> threading:
> # On some cpu's/architectures it is beneficial to tie individual threads
> # to specific CPU's/CPU cores. In this case all threads are tied to CPU0,
> # and each extra CPU/core has one "detect" thread.
> #
> # On Intel Core2 and Nehalem CPU's enabling this will degrade performance.
> #
> set-cpu-affinity: yes
Ppl have been reporting very mixed results with this, so it may be good
to experiment with on/off, and different settings.
> # By default Suricata creates one "detect" thread per available
> CPU/CPU core.
> # This setting allows controlling this behaviour. A ratio setting of 2
> will
> # create 2 detect threads for each CPU/CPU core. So for a dual core
> CPU this
> # will result in 4 detect threads. If values below 1 are used, less
> threads
> # are created. So on a dual core CPU a setting of 0.5 results in 1 detect
> # thread being created. Regardless of the setting at a minimum 1 detect
> # thread will always be created.
> #
> detect-thread-ratio: 2
On the number of cpus you have, setting this around 1 is probably
better. (only used with autofp)
> pcap:
> - interface: net16
> checksum-checks: no
> threads: 48
> - interface: net17
> checksum-checks: no
> threads: 2
> - interface: net18
> checksum-checks: no
> - interface: net19
> threads: 16
> checksum-checks: no
> - interface: net7
> checksum-checks: no
> threads: 1
> - interface: net8
> checksum-checks: no
> threads: 1
> - interface: net9
> checksum-checks: no
> threads: 2
> # On Linux, pcap will try to use mmaped capture and will use buffer-size
> # as total of memory used by the ring. So set this to something bigger
> # than 1% of your bandwidth.
> buffer-size: 1gb # 2gb
You'll have to set this buffer size per interface. Also, if the MTU
check doesn't work (should print error on startup), it may be good to
manually set a snaplen.
> #bpf-filter: "tcp and port 25"
> # Choose checksum verification mode for the interface. At the moment
> # of the capture, some packets may be with an invalid checksum due to
> # offloading to the network card of the checksum computation.
> # Possible values are:
> # - yes: checksum validation is forced
> # - no: checksum validation is disabled
> # - auto: suricata uses a statistical approach to detect when
> # checksum off-loading is used. (default)
> # Warning: 'checksum-validation' must be set to yes to have any
> validation
> checksum-checks: no
> # With some accelerator cards using a modified libpcap (like
> myricom), you
> # may want to have the same number of capture threads as the number
> of capture
> # rings. In this case, set up the threads variable to N to start N
> threads
> # listening on the same interface.
> #threads: 16
> # set to no to disable promiscuous mode:
> #promisc: no
> # set snaplen, if not set it defaults to MTU if MTU can be known
> # via ioctl call and to full capture if not.
> #snaplen: 1518
> # Put default values here
> - interface: default
> checksum-checks: no
>
The options below can be interesting to inspect.
> # Profiling settings. Only effective if Suricata has been built with the
> # the --enable-profiling configure flag.
> #
> profiling:
>
> # rule profiling
> rules:
>
> # Profiling can be disabled here, but it will still have a
> # performance impact if compiled in.
> enabled: yes
> filename: rule_perf.log
> append: yes
>
> # Sort options: ticks, avgticks, checks, matches, maxticks
> sort: avgticks
>
> # Limit the number of items printed at exit.
> limit: 100
>
> # packet profiling
> packets:
>
> # Profiling can be disabled here, but it will still have a
> # performance impact if compiled in.
> enabled: yes
> filename: packet_stats.log
> append: yes
>
> # per packet csv output
> csv:
>
> # Output can be disabled here, but it will still have a
> # performance impact if compiled in.
> enabled: no
> filename: packet_stats.csv
>
> # profiling of locking. Only available when Suricata was built with
> # --enable-profiling-locks.
> locks:
> enabled: no
> filename: lock_stats.log
> append: yes
>
Cheers,
Victor
--
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------
More information about the Oisf-users
mailing list