<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><br></div><div><br>On 20 Jan 2017, at 02:08, Maxim <<a href="mailto:hittlle@163.com">hittlle@163.com</a>> wrote:<br><br></div><blockquote type="cite"><div><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div>Hi Cooper,</div><div>Thanks very much. I could not open <a href="http://marc.info/?l=linux-netdev&m=148181173415107&w=2" _src="http://marc.info/?l=linux-netdev&m=148181173415107&w=2">http://marc.info/?l=linux-netdev&m=148181173415107&w=2</a> to patch my ixgbe driver. I use ixgbe-4.4.6, the latest version downloaded from Intel official site. Do I need to patch it? Could you please share your experience to optimize suricata performance? Could you please send me a list? Currently, I use multiple queues and RSS, and plus RFS, and my setup can process nearly 5 gigabits of traffic per second. I wanna try your way, that is single receive queue + RFS. Another question is that what size of RX queue should I set? Does this size have something to do with CPU layer 3 cache size? I used perf to record my cache misses, my cache miss rate is nearly 50%, maybe I can reduce this. Many thanks.</div></div></div></blockquote><div><br></div><div>You need to take care of the 50% CPU cache misses. It is quite big a percentage.</div><div>Make sure the NUMA node location is correct and the NIC descriptor ring size is optimal (that does not necessarily mean maximum). </div><div><br></div><div>Michal spent quite a bit of effort (repetitive adjust,run,check) trying to find the optimal "golden middle" one for our set up (described in the paper) including throttling and coalescence (which also also comes into play in that case).</div><div><br></div><br><blockquote type="cite"><div><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div><br></div><div>Hittlle</div><br><br><br><br><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><br><pre><br>At 2017-01-20 06:53:30, "Cooper F. Nelson" <<a href="mailto:cnelson@ucsd.edu">cnelson@ucsd.edu</a>> wrote:
>Hardware RSS has problems because often the flow load balancing is not
>symmetric. This causes problems with suricata as different cores handle
>each side of the flow and creates timing issues.
>
>I'm assuming you are using they ixgbe driver, if so you probably need to
>patch it.
>
>> <a href="http://marc.info/?l=linux-netdev&m=148181173415107&w=2">http://marc.info/?l=linux-netdev&m=148181173415107&w=2</a>
>
>... and then set a special hash key to force symmetric flows.
>
>I have a special experimental 3.2 build based around full hardware
>RSS/offloading using the new AF_PACKET tpacket-v3 mode which is showing
>some pretty spectacular performance improvements over the standard
>build. If you are interested I can work with you off list to get it
>setup on your hardware, but I'll warn you there are lots of moving parts
>to get everything working correctly.
>
>Most important thing first is to make sure you are on a Linux
>distribution with a relatively 'fresh' kernel. I'm on 4.8.7 currently
>and at least 4.7+ is recommend. You also need to be able to install the
>source for the kernel and then patch the ixgbe module, or download the
>driver and then patch it.
>
>-Coop
>
>On 1/18/2017 6:58 PM, Maxim wrote:
>> Thanks all for you guidance. I've read this tutorial. Currently there
>> are two approaches to suricata performance tuning. One is to use
>> multiple queues, and bind each queue IRQ to a separate core; the
>> other one, just like this tutorial shows is to use a single queue,
>> but let Linux RFS(receive flow steering) to do what NIC RSS would do.
>> I've no idea who is better. I prefer the multiple queue approach
>> because I think hardware is better doing calculating than RFS because
>> the latter is implemented in software, what do you think? In my case,
>> I used 16 RX queues, and bind them to 16 cores separately, when I
>> tried to simulate 10 gigabit traffic per second, all the 16 cores
>> were fully occupied, but I still have another 8 cores idling. I wanna
>> use RFS to distribute busy softirqs to the 8 idle cores, but it turns
>> out there is no significant improvement. I turned on hyperthreading,
>> and my CPU is 2.1 Ghz, my CPU sucks? Many thanks.
>>
>
>
>--
>Cooper Nelson
>Network Security Analyst
>UCSD ITS Security Team
><a href="mailto:cnelson@ucsd.edu">cnelson@ucsd.edu</a> x41042
>
</pre></div><br><br><span title="neteasefooter"><p> </p></span></div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>Suricata IDS Users mailing list: <a href="mailto:oisf-users@openinfosecfoundation.org">oisf-users@openinfosecfoundation.org</a></span><br><span>Site: <a href="http://suricata-ids.org">http://suricata-ids.org</a> | Support: <a href="http://suricata-ids.org/support/">http://suricata-ids.org/support/</a></span><br><span>List: <a href="https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users">https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a></span><br></div></blockquote></body></html>