Posts categorized "Network Performance"

May 20, 2008

10Gb/s Ethernet with commodity hardware (Revisited)

To update from my somewhat opaque post in February.

Please take a look at this  independent test report from the Broad-Band Test Labs, with some interesting perspective from Steve Broadhead in Computer Weekly

Bottom line is that if we're reporting exactly what's what for 10G Ethernet performance for a decent set of classic micro-benchmarks as well as for virtualization and storage deployments. So download and look no further. We hope that the Solarstorm controller available from SMC speaks for itself.

SMC 10GPCIe-10BT (10GBASE-T Server Adaptor Card)

April 04, 2008

Interrupt affinity and Microbenchmarks

One of the purposes of a micro-benchmark such as NetPerf in a development environment is to track performance over time in a very deterministic manner. This is increasingly at odds with the non-determinism caused by interrupt and application scheduling over large numbers of CPU cores. For this reason, we often pin both the interrupt and application when running a single-stream micro-benchmark.

To pin interrupts to a given CPU core using Linux you:

- stop the irqbalance service
- cat /proc/interrupts to determine which MSI-X interrupts (N) are allocated to your device
- echo (1<< CPU) > /proc/irq/<N>/smpaffinity

To pin an application to a given CPU core using Linux you:

- taskset (1<< CPU)  <application command line>

These days, similar facilities are available for all other OS, but conversely for multi-stream testing where we want to measure the effect of receive side scaling techniques interrupt and application pinning is generally counterproductive.  However it's always an amusing exercise to see whether you can manually do a better job than the OS/hypervisor for a given workload.

February 29, 2008

10Gb/s Ethernet with commodity hardware (Revisited)

An update for those interested in tracking the Moore's Law curve is that bi-directional line-rate is now achievable with very modest CPU utilization using a commodity Lin/Win/Tel server and standard (1500) Ethernet frames.

February 01, 2008

Hot topics in energy efficient switches

A lot has been and is being said about power and going to 10 gigabits. Pundits and conventional wisdom commonly focus on the PHY power as the power per port. Ultimately, when switch designs get to much higher densities than they are today, the PHY power will be limiting. Today, however, recent announcements show that the hardest part is still amortizing the power required to process line-rate packets at speed, but things are changing, both on the PHY and within switch designs.

Last week Cisco announced a doubling in the number of 10gigabit ports for line cards in their catalyst 6500 platform  ( http://newsroom.cisco.com/dlls/2008/prod_012808.html ). According to the release, the new module “can help reduce power consumption by up to 50 percent per port.” Now, barring a new port type (the release mentions none, and the data sheet description specifies X2 modules), it is fair to assume that the power consumption of these line cards must therefore be primarily the power consumed by the switch fabric. More detailed power calculations will confirm this fact. This is not an unusual situation in enterprise class switches today. The fact that Cisco is bringing the port density of Catalyst 6500 line cards up is very good for 10gigE, simply because it allows a better amortization of the power requirements for switching high-performance 10GigE traffic. The simple fact is that like on TCP Offload and iWarp NICS (http://www.eetimes.com/news/design/showArticle.jhtml;jsessionid=0MIIR0LRC0MQOQSNDLOSKHSCJUNN2JVN?articleID=205918831 ), in switches the majority of the power consumption per port is still going . When you double the port density (and keep the fabric roughly the same). Unfortunately for the switch vendors, getting beyond 16 ports in a standard line card form factor is going to require a shift away from the fairly versatile X2 modules that are currently shipping into the market. Soldered-down solutions such as 10GBASE-T, in addition to yet another optical module form factor shift will allow densities to look more like gigabit ethernet does today, physically allowing 48 or more ports in the same front-panel space. That transition is happening today, and the higher density can be seen even in the companion Cisco announcement (Nexus 7000 series) and in announced designs from SMC http://www.smc.com/index.cfm?event=viewProduct&localeCode=EN_USA&cid=8&scid=107&pid=1646  ), Arastra ( http://www.arastra.com/media/2007-11-05/ ) and others using soldered down 10GBASE-T silicon, new module form factors,  and new switch silicon designs.

These are further enabled because, silicon designers, both internal to the OEMs and in merchant silicon teams have been working the switch density and power consumption problem just as PHY vendors have been diligently making solutions, such as 10GBASE-T, which will supplant the X2 and relieve the front-panel real estate crisis. At the same time as the power budget begins to realign, PHY powers are entering  These designs are showing that this year, densities of 10gigabit platforms, enabled by 10GBASE-T and new optical module designs are finally providing the cost and power efficiencies necessary for mass deployment of high-performance 10gig networking.

I plan to be writing in the next few days about a number of topics in energy efficiency, including two trends that on the surface seem to be at odds – increased utilization of resources, including high speed links through virtualization and other means (e.g., http://www.cisco.com/en/US/solutions/ns708/networking_solutions_products_genericcontent0900aecd806fd32a.pdf) and the desire to rapidly turn down underutilized links (“Energy Efficient Ethernet” – e.g., http://www.ieee802.org/3/az/index.html ). So please send thoughts on what might be of interest.

January 30, 2008

QoS and latency

Latency is the time taken for a message to travel from its source to its destination, including all overheads. In  networking terms, this would be: sender overhead + time of flight + transmission time + receiver overhead. 

Reported latencies are usually generated using a half round-trip measurement. Here the total elapsed time to send a small message back and forth between two endpoints is measured. The result is averaged over the number of iterations and divided by 2. Half round-trip latency is important where application performance depends on small message exchange particularly in the parallel compute arena.

However other applications such as streaming media and stock-update processing performed within the financial services community depend rather more critically on the performance of processing streams of messages. For these applications, one-way latency is a rather more useful measurement. Here the total elapsed time to transfer a large number of messages from a sender to a receiver is measured and averaged over the total number of messages.

At a glance, the reader might think that one-way latency and half round-trip latency should be equal. However this is not so because the system overheads will be very different when processing streams of messages compared with a single message. It's therefore a shame that one-way latency is often not reported in discussions of interconnect performance.

Recently latency has been used synonymously  with the other attributes (by my definition) of QoS, namely jitter and bandwidth. For example, as data sets have grown, the performance of some applications have increasingly become bottlenecked by the time taken to transfer this data back and forth. This time is actually dominated by bandwidth, but the problem is often reported as data latency.

Similarly, jitter (which is a measure of the variance of message transfer times) has been termed application latency.  A good example of an application where jitter is a critical metric would be the stock ticker feed. Put simply, any human perceivable jitter in the feed will cause traders to lose confidence in their information source and must be prevented at all cost.

The low-jitter property (rather than low-latency) of interconnects such as Infiniband is one of the main reasons that they have gained momentum in the financial services community. But ironically much of this jitter has been caused by OS interactions, particularly between the scheduler and the network stack. Recent work in this area has done much to reduce these issues and it is now possible to achieve low-jitter operation using classic Ethernet networks.

January 25, 2008

10Gb/s Ethernet with commodity hardware

I'm still frequently asked whether 10Gb/s Ethernet is possible for everyday applications using commodity hardware.  The short answer is that it is. We published a paper in 2006 (which appeared as an editorial in the April 2007 ACM CCR)

Download ACM_CCR_ETHERNET_RETROSPECTIVE.PDF

Which showed commodity servers available on the street in mid-2006 hitting 10Gb/s performance with relative ease. Unsurprisingly, the situation has only got better since then and frankly todays servers are so IO bound at 1Gb/s that a wholesale shift to 10Gb/s is imminent.

Incidentally for those interested in a bit of history, the paper also presents a retrospective of the evolution of network interface architecture from programmed IO devices on the early workstations of the '80s to  modern devices sprouting vast numbers (of often irrelevant) hardware offloads.