Posts categorized "Web/Tech"

July 18, 2008

Auto-Negotiation Part 3 - Why 100Mbps is Important on a 10G PHY

Why are multi-speed 10G/1G/100BASE-T PHYs important?

In prior generations of Ethernet, this proved to be the path towards rapid provisioning of new speeds.  With multi-speed 10GBASE-T NICs in their servers, an end-user may begin to provision 10GBASE-T capable servers prior to upgrading the switching infrastructure and later install the 10GBASE-T switch and upgrade the entire set of servers’ speeds.  This allows for much easier and more rapid provisioning, without a “forklift” upgrade, where entire infrastructures are changed at once.

A second reason, growing in importance, for multi-speed PHYs is power management.  I’ve already said much about Energy Efficient Ethernet, but that is not what I mean.  Here, I am talking about “wake on LAN” capability.   I will say more about this subject separately, but putting it simply, today’s version of server power management puts idle servers into a low-speed mode (100BASE-TX usually), and waits for a “Magic Packet” to “Wake on LAN”.  This way, idle machines can dramatically lower their power consumption.  With the increasing EPA and Industry focus on making devices consume minimal power when not doing work, this feature is a must for servers.  Eventually, we’ll have to see how this plays out with regard to Energy Efficient Ethernet, but that won’t occur for years.  Today, if you want energy efficiency when a server is idle, you support Wake-on-LAN functionality.

Simple Auto-Negotiation enables some pretty important benefits which have helped the dominance of BASE-T copper Ethernet.  However, it is not to be forgotten that this is because it allows new generations to build on the prior generations technological and installed infrastructure base of BASE-T networks.  Now that 10GBASE-T is offering the right lower speeds of Ethernet, we will begin to see the power of Auto-Negotiation as it allows IT managers to asynchronously upgrade the speeds of their servers, AND still support power management (through Wake-on-LAN) for idle systems.

July 15, 2008

Auto-Negotiation Part 2 – Auto-Neg vs. Speed Switching

Would someone use Auto-Negotiation to switch speeds on-the-fly?

No.  The switching of speeds via Auto-Negotiation occurs at the initiation of a physical-layer link.  It is not something done on-the-fly.  Auto-Negotiation is useful for network speed transitions that happen over long durations in computer time (at least minutes long, since the transition takes a few seconds).  Unlike Energy Efficient Ethernet, which I’ve discussed earlier, the kind of speed transitions that might be facilitated through Auto-Negotiation, would not be expected to be buffered up in a network, as this would require billions and billions of bits stored.

July 09, 2008

Auto-Negotiation – a simple but powerful idea, often misunderstood

Almost every copper Ethernet device supports “Auto-Negotiation”, or clause 28 of IEEE Std. 802.3.  However, based on questions I’ve been asked, it seems that this simple protocol is often misunderstood. This is the first part of a 3 part series to answer some of the more common questions.

What exactly is Auto-Neg?

Simply put, support for Auto-Negotiation means that two devices can talk with each other and communicate what Ethernet standards each is capable of supporting.  They then agree on the highest common speed, which may be none (they agree to disagree).  There are many Ethernet standards left on the dustbin of history, not offered in modern devices.  The Auto-Negotiation capability is one part of an equation that has allowed twisted-pair copper (BASE-T) devices to dominate the Ethernet market.  The other part is the ability to communicate over a common connector (RJ-45) and media type (UTP copper with structured cabling rules).  The most misunderstood part of Auto-Negotiation is that it does not imply support for any speeds.

Does support for Auto-Neg mean Multi-Speed?

The initial versions of 10GBASE-T PHYs were all single speed, largely because the switch interfaces could only support the 10 Gigabit XAUI interface, and could not make use of a multi-speed PHY.  They supported Auto-Negotiation, but not multiple speeds.  That is changing this year.  The coming generation of 10GBASE-T NICs will support 100Mbps and 1000Mbps in addition to 10Gigabit speeds.  They will Auto-Negotiate with a switch port and provide the highest possible speed, but allowing connection at lower speeds where appropriate.

April 21, 2008

Two 10GBASE-T Hopefuls Pass On

Please pause for a moment of silence.  Last month, quietly, the 10GBASE-T vendor community lost two early players.  Both were banking on short-reach 10GBASE-T technology, figuring that patch-cord length solutions would be good enough.  Keyeye Communications closed its doors (http://www.entrepreneur.com/localnews/1610167.html), and Vativ Technologies, which had marketed a proprietary, 10gigabit ethernet patch-cord length connection sold its assets to Entropic Communications for a fraction of the money invested (http://www.primenewswire.com/newsroom/news.html?d=139528).

Both of these teams had star players.  I worked with Hiroshi Takatori, founder of Keyeye, a few years back as a partner in developing HDSL2 transceivers,  and I worked alongside, Mike McConnell, well-known in the Ethernet world, and Albert Vareljian in the IEEE standards.  I’ve had some interaction with Sreen Ragavan’s team from ComCore and National, from which he seeded Vativ, and they are not slouches either.  It is with some sadness that I see these engineers and technology marketers depart from the 10GBASE-T scene. It goes to show, as I’ve said a few times before, 10GBASE-T is an extremely hard problem to solve.  It just took more careful systems planning, intellectual property development, and patience than most will go for.  Both teams made compromises on distance or standards compliance, and attempted to go for less than 100 meter reach, in Vativ’s case, a few meters of patch cord, and in Keyeye’s case, aligned with a “short reach test mode” which was inserted into the 802.3an-2006 10GBASE-T standard as an optional mode to allow full-100m-capable devices to conserve power when they were connected on shorter links.  Relaxing the demands of the technical problem does make 10GBASE-T amenable to analog and other simpler techniques, but it also makes it a lot less useful.

Now that full performance devices have come to market, and it is clear that we have begun dramatically bringing the power down (see http://www.networkworld.com/news/2008/041408-solarflare-halves-10gbase-ts-power.html?code=nlnetarch133353 ), as was done for 1000BASE-T before, I guess that it was hard to generate enough greed in investors to overcome the fear that a short-reach technology would be relegated to a small niche, while traditional, full-100m devices with power management capabilities would once again fill the vast majority of slots.  Fortunately, Solarflare and I began this journey with a team and investors who were in it for the long haul, too.

Still, I wish the teams from Keyeye and Vativ well.  I’m already starting to see them show up in other places, helping contribute to other new technologies.

February 06, 2008

Archive images

I was thinking of photographing some old networking hardware I've knocking about. Here are some fascinating archive images from:

Cambridge Computer Laboratory
David Greaves (Cambridge Ring, CFR, and ATM) 


More to follow ...

February 01, 2008

Hot topics in energy efficient switches

A lot has been and is being said about power and going to 10 gigabits. Pundits and conventional wisdom commonly focus on the PHY power as the power per port. Ultimately, when switch designs get to much higher densities than they are today, the PHY power will be limiting. Today, however, recent announcements show that the hardest part is still amortizing the power required to process line-rate packets at speed, but things are changing, both on the PHY and within switch designs.

Last week Cisco announced a doubling in the number of 10gigabit ports for line cards in their catalyst 6500 platform  ( http://newsroom.cisco.com/dlls/2008/prod_012808.html ). According to the release, the new module “can help reduce power consumption by up to 50 percent per port.” Now, barring a new port type (the release mentions none, and the data sheet description specifies X2 modules), it is fair to assume that the power consumption of these line cards must therefore be primarily the power consumed by the switch fabric. More detailed power calculations will confirm this fact. This is not an unusual situation in enterprise class switches today. The fact that Cisco is bringing the port density of Catalyst 6500 line cards up is very good for 10gigE, simply because it allows a better amortization of the power requirements for switching high-performance 10GigE traffic. The simple fact is that like on TCP Offload and iWarp NICS (http://www.eetimes.com/news/design/showArticle.jhtml;jsessionid=0MIIR0LRC0MQOQSNDLOSKHSCJUNN2JVN?articleID=205918831 ), in switches the majority of the power consumption per port is still going . When you double the port density (and keep the fabric roughly the same). Unfortunately for the switch vendors, getting beyond 16 ports in a standard line card form factor is going to require a shift away from the fairly versatile X2 modules that are currently shipping into the market. Soldered-down solutions such as 10GBASE-T, in addition to yet another optical module form factor shift will allow densities to look more like gigabit ethernet does today, physically allowing 48 or more ports in the same front-panel space. That transition is happening today, and the higher density can be seen even in the companion Cisco announcement (Nexus 7000 series) and in announced designs from SMC http://www.smc.com/index.cfm?event=viewProduct&localeCode=EN_USA&cid=8&scid=107&pid=1646  ), Arastra ( http://www.arastra.com/media/2007-11-05/ ) and others using soldered down 10GBASE-T silicon, new module form factors,  and new switch silicon designs.

These are further enabled because, silicon designers, both internal to the OEMs and in merchant silicon teams have been working the switch density and power consumption problem just as PHY vendors have been diligently making solutions, such as 10GBASE-T, which will supplant the X2 and relieve the front-panel real estate crisis. At the same time as the power budget begins to realign, PHY powers are entering  These designs are showing that this year, densities of 10gigabit platforms, enabled by 10GBASE-T and new optical module designs are finally providing the cost and power efficiencies necessary for mass deployment of high-performance 10gig networking.

I plan to be writing in the next few days about a number of topics in energy efficiency, including two trends that on the surface seem to be at odds – increased utilization of resources, including high speed links through virtualization and other means (e.g., http://www.cisco.com/en/US/solutions/ns708/networking_solutions_products_genericcontent0900aecd806fd32a.pdf) and the desire to rapidly turn down underutilized links (“Energy Efficient Ethernet” – e.g., http://www.ieee802.org/3/az/index.html ). So please send thoughts on what might be of interest.

January 31, 2008

Virtualization and iSCSI simplify IT

Yesterday Dell completed its acquisition of Equal Logic, together with a quote from Michael Dell:

Virtualization and iSCSI are two keys to simplifying IT. Enterprises are creating data and consuming power at an exponential rate, driving up IT cost and complexity.

When you combine this with Dell's Veso announcement at the '07 VMworld, a picture of ubiquitous virtualization combined with simple easy to use (commoditized) iSCSI emerges. This environment is something that Solarflare has been actively supporting - our SFC4000 10GBASE-T controller has been designed specifically to support iSCSI within a virtualized environment. For some more details, see our see our  presentation  at the Xen Nov 07 summit.

-- Thanks to Robert Stonehouse

January 30, 2008

QoS and latency

Latency is the time taken for a message to travel from its source to its destination, including all overheads. In  networking terms, this would be: sender overhead + time of flight + transmission time + receiver overhead. 

Reported latencies are usually generated using a half round-trip measurement. Here the total elapsed time to send a small message back and forth between two endpoints is measured. The result is averaged over the number of iterations and divided by 2. Half round-trip latency is important where application performance depends on small message exchange particularly in the parallel compute arena.

However other applications such as streaming media and stock-update processing performed within the financial services community depend rather more critically on the performance of processing streams of messages. For these applications, one-way latency is a rather more useful measurement. Here the total elapsed time to transfer a large number of messages from a sender to a receiver is measured and averaged over the total number of messages.

At a glance, the reader might think that one-way latency and half round-trip latency should be equal. However this is not so because the system overheads will be very different when processing streams of messages compared with a single message. It's therefore a shame that one-way latency is often not reported in discussions of interconnect performance.

Recently latency has been used synonymously  with the other attributes (by my definition) of QoS, namely jitter and bandwidth. For example, as data sets have grown, the performance of some applications have increasingly become bottlenecked by the time taken to transfer this data back and forth. This time is actually dominated by bandwidth, but the problem is often reported as data latency.

Similarly, jitter (which is a measure of the variance of message transfer times) has been termed application latency.  A good example of an application where jitter is a critical metric would be the stock ticker feed. Put simply, any human perceivable jitter in the feed will cause traders to lose confidence in their information source and must be prevented at all cost.

The low-jitter property (rather than low-latency) of interconnects such as Infiniband is one of the main reasons that they have gained momentum in the financial services community. But ironically much of this jitter has been caused by OS interactions, particularly between the scheduler and the network stack. Recent work in this area has done much to reduce these issues and it is now possible to achieve low-jitter operation using classic Ethernet networks.

January 28, 2008

One small step for functional programming

If you've ever been exposed to functional programming and been struck by the power and simplicity of expression which is possible, then you'll be pleased to know that Microsoft Research has been active in this area for some time now with the F# language (pronounced FSharp).

The real beauty of the language is that the runtime has been integrated with the .NET framework. This means that building GUIs and other features such as distributed communication into an application with a functional core is now very easy.

It's down-loadable from here , is clean to install and comes with a nice set of sample programs - including a client/server socket program and a symbolic differentiator with graphical output.

I'd love to see the day that the language is included in Visual Studio

January 25, 2008

10GBASE-T power tradeoffs

This article was written in early 2007 and published by in Jan 2008. It predicted that the constraints of the 10GBASE-T PHY power requirements would mean that 2007 products would be restricted to single port 10GBASE-T NICs based around a low power controller architecture. This prediction has been born out - to date the only available 10GBASE-T NIC as product (not marketing hype) is the SMC10GPCIe-10BT.  The paper goes on to predict that the only  dual port 10GBASE-T NICs available in  2008 will also be based around low power controller architectures. We'll see about that one,  but a pattern emerges which isn't going to go away  until at least the 45nm process node in that significant care needs to be taken to balance the power requirements of both the PHY and the controller in order that major product intersections are not missed.

An interesting  corollary to these architectural challenges  has been the  deployment  of creative marketing to report the max power consumption of controller silicon.  Largely  this is reported  for a controller powered on,  but not  transferring  Ethernet frames. Given that  modern CPUs  employ  power  scaling features it should  be unsurprising that  this measurement is  much lower than for a CPU based network controller  under load.  My personal measurements show that >50% increases in power consumption  are not uncommon once  packets start to flow!

Some useful links on this topic include:

Rick Merritt's Blog (Jan 2008)

The Register (Feb 2007)

SMC8724-10BT (24 Port 10GBASE-T switch)

SMC 10GPCIe-10BT (10GBASE-T Server Adaptor Card)