May 20, 2008

Tesla and Computing for the Future of the Planet

I've just finished reading a "Tesla, a Man Out of Time" which is a rip-roaring read on the great AC vs DC wars which took place just before the turn of the last century, together with lots of great  material for the conspiracy theorists.

Of course, Tesla was a truly great mind of his time credited with inventing the AC electric motor as well as radio transmission.

One of Tesla's dreams was to transmit energy wirelessly. While never confused between the transmission of information and energy it is likely true that much of the capital and time spent on his dual wireless energy / information transmission experiments could have been more profitably spent had he concentrated purely on what we now know as radio.  Tesla watched as his commercial lead was lost to Marconi and others and was unable to defend his radio patents during his lifetime (he was awarded this posthumously).

All this resonates (no pun intended) with recent directions taken by Andy Hopper at the Cambridge University Computer Laboratory. From the principle that transmission of information is much more efficient than the transmission of energy, his group are exploring new techniques to explore the tradeoffs between physical and virtual resources particularly as this relates to digital-enablement for the developing world and classic data-center energy-efficiency applications. Take a look  here  and at his Google Tech Talk (where Solarflare gets a mention for high-efficiency migration of Citrix-Xen computations using 10G Ethernet).


 

April 21, 2008

Two 10GBASE-T Hopefuls Pass On

Please pause for a moment of silence.  Last month, quietly, the 10GBASE-T vendor community lost two early players.  Both were banking on short-reach 10GBASE-T technology, figuring that patch-cord length solutions would be good enough.  Keyeye Communications closed its doors (http://www.entrepreneur.com/localnews/1610167.html), and Vativ Technologies, which had marketed a proprietary, 10gigabit ethernet patch-cord length connection sold its assets to Entropic Communications for a fraction of the money invested (http://www.primenewswire.com/newsroom/news.html?d=139528).

Both of these teams had star players.  I worked with Hiroshi Takatori, founder of Keyeye, a few years back as a partner in developing HDSL2 transceivers,  and I worked alongside, Mike McConnell, well-known in the Ethernet world, and Albert Vareljian in the IEEE standards.  I’ve had some interaction with Sreen Ragavan’s team from ComCore and National, from which he seeded Vativ, and they are not slouches either.  It is with some sadness that I see these engineers and technology marketers depart from the 10GBASE-T scene. It goes to show, as I’ve said a few times before, 10GBASE-T is an extremely hard problem to solve.  It just took more careful systems planning, intellectual property development, and patience than most will go for.  Both teams made compromises on distance or standards compliance, and attempted to go for less than 100 meter reach, in Vativ’s case, a few meters of patch cord, and in Keyeye’s case, aligned with a “short reach test mode” which was inserted into the 802.3an-2006 10GBASE-T standard as an optional mode to allow full-100m-capable devices to conserve power when they were connected on shorter links.  Relaxing the demands of the technical problem does make 10GBASE-T amenable to analog and other simpler techniques, but it also makes it a lot less useful.

Now that full performance devices have come to market, and it is clear that we have begun dramatically bringing the power down (see http://www.networkworld.com/news/2008/041408-solarflare-halves-10gbase-ts-power.html?code=nlnetarch133353 ), as was done for 1000BASE-T before, I guess that it was hard to generate enough greed in investors to overcome the fear that a short-reach technology would be relegated to a small niche, while traditional, full-100m devices with power management capabilities would once again fill the vast majority of slots.  Fortunately, Solarflare and I began this journey with a team and investors who were in it for the long haul, too.

Still, I wish the teams from Keyeye and Vativ well.  I’m already starting to see them show up in other places, helping contribute to other new technologies.

April 18, 2008

FCoE or iSCSI which way are you converging?

The recent FCOE announcements have clearly stirred the pot; with plenty of debate on whether FCOE is simply a smash and grab layer violation, or else is a technology with lasting utility. Thankfully I don’t need to take a public stance on the matter since Solarflare by definition supports all upper layer protocols over Ethernet and in particular supports the Data Center Ethernet (also known as CEE) protocols required for software based implementations of FCOE within its Solarstorm Ethernet controller products.

See:

http://www.eetimes.com/showArticle.jhtml?articleID=207100764

http://interconnects.blogspot.com/2008/04/moving-at-10g-rates.html

http://blocksandfiles.com/article/4773

http://blogs.cisco.com/datacenter/2008/04/the_antifcoe_sentiment.html

http://en.wikipedia.org/wiki/OSI_model

http://www.open-fcoe.org/

http://www.fcoe.com

It is however worth understanding the architectural differences between the current crop of FCOE announcements and also the difference between network convergence and unified networks. First, a quick look at the two FCOE architectures:

 

  1. Software based FCOE is the natural model to adopt for a commodity LAN vendor who is converging on storage. Here the goal may be to inter-network with legacy FC equipment by running  a FC protocol stack on top of a regular Ethernet driver in a commodity  host operating system. The Ethernet controller looks just like a regular Ethernet controller, except that it is able to partake in the new Ethernet  congestion avoidance protocols currently being defined in the IEEE. The FCOE driver is handed an FCOE frame and needs to perform FC protocol.      Performance is likely to be worse than a hardware based solution because software is now doing FC protocol which would previously have been done by the FC controller and also because FC data integrity is now being done in software. (As an aside, data-integrity performance is becoming less of an issue with this architecture due to protocol neutral CRC support which is increasingly being found in hardware.) Also end customer acceptance is likely to be slow on the uptake because the software FC stack is immature and the FC community is rightly very conservative regarding system reliability and cross-vendor compatibility. However, software based FCOE has the      promise of highly integrated commodity silicon (for the server at least).

 

  1. Hardware based FCOE is the natural model to adopt for a storage HBA vendor who is converging on LAN. Here the goal may be to cost-optimize a rack solution by running both FC and LAN      traffic using a single HBA and wire to a top of rack switch which connects out to FC and regular Ethernet networks. (Yes I know that IEEE is also defining inter-switch congestion avoidance protocols which would enable a multi-tier converged Ethernet network, but frankly these are a long, long way from any sort of maturity). The converged HBA looks at the hardware level like two distinct hardware devices: an Ethernet controller and an FC controller with the advantage that the software running on the server can be the existing mature (and loved) driver stacks of the respective Ethernet and FC controller vendors. (Yes, unexpected alliances between FC HBA vendors and well known Ethernet vendors are required for this solution unless customers are to be persuaded that FC vendors overnight have their own mature Ethernet controllers and software). So the advantage of this model is that it can be available as a deployable product in short order. However not easily integrated within volume silicon. Of course if you are an end customer paying HBA prices, then this might be acceptable.

As well as the implementation and business differences between the FCOE architectures, both FCOE and iSCSI illustrate the difference between two different approaches to the unification of networks:

 

  1. iSCSI is an example of network convergence, where different network functions are Layered into one protocol stack architecture. Ethernet implementing OSI Layers 1, 2 TCP/IP implementing Layers 3, 4+ and iSCSI implementing Layer 5. By converging on a single protocol stack architecture the features and abstractions of the underlying layers are built upon. For example iSCSI sessions may be routed over different underlying IP networks.

 

  1. Whereas FCOE is an example of a unified network in that two different network protocol stack architectures are carried by a single Layer 2 network (Ethernet in this case). By maintaining two distinct network protocol stacks, a unified network is able to support applications which require both network protocol stacks while transitioning the network at a lower layer.

 
These two different approaches are the same double-edged sword for both iSCSI and FCOE and will be the cause of significant continuing religious debate. Understanding the difference is key to deciding whether one or the other is the right approach for a given enterprise to take.

April 15, 2008

10GBASE-T Comes to the Fore

Sometimes I look back and find it’s amazing how constant the demand for the simplicity and utility of 10GBASE-T has been over the past 7 years since Solarflare began.  Never is it more apparent than in the responses to the launch of our recent single-chip, sub-6 watt transceiver.  It’s easy to say that it’s just history repeating itself, but a lot of hard work went into it.  10GBASE-T is about engineers putting the complexity into the silicon and firmware so that the user doesn’t have to.  Users have realized, and said to me, since 2000, that if they had a 10 Gigabit solution that ran on UTP copper (100 meters), they would adopt it, just as they had 1000BASE-T.  Today, the response is the same.  Speaking the Rick Merritt in EE Times (4/14), Dante Malagrino, director of product marketing for data center solutions at Nuova Systems, which was acquired last week by Cisco, said that “the 10GBASE-T technology "is great in terms of compatibility and simplicity,”.  Cisco has understood the importance of this technology since the beginning, it was their help in driving the 802.3an (10GBASE-T) standard, particularly on issues of power/performance tradeoffs and latency, that have resulted in the available parts today.   Recognizing, as Steve Pope has, that latency is mostly in the system, and that the PHY contributes little, Cisco helped set the 2.5usec latency that is standard for 10GBASE-T (http://www.ieee802.org/3/an/public/jul05/comments_3_0705.pdf ).

The question marks for 10GBASE-T have always been around 3 things: (1) can it be done, which was proven with the SFX7101, shipping since August 2006, (2), will the power come down (with the SFT9001 it has, dramatically), and (3) is there really demand for 10 gigabit. If you’re reading this blog, you are likely already aware that 10gigabit technology is (finally) beginning to take off in the market, and that it is driven by a variety of applications, including storage, unified fabrics, and virtualization.  These in turn, are not simply ends in themselves, but are driven by operational and energy efficiencies, major economic factors that are becoming increasingly important in today’s economy.  In relation to 10 gigabit networking, Renato Recio, a chief engineer for server networking at IBM Corp. said to EE Times (http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=207200193), "Somewhere between 50 and 70 percent of the Fortune 1000 companies are going to be building data centers in the next three years," "They are looking for technologies to make them more green, and this network convergence group has that value--this rings for customers”.

Rick Merritt, writing for EE Times, understands the operational importance of 10GBASE-T to network convergence, when he writes, “The work on the 10GBASE-T standard for Ethernet over copper lines only indirectly fuels the network convergence. Its primary aim is to lower the cost of and expand the market for 10-Gbit Ethernet, which has been limited to expensive optical and short-reach copper cables to date.”  Lower cost is extremely important, because customers will only buy these more efficient technologies at a reasonable price, and that is where the new generation of 10GBASE-T transceivers comes in. The fully-integrated SFT9001 now brings the promise of low-cost 10 Gigabit Ethernet to fruition. Again from EE Times, Recio added, "In my opinion, 10GBASE-T is a very important piece because it significantly reduces my price point to use copper," "I'd rather not use fiber in a rack, and it's an even better deal if my end-of-row switch can use copper."

These new parts will take some months for OEMs to integrate into end-user products, and, expect to see a lot of conversation in the meantime, touting the benefits of optics or short-reach copper solutions.  Having watched the constancy of the 10GBASE-T market demand for a number of years, it looks like the promise has finally be prepared and the tidal wave of 10 Gigabit copper is coming. 

April 04, 2008

OpenOnload 2.2.125

Dev-snapshot is available from www.openonload.org.

Interrupt affinity and Microbenchmarks

One of the purposes of a micro-benchmark such as NetPerf in a development environment is to track performance over time in a very deterministic manner. This is increasingly at odds with the non-determinism caused by interrupt and application scheduling over large numbers of CPU cores. For this reason, we often pin both the interrupt and application when running a single-stream micro-benchmark.

To pin interrupts to a given CPU core using Linux you:

- stop the irqbalance service
- cat /proc/interrupts to determine which MSI-X interrupts (N) are allocated to your device
- echo (1<< CPU) > /proc/irq/<N>/smpaffinity

To pin an application to a given CPU core using Linux you:

- taskset (1<< CPU)  <application command line>

These days, similar facilities are available for all other OS, but conversely for multi-stream testing where we want to measure the effect of receive side scaling techniques interrupt and application pinning is generally counterproductive.  However it's always an amusing exercise to see whether you can manually do a better job than the OS/hypervisor for a given workload.

March 20, 2008

Progress in energy efficiency

This past week I've been off meeting with the IEEE task force on energy efficient ethernet (802.3az), something I've mentioned before.  The group is working hard to define operating modes for systems during periods of low link utilization.  Now, while most of us think of our electric bills on the scale of days, weeks and months, the EEE group is looking to optimize utilization at the time scale of microseconds.  A longtime industry veteran in the group pointed out that this is a  return to the old days of ethernet when the only time transmission occured was when there was data, and a preamble signalled the receiver and facilitated its wakeup.

One of the most interesting discussions was around congested networks in the data center.  Within this group, it was somewhat universally stated that this was no longer an issue for gigabit networks, because those were being scaled to 10gigabits when congestion raised its ugly head.  A scant few years ago, I can recall that not being the case.   The future looks bright for 10 Gigabit.

In addition, the group is making progress towards a specification.  Proposals were baselined for low-power idles for 100 and 1000BASE-T, and proposals were made for a low-power idle mode for 10GBASE-T, which promises to cut power during low-utilization periods by up to 80%.  This has the potential to rapidly accelerate 10gigabit to the desktop,  by allowing quick bursts of speed when required, without the power penalty. 

Link: Up and Down the Network Stack.

March 14, 2008

Smokin' Chimney? Send for the Fire-Crew

Kudos to Microsoft for a strong statement for robustness and quality over religion. They issued a critical update this week to disable the scalable networking pack (RSS and TOE Chimney) from Windows 2003. Details can be found here and here .

February 29, 2008

10Gb/s Ethernet with commodity hardware (Revisited)

An update for those interested in tracking the Moore's Law curve is that bi-directional line-rate is now achievable with very modest CPU utilization using a commodity Lin/Win/Tel server and standard (1500) Ethernet frames.

February 26, 2008

It's all about using the right engine for the job ..

Take a look at these videos of a couple of model engines:

1.  Stirling engine

2.  steam engine 

Two very different approaches to solving a similar problem and a good reminder that achieving energy efficiency designs often requires you to think outside the box.

BTW go buy both they're fantastic for big and small kids (1   , 2)