The ability to use native RapidIO networks as if they were Ethernet creates a compelling case for using RapidIO in greenfield systems. Its ecosystem supports 20 Gbps ports, and Ethernet encapsulation makes RapidIO the logical choice when redesigning 10 GbE systems for next-generation throughput and performance.


Example of a RapidIO-based hardware/computing server implementation:
High-density, non-blocking switching cards have already been developed using RapidIO Gen2 switches from Integrated Device Technology, Inc. (IDT). These switching cards could be deployed in data center/HPC environments. Figure 1 illustrates a switching card implementation that could be leveraged to support top-of-rack switching.


The IDT Tsi721 bridge device, which converts PCIe Gen2 to RapidIO Gen2, could be used to provide the interconnect to the server node. The Tsi721 converts between PCIe and RapidIO, providing a bridged connection at up to 20 Gbaud. With the Tsi721, designers can develop heterogeneous systems that leverage RapidIO's peer-to-peer networking performance while utilizing microprocessor clusters that are only active with PCIe. The target market requires large amounts of data to be transferred efficiently without processor intervention, which can be achieved using a high-speed DMA block in conjunction with the Tsi721's messaging engines. The advantages of RapidIO Type 9 messaging, such as virtualization support and increased throughput compared to 10 GbE, allow for reduced cabling. This example system serves as a starting point for the discussion that will take place throughout the rest of this article.
For this example system, messaging between a pair of nodes is supported by two separate concepts: logical I/O and messaging. Logical I/O transactions are supported with direct bridge translation, as well as with Block DMA engines. DMA and RDMA are supported in this system, as they are designed for Ethernet-based systems. Messaging is supported by messaging engines. Ethernet messaging is supported by TCP, among others. RapidIO supports a single IOV per channel, which is lower than available Ethernet solutions.


Comparison of the protocol


Ethernet provides a way to serve peer-to-peer traffic processor networks, whether chip-to-chip, board-to-board, or between chassis. However, Ethernet evolved from LANs and WANs as architects sought an efficient way to use it in embedded systems. Against this backdrop, LANs and WANs typically assume a processor at each node to complete the protocol stack. This is a reasonable assumption for LANs and WANs, but it introduces excessive latency and power consumption into real-time embedded systems (including servers).


The PCI and PCIe standards offer an alternative; however, they were actually designed for monolithic, single-host processor systems with a root complex concept. Scaling to multiple processors on line cards, backplane motherboards, and multiple servers presents a challenge, even with non-transparent bridges. The problem can be solved for a small number of endpoints or compute nodes, but memory mapping becomes difficult very quickly as systems scale in size.
Since RapidIO was built from the ground up for multi-processor P2P networks, it inherently includes the following attributes: reliable communication, end-to-end packet delivery in less than one microsecond, 100 ns cut-through switching latency, no processor overhead for protocol termination, support for any topology (e.g., direct, mesh, star, dual star), high-performance messaging for transmitting large amounts of data, and a push architecture with the option for each processor in the system to have its own memory subsystem.


RapidIO has become the primary integrated interconnect and, with its carrier-grade serial communication specified for backplane connectivity, can offer native support for intra-board, inter-board, wired, and inter-chassis connections within a room or between different rooms.


Subsequent specifications were developed to improve Ethernet performance in embedded environments, extending beyond wide area network (WAN) and local area network (LAN) settings. These enhancements for data center environments are collectively defined as Data Center Bridging (DCB). Data center and embedded environments are characterized by lossless transport, improved flow control, and low latency.


Comparison between QoS and Flow Control:

One of the drivers for increased bandwidth in enterprise data centers and the cloud is the need to combine the storage network, which typically reaches 8 Gbps for Fibre Channel, with the inter-server connectivity network, which is usually 1 Gbps Ethernet. These networks have different limitations in terms of quality of service. Furthermore, storage networks must not drop packets. Currently, RapidIO-based systems deliver reliable delivery with predictable quality of service.


For applications requiring more aggressive and effective QoS, RapidIO offers advanced flow control and data plane features. The RapidIO protocol defines multiple flow control mechanisms at the physical and logical layers. By managing physical layer flow control at the data link layer, short-term congestion events are effectively handled through flow control managed by both the transmitter and receiver. Long-term congestion can be controlled at the logical layer using XOFF and XON messages, which allow the receiver to halt packet flow when congestion is detected in a specific flow.


Virtual channels (VCs) support new QoS features. These features provide reliable or best-practice delivery policies, improved flow control at the data link layer, and end-to-end traffic management. VCs also allow up to 16 million individual virtual flows between two endpoints.
Ethernet has also improved its flow control capabilities through advancements in DCB technology, which allows one end of a link to halt transmission at the other end to prevent buffer overflows and subsequent packet dropouts. Simplified packet routing, facilitated by VLAN tagging, as well as packet prioritization as part of the VLAN tag, have also contributed significantly to improved Ethernet latency and quality of service.


However, DCB has numerous flow control and QoS limitations compared to RapidIO [Comparison between Ethernet and RapidIO, CompactPCI, AdvancedTCA, and MicroTCA, July 15, 2010]. For example, Ethernet flow control support is primarily provided through 802.3x PAUSE support. Even with enhanced flow control mechanisms, the overhead of congestion notification is high, as the notification propagates from the source to the network edge, whereas in RapidIO, congestion notification is quite fast through control symbol transmission. Clearly, Ethernet mechanisms have not been widely adopted, and some vendors offer proprietary support for limited topologies. RapidIO's data link layer flow control allows the transmitter to load the receiver by keeping it continuously full, which improves scheduling efficiency and, therefore, overall switching efficiency.


Performance comparison: latency and throughput

The latency of Ethernet switching devices continues to decrease, to the point that the industry's leading Ethernet switches now have a latency of approximately 200 ns. However, the latency of RapidIO switching devices is below 100 ns and is similarly decreasing. This trend will continue for both technologies as companies use smaller silicon process geometries and higher physical layer speeds. End-to-end packet termination can be greater than 10 µs for Ethernet and less than 1 µs for RapidIO.


RapidIO offers guaranteed delivery through data link layer failure recovery. Data link layer control symbols minimize control loop latency. Control symbols can also be embedded within packets, further minimizing latency. Lossless DCE still requires download engines or software stacks that tend to introduce latency.


RapidIO Gen2 switches offer 20 Gbps per port. The Tsi721 converts PCIe data to RapidIO and vice versa, providing a gateway at the maximum line rate of 16 Gbps for packets as small as 64 bytes. This is more than the 10 GbE generally available, but obviously less than the growing number of 40 GbE solutions that are coming to market.


From a raw bandwidth perspective, Ethernet outperforms RapidIO. However, this gap should narrow once the physical specifications and roadmap for upcoming RapidIO releases become available. RapidIO's performance and protocol efficiency enable robust protocol encapsulation. Messaging and data transmission provide native Ethernet encapsulation capabilities.


Security:

The Tsi721's receiver-side security functions for RapidIO are implemented in hardware and can determine whether or not to accept a series of destination IDs. Each packet type defined by the RapidIO 2.1 Specification can be accepted or discarded. Transmission-side security must be implemented in software.


Security in the switching structure is reinforced by the system host. No node can communicate with another unless the routing tables of the structure are configured to allow packets to be routed between nodes. Each switch port also has four filters that can mask and compare up to the first 20 bytes of each packet and discard it. This capability can be used to enforce address ranges and recipient IDs for DMA read and write transactions, as well as to prevent any node other than the system host from querying or configuring the switching structure.


The

RapidIO unit and system offer superior power and price, and are undoubtedly more energy-efficient than Ethernet. RapidIO's physical layer replaces Ethernet's transport layer protocols to ensure reliable and accurate message delivery. Ethernet's protocol load is higher, which also contributes to greater energy consumption per transmitted data point.


Ethernet vendors charge a premium for 10 Gbps Ethernet ports and even more for 40 Gbps ports. The system port fee for a 10 Gbps Ethernet chip can reach hundreds of dollars per port, with 10G switching device costs exceeding $10 per port in volume. However, thanks to less complex packet termination, smaller packet memory, and no sorting requirements, among other factors, RapidIO's system port costs are approximately $55, with switching device costs below $4 per 10G port in volume.


Ecosystem:

From an ecosystem perspective, legacy Ethernet, over 30 years old, has a significant advantage compared to older RapidIO ecosystems that are ten years or more old. There are many vendors of silicon, platforms, tools, and software. The Ethernet hardware ecosystem offers converged network cards, switch and router platforms, as well as server and storage platforms. The software ecosystem offers network management software, Microsoft Windows®, Linux®, and a wide variety of other offerings. There are also protocol analyzers, packet analyzers, traffic generators, network testers, and a large record of demanding compliance testing. RapidIO incorporates robust operating systems, Linux, OFED, protocol analysis, system diagnostics, and several server platforms. It has a strong track record of Gen1 compliance, while Gen2 compliance is underway. Limitations remain regarding the Windows operating system, VMware, network adaptation solutions, NPU devices, and support options for storage platforms.


Conclusion:

The redesign effort to move from 10 GbE to 40 GbE opens up greenfield opportunities for competing interconnect fabrics seeking a foothold in enterprise servers, data centers, cloud computing, and high-performance computing. Among the competitors, RapidIO Gen2, with its 20 Gbps throughput, is a strong contender. System designers who can leverage smaller, less mature RapidIO ecosystems can move to 20 Gbps using switches, PCIe bridges, and any of the currently available embedded CPU solutions. The benefits include a fault-tolerant, reliable, and highly scalable carrier-grade system solution with an extremely low total cost of ownership. It also offers significantly lower power consumption and latency, with superior flow control and QoS.

Author:

Trevor Hiatt, product director at IDT.

More information or a quote