Increasingly, each of these components is handled by a different program running on a different server in the website's data center. This reduces processing time, but it exacerbates another problem: the equitable allocation of network bandwidth among the programs.
Many websites aggregate all the components of a page before sending them to the user. So if just one program is allocated too little bandwidth on the data center network, the rest of the page, and the user, could be left waiting for their component.
At the Usenix Symposium on Networked Systems Design and Implementation, researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new system for allocating bandwidth in data center networks. In tests, the system maintained the same overall data throughput, or network "throughput," as those currently in use, but allocated bandwidth much more equitably, completing the download of all page components up to four times faster.
"There are easy ways to maximize performance in a way that divides the resource very unevenly," says Hari Balakrishnan, the Fujitsu professor of electrical engineering and computer science and one of the two lead authors on the paper describing the new system. "What we've shown is a way to converge very quickly toward a good allocation.".
Most networks regulate data traffic using some version of the Transmission Control Protocol, or TCP. When traffic becomes too heavy, some data packets fail to reach their destinations. With TCP, when a sender realizes its packets aren't being received, it reduces its transmission rate and then slowly increases it again. Given enough time, this process will reach a point of equilibrium where network bandwidth is optimally allocated among senders.
But in the data center of a large website, there often isn't enough time. "Things change on the network so rapidly that this isn't adequate," Perry says. "It often takes so long that [the transmission rates] never converge, and it's a lost cause."
TCP places the entire responsibility for traffic regulation on end users because it was designed for the public internet, which links thousands of smaller, independently operated networks. Centralizing control of such a vast network seemed impossible, both politically and technically.
But in a data center, which is controlled by a single operator, and with the increases in data connection speeds and computer processors over the past decade, centralized regulation has become practical. CSAIL's researchers' system is a centralized system.
The system, called Flowtune, essentially adopts a market-based solution for bandwidth allocation. Operators assign different values to increases in the transmission rates of data sent by different programs. For example, doubling the transmission rate of the image in the center of a web page might be worth 50 points, while doubling the transmission rate of analytical data that is reviewed only once or twice a day might be worth only 5 points.
Supply and Demand:
As in any good market, each link in the network sets a "price" according to "demand"—that is, according to the amount of data that senders collectively want to transmit. For each pair of sending and receiving computers, Flowtune calculates the transmission rate that maximizes the "total profit," or the difference between the value of the increased transmission rates (50 points for the image versus 5 for the analytics data) and the price of the bandwidth required across all intermediate links.
Profit maximization, however, changes the demand across the links, so Flowtune continuously recalculates prices and, based on that, recalculates the maximum profits, allocating the resulting transmission rates to the servers that send data across the network.
The paper also describes a new procedure the researchers developed to allocate Flowtune's calculations across the cores of a multi-core computer to increase efficiency. In the experiments, researchers compared Flowtune to a widely used TCP variant, using real-world data center data.
Depending on the dataset, Flowtune completed the slowest 1 percent of data requests nine to eleven times faster than the existing system.
By Larry Hardesty, MIT News Office
