The Internet has changed the way people perceive computers, communicate and do business. Our mission is research and teaching in two directions (more details on our Projects page):
Our paper “DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments” was accepted at USENIX ATC ’13! Dejan Novakovic will present the work in San Jose in June. Below is the paper abstract:
Cloud computing in general, and Infrastructure-as-a-Service (IaaS) in particular, are becoming ever more popular. Unfortunately, performance interference (and the resulting unpredictability in the delivered performance) across virtual machines (VMs) co-located on the same physical machine (PM) threatens to make cloud computing inadequate for performance-sensitive customers and more expensive than necessary for all customers.
We describe the design and implementation of DeepDive, a system for transparently identifying and managing interference. DeepDive successfully addresses several important challenges, including lack of performance information from applications, and large overhead of detailed interference analysis. We first show that it is possible to use easily-obtainable, low-level metrics to clearly discern when interference is occurring and what resource is causing it. Next, using realistic workloads, we demonstrate that DeepDive quickly learns about interference across co-located VMs. Finally, we show DeepDive’s ability to deal efficiently with interference when it is detected, by using a low-overhead approach to identifying a VM placement that alleviates interference.
The increasing adoption of Software Defined Networking, and OpenFlow in particular, brings great hope for increasing extensibility and lowering costs of deploying new network functionality. A key component in these networks is the OpenFlow agent, a piece of software that a switch runs to enable remote programmatic access to its forwarding tables. While testing high-level network functionality, the correct behavior and interoperability of any OpenFlow agent are taken for granted. However, existing tools for testing agents are not exhaustive nor systematic, and only check that the agent’s basic functionality works. In addition, the rapidly changing and sometimes vague OpenFlow specifications can result in multiple implementations that behave differently.
This paper presents SOFT, an approach for testing the interoperability of OpenFlow switches. Our key insight is in automatically identifying the testing inputs that cause different OpenFlow agent implementations to behave inconsistently. To this end, we first symbolically execute each agent under test in isolation to derive which set of inputs causes which behavior. We then crosscheck all distinct behaviors across different agent implementations and evaluate whether a common input subset causes inconsistent behaviors. Our evaluation shows that our tool identified several inconsistencies between the publicly available Reference OpenFlow switch and Open vSwitch implementations.
Please join me in congratulating Dr. Nedeljko Vasic on receiving an Honorable Mention in the 2012 EuroSys Roger Needham PhD Award competition for the best systems PhD in Europe. The committee said that Nedeljko’s Ph.D. Thesis narrowly missed out on the top prize after several weeks of intense deliberation. This is the first time EPFL is mentioned in this competition.
The recorded video of Marco’s presentation of DiCE at USENIX ATC’11 finally surfaced on the Web.
Effective resource management of virtualized environments that form the cloud computing backbone is a challenging task. State-of-the-art managements systems either rely on analytical models or testing different resource allocations by running actual experiments. Both approaches face a significant overhead once the workload changes. We introduce DejaVu – a framework that accelerates resource allocation in virtualized environments. DejaVu achieves more than a factor of 10 speedup in adaptation time for each workload change relative to the state-of-the-art. By enabling quick adaptation, DejaVu saves up to 60% of the service provisioning cost. Dejan will present our work at ASPLOS in London in March 2012.
Power consumption of today’s datacenters is already significant and threatens to shortly hit the power wall – it is getting progressively harder to supply datacenter equipment with sufficient energy for power and cooling. We tackle this problem by proposing REsPoNse, a framework that effectively tries to achieve the energy-proportionality in both Internet and datacenter networks. The insight in REsPoNse is to identify a few energy-critical paths off-line, install them into network elements, and use a simple online element to redirect the traffic in a way that enables large parts of the network to enter a low-power state. Nedeljko presented this work at CoNEXT in December 2011.
Are the bugs in your OpenFlow application keeping you up all night? Today, despair no more!
As promised in our upcoming NSDI paper, we are releasing the first public version (0.7) of NICE, our tool for testing OpenFlow applications for the popular NOX controller platform.
Our paper A NICE Way to Test OpenFlow Applications has been accepted at NSDI 2012 (joint work with Jennifer Rexford from Princeton University).
The emergence of OpenFlow-capable switches enables exciting new network functionality, at the risk of programming errors that make communication less reliable. The centralized programming model, where a single controller program manages the network, seems to reduce the likelihood of bugs. However, the system is inherently distributed and asynchronous, with events happening at different switches and end hosts, and inevitable delays affecting communication with the controller. In this paper, we present efficient, systematic techniques for testing unmodified controller programs. Our NICE tool applies model checking to explore the state space of the entire system—the controller, the switches, and the hosts. Scalability is the main challenge, given the diversity of data packets, the large system state, and the many possible event orderings. To address this, we propose a novel way to augment model checking with symbolic execution of event handlers (to identify representative packets that exercise code paths on the controller). We also present a simplified OpenFlow switch model (to reduce the state space), and effective strategies for generating event interleavings likely to uncover bugs. Our prototype tests Python applications on the popular NOX platform. In testing three real applications—a MAC-learning switch, in-network server load balancing, and energy-efficient traffic engineering—we uncover eleven bugs.