VL2: A Scalable and Flexible Data Center Network Greenberg, et. al.
27 Slides9.19 MB
VL2: A Scalable and Flexible Data Center Network Greenberg, et. al. SIGCOMM ‘09
Some refresher Layer 2 (Data link layer) Addressing: MAC address Learning: Flooding Switched Minimum Spanning Tree Semantics: Unicast Multicast Broadcast Layer 3 (Network layer) Addressing: IP Address Learning: Routing protocol Dynamic BGP OSPF Static Routed ARP Discovery protocol Ties IP back to MAC Utilizes layer 2 broadcast semantics 2
Fat tree VL2, written by Microsoft Research and presented by Sargun Dhillon https://www.youtube.com/watch?v k6dEK0oz0gg 3
Layer 2 Learning VL2, written by Microsoft Research and presented by Sargun Dhillon https://www.youtube.com/watch?v k6dEK0oz0gg 4
Minimum Spanning Tree VL2, written by Microsoft Research and presented by Sargun Dhillon https://www.youtube.com/watch?v k6dEK0oz0gg 5
Layer 3 Routing VL2, written by Microsoft Research and presented by Sargun Dhillon https://www.youtube.com/watch?v k6dEK0oz0gg 6
Architecture of Data Center Networks (DCN) 7
Conventional DCN Problems CR S AR AR S S CR 1:240 I want 1:80 S S more S 1:5 A A A A A A Static network assignment Fragmentation of resource . S A AR AR S S I have spare ones, S but S A A A S A A Poor server to server connectivity Traffics affects each other Poor reliability and utilization Hakim Weatherspoon, High Performance Systems and Networking Lecture Slides https://www.cs.cornell.edu/courses/cs5413/2014fa/lectures/09-vl2.pptx 8
Objectives Uniform high capacity: Maximum rate of server to server traffic flow should be limited only by capacity on network cards Assigning servers to service should be independent of network topology Performance isolation: Traffic of one service should not be affected by traffic of other services Layer-2 semantics: Easily assign any server to any service Configure server with whatever IP address the service expects VM keeps the same IP address even after migration 9
Measurements and Implications of DCN (1) Data-Center traffic analysis: Traffic volume between servers to entering/leaving data center is 4:1 Demand for bandwidth between servers growing faster Network is the bottleneck of computation 10
Measurements and Implications of DCN (2) Flow distribution analysis: Majority of flows are small, biggest flow size is 100MB The distribution of internal flows is simpler and more uniform 50% of node averaged 10 concurrent flows, but 5% of flow is greater than 80 concurrent flows 11
Measurements and Implications of DCN (3) Traffic matrix analysis: Poor summarizing of traffic patterns Instability of traffic patterns The lack of predictability stems from the use of randomness to improve the performance of datacenter applications 12
Measurements and Implications of DCN (4) Failure characteristics: Pattern of networking equipment failures: 95% of failures resolved within 1min, 98% 1hr, 99.6% 1 day, 0.09% 10 days No obvious way to eliminate all failures from the top of the hierarchy 13
Virtual Layer 2 Switch (VL2) Design principle: Randomizing to cope with volatility: Using Valiant Load Balancing (VLB) to do destination independent traffic spreading across multiple intermediate nodes Building on proven networking technology: Using IP routing and forwarding technologies available in commodity switches Separating names from locators: Using directory system to maintain the mapping between names and locations Embracing end systems: A VL2 agent at each server 14
Topology 15
Addressing and Routing (1) Packet forwarding VL2 agent to trap packet and encapsulates it with LA routing Address resolution Servers virtually in a same subnet ARP is captured by agent to redirected as unicast request to directory server Directory server replied with routing info (which then cached by agent) Access Control via directory server 16
Addressing and Routing (2) Valiant Load Balancing (VLB) for random route Int Int When going up, choose random node Equal Cost Multi Path Forwarding (ECMP) for fail-safe ToR ToR Assign the same LA to multiple node When a node fail, use IP anycast to route to backup node Srv Srv TCP for congestion control 17
Directory System 18
Backwards compability Interaction with hosts in the internet External traffic can directly flow without being forced through gateway servers to have their headers rewritten Server that needs to be reachable externally are assigned an LA in addition to AA for direct communication Handling broadcast ARP is replaced by Directory System DHCP is intercepted by agent and sent as unicast tx directly to DHCP server 19
Evaluations (1) Uniform high capacity: All-to-all data shuffle stress test: 75 servers, deliver 500 MB Maximal achievable goodput is 62.3 VL2 network efficiency as 58.8/62.3 94% (of maximum achievable goodput) 20
Evaluations (2) Fairness: 75 nodes Real data center workload Plot Jain’s fairness index for traffics to intermediate switches 21
Evaluations (3) Performance isolation: Two types of services: Service one: 18 servers do single TCP transfer all the time Service two: new server starts every 2 seconds, each transmits 8 GB transfer over TCP on start, up to total of 19 servers 22
Evaluations (4) Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation switches 23
Evaluations (5) Directory System’s performance 40K lookups/s with response time 1 ms 0.06% from entire server is needed for worst case scenario Convergence latency is within 100 ms for 70% of the updates and 530 ms for 99% 24
Evaluations (6) Compare VLB with: Adaptive routing (e.g., TeXCP) as upper bound Best oblivious routing (VLB is one of oblivious routing) VLB (and best oblivious) is close to adaptive routing scheme in terms of link utilization 25
Summary VL2 consolidates layer 2 and 3 into single “virtual layer-2” Key design: Layer-2 semantics - flat addressing Uniform high capacity - guaranteed bandwidth using VLB Performance isolation - enforce hose traffic model using TCP 26
Questions? With the bipartite graph between aggregate and intermediate switches, is there a limit to the network size without oversubscription? The VLB design guarantees bandwidth without over-subscription, and the author claims that network size can be scaled as many as the budget allowed Given DI number of port in intermediate switches, and DA number of port on aggregate switches, the maximum number of server is 20(DA*DI/4) How is the implementation of VL2 compared to today's (2018) state of the art datacenter network design? Today’s network designs are focused on software defined networking Since there is one more layer that handle the routing, will it become the new bottleneck of this system if every request is small but the number of request is very large? Unanswered 27