CS 268: Project Suggestions Scott Shenker and Ion Stoica January
51 Slides351.00 KB
CS 268: Project Suggestions Scott Shenker and Ion Stoica January 24, 2005
Next Two Lectures Wednesday: - Clark: “The Design Philosophy .” - Saltzer, Reed, and Clark: “End-to-end arguments ” Monday: - Cerf and Kahn: “A Protocol for ” Remember to do your summaries! Reading list will be finalized over weekend . 2
Overview Will present 35 short project suggestions Legend: based on how well-defined projects are, not necessary how difficult they are - Well-defined projects - Less-defined project - You need to define project’s goals Need to send us a one page proposal by Feb 7 - Feel free to talk with us beforehand! 3
Outline Traditional networking Slightly nontraditional networking Distributed Hash Tables New Architectures and Paradigms Theory Identity-based Architectures 4
RED: Does it Really Help? Random Early Drop is the first and most widely used “active queue management” algorithm. Its goal is to promote fairness and decrease queue lengths (delays). Does it really help? There have been several contradictory papers on this. What is the real story? 5
Quickstart TCP vs XCP XCP (Katabi et al.) is a recent congestion control proposal (we’ll cover it later) that requires dramatic changes in TCP and routers Quickstart is a quick-and-dirty hack: http://www.icir.org/floyd/quickstart.html Is XCP significantly better? 6
Burst Switching Two main communication models - Datagrams: each packet is individually switched (routed) - Circuits: a circuit is set-up and all packets are forwarded Hybrid model: burst switching - First packet describes how many packets are in a burst - Router decides whether to forward all packets in the burst or none of them Research - Design a burst switching protocol and study its trade-offs 7
Interdomain Traffic Engineering Interdomain traffic engineering is a mess: - Ambiguous goals - Ad hoc techniques The best known paper on this is "Guidelines or Interdomain Traffic Engineering" by Feamster et al. Can one come up with a specification language and a coherent set of mechanisms? 8
Redirection/Selection Strategies Akamai redirection uses domain information to redirect requests Recent work on server selection uses network coordinates (GNP, Vivaldi) Are coordinates significantly better? 9
Slightly Nontraditional Networking
Disjoint Paths vs Waypoints Feedback based routing (Zhu et al.) uses disjoint paths to achieve resiliency Would using waypoints work better? - Easier (no need to find disjoint path) - More choices 11
Resiliency via Incast Send to set of waypoints (in mcast group): Each waypoint forwards data toward rcvr Incast boxes (one or more along path) strip out extra redundancies (configurable parameter) How reliable does that make delivery? What is a coherent architecture for this? - i3, DOA, etc.? 12
Negation Routing Recent proposal for Packet Obituaries (Argyraki et al., Hotnets 2004) gives feedback on which AS is dropping packets Additional proposal (being written up) allows one to do “negation routing” by saying: “avoid this AS” What is the performance of this approach? 13
Sensornet Routing Point-to-point routing is hard in sensornets and other ad hoc networks In a static ad hoc network, one can build up a coordinate system by using recursive pairing Challenge: design such an algorithm and analyze its performance 14
Reconfigurable Directional Antennae Lots of interest in “mesh networking” - Many performance problems because of interference What if we had reconfigurable directional antennae instead of broadcast? Could quickly reconfigure “links” to produce good paths Design such a system and analyze it 15
Anycast as Evolution Mechanism [joint with Sylvia Ratnasamy] How can the Internet evolve? Need to give incentives for individual ISPs to deploy new versions of IP at least partially That requires having packet using IPvX being automatically forwarded to the nearest router supporting IPvX Interesting combination of technical and economic requirements 16
Distributed Hash Tables
DHTs Overview Each data item and machine (node) in the system has associated a unique ID in a large ID space Hash table like interface - put(id, data) - data get(id) ID space is partitioned among nodes Data items are stored at the node responsible for its ID 18
Example: Chord Circular ID space [0.2m-1] Consider two consecutive nodes on the ID circle with IDs N1 and N2, where N2 follows N1 - 2m-1 0 3 Node N2 is responsible for interval (N1, N2] 7 Node 35 inserts (37, data) Chord circle Node 3 queries data with ID 37 41 37 data 20 35 19
Location Control in DHTs Users have no control over where data items are located Advantages: - Users don’t need to know about individual nodes - System can recover in case of failure without user involvement Disadvantages: - Not efficient - Enforces uniform trust model Research: design a system in which users have “some” degree of control on where data items are located - E.g., “I want my items to be located only on nodes in US” 20
Content Routing (DHTs?) Gritter and Cheriton proposed a technique for “Content Routing” This was before the days of DHTs Would DHT technology (put “on-path”) provide a better solution to this problem? 21
Circle Checks for DHTs DHTs provide no correctness guarantees However, there may be ways to check whether the DHT is giving consistent results - Based on a circulating packet Flesh out this design, and analyze Generalize this to other problems: - Don’t ensure correctness, but check it - Can one evade the CAP theorem? 22
DHT as Library vs DHT as Service Traditionally DHTs have been used as libraries - Lots of flexibility - But requires separate DHT for each application Can also use DHT as service - Rigid interface - ReDiR is client-side library that helps make this more effective OpenDHT (www.opendht.org) is a public DHT service 23
OpenDHT Projects!!! Reliable mcast (srm, wb) Top k clients Coral (NYU project) over OpenDHT - Just location-aware store Suspend/Resume (IRP project, but on DHTs) Auxiliary boxes: - NAT traversal (Skype?) - Permanent store 24
Peering DHTs For a DHT service to make sense, it needs to be commercially viable That means that there must be a way to provide a DHT-dialtone Walfish et al. have a proposal for peering DHTs Many details need to be ironed out, and its performance needs to be analyzed. 25
New Architectures and Paradigms
DoS Prevention DoS Resilient Architecture [http://www.cs.ucl.ac.uk/staff/M.Handley/papers/dos-arch.pdf] - Separate clients from servers - Only servers can be directly contacted - Clients can be contacted only if it allows this explicitly Research: - Other alternatives to implement such architecture? - How well can you do in the context of the current Internet? Note: can use DOA, i3 like architectures 27
Checkable Protocols Protocols that check correctness but do not guarantee it, e.g., - ECN-nonce [http://www.cs.ucsd.edu/ savage/papers/ICNP01.pdf] - Listen and Whisper [http://www.cs.berkeley.edu/ lakme/listenwhisper.pdf] - SV-CSFQ [http://citeseer.ist.psu.edu/stoica02selfverifying.html] Develop other applications, e.g., - Differentiated services: make differentiated service more robust to malicious/misconfigured ingress nodes 28
Annotation Layer Many protocols require nodes along path to exchange information: - IP traceback, quality of service, authentication Today’s solutions not good enough - Hijack bits in the packet header (e.g., fragment offset) - IP options: slow to process Proposed solution: - Insert annotation layer between network and transport Produce design and examples 29
Theory
CAP vs CAS The famous CAP theorem (easy to read) states that one cannot achieve: - Consistency - Availability - Ability to function while Partitioned Partitioning is no longer necessary What we really care about is C, A, and Scaling! Can we formulate and prove a CAS theorem? 31
Overlay Routing Assume - A network topology T - A routing algorithm running on top of T - You control a fraction f of nodes in T Question: - How well can you approximate an “arbitrary” routing metric as a function of f and topology T ? Example: - T uses # of hops to implement shortest path - You know delay distributions along links in T - How well can you approximate lowest latency routing metric assuming a power-law topology and f 10%? 32
Geographic Routing Consider a stationary ad hoc network Design a compact routing scheme (small routing tables) Require that this scheme have low incremental costs when nodes and links come/go Is geographic routing the only such scheme? 33
Identity-Based Architectures
“ID-based” Architectures Decouple the identity of an end-host/service from its address At transport level, sender sends packet to an ID, not an address Examples - Delegation Oriented Architecture (DOA) [ http://nms.lcs.mit.edu/doa/] - Host Identity Protocol (HIP) [http://www.ietf.org/html.charters/hip-charter.html] - Internet Indirection Infrastructure (i3) [http://i3.cs.berkeley.edu/] 35
“ID-based” Architectures (cont’d) Both the sender and receiver should be able to explicitly control the service-path Example: - Sender wants its packets to go through S1 before they reach destination - Receiver wants all packets to go through S2 before it receives them Service/Middlebox (S1) Service/Middlebox (S2) Internet Receiver Indirection point Sender sender control receiver control 36
Example realization in i3 and DOA ([idS1,idR], data) S2 S1 i3 ids1 S1 idR [idS2,R] idS2 S2 R Internet i3 Name resolution infrastructure (e.g., OpenDHT) idS1 idS1 S1 S1 S2 S1 DOA idR R idS2 ([idS1,idS2], data) idS2 S2 idR Internet R idS2 idR S2 R 37
Authentication and Encryption Resolution: name IDs - Use DNS to return receiver’s ID and eventually its public key (HIP) Slow update; as secure as DNS - Use an address book (i3) Secure but hard to maintain - OpenDHT Highly scalable; security needs to be addressed Authentication and encryption - Public key cryptography (HIP, i3) Research - Use Identity Based Encryption (IBE); other alternatives? - Study trade-offs between various techniques 38
Signaling Protocol for Middleboxes Design a signaling protocol to accommodate middleboxes/services Research issues: - Authentication of middleboxes - Transparent recovery: when one middlebox fails another equivalent middlebox can take over Challenge: recovery transparent to end-hosts at transport layer 39
ID based Transport Protocols Design a congestion control mechanism (e.g. TCP) such that it is possible to change the receiving machine in the middle of the transfer! Scenario: - A and B open a connection (A receiver; B source) - A changes to A’ - B continues to send data to A’ without creating a new connection Research: refactor transport such that - Congestion control state binds to address - Data transfer state binds to ID 40
Service Differentiation via Middleboxes Problem: flow isolation (UDP can kill TCP) TCP UDP Solution outline: Run RR Application level congestion control TCP UDP 41
Anonymous File Sharing IDs may be chosen such that not to reveal endhost identity - E.g., pick random IDs Sender doesn’t know the IP address of receiver You can simply use web to share files! Questions: - Anonymous search engine - Anonymity vs. performance 42
Event Notification System Users specify events in which they are interested as a conjunction of attributes, e.g., - (stock “msr”) and (share price 60) - (source “Berkeley”) and (destination “North Lake Tahoe”) and (time 3.5 hours) Research: create an efficient delivery tree - Users with the same interest grouped under the same tree - Users in the same geographic region grouped under the same tree 43
Anycast / Service Location 1) Direct a client to a nearby server/proxy Two alternatives: Unmodified client - 2) Make selection based of the DNS request (at the DNS server) similar to Akamai Modified client - Select a “good” server/proxy in the context of OpenDHT or i3 Consider both the proximity and load 44
Efficient Multi-Level Naming The LFN proposal requires several layers of names. Done naively, this requires many lookups for a single transaction Can one devise techniques, such as hints, writethrough, etc. to make this performance adequate? 45
Internet-Scale XML Dissemination Service [Due to Yanlei Diao – next 5 slides] YFilter: Specification of data interests using an XML query language Data Source Data Source Data Source user queries XML streams YFilter query results User queries: Specification of data interests using an XML query language Data sources: Continuously publish XML data items The service: Delivers to each user the XML data items that match her data interests. The results are presented in a format required by the user Applications: News feeds, stock tickers, sports tickers, mobile services, 46 online auction, network monitoring.
ONYX: Large-Scale XML Dissemination Data Source Data Source Data Source Broker Broker - A dedicated network - Peer-to-peer - Collaboration among administrative domains Broker YFilter Broker U1 U2 Broker Broker U3 ONYX: Operator Network using YFilter for XML Dissemination An overlay network of information brokers running YFilter Underlying infrastructures: U4 U5 47
Content-based Routing Data Source data flow Broker 1 Broker 2: Broker 2 Q4: query flow Broker 4 Broker 3 Broker 5 Broker 2: /article/[@edition.area “NY”] Broker 4: /article/[@edition.area “SF”] Broker 5: /article/subject[@type “Stock”] or /article/subject[@matter “fishing”] Broker 6: Broker 6 /nitf/series[@series.name “Tide Forecasts”] Q5: Q1: /article[@edition.area “SF”]/subject[@type “Stock”] /article[@edition.area “SF”] Q1: Q3: /article[@edition.area “SF”] Q3: /article[@edition.area “SF”] /subject[@type “Stock”] /series[@series.name “Tide Forecasts”] /series[@series.name “Tide Forecasts”] Q2: /article[@edition.area ”SF”] Q2: /article[@edition.area ”SF”]/subject[@matter “fishing”] /subject[@matter “fishing”] 48
Content-based Routing with I3? Use i3 as an alternative to content-based routing Basic approach: - On queries: Function Fq: {Qi} - {IDj} - On documents: Function Fd: {Di} - {IDj1, Idj2, , Idjn} Goal: an adaptive algorithm to balance false positives delivered and routing speed. Also, report strengths and weaknesses of using i3 49
Dissemination of Results When every user requires customized results (e.g., my Yahoo! in a push-based fashion), how do you deliver the results? Bandwidth: - Peak load 5000 documents per second, 1KB each - 1 million queries, query selectivity is 10%, result size reduction factor 10%. - Total result size 5000 * (10 6 * 10%) * (1000*8*10% ) 400Gb/s - MSDN RSS is facing a similar problem! Connectivity: consider mobile users Solutions: - Unicast vs. multicast? - Fragmenting and merging results for sharing? - Caching for disconnected results? Let Microsoft know when you solve the problem! 50
Next Step You can either choose one of the projects we discussed during this lecture, or come up with your own Pick your partner, and submit a one page proposal by February 7. The proposal needs to contain: - The problem you are solving - Your plan of attack with milestones and dates - Any special resources you may need 51