About Me Joshua Silver 4th year CS major – graduating in May
20 Slides997.50 KB
About Me Joshua Silver 4th year CS major – graduating in May Specialization: Databases Interests: The business side of computing and no, not IT How can companies use technology to improve and enable their business Think Enterprise Web 2.0, mobile strategies, viral promotion on the internet, Netflix recommendation engine, e-commerce, etc. etc. Startups!
Sleepers & Workaholics Caching Strategies in Mobile Computing Authors: Dr. Daniel Barbará and Dr. Tomasz Imielinski Presented by: Joshua Silver, Fall 2008
Sleepers & Workaholics Caching Strategies in Mobile Computing Dr. Daniel Barbará Professor at George Mason University Several patents associated with mobile caching Dr. Tomasz Imielinski Professor at Rutgers University Senior VP: Search Technology at Ask.com
The Big Picture Problem Wireless devices have limited bandwidth, limited storage, and limited battery life To save power, devices go offline Mobile devices appear randomly in new cells Makes data caching difficult since server can’t track client caches
Then and now Paper written in 1994 Devices, bandwidth, battery limitations are different Essential problem still exists
With an explosion of wireless devices, the problem is even greater 24 Million in 1994 240 Million in 2008 and that doesn’t even take into account proprietary handheld units (like UPS driver delivery computers , Amazon Kindles, grocery store handheld scanners, etc.) Source: CTIA—The Wireless Association. http://www.infoplease.com/ipa/A0933563.html
Why Caching is Important Conserve: 1. Computational resources 2. Battery life 3. Network bandwidth Can’t store entire dataset on handheld. -US maps on GPS unit -Delivery routes for UPS drivers -Contact list on Blackberry
Traditional Strategies Fail In a traditional client-server model: the server keeps track of client caches pushes only the changes/sends cache invalidation messages BUT . Server lacks knowledge of: Which units are in its cell Which units are powered ON Quintessential problem: Client caches in a mobile environment cannot be tracked by a server
The Solution Purpose: " to propose a taxonomy of different cache invalidation strategies and study the impact of clients' disconnection times on their performance." Sleepers & Workaholics proposes a few solutions and evaluates their effectiveness with mathematical rigor
Evaluation Criteria Complicated math! . The paper’s appendices have details. Essentially: Define two types of Mobile Units Sleepers (offline/off all the time) Workaholics (never go offline) Almost all real world devices fall in between How do you compare? Normalize by defining “hit ratio” since it affects overall throughput valid cache hits HX total data size
Strategies to Evaluate Proposed Strategies: Timestamps (TS) Amnesic Terminals (AT) – like amnesia) Signatures (SIG) Control Strategy: No Cache (NC) (only remembering part
Timestamps -Each cache entry has a timestamp -Synchronous, history based, uncompressed in nature SERVER: Communicates with clients every n seconds (and retries until successfully connected) Sends a list of items and their associated timestamps (to accommodate for potential delay in transmission) CLIENT: For each item in cache: If entry is in received report from server, purge from cache If NOT in report, simply update timestamp to current time
Amnesic Terminals -Each cache entry has a identifier -ALSO Synchronous, history based, uncompressed in nature SERVER: Notify clients of identifiers of items changed since the last invalidation report. CLIENT: For each item in cache: If in report, purge from cache If NOT in report, do nothing ALSO, if enough time has elapsed, drop WHOLE cache and rebuild completely.
Signatures -Checksums calculated over value of data to form Signature -Since the mobile unit does not have entire database, need an algorithm to compute a partial checksum – see the appendix -Signatures combined using XOR -Synchronous, state based, compressed reports SERVER: Server broadcasts the set of combined signatures CLIENT: Item in cache is declared invalid if it belongs to “too many” unmatching signatures (suspected of being out of date)
No Cache There is no cache SERVER: Responds to direct queries from the client with appropriate information CLIENT: Query the database directly anytime item is needed
Conclusions on Effectiveness Strategy depends on circumstances: Signatures best for long sleepers, when the disconnection period is long and difficult to predict Timestamps best for query-intensive scenarios, when the rate of queries is greater than the rate of updates, provided that units are not workaholics Amnesiac Terminals is best for workaholics, units that are awake most of the time
Still not satisfied . how can we improve effectiveness? Only 2 options: 1. Update less often or 2. Send less info
Relax the Consistency of the Cache Depending on data type, data may not need to be exact EX: stocks, weather, etc. Allow to vary by a set tolerance (like .05% for stock prices, outdated weather reports by 2 hours, etc) Makes shorter invalidation reports possible
How Do We Decide to Update? - Consider cached copies to be quasicopies - Each quasi-copy has a coherency condition attached to it Coherency Conditions: Delay Condition - updated based on time Arithmetic Condition - updated based on difference between data and quasi-copy
Criticism Which resources are most scarce is not really still accurate (eg. bandwidth better than predicted, longer battery life) Units rarely powered down Battery life better than predicted Battery life does not dictate use patterns reception does also Units still lose reception frequently Today’s most common “sleeper” condition -- explicitly excluded from definition in S&W