Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld
14 Slides463.00 KB
Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011
Shopping distribution What properties do your distributions have?
Testing closeness of two distributions: Transactions in California Transactions in New York trend change?
Testing Independence: Shopping patterns: Independent of zip code?
This work: Many distributions
One distribution: D is arbitrary black-box D distribution over [n], generates iid samples. samples Sample complexity in terms Test Pass/Fail? of n? (can it be sublinear?)
Some answers Uniformity (n1/2) [Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08] Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08] Independence O(n12/3 n21/3), (n12/3 n21/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work Entropy n1/β 2 o(1) [Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08] Support Size (n/logn) [Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10] Monotonicity on total order (n1/2) [Batu, Kumar, Rubinfeld 04] Monotonicity on poset n1-o(1) [Bhattacharyya, Fischer, Rubinfeld, Valiant 10]
Collection of distributions: Further refinement: Known or unknown distribution on i’s? D1 D2 Dm samples Two models: Sampling model: Get (i,x) for random i, x Di Query model: Get (i,x) for query i and x Di Test Pass/Fail? Sample complexity in terms of n,m?
Properties considered: Equivalence All distributions are equal Clusterability’’ Distributions can be clustered into k clusters such that within a cluster, all distributions are close
Equivalence vs. independence Process of drawing pairs: Draw i [m], x Di output (i,x) Easy fact: (i,x) independent iff Di‘s are equal
Results Also yields “tight” lower bound for independence testing Def: (D1, Dm) has the Equivalence property if Di Di' for all 1 i, i’ m. Lower Bound Upper Bound n m (n2/3m1/3) Unknown Weights Õ(n2/3m1/3) m n (n1/2m1/2) Õ(n1/2m1/2) Known Weights
Clusterability Can we cluster distributions s.t. in each cluster, distributions (very) close? Sample complexity of test is O(kn2/3) for n domain size, k number of clusters No dependence on number of distributions Closeness requirement is very stringent
Open Questions Clusterability in the sampling model, less stringent notion of close Other properties of collections? E.g., all distributions are shifts of each other?
Thank you