Date:    Monday March 29, 4:30pm
Speaker: Beg Gum


Topic(s): 
Estimating the Maximum Value of a Large Dataset by Sampling.


Abstract: 
Sampling has long been used in the field of Statistics as a method for
estimating properties of large data sets, such as mean and variance.
Recently sampling has emerged as a powerful technique in computer science.
Randomized algorithms using sampling have been used to find faster,
simpler algorithms for well-known problems in P, such as Minimal Spanning
Tree. In addition, PTASs (Polynomial Time Approximation Schemes) have been
found using randomization for NP-hard problems.

In this result, we apply the technique of sampling to give a non-trivial
estimate of the maximum value of a large dataset.  Instead of assuming a
Normal Distribution or other common distribution as is common in
statistical work, we allow the adversary to select a worst-case
dataset.  We give a simple algorithm which gives an estimate of the
maximum based on the sample.  We consider our estimate a success, if our
estimate does not exceed the actual maximum of the dataset.  We show that
our algorithm will give a successful estimate with probability at least
1/e  +  1/k^2 for any dataset, where k is the size of our sample.  We also
show that this bound is tight by exhibiting a dataset for which our
algorithm fails with probability 1/e.