Date: Monday March 29, 4:30pm Speaker: Beg Gum Topic(s): Estimating the Maximum Value of a Large Dataset by Sampling. Abstract: Sampling has long been used in the field of Statistics as a method for estimating properties of large data sets, such as mean and variance. Recently sampling has emerged as a powerful technique in computer science. Randomized algorithms using sampling have been used to find faster, simpler algorithms for well-known problems in P, such as Minimal Spanning Tree. In addition, PTASs (Polynomial Time Approximation Schemes) have been found using randomization for NP-hard problems. In this result, we apply the technique of sampling to give a non-trivial estimate of the maximum value of a large dataset. Instead of assuming a Normal Distribution or other common distribution as is common in statistical work, we allow the adversary to select a worst-case dataset. We give a simple algorithm which gives an estimate of the maximum based on the sample. We consider our estimate a success, if our estimate does not exceed the actual maximum of the dataset. We show that our algorithm will give a successful estimate with probability at least 1/e + 1/k^2 for any dataset, where k is the size of our sample. We also show that this bound is tight by exhibiting a dataset for which our algorithm fails with probability 1/e.