Web Search Using Automated Classification

by Chandra Chekuri, Michael H. Goldwasser, Prabhakar Raghavan, and Eli Upfal

Abstract: We study the automatic classification of Web documents into pre-specified categories, with the objective of increasing the precision of Web search. We describe experiments in which we classify documents into high-level categories of the Yahoo! taxonomy, and a simple search architecture and implementation using this classification. The validation of our classification experiments offers interesting insights into the power of such automatic classification, as well as into the nature of Web content. Our research indicates that Web classification and search tools must compensate for artifices such as Web spamming that have resulted from the very existence of such tools.
Keywords: Automatic classification, Web search tools, Web spamming, Yahoo! categories.

Michael Goldwasser