Machine Learning

Forms of Machine Learning


Supervised Learning

Methodology: Training set vs. Test set
We do not wish to simply memorize the data set (or "overfit"). So presumption is that there is a large domain of inputs, and that a small, representative subset of that input is given as training data, but that evaluation will be performed based on a different test set drawn from the same distribution.

Input
The input can be any type of data. One common form is to assume that it is made of many distinct features, and treat it as a point in a high-dimensional metric space.

Output
When continuous, the learning problem is typical called regression. For example, classic "curve fitting" is an example, where we have discrete set of input points mapping real x to real y, and we want to find a function y = f(x) that closely approximates the data.

When output is discrete, the learning problem is called classification. For example, trying to do character recognition. A special case is Boolean classification (e.g., spam or not spam).

Hypothesis
The desired "function" is called a hypothesis. Sometimes we will have a very specific hypothesis space that we are considering (e.g., finding a cubic function that matches data).

Sometimes we are simply looking for a "consistent" hypothesis, that is, one that matches the given data. If there are multiple consistent hypotheses, principle of Ockham's razor is that we prefer the simplest of those hypotheses.

If we assume that the data is probabilistic, then we often want to choose a hypothesis h* that is most probable given the data. That is

h* = argmaxh ∈ H P(h | data).
By Bayes' rules, this is equivalent to
h* = argmaxh ∈ H P(data | h) * P(h).

Linear Classifiers
Given a Boolean classification problem with multidimensional data, one of the simplest hypothesis spaces is that of linear separators.

Perceptron Algorithm (Rosenblatt, 1959)
PerceptronLearning[M+, M-]
w = arbitrary vector of real numbers
Repeat
     For all x ∈ M+
         If w·x ≤ 0 Then w = w + x
     For all x ∈ M-
         If w·x > 0 Then w = w - x
Until all x ∈ M+ ∪ M- are correctly classified r

Theorem (Novikoff, 1962)
If the data set is linearly separable, the algorithm is guaranteed to converge in a finite number of steps.

Hands-on lab: implementation of the algorithm, and experimentation

Experiment


Michael Goldwasser
Last modified: Thursday, 31 October 2013