Machine Learning

Forms of Machine Learning

Supervised Learning
Agent is given representative, tagged training data, demonstrating an implicit function from input to output. Goal is to develop an an explicit function that approximates the function for generalized data. Note that the "function" can simply be a computational process.
Reinforcement Learning
No direct feedback on each decision, but some reward/punishment regarding the outcome that might be used to revise behavior next time. For example, agent may play many games of Pente against an opponent, and it see in the end whether it has won or lost.
Unsupervised Learning
No form of feedback; just input. Still, agent may be able to recognize some patterns in the input (e.g., detecting clusters).

Supervised Learning

Methodology: Training set vs. Test set
We do not wish to simply memorize the data set (or "overfit"). So presumption is that there is a large domain of inputs, and that a small, representative subset of that input is given as training data, but that evaluation will be performed based on a different test set drawn from the same distribution.

Input
The input can be any type of data. One common form is to assume that it is made of many distinct features, and treat it as a point in a high-dimensional metric space.

Output
When continuous, the learning problem is typical called regression. For example, classic "curve fitting" is an example, where we have discrete set of input points mapping real x to real y, and we want to find a function y = f(x) that closely approximates the data.

When output is discrete, the learning problem is called classification. For example, trying to do character recognition. A special case is Boolean classification (e.g., spam or not spam).

Hypothesis
The desired "function" is called a hypothesis. Sometimes we will have a very specific hypothesis space that we are considering (e.g., finding a cubic function that matches data).

Sometimes we are simply looking for a "consistent" hypothesis, that is, one that matches the given data. If there are multiple consistent hypotheses, principle of Ockham's razor is that we prefer the simplest of those hypotheses.

If we assume that the data is probabilistic, then we often want to choose a hypothesis h^* that is most probable given the data. That is

h^* = argmax_{h ∈ H} P(h | data).

By Bayes' rules, this is equivalent to

h^* = argmax_{h ∈ H} P(data | h) * P(h).

Linear Classifiers
Given a Boolean classification problem with multidimensional data, one of the simplest hypothesis spaces is that of linear separators.

Perceptron Algorithm (Rosenblatt, 1959)
PerceptronLearning[M₊, M_-]
w = arbitrary vector of real numbers
Repeat
     For all x ∈ M₊
         If w·x ≤ 0 Then w = w + x
     For all x ∈ M_-
         If w·x > 0 Then w = w - x
Until all x ∈ M₊ ∪ M_- are correctly classified r

Theorem (Novikoff, 1962)
If the data set is linearly separable, the algorithm is guaranteed to converge in a finite number of steps.

Hands-on lab: implementation of the algorithm, and experimentation

Experiment

Michael Goldwasser

Last modified: Thursday, 31 October 2013