DETERMINISTIC OPTIMIZATION

			  Conjugate Gradient

The Conjugate Gradient algorithm is probably the best deterministic
algorithm for high-dimensional problems.

The following is very popular book.  *Never* use any code from it!!
***NOT EVEN ONCE***!!!!  However, the text is not too bad, although it
does contain the occasional blooper and a great deal of omitted
material.  The chapter on ordinary differential equations is
particularly outdated, wedged in the '60s.

@Book		{PRESS-ETAL88A,
author	=	"William H. Press and Brian P. Flannery and Saul A. Teukolsky
		  and William T. Verrerling",
title	=	"Numerical Recipes in {C}",
publisher=	CAMBRIDGE,
year	=	1988
}

Here is a seminal reference.

@book		{JEA82,
author	=	"Kang Chang Jea",
title	=	"Generalized conjugate gradient acceleration of
		 iterative methods",
publisher=	Uof#"Texas at Austin, Center for Numerical Analysis",
year	=	1982,
note	=	"PhD thesis",
}

And here is a *great* paper, which really explains everything, and is
accessible even if you didn't finish an advanced numeric analysis
course just last week.

@TechReport	{SHEWCHUK94A,
author	=	"Jonathan Richard Shewchuk",
title	=	"An Introduction to the Conjugate Gradient Method Without the
		  Agonizing Pain",
institution=	CMU#" School of Computer Science",
year	=	1994,
note	=	"http://www.cs.cmu.edu/afs/cs/project/quake/public/papers/painless-conjugate-gradient.ps
		 or
		 ftp://warp.cs.cmu.edu/quake-papers/painless-conjugate-gradient.ps",
annote	=	"For figures suitable for transparencies and lectures, see
		  ftp://warp.cs.cmu.edu/quake-papers/painless-conjugate-gradient-pics.ps"
}


			   Newton's Method

For full second-order methods, variants of Levenberg-Marquat remain
state of the art.

Here are seminal refs.

@Article	{LEVENBERG44A,
author	=	"K. Levenberg",
title	=	"A method for the solution of certain problems in least
		  squares",
journal	=	"Quart. Appl. Math",
year	=	1944,
volume	=	2,
pages	=	"164--168"
}

@Article	{MARQUARDT63A,
author	=	"D. Marquardt",
title	=	"An algorithm for least-squares estimation of nonlinear
		  parameters",
journal	=	"SIAM J. Appl. Math",
year	=	1963,
volume	=	11,
pages	=	"431--441"
}

You wouldn't actually want to read those.  Instead look it up in any
numerical analysis textbook.  Eg.

@InBook		{MORE77A,
author	=	"J. J. More'",
title	=	"The {L}evenberg-{M}arquardt algorithm: implementation and
		  theory",
pages	=	"105--116",
booktitle=	"Numerical Analysis",
editor	=	"G. A. Watson",
series	=	"Lecture Notes in Mathematics 630",
year	=	1977,
publisher=	SPRINGER,
}

And here is a paper that gives an efficient algorithm for calculating
the diagonal of the Hessian of a determinist gradient system in O(n)
time, and proposes an optimization algorithm that makes use of this
information.

@inproceedings	{BECKER-LECUN88,
crossref=	"CMSS88",
author	=	"Sue Becker and Yann LeCun",
title	=	"Improving the Convergence of Back-Propagation Learning with
		  Second Order Methods",
pages	=	"29--37",
note	=	"Also published as Technical Report CRG-TR-88-5, "#Dof#CS#
		 ", "#Uof#"Toronto",
}


		       STOCHASTIC OPTIMIZATION

Fundamental results.

@article	{ROBINS-MONRO51,
author	=	"H. Robbins and S. Monro",
title	=	"A Stochastic Approximation Method",
journal	=	"Annals of Mathematical Statistics",
volume	=	22,
pages	=	"400--407",
year	=	1951,
}

(Russian mathematicians refer to some Russian guy who published the
same thing twenty years earlier in Russian.  I don't have the ref
handy.  They say the same thing about the FFT though.  Who knows...)

Theoretical demolishing of momentum.

@inproceedings	{SHYNK-ROY88,
author	=	"J. J. Shynk and S. Roy",
title	=	"The {LMS} algorithm with momentum updating",
booktitle=	"Proceedings of the IEEE International Symposium on
		 Circuits and Systems",
location=	"Helsinki, Finland",
month	=	jun#"~6--9",
year	=	1988,
pages	=	"2651--2654",
}

@article	{TUGAY-TANIK89,
author	=	"Mehmet Ali Tu\v{g}ay and Yal\c{c}in Tanik",
title	=	"Properties of the momentum {LMS} algorithm",
journal	=	"Signal Processing",
publisher=	"Elsevier Science Publishers B. V.",
volume	=	18,
number	=	2,
year	=	1989,
month	=	oct,
pages	=	"117--127",
}

@article	{SHYNK-ROY90,
author	=	"John J. Shynk and Sumit Roy",
title	=	"Convergence properties and stationary points of a Perceptron
		 learning algorithm",
journal	=	"Proceedings of the IEEE",
volume	=	78,
number	=	10,
pages	=	"1599--1604",
year	=	1990,
month	=	oct,
abstract= "If a Gaussian random vector is input into a Perceptron, the
           algorithm's stationary points are not unique and the step size mu
           and the momentum constant alpha determine the algorithm's
           behavior near convergence. A Perceptron is an adaptive linear
           neuron that produces one of two discrete values and is used as
           the basis for multilayer, feedforward neural networks. The
           least-mean-square adaptive algorithm is used to train a
           single-layer Perceptron by adjusting internal weights.",
}

A new analytical approach which simplifies the math amazingly, and
immediately extends many of the traditional results.

@Article	{SHARMA-ETAL96A,
author	=	"Rajesh Sharma and William A. Sethares and James A. Bucklew",
title	=	"Asymptotic analysis of stochastic gradient-based adaptive
		  filtering algorithms with general cost functions",
journal	=	ieeetrSP,
year	=	1996,
volume	=	44,
number	=	9,
pages	=	"2186--2194",
}

Here is some modern stochastic gradient stuff.  These resurect
momentum, using a more fine-grained analysis.

@InProceedings	{LEEN-ORR94A,
author	=	"Todd K. Leen and Genevieve B. Orr",
title	=	"Momentum and Optimal Stochastic Search",
crossref=	"NIPS6",
url	=	"ftp://speech.cse.ogi.edu/pub/neural/papers/leenOrr93.AdaptMom.ps.Z"
}

@InProceedings	{ORR-LEEN93A,
author	=	"G. B. Orr and T. K. Leen",
title	=	"Weight Space Probability Densities in Stochastic Learning:
		  II. Transients and Basin Hopping Times",
crossref=	"NIPS5",
url	=	"ftp://speech.cse.ogi.edu/pub/neural/papers/leenOrr92.stochastic.ps.Z"
}

@PhdThesis	{ORR95A,
author	=	"Genevieve B. Orr",
title	=	"Dynamics and Algorithms for Stochastic Search",
school	=	"Oregon Graduate Institute",
year	=	1995,
address	=	"Portland, OR",
url	=	"ftp://neural.cse.ogi.edu/pub/neural/papers/orrPhDch1-5.ps.Z,
		 ftp://neural.cse.ogi.edu/pub/neural/papers/orrPhDch6-9.ps.Z"
}

And here is a paper that gives an efficient algorithm for quickly
finding the product of the Hessian with an arbitrary vector.  It is
used a bit in the above.

@Article	{PEARLMUTTER94A,
author	=	"Barak A. Pearlmutter",
title	=	"Fast Exact Multiplication by the {H}essian",
journal	=	NeuComp,
year	=	1994,
volume	=	6,
number	=	1,
pages	=	"147--160",
FTPnote	=	"Ftp archive.cis.ohio-state.edu:
		  /pub/neuroprose/pearlmutter.hessian.ps.Z"
}

And here are some more papers that uses the above algorithm for
optimization.  Sorry for the overkill...

@Article	{MOLLER93A,
author	=	"Martin M{\o}ller",
title	=	"A Scaled Conjugate Gradient Algorithm for Fast Supervised
		  Learning",
journal	=	"Neural Networks",
year	=	1993,
month	=	jun,
volume	=	6,
number	=	4,
pages	=	"525--533",
}

@Article	{MOLLER93B,
author	=	"Martin M{\o}ller",
title	=	"Supervised Learning on Large Redundant Training Sets",
journal	=	"International Journal of Neural Systems",
year	=	1993,
volume	=	4,
number	=	1,
pages	=	"15--25",
}

@InBook		{MOLLER94B,
crossref=	"MOLLER94A",
title	=	"Adaptive preconditioning of the {H}essian matrix",
pages	=	"127--149"
}

@PhdThesis	{MOLLER94A,
author	=	"Martin M{\o}ller",
title	=	"Efficient Training of Feed-Forward Neural Networks",
school	=	"Comp. Sci. Dept., Aarhus University",
year	=	1994,
address	=	"Denmark",
note	=	"DAIMI PB-464"
}

@PhdThesis	{LEHR96A,
author	=	"Michael A. Lehr",
title	=	"Scaled Stochastic Methods for Training Neural Networks",
school	=	"Department of Electrical Engineering, Stanford University",
year	=	1996,
note	=	"Available from http://www-isl.stanford.edu/people/lehr/ or
		 ftp://simoon.stanford.edu/pub/lehr/thesis/.",
annote	=	"contains different derivation of R{backprop} Hessian stuff"
}

This paper proposed adapting the learning parameters.  Unfortunately,
in a batch context, ah well.  It's the basis for the more advanced
papers that follow it.

@article	{JACOBS88,
author	=	"Robert A. Jacobs",
title	=	"Increased rates of convergence through learning
		 rate adaptation",
journal	=	NNets,
publisher=	"Pergamon Press",
year	=	1988,
volume	=	1,
number	=	4,
pages	=	"295--307"
}

This paper takes a control-theoretic approach to stochastic
optimization.

@article	{DABIS-MOIR91,
author	=	"H. S. Dabis and T. J. Moir",
title	=	"Least mean squares as a control system",
journal	=	IJC,
publisher=	"Taylor and Francis Ltd",
year	=	1991,
volume	=	54,
number	=	2,
pages	=	"321--335",
}

I've saved the best for last.  Here is the current best linear
complexity stochastic optimization algorithm.  It is apparently
competitive with recursive least squares, the best second-order
quadratic complexity stochstic optimization algorithm!

@InProceedings	{SUTTON92B,
crossref=	"YWALS7",
author	=	"Richard S. Sutton",
title	=	"Gain Adaptation Beats Least Squares?",
pages	=	"161--166",
keywords=	"delta-bar-delta",
url	=	"ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92b.ps.gz"
}

@InProceedings	{SUTTON92C,
crossref=	"AAAI92",
author	=	"Richard S. Sutton",
title	=	"Adapting Bias by Gradient Descent: An Incremental Version of
		  Delta-Bar-Delta",
url	=	"ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92a.ps.gz"
}

@InProceedings	{GLUCK92A,
crossref=	"COGSCI92",
author	=	"Mark A. Gluck and Paul T. Glauthier and  Richard S. Sutton",
title	=	"Adaptation of Cue-Specific Learning Rates in Network Models of
		  Human Category Learning",
keywords=	"delta-bar-delta"
}


				  EM


Seminal ref.

@article	{DEMPSTER76,
author	=	"A. P. Dempster and N. M. Laird and D. B. Rubin",
title	=	"Maximum likelihood from incomplete data via the {EM}
		 algorithm",
journal	=	"Proceedings of the Royal Statistical Society",
year	=	1976,
pages	=	"1--38"
}

For more modern stuff, look up some papers from Mike Jordan's group
via http://www.ai.mit.edu/projects/jordan.html, their intros and refs
are excellent hooks into the literature.

		     Hidden Markov Models (HMMs)

@Article	{LEVINSON-ETAL83A,
author	=	"S. E. Levinson and L. R. Rabiner and M. M. Sondhi", 
title	=	"An Introduction to the Application of the Theory of
		  Probabilistic Functions of a Markov Process to Automatic
		  Speech Recognition",
journal	=	"Bell System Technical Journal", 
volume	=	62, 
number	=	4, 
month	=	apr, 
year	=	1983, 
pages	=	"1035--1074", 
comment	=	"Analysis of implementation of Hidden Markov models and the
		  Baum-Welch algorithm",
}

See also Peter Brown's PhD thesis (CMU) and Kai-Fu Lee's PhD thesis
(also CMU).  The latter has a particularly good intro to HMMs chapter,
and a lot of stuff on tricks that allowed his system to beat similar
efforts by much much larger much much more experienced groups.

(Sorry, I don't have url's or ISBNs.  I do have copies on my shelves...)


For fun, nothing beats this very provocative paper which views French
as English passed through a strange communication channel, uses a
large corpus (and the EM algorithm) to build a model of the channel,
and then does translation by finding the input to the channel most
likely to have produced the observed output.  It even mostly works!

@article	{BROWN-COCKE-ET-AL90,
author	=	"Peter Brown and John Cocke and Stephen Della Pietra
		 and Vincent J. Della Pietra and Fredrick Jelinek and
		 John D. Lafferty and Robert L. Mercer and Paul S. Roossin",
title	=	"A Statistical Approach to Machine Translation",
journal	=	"Computational Linguistics",
pages	=	"79--85",
volume	=	16,
number	=	2,
month	=	jun,
year	=	1990,
}


			  the Kalman Filter


@Article	{KALMAN60A,
author	=	"R. E. Kalman",
title	=	"A New Approach to Linear Filtering and Prediction Problems",
journal	=	"Trans. ASME "#Jof#"Basic Engineering",
year	=	1960,
month	=	mar,
volume	=	82,
number	=	1,
pages	=	"35--45"
}

The Extended Kalman Filter

@Article	{MEHRA70A,
author	=	"R. K. Mahra",
title	=	"On the Identification of Variances and Adaptive {K}alman
		  Filtering",
journal	=	ieeetrAC,
year	=	1970,
volume	=	"AC-15",
number	=	2,
pages	=	"175--184",
month	=	apr,
}

And an application thereof.

@Article	{PUSKORIUS-FELDKAMP94A,
author	=	"G. V. Puskorius and L. A. Feldkamp", 
title	=	"Neurocontrol of Nonlinear Dynamical Systems with {K}alman
		  Filter-Trained Recurrent Networks",
journal	=	IEEEtrNN,
volume	=	5,
number	=	2,
pages	=	"279--297",
year	=	1994
}


		       MODEL BASED RECOGNITION
				  vs
		   DISCRIMINATION BASED RECOGNITION

Motor theory of speech.  A good row in the journals exposes the seamy
underbelly of psychology!

@article	{LANE65,
author	=	"Lane, H. L.",
year	=	1965,
title	=	"The motor theory of speech perception: A critical review",
journal	=	"Psychology Review",
volume	=	72,
pages	=	"275--309"
}

@article	{STUDDERT-KENNEDY-LIBERMAN-HARRIS-COOPER70,
author	=	"Studdert-Kennedy, M. and Liberman, A. M and Harris, K. S. and
		  Cooper, F. S.",
year	=	1970,
title	=	"Motor Theory of Speech Perception: A Reply to Lane's Critical
		  Review",
journal	=	PsychRev,
volume	=	77,
pages	=	"234--249"
}

@Proceedings	{MOTORHEAR91,
title	=	"Modularity and the Motor Theory of Speech Perception:
		  Proceedings of a Conference to Honor Alvin M. Liberman",
booktitle=	"Modularity and the Motor Theory of Speech Perception:
		  Proceedings of a Conference to Honor Alvin M. Liberman",
year	=	1991,
editor	=	"Ignatius G. Mattingly and Michael Studdert-Kennedy",
publisher=	ERLBAUM
}


			   N-ARMED BANDITS

These are excellent texts on bandits.

@Book		{BERRY-FRISTEDT85A,
author	=	"Donald A. Berry and Bert Fristedt",
title	=	"Bandit Problems: Sequential Allocation of Experiments",
publisher=	CHAPMAN-HALL,
year	=	1985,
series	=	"Monographs on statistics and applied probability"
}

@book		{NARENDRA-THATHACHAR89A,
author	=	"K. S. Narendra and M. A. L. Thathachar",
title	=	"Learning Automata: an Introduction",
year	=	1989,
publisher=	PRENTICE,
}
		  
I've also heard that this is good, but have not seen it myself.

@Book		{PRESMAN-SONIN90A,
author	=	"Ernst Lvovich Presman and Isaak Mikhailovich Sonin",
title	=	"Sequential control with incomplete information: the Bayesian
		  approach to multi-armed bandit problems",
publisher=	ACADEMIC,
year	=	1990,
annote	=	"translated from the Russian"
}


This is a little dated, but has a slant of interest to people working
on genetic algorithms.

@TechReport	{SIVAPALAN-ETAL87A,
author	=	"T. Sivapalan and David E. Goldberg",
title	=	"The Two-Armed Bandit Problem: A Bibliography, 1952-Present",
institution=	"University of Alabama, Dept. of Engineering Mechanics,
		  Clearinghouse for Genetic Algorithms",
year	=	1987,
number	=	"TCGA 87002",
address	=	"Tuscaloosa, Alabama",
month	=	nov
}

This is off topic, but one of the funniest books I've ever read.

@Book		{WEISBECKER86A,
author	=	"Alan C. Weisbecker",
title	=	"Cosmic Banditos: a Contrabandista's Quest for the Meaning of
		  Life",
publisher=	"Vintage Books",
year	=	1986,
address	=	"New York",
quote	=	"Mr Quark: Please leave me and my family alone."
}

These view neurons as little bandit automata.  Considered classics.

@article	{BARTO83,
author	=	"Barto, A. G. and Sutton, R. S. and Anderson, C. W.",
title	=	"Neuron-like Adaptive Elements that Can Solve Difficult
		  Learning Control Problems",
journal	=	ieeetrSMC,
volume	=	13,
pages	=	"835--846",
year	=	1983,
annote	=	"Reprinted in {\it Neurocomputing: Foundations of Research},
		 J. A. Anderson and E. Rosenfeld (Eds.), Cambridge, MA: The MIT
		 Press, 1988, pp. 537--549"
}

@article	{BARTO85,
author	=	"Barto, A. G. and P. Anandan",
title	=	"Pattern Recognizing Stochastic Learning Automata",
pages	=	"360--375",
journal	=	ieeetrSMC,
volume	=	15,
year	=	1985
}


		    REINFORCEMENT LEARNING SYSTEMS


The classic ad-hoc bucket-brigade self-learning checkers player.

@Article	{SAMUEL60A,
author	=	"A. L. Samuel",
title	=	"Programming Computers to Play Games",
journal	=	"Advances in Computers",
year	=	1960,
volume	=	1,
pages	=	"165--192"
}

@article	{SAMUEL67,
author	=	"Samuel, A. L.",
title	=	"Some studies in machine learning using the game of
		 checkers {II}---Recent progress",
journal	=	ibmjrd,
year	=	1967,
volume	=	11
}

More bucket-brigades.  This book has a lot in it, but the lack of a
systematic and rigorous approach makes it difficult to separate what
worked from what he thought was cool.

@book		{HOLLAND75,
author	=	"Holland, J. H.",
title	=	"Adaptation in Natural and Artificial Systems",
publisher=	Uof#"Michigan Press",
year	=	1975
}

There are a couple important modern algorithms.  Aside from Bandit
algorithms, the method of tempral differences is probably most
important.  It is not stricly a reinforcement learning algorithm, as
it only estimates the expected return of a state, but I'm putting it
here anyway.  (It is related to the bucket brigade, but has had some
serious practical impact.)

@Article	{SUTTON88A,
author	=	"R. S. Sutton",
title	=	"Learning To Predict by the Methods of Temporal Differences",
journal	=	"Machine Learning",
year	=	1988,
volume	=	3,
pages	=	"9--44",
url	=	"ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-88.ps.gz"
}

Here is a fun paper, which embodies a lot of different ideas in a
single system called DYNA.

@InBook		{SUTTON90A,
crossref=	"NNfC",
author	=	"Richard S. Sutton",
title	=	"First Results with {DYNA}, an Integrated Architecture for
		  Learning, Planning and Reacting",
chapter	=	8,
pages	=	"179--189",
annote	=	"see also ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-91b.ps.gz"
}

@InProceedings	{SUTTON90B,
author	=	"Richard S. Sutton",
title	=	"Integrated Architectures for Learning, Planning, and Reacting
		  Based on Approximating Dynamic Programming",
crossref=	"ICML7",
url	=	"ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-90.ps.gz"
}

Some more TD stuff.

@InProceedings	{TESAURO92A,
author	=	"G. J. Tesauro",
title	=	"Practical Issues in Temporal Difference Learning",
crossref=	"NIPS4",
pages	=	"260--266"
}

@InProceedings	{SUTTON-SINGH94A,
author	=	"Richard S. Sutton and Satinder P. Singh",
title	=	"On Bias and Step-Size in Temporal-Difference Learning",
crossref=	"YWALS9",
pages	=	"91--96",
url	=	"ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-singh-94.ps.gz"
}

@InBook		{SUTTON-BARTO90B,
author	=	"Richard. S. Sutton and Andrew. G. Barto",
title	=	"A Time-Derivative Theory of Pavlovian Conditioning",
crossref=	"LCNS",
pages	=	"497--537"
}

The seminal Q-learning reference

@PhdThesis	{WATKINS89,
author	=	"C. J. C. H. Watkins",
title	=	"Learning from Delayed Rewards",
school	=	"Cambridge University",
year	=	"1989",
address	=	"Cambridge, England",
}

Something a little easier to get ahold of

@Article	{WATKINS-DAYAN92A,
author	=	"Watkins, J. C. H. and Dayan, P.",
title	=	"Technical Note: Q-Learning",
journal	=	"Machine Learning",
year	=	1992,
volume	=	8,
number	=	"279-292"
}


A sort of manifesto attempting to unify two disciplines.

@InProceedings	{SUTTON-ETAL91,
crossref=	"ACC91",
author	=	"R. S. Sutton and A. G. Barto and R. J. Williams",
title	=	"Reinforcement Learning is Direct Adaptive Optimal Control",
pages	=	"2143--2146"
}

A nice collection.

@Book		{SUTTON92A,
title	=	"Reinforcement Learning",
booktitle=	"Reinforcement Learning",
publisher=	KLUWER,
year	=	1992,
editor	=	"Richard S. Sutton",
series	=	"SECS",
volume	=	173,
note	=	"Reprinted from volume 8(3--4) (1992) of Machine Learning"
}

More Q-learning

(boltzmann-distribution paper which shows convergence)

On the convergence of stochastic iterative dynamic programming algorithms 
Tomi Jaakola, Mike Jordan, Satinder Singh.
http://web.mit.edu/~tommi/Public/conv.html
ftp://psyche.mit.edu/pub/tommi/jaak-convergence.ps.gz


POMDP (partially observable markov decision process) policy learning

Reinforcement learning algorithm for partially observable decision problems
Jaakkola, Singh, and Jordan / NIPS-7
http://web.mit.edu/~tommi/Public/nips7.html
ftp://psyche.mit.edu/pub/tommi/jaak-nips7.ps.gz


	   ******* to add (not in my bib file, urk) *******

Gerry Tesauro's world champion TD-gammon.  Neural Computation.
http://www.research.ibm.com/massdist/tdl.html

Hans Berliner's BKG 9.2 article, scientific american.  blemishes by
clever Hans.

peter brown thesis

kai-fu lee thesis

BILL, the othello program.  (eval function tuned w/ reinforcement
learning, sort of.)


Heirarchical reinforcement learning

heirarchical EM stuff

the bad lieutenant's stuff (Harmon):
 sutton's extended delta-bar-delta -> nonlinear
 advantage learning
 reinforcement learning tutorial
 http://ace.aa.wpafb.af.mil:80/~harmonme/pubs.html

Kenji Doya, continuous-time TD, NIPS*9x

that bandit H-something races paper from NIPS a few years ago.
Nice formalism.

the recent COLT community work on adaptive approximations to optimal
balanced portfolios.

add textbooks.

backpropagation & automatic differentiation
helmholtz machines
boltzmann machines

Dayan & Hinton NIPS paper on hierarchical reinforcement learning


section on generalization

PAC learning
An Introduction to Computational Learning Theory
Michael J. Kearns and Umesh V. Vazirani 
1994 
MIT Press
ISBN 0-262-11193-4 
http://www-mitpress.mit.edu/mitp/recent-books/comp/kearns.html

combining learners
Boosting, especially the AdaBoost algorithm
http://www.research.att.com/~schapire/boost.html

Exponentiated Gradient Descent