HW3: Rosenbrock's Racers !!!

Implement naive deterministic gradient descent of Rosenbrock's function

 R(x,y) = 100 (y-x^2)^2 + (1-x)^2

The race course starts at (-1, 1.5) and ends at an epsilon ball around
the minimum at (1, 1), epsilon=0.01.  Time measured in function
(gradient) evaluations.

Represent R(x,y) as a feedforward network with the simplest nonlinear
elements you can find.  Use "backprop" to derive the gradient upon
this structure, rather than the usual calculus.  (Show your work.)

Get some practice "hand piloting" the system by modifying the learning
parameters (learning rate eta and momentum parameter alpha) manually
as it progresses.

Hint: feedback parameters I like are gradient magnitude, gradient
magnitude ratio (this vs last), cos gradient angle (this vs last), cos
step angle, cos step vs gradient angle.

Which feedback parameters were useful?

Try to encode the skills you developed in an "autopilot".

How fast does your autopilot run the course?
Is it robust, eg to changing the "100" in the definition of R(x,y) ?

Extra Credit: add a little noise to the gradient your system
measures.  Does it mess things up?  How does it affect speed?