HW3: Rosenbrock's Racers !!! Implement naive deterministic gradient descent of Rosenbrock's function R(x,y) = 100 (y-x^2)^2 + (1-x)^2 The race course starts at (-1, 1.5) and ends at an epsilon ball around the minimum at (1, 1), epsilon=0.01. Time measured in function (gradient) evaluations. Represent R(x,y) as a feedforward network with the simplest nonlinear elements you can find. Use "backprop" to derive the gradient upon this structure, rather than the usual calculus. (Show your work.) Get some practice "hand piloting" the system by modifying the learning parameters (learning rate eta and momentum parameter alpha) manually as it progresses. Hint: feedback parameters I like are gradient magnitude, gradient magnitude ratio (this vs last), cos gradient angle (this vs last), cos step angle, cos step vs gradient angle. Which feedback parameters were useful? Try to encode the skills you developed in an "autopilot". How fast does your autopilot run the course? Is it robust, eg to changing the "100" in the definition of R(x,y) ? Extra Credit: add a little noise to the gradient your system measures. Does it mess things up? How does it affect speed?