Stanford Cs 229 Homework Answers
CS229 Problem Set #01CS 229, Autumn 2016Problem Set #0 Solutions: Linear Algebra andMultivariable CalculusNotes:(1) These questions require thought, but do not require long answers. Please be asconcise as possible. (2) If you have a question about this homework, we encourage you to postyour question on our Piazza forum, athttps://piazza.com/stanford/autumn2016/cs229. (3)If you missed the Frst lecture or are unfamiliar with the collaboration or honor code policy, pleaseread the policy on Handout #1 (available from the course website) before starting work. (4)This speciFc homework isnot graded, but we encourage you to solve each of the problems tobrush up on your linear algebra. Some of them may even be useful for subsequent problem sets.It also serves as your introduction to using Gradescope for submissions.If you are scanning your document by cellphone, please check the Piazza forum for recommendedcellphone scanning apps and best practices.1.[0 points] Gradients and HessiansRecall that a matrixA2Rn⇥nissymmetricifAT=A, that is,Aij=Ajifor alli, j. Alsorecall the gradientrf(x) of a functionf:Rn!R,whichisthen-vector of partial derivativesrf(x)=264@@x1f(x)...@@xnf(x)375wherex=264x1...xn375.The hessianr2f(x) of a functionf:Rn!Ris then⇥nsymmetric matrix of twice partialderivatives,r2f(x2666664@2@x21f(x)@2@x1@x2f(x)···@2@x1@xnf(x)@2@x2@x1f(x)@2@x22f(x)@2@x2@xnf(x)............@2@xn@x1f(x)@2@xn@x2f(x)@2@x2nf(x)3777775.(a) Letf(x12xTAx+bTx,whereAis a symmetric matrix andb2Rnis a vector. Whatisrf(x)?Answer:In short, we know thatr(12xTAxAxfor a symmetric matrixA,whiler(bTxb.Thenrf(xAx+bwhenAis symmetric. In more detail, we have12xTAx=12nXi=1nXj=1Aijxixj,
CS229 Problem Set #1 Solutions 2 The − λ 2 θ T θ here is what is known as a regularization parameter, which will be discussed in a future lecture, but which we include here because it is needed for Newton’s method to perform well on this task. For the entirety of this problem you can use the value λ = 0 . 0001. Using this de±nition, the gradient of ℓ ( θ ) is given by ∇ θ ℓ ( θ ) = X T z − λθ where z ∈ R m is de±ned by z i = w ( i ) ( y ( i ) − h θ ( x ( i ) )) and the Hessian is given by H = X T DX − λI where D ∈ R m × m is a diagonal matrix with D ii = − w ( i ) h θ ( x ( i ) )(1 − h θ ( x ( i ) )) For the sake of this problem you can just use the above formulas, but you should try to derive these results for yourself as well. Given a query point x , we choose compute the weights w ( i ) = exp p − || x − x ( i ) || 2 2 τ 2 P . Much like the locally weighted linear regression that was discussed in class, this weighting scheme gives more when the “nearby” points when predicting the class of a new example. (a) Implement the Newton-Raphson algorithm for optimizing ℓ ( θ ) for a new query point x , and use this to predict the class of x . The q2/ directory contains data and code for this problem. You should implement the y = lwlr(X train, y train, x, tau) function in the lwlr.m ±le. This func-tion takes as input the training set (the X train and y train matrices, in the form described in the class notes), a new query point x and the weight bandwitdh tau .