A few issues cropped up repeatedly. One is that code copied-and-pasted from a
Jupyter notebook often doesn't work right off the bat (for instance, it may
refer to other code that you forgot to copy and paste). It's important when
moving code around to test that the result works. In this case, since the code
needed to end up in .py
files anyway, it would have been easier to write and
test the code there first, and load that code into a Jupyter notebook with an
import statement.
Another common mistake was to write a loop looking like this for
gradient_descent
or one of the Exercise 3 functions:
while (norm(jacobi_step(D, L, U, b, xk) - xk) > epsilon):
xk = jacobi_step(D, L, U, b, xk)
return xk
This kind of loop has three problems:
-
Every time the loop executes,
jacobi_step
is called twice (once in thewhile
statement, and once to assignxk
). This is wasteful, since you can just store the value. -
The last time the loop executes,
jacobi_step
is called onxk
, but then that value is never stored inxk
. So effectively, the very last iteration is thrown away, and the second-last iteration is returned instead. Why not go ahead and use the very best value you've calculated? -
Making long function calls in a condition makes it harder to understand the flow of code.
It's generally better to make a simple change like this:
# No harm in doing one step immediately.
xkp1 = jacobi_step(D, L, U, b, xk)
# Now loop until the next iteration is less than epsilon from xk.
while (norm(xkp1 - xk) > epsilon):
xk = xkp1
xkp1 = jacobi_step(D, L, U, b, xk)
return xkp1
By storing the next value of xk
in xkp1
, we get rid of the extra call to
jacobi_step
in each iteration, and make it possible to return this value.
One last common mistake was to use different convergence criteria from what the assignment requested (e.g. using the 1-norm or max-norm rather than the 2-norm for Exercise 3).
Most people were able to implement the Exercise 1 code perfectly. One common issue in the reports was that many people suggested that the numerical results proved the Collatz conjecture (for example, because of the graph shape, or because there were no sequences much above 200 in length). For theorems like this, however, numerics can only find a counterexample. It's not possible to prove a conjecture about all integers this way (at least, not unless your experiment is part of a larger inductive argument).
Indeed, some conjectures in number theory have turned out to be false, but with counterexamples that are much larger than any number representable on a conventional computer. (For instance, see Skewes' number.)
Many people forgot to make their routines work in the case where the starting value has a derivative of zero, but is not a minimum (for example, if the starting point is a maximum).
One simple way of dealing with this is to use the function f
that's passed in
to test values at a distance of epsilon
on either side of the starting
point. If one of those values is smaller than the starting point, then the
iteration should "move" in that direction to find a true minimum. This is the
reason that f
is an argument in the first place (as many of you noticed, we
don't use it for the gradient descent itself). This can also be done during the
iteration to avoid stopping at an inflection point (though we didn't test for
this).
Another common mistake was to move in the direction where f
was smaller, but
then stop iteration. To actually find the minimum, it's not enough to avoid a
"bad" spot. You then have to continue to loop to find the minimum.
A common mistake was to simply forget to take the absolute value of the matrix entries when checking whether a matrix is SDD.
A few people seem to have simply forgotten to use gauss_seidel_step
in
gauss_iteration
at all (due to copying-and-pasting code from jacobi_step
;
some people had two calls to jacobi_step
, but only changed one of them).
For the report, there was a bit of a hint that the behavior of the Jacobi and
Gauss-Seidel methods might change when epsilon
was somewhere near 10^-16 (near
machine precision). If you look closely at what happens to the residual with
small values of epsilon
, you might notice two things:
-
If
epsilon
is small enough, the number of operations that either method takes to converge stays constant. E.g., no matter how smallepsilon
is, the Jacobi method will stay well under 200 iterations for the problem given.This is because of limited precision of floating-point arithmetic. Eventually the difference between two iterations will be rounded to zero. So no matter what (positive) value of
epsilon
is used, the iteration will not continue past that point. (It is theoretically possible that the method could instead oscillate between two values, so that termination never happens ifepsilon
is too small. However, that does not happen in this case; if the method is stable, it is not common to enter an infinite loop with the kind of criterion we're using here.) -
At the same time, as the number of iterations increases, the value of the residual eventually levels off, rather than decreasing to zero. This is because there's a small amount of truncation error that affects the calculated residual, and so the residual can't shrink below the order of magnitude of that error. This is easier to see if you plot the residual against the iteration count for both methods using a log scale (i.e. using the
semilogy
function frommatplotlib
).(This means that if our convergence criterion was based on the residual, we would enter an infinite loop if we tried to make the norm of the residual on the order of machine precision or smaller.)