You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I would like to express my admiration for the neural-tangents repository. It has been incredibly useful for my bachelor thesis experiments. Currently, I'm trying to gain a deeper understanding of the concepts of parameter space and function space.
In the "weight_space_linearization" notebook, I noticed that the network is linearized, and it is stated that training the linearized network using gradient descent is equivalent to continuous gradient descent in an infinite width network. This understanding aligns with theorem 2.1 of the paper Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent, which suggests that a neural network model can be well approximated by a linearized model. However, in order to linearize the network, it seems that the empirical NTK and the outputs of the neural network model at initialization need to be computed (as described in equations 10 and 11 of the same paper). Could you please clarify whether the empirical NTK is actually computed to linearize the model? If not, I'm curious to understand the process of linearization.
Additionally, there is a "function_space_linearization" notebook available in the repository, which approaches the problem from the function space perspective. In this notebook, the empirical NTK of the finite network is computed using its parameters at initialization. According to the documentation of the gradient_descent_mse method, it predicts the outcome of function space gradient descent training on Mean Squared Error (MSE). I would like to understand the process behind this. How is gradient descent performed in the function space? Is it equivalent to what was done in the "parameter_space_linearization" notebook?
I would greatly appreciate any clarification or insights you can provide regarding the differences and underlying mechanisms of parameter space and function space linearization in neural-tangents. Thank you very much!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
First of all, I would like to express my admiration for the neural-tangents repository. It has been incredibly useful for my bachelor thesis experiments. Currently, I'm trying to gain a deeper understanding of the concepts of parameter space and function space.
In the "weight_space_linearization" notebook, I noticed that the network is linearized, and it is stated that training the linearized network using gradient descent is equivalent to continuous gradient descent in an infinite width network. This understanding aligns with theorem 2.1 of the paper Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent, which suggests that a neural network model can be well approximated by a linearized model. However, in order to linearize the network, it seems that the empirical NTK and the outputs of the neural network model at initialization need to be computed (as described in equations 10 and 11 of the same paper). Could you please clarify whether the empirical NTK is actually computed to linearize the model? If not, I'm curious to understand the process of linearization.
Additionally, there is a "function_space_linearization" notebook available in the repository, which approaches the problem from the function space perspective. In this notebook, the empirical NTK of the finite network is computed using its parameters at initialization. According to the documentation of the
gradient_descent_mse
method, it predicts the outcome of function space gradient descent training on Mean Squared Error (MSE). I would like to understand the process behind this. How is gradient descent performed in the function space? Is it equivalent to what was done in the "parameter_space_linearization" notebook?I would greatly appreciate any clarification or insights you can provide regarding the differences and underlying mechanisms of parameter space and function space linearization in neural-tangents. Thank you very much!
Best regards,
Juan Da Silva
Beta Was this translation helpful? Give feedback.
All reactions