Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 07.3-concepts.Rmd #396

Merged
merged 1 commit into from
May 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion manuscript/07.3-concepts.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ $$S_{C,k,l}(x)=\nabla h_{l,k}(\hat{f}_l(x))\cdot v_l^C$$
where $\hat{f}_l$ maps the input $x$ to the activation vector of the layer $l$ and $h_{l,k}$ maps the activation vector to the logit output of class $k$.

Mathematically, the sign of $S_{C,k,l}(x)$ only depends on the angle between the gradient of $h_{l,k}(\hat{f}_l(x))$ and $v_l^C$.
If the angle is greater than 90 degrees, $S_{C,k,l}(x)$ will be positive, and if the angle is less than 90 degrees, $S_{C,k,l}(x)$ will be negative.
If the angle is less than 90 degrees, $S_{C,k,l}(x)$ will be positive, and if the angle is greater than 90 degrees, $S_{C,k,l}(x)$ will be negative.
Since the gradient $\nabla h_{l,k}$ points to the direction that maximizes the output the most rapidly, conceptual sensitivity $S_{C,k,l}$, intuitively, indicates whether $v_l^C$ points to the similar direction that maximizes $h_{l,k}$.
Thus, $S_{C,k,l}(x)>0$ can be interpreted as concept $C$ encouraging the model to classify $x$ into class $k$.

Expand Down
Loading