Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update loss explanation #70

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,13 @@ To compute the loss, we start with three key concepts:
computed as $Nx \times Ny - \frac{Ny \times (Ny - 1)}{2}$, where $Nx$ is the
total number of records, and $Ny$ is the number of relevant records.

2. **Worst AUC**: This represents the area under a worst-case recall curve,
The Optimal AUC is calculated as the entire area, minus the triangle with impossible performance. As recall is a grid (stepwise curve), we remove the cells on the diagonal of this triangle (hence $Ny - 1$).

3. **Worst AUC**: This represents the area under a worst-case recall curve,
where all relevant records appear at the end of the screening process. This
is calculated as $\frac{Ny \times (Ny + 1)}{2}$.
is calculated as $\frac{Ny \times (Ny + 1)}{2}$. Here, same as before, we need to account for the cells on the diagonal. Here we add the cells on the diagonal.

3. **Actual AUC**: This is the area under the recall curve produced by the model
4. **Actual AUC**: This is the area under the recall curve produced by the model
during the screening process. It can be obtained by summing up the cumulative
recall values for the labeled records.

Expand All @@ -165,6 +167,8 @@ the worst AUC.
$$\text{Normalized Loss} = \frac{Ny \times \left(Nx - \frac{Ny - 1}{2}\right) -
\sum \text{Cumulative Recall}}{Ny \times (Nx - Ny)}$$

> Note: This formula uses the absolute recall values, not the normalized ratio found in the graph below.

The lower the loss, the closer the model is to the perfect recall curve,
indicating higher performance.

Expand Down