Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Add multi hot sparse categorical crossentropy #163

Merged
merged 6 commits into from
Sep 20, 2018

Conversation

roywei
Copy link

@roywei roywei commented Aug 30, 2018

Summary

This PR adds a new feature to calculate categorical cross-entropy on multi hot sparse labels
Inputs are softmax predictions and true labels.
It should return same loss as categorical cross-entropy.

Example:

Input:
input data is random images of size (32, 32) in channels first data format
 Labels:
Tradition labels are in the shape of (num_samples, num_class), for example:
labels = [[0, 1, 1, ..., 0],
          [1, 1, 0, ..., 0],
          ...
          [0, 0, 0, ..., 1]]
where len(labels) = num_samples, len(labels[0]) = num_classes
 However, when num_classes are very large and labels are very sparse,
we can represent them differently, for example:
 There are total 1000 classes, so there will be 10000 different labels.
Each image can belong to at most 5 labels at the same time.
labels = [[1, 2],
          [0, 1],
          ...
          [999]]
where labels is a list of list
 Special Note:
To deal with different length of sparse labels, we pad them with negative values,
so we can differentiate padding values with normal labels. It will become:
padded_labels = pad_sequeences(labels, value=-1)
padded_labels = [[-1, -1, -1, 1, 2],
          [-1, -1, -1, 0, 1],
          ...
          [-1, -1, -1, -1, 999]]
It will have shape (num_samples, 5) which still save space compare to dense labels.

Changes

  • Add implementation for loss and metrics
  • Add examples at examples/multi_hot_sparse_categorical_crossentropy.py
  • Add unit tests for loss and metric
  • Performance:
    X3 faster for calculating 3000 classes multi labeled sparse labels. (5 labels at most for each data)
('categorical crossentropy loss time per epoch:', 0.7335219383239746)
('multi hot sparse categorical crossentropy loss time per epoch:', 0.23781204223632812)

PR Overview

  • This PR requires new unit tests [y/n] (make sure tests are included)
  • This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • This PR is backwards compatible [y/n]
  • This PR changes the current API [y/n]

@roywei roywei changed the title add multi hot sparse categorical crossentropy [WIP]add multi hot sparse categorical crossentropy Aug 30, 2018
Copy link

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

keepdims=True))
mx_output = mx.sym.clip(mx_output, a_min=epsilon(), a_max=1.0 - epsilon())
mx_output = mx.sym.concat(mx.sym.full((target.shape[0],1), 0.5), mx_output)
from mxnet.symbol.contrib import foreach

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move it.

mx_output = mx.sym.broadcast_div(mx_output, mx.sym.sum(mx_output,
axis=axis,
keepdims=True))
mx_output = mx.sym.clip(mx_output, a_min=epsilon(), a_max=1.0 - epsilon())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Please document these steps so it will be easier later for maintenance.

@roywei roywei changed the title [WIP]add multi hot sparse categorical crossentropy Add multi hot sparse categorical crossentropy Sep 13, 2018
Copy link

@kalyc kalyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, added a few comments inline

break
sparse_label_j = np.where(dense_labels[i] == 1)[0]
sparse_labels.append(sparse_label_j)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] remove blank line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

metrics=['accuracy'])

# pad sparse labels into shape length with value -1 to differentiate from normal labels
y_train_pad = pad_sequences(sparse_labels, value=-1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add this padding logic inside multi_hot_sparse_categorical_crossentropy calculating method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be done at numpy array level, before feeding to the model. Keras can't take a input(y_true) in the form of a list of list, and the list length may vary. (e.g. [[1,2],[0],[0,1,2]])

# using control flow ops to iterate output and take target (true label)
_step = lambda data, _: (mx.sym.take(data[0], data[1]), [])
data = [mx_output, target.symbol]
outputs, _ = mx.symbol.contrib.foreach(_step, data, [])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a MXNet contrib API do we have an issue to track that this needs to be updated when the stable version is released for foreach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have CI to test MXNet API changes (deprecated or breaking change). It's the same for contrib ops and other normal ops. If it's moved to mx.sym, it will give a deprecated warning. we should be able to catch it. created #173

@@ -109,6 +109,22 @@ def test_sparse_categorical_crossentropy_4d():
assert np.isclose(expected_loss, np.mean(loss))


def test_multi_hot_sparse_categorical_crossentropy():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test for multi_hot_sparse_categorical_accuracy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, see changes at tests/keras/metrics_test.py

Copy link

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions. LGTM

Copy link

@kalyc kalyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution @roywei

@@ -3017,6 +3017,11 @@ def multi_hot_sparse_categorical_crossentropy(target, output, from_logits=False,
>>>y_true_np2 = np.array([[1, 2], [0, 2],[0]])
```
"""
# TODO: remove version check after mxnet 1.3.1 stable release
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create a tracking issue to remove this check

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracked at #175

@roywei roywei merged commit ae719f1 into awslabs:dev Sep 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants