Refactor `CompileLoss` to support nested `y_true` and `y_pred` #19879

james77777778 · 2024-06-19T06:54:37Z

Fix #19855

I think current implementation of CompileLoss restricts many possibility for custom loss:

It strictly require y_true and y_pred to be single tensor
It enforces the number of losses to match the number of the model's outputs in Functional model instead of the actual number of pairs of (y_true, y_pred)

Additionally, there are many pitfalls that can cause CompileLoss to malfunction. #19855

So, I have refactored CompileLoss by introducing the concept of y_true_spec and y_pred_spec.
The idea behind is to utilize tree and the specs to automatically pack y_true and y_pred for CompileLoss.build and CompileLoss.call.

For backward compatibility, they default to a plain "tensor" structure. If someone needs a nested structure for y_true or y_pred, they can use the newly introduced Loss.set_specs to set y_true_spec and y_pred_spec.

Tests have been added to verify this new behavior.

test_custom_loss_with_list_y_true: y_true as a list
test_custom_loss_with_nested_dict: y_true as a dict with a list and y_pred as a dict of dict

Additionally, the missing dtype in all losses has been corrected.

keras/src/trainers/compile_utils.py

Darkdragon84 · 2024-06-19T15:49:36Z

Very cool! Is there any way we can integrate the additional required step to call loss.set_specs into sth like the compile step or so? Right now only Loss objects can be used with the new feature, when you pass a callable, the usual single tensor specs are assumed.

I am wondering if we really need to tinker with the y_true and y_pred arguments to model.compute_loss so much? Let's consider the possible scenarios:

Single loss, inputs are single tensors: we don't need to do anything to inputs, pass them right through to the single loss
Single loss, inputs are nested structures of tensors: again, we don't need to do anything to inputs, pass them right through to the single loss
multiple losses, inputs are single tensors: we need to broadcast the inputs to each loss, but leave their structures as they are
multiple losses, inputs are nested structures of tensors: this one is the only tricky part, let's dive in one more level:
1. inputs are dicts: compare keys with losses dicts, if everything matches, pass the values through to each loss unaltered
2. inputs are nested sequences: this is the only part that can be ambiguous (I think): if both y_true and y_pred are sequences of the same length as the losses it could mean either:
  - each input element corresponds to one loss, so zip iterate through losses, y_true and y_pred
  - each loss takes the same nested input shapes, so broadcast

So, 4.ii. is the only case we have to figure out how to make unambiguous.

One easy solution would be to always require the loss argument to model.compile as well as the inputs to model.compute_loss to be a sequence (or dict) of same length, even for a single loss. It would be the same as for the metrics argument to model.compile, so I think this would be fine.

What do you think? Did I miss any scenarios?

fchollet

Thanks for the PR!

keras/src/losses/__init__.py

fchollet · 2024-06-19T17:24:37Z

keras/src/losses/loss.py

+ self.y_true_spec = self.SPEC_TENSOR
+ self.y_pred_spec = self.SPEC_TENSOR
+
+ def set_specs(self, y_true, y_pred):


The notion of having specs for y_true and y_pred feels overengineered to me.

It is difficult or even impossible to avoid this function if y_true or y_pred is nested instead of a plain tensor.

We have no information about the structure of y_true and y_pred when compiling the losses. This was previously resolved through complicated analysis and a very strong assumption that each loss has single tensors as inputs.

Obviously, the single tensor input format is insufficient for many cases.
(Personally, in an object detection task, I had to stack multiple y_true values into a single tensor to make CompileLoss works. It took me some time to work around it.)

This solution for nested input structures is the only one I could come up with. It feels overengineered, but it works. Additionally, the build part is much cleaner than before.

fchollet · 2024-06-19T18:01:05Z

keras/src/trainers/compile_utils.py

+ f"Received: loss_weights length={len(flat_loss_weights)}, "
+ f"loss legnth={len(flat_losses)}"
+ )
+ flat_y_true_specs, flat_y_pred_specs = self._get_specs(flat_losses)


I don't quite follow how this is used. Could it be simplified?

I have added some comments to clarify the entire procedure.
Essentially, we need y_true_spec and y_pred_spec in a flat form because we have flattened the losses in that manner.
In call, these specs are necessary to correctly pack y_true and y_pred before iterating through them with self.flat_losses.

james77777778 · 2024-06-20T02:09:03Z

Hey @Darkdragon84

Thanks for your inputs.

Is there any way we can integrate the additional required step to call loss.set_specs into sth like the compile step or so?

It is quite difficult for me. The root cause is that we have no information about the structure of y_true and y_pred during build. I don't have an idea to automatically infer that.

One easy solution would be to always require the loss argument to model.compile as well as the inputs to model.compute_loss to be a sequence (or dict) of same length, even for a single loss.

Yeah, this will make CompileLoss much cleaner and we can avoid using specs. However, this change could break a lot of existing projects built with Keras. I don't think it is feasible.

Darkdragon84 · 2024-06-20T08:09:02Z

Yeah, this will make CompileLoss much cleaner and we can avoid using specs. However, this change could break a lot of existing projects built with Keras. I don't think it is feasible.

Yeah I understand, especially since the latest major release was very recently... Would it be possible to enable arbitrarily nested tensor inputs only if the losses and inputs are given as lists (i.e. confer to the format that the metrics also use?). Perhaps we can also find an easier way to resolve the ambiguity without having to require list inputs

Darkdragon84 · 2024-06-20T09:03:21Z

It is quite difficult for me. The root cause is that we have no information about the structure of y_true and y_pred during build. I don't have an idea to automatically infer that.

If we found a way to disambiguate 4.ii. we could defer the call to set_specs to when the losses are actually compiled (first call of compute_loss I think?). I'll think a bit about it.

james77777778 · 2024-06-20T10:11:30Z

Would it be possible to enable arbitrarily nested tensor inputs only if the losses and inputs are given as lists

~~This should be feasible. It's the easiest scenario. (I can create a shortcut for this)~~
On second thought, it isn't clear to me how we can verify that the inputs are valid even with a single loss

I tried to build this testing table and we should pass all of them with the algorithm

All possible valid scenario in tests

`loss`	`y_pred`	`y_true`
`loss`	`y`	`y`
`[loss]`	`y`	`y`
`{"a": loss}`	`y`	`y`
`{"a": loss}`	`y`	`y`

`loss`	`[y]`	`y`
`[loss]`	`[y]`	`y`
`{"a": loss}`	`[y]`	`y`
`{"a": loss}`	`[y]`	`y`

`loss`	`{"a": y}`	`y`
`[loss]`	`{"a": y}`	`y`
`{"a": loss}`	`{"a": y}`	`y`
`{"a": loss}`	`{"a": y}`	`y`

`[loss1, loss2]`	`[y1, y2]`	`(y1, y2)`
`[loss1, [loss2]]`	`[y1, y2]`	`(y1, y2)`
`{"a": loss1, "b": loss2}`	`[y1, y2]`	`(y1, y2)`
`{"a": loss1, "b": loss2}`	`[y1, y2]`	`(y1, y2)`
`{"a": loss1}`	`[y1, y2]`	`(y1, y2)`

`{"a": loss1, "b": loss2}`	`{"a": y1, "b": y2}`	`{"a": y1, "b": y2}`
`[loss1, loss2]`	`{"a": y1, "b": y2}`	`{"a": y1, "b": y2}`

We should be careful that both y1 and y2 could be nested structure.

There are also some cases that should be considered invalid such as missed keys, but I haven't listed them above to simplify the table.

fchollet · 2024-06-20T18:36:09Z

I don't think we should make changes to the base Loss API (i.e. let's restricts changes to the CompileLoss class). I also don't think we should make loss.set_specs() a public-facing API that users have to call if their losses take more than a single tensor input.

If we're going to enable nested structures for y_true and y_pred in losses, they should "just work". There should not be an additional config step that users have to take.

we could defer the call to set_specs to when the losses are actually compiled (first call of compute_loss I think?

This would be the right approach -- if we need to keep track of input specs, let's base it on the structure of the inputs received at the first call to the loss.

james77777778 · 2024-06-21T02:02:40Z

we could defer the call to set_specs to when the losses are actually compiled (first call of compute_loss I think?

This would be the right approach -- if we need to keep track of input specs, let's base it on the structure of the inputs received at the first call to the loss.

But how can we do this?
As far as I know, CompileLoss is the very first point where loss and the inputs (y_true, y_pred) meet, not to mention that the structures might already be nested.

(I'm going to separate the unrelated fix from this PR to concentrate the discussion here)

fchollet · 2024-06-21T04:36:06Z

As far as I know, CompileLoss is the very first point where loss and the inputs (y_true, y_pred) meet, not to mention that the structures might already be nested.

So could the logic live entirely in CompileLoss and not have any user-facing footprint?

I have not thought about how to do it, but if the only way to make the feature work is to add a user facing set_specs step, then I think we're better off not supporting the feature at all by default (this would mean that users who need nested loss inputs will write their own compute_loss function).

Darkdragon84 · 2024-06-21T09:03:32Z

But how can we do this?
As far as I know, CompileLoss is the very first point where loss and the inputs (y_true, y_pred) meet, not to mention that the structures might already be nested.

@james77777778
I guess at exactly that point we need to find a way to clearly infer from the inputs. If there are disambiguities we cannot resolve we probably need to retain the original behavior. Are you concerned about build?

james77777778 · 2024-06-22T05:36:46Z

So could the logic live entirely in CompileLoss and not have any user-facing footprint?

I think this is not feasible. We should retain the original behavior.

Are you concerned about build?

Hey @Darkdragon84. I'm unsure what do you mean by build?

I'm thinking of another way to support nested y_true and y_pred by utilizing dict. It should be feasible to check whether loss, y_true and y_pred are all dicts or not. If they are all dicts, we can directly feed the inputs to the corresponding loss, no matter what their structure is.
(Maybe it is possible to extend this approach a little more to support other structures as well.)

A simple usage like this:

import keras


def my_loss(y_true, y_pred):
    y1, y2 = y_true
    pred_sum = keras.ops.sum(y_pred)
    return keras.ops.abs(keras.ops.sum(y1) - pred_sum) + keras.ops.abs(
        keras.ops.sum(y2) - pred_sum
    )


x = keras.Input(shape=(3,), name="input_a")
output_a = keras.layers.Dense(1, name="output_a")(x)
output_b = keras.layers.Dense(1, name="output_b", activation="sigmoid")(x)
model = keras.Model(x, {"output": [output_a, output_b]})  # <- dict

model.compile(loss={"output": my_loss})  # <- dict

x = keras.random.uniform([8, 3])
y1 = keras.random.uniform([8, 1])
y2 = keras.random.randint((8, 1), 0, 2)
model.fit(x, {"output": [y1, y2]}, batch_size=2, epochs=1)  # <- dict

"""
Currently, this causes an error:
ValueError: In the dict argument `loss`, key 'output' does not correspond to any model output. Received:
loss={'output': <function my_loss at 0x73e7325fe2a0>}
"""

gbaned · 2024-07-12T02:58:44Z

Hi @james77777778 Can you please resolve the conflicts? Thank you!

james77777778 · 2024-07-12T03:06:11Z

Hi @james77777778 Can you please resolve the conflicts? Thank you!

This PR should be closed now. I don't have a concrete idea for CompileLoss at the moment.

google-ml-butler bot added the size:L label Jun 19, 2024

google-ml-butler bot assigned gbaned Jun 19, 2024

james77777778 mentioned this pull request Jun 19, 2024

CompileLoss truncates multiple loss inputs #19855

Open

Darkdragon84 reviewed Jun 19, 2024

View reviewed changes

keras/src/trainers/compile_utils.py Outdated Show resolved Hide resolved

Darkdragon84 reviewed Jun 19, 2024

View reviewed changes

keras/src/trainers/compile_utils.py Outdated Show resolved Hide resolved

fchollet reviewed Jun 19, 2024

View reviewed changes

Refactor CompileLoss

f11eaba

james77777778 force-pushed the improve-compileloss branch from b97cd6d to f11eaba Compare June 20, 2024 01:33

james77777778 requested a review from fchollet June 20, 2024 02:09

google-ml-butler bot added the awaiting review label Jun 20, 2024

james77777778 closed this Jul 12, 2024

google-ml-butler bot removed the awaiting review label Jul 12, 2024

james77777778 deleted the improve-compileloss branch September 4, 2024 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `CompileLoss` to support nested `y_true` and `y_pred` #19879

Refactor `CompileLoss` to support nested `y_true` and `y_pred` #19879

james77777778 commented Jun 19, 2024 •

edited

Loading

Darkdragon84 commented Jun 19, 2024 •

edited

Loading

fchollet left a comment

fchollet Jun 19, 2024

james77777778 Jun 20, 2024

fchollet Jun 19, 2024

james77777778 Jun 20, 2024

james77777778 commented Jun 20, 2024

Darkdragon84 commented Jun 20, 2024

Darkdragon84 commented Jun 20, 2024

james77777778 commented Jun 20, 2024 •

edited

Loading

fchollet commented Jun 20, 2024

james77777778 commented Jun 21, 2024

fchollet commented Jun 21, 2024

Darkdragon84 commented Jun 21, 2024 •

edited

Loading

james77777778 commented Jun 22, 2024 •

edited

Loading

gbaned commented Jul 12, 2024

james77777778 commented Jul 12, 2024

Refactor CompileLoss to support nested y_true and y_pred #19879

Refactor CompileLoss to support nested y_true and y_pred #19879

Conversation

james77777778 commented Jun 19, 2024 • edited Loading

Darkdragon84 commented Jun 19, 2024 • edited Loading

fchollet left a comment

Choose a reason for hiding this comment

fchollet Jun 19, 2024

Choose a reason for hiding this comment

james77777778 Jun 20, 2024

Choose a reason for hiding this comment

fchollet Jun 19, 2024

Choose a reason for hiding this comment

james77777778 Jun 20, 2024

Choose a reason for hiding this comment

james77777778 commented Jun 20, 2024

Darkdragon84 commented Jun 20, 2024

Darkdragon84 commented Jun 20, 2024

james77777778 commented Jun 20, 2024 • edited Loading

All possible valid scenario in tests

fchollet commented Jun 20, 2024

james77777778 commented Jun 21, 2024

fchollet commented Jun 21, 2024

Darkdragon84 commented Jun 21, 2024 • edited Loading

james77777778 commented Jun 22, 2024 • edited Loading

gbaned commented Jul 12, 2024

james77777778 commented Jul 12, 2024

Refactor `CompileLoss` to support nested `y_true` and `y_pred` #19879

Refactor `CompileLoss` to support nested `y_true` and `y_pred` #19879

james77777778 commented Jun 19, 2024 •

edited

Loading

Darkdragon84 commented Jun 19, 2024 •

edited

Loading

james77777778 commented Jun 20, 2024 •

edited

Loading

Darkdragon84 commented Jun 21, 2024 •

edited

Loading

james77777778 commented Jun 22, 2024 •

edited

Loading