Use power-of-two scaling in autoscale scaled translation ops rules. #65

balancap · 2024-01-02T11:21:24Z

As shown in #60 issue, propagating non power-of-two scaling factors can decrease training accuracy in low precision (typically in FP16).
The additional rescaling operations will introduce non-negligible floating point accumulated errors.

Ths PR is adding the option to round the scale to a power-of-two in scaled translation. Supporting at the moment only rounding up and down. The rounding mode can be modified in the config dataclass AutoScaleConfig.

Scaled translations updated are: dot_general, add, sub and reduce_sum.

Finally, when implicitely converting scalars to scaled arrays, the method make_scaled_scaled now splits the input mantissa and exponent between data and scale.

As shown in #60 issue, propagating non power-of-two scaling factors can decrease training accuracy in low precision (typically in FP16). The additional rescaling operations will introduce non-negligible floating point accumulated errors. Ths PR is adding the option to round the scale to a power-of-two in scaled translation. Supporting at the moment only rounding up and down. The rounding mode can be modified in the config dataclass `AutoScaleConfig`. Scaled translations updated are: `dot_general`, `add`, `sub` and `reduce_sum`. Finally, when implicitely converting scalars to scaled arrays, the method `make_scaled_scaled` now splits the input mantissa and exponent between data and scale.

balancap marked this pull request as draft January 2, 2024 12:38

balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch from a8d2bdd to da22330 Compare January 2, 2024 17:53

balancap changed the title ~~Use power of two scaling factors in autoscale ops rules~~ Use power-of-two scaling in autoscale scaled translation ops rules. Jan 2, 2024

balancap added enhancement New feature or request ops Ops coverage labels Jan 2, 2024

balancap self-assigned this Jan 2, 2024

balancap linked an issue Jan 2, 2024 that may be closed by this pull request

Implement power of 2 scaling in AutoScale #61

Closed

balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch from da22330 to eede38a Compare January 2, 2024 18:07

balancap marked this pull request as ready for review January 2, 2024 18:09

balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch 4 times, most recently from df52131 to 8971b50 Compare January 3, 2024 12:08

balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch from 8971b50 to 81b876c Compare January 3, 2024 12:18

balancap merged commit 9f7391b into main Jan 3, 2024
2 checks passed

balancap deleted the use-power-of-two-scaling-factors-in-unit-scaling-rules branch January 3, 2024 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use power-of-two scaling in autoscale scaled translation ops rules. #65

Use power-of-two scaling in autoscale scaled translation ops rules. #65

balancap commented Jan 2, 2024 •

edited

Loading

Use power-of-two scaling in autoscale scaled translation ops rules. #65

Use power-of-two scaling in autoscale scaled translation ops rules. #65

Conversation

balancap commented Jan 2, 2024 • edited Loading

balancap commented Jan 2, 2024 •

edited

Loading