Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use power-of-two scaling in autoscale scaled translation ops rules. #65

Merged

Conversation

balancap
Copy link
Contributor

@balancap balancap commented Jan 2, 2024

As shown in #60 issue, propagating non power-of-two scaling factors can decrease training accuracy in low precision (typically in FP16).
The additional rescaling operations will introduce non-negligible floating point accumulated errors.

Ths PR is adding the option to round the scale to a power-of-two in scaled translation. Supporting at the moment only rounding up and down. The rounding mode can be modified in the config dataclass AutoScaleConfig.

Scaled translations updated are: dot_general, add, sub and reduce_sum.

Finally, when implicitely converting scalars to scaled arrays, the method make_scaled_scaled now splits the input mantissa and exponent between data and scale.

@balancap balancap marked this pull request as draft January 2, 2024 12:38
@balancap balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch from a8d2bdd to da22330 Compare January 2, 2024 17:53
@balancap balancap changed the title Use power of two scaling factors in autoscale ops rules Use power-of-two scaling in autoscale scaled translation ops rules. Jan 2, 2024
@balancap balancap added enhancement New feature or request ops Ops coverage labels Jan 2, 2024
@balancap balancap self-assigned this Jan 2, 2024
@balancap balancap linked an issue Jan 2, 2024 that may be closed by this pull request
@balancap balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch from da22330 to eede38a Compare January 2, 2024 18:07
@balancap balancap marked this pull request as ready for review January 2, 2024 18:09
@balancap balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch 4 times, most recently from df52131 to 8971b50 Compare January 3, 2024 12:08
As shown in #60 issue, propagating non power-of-two scaling factors can decrease training accuracy in low precision (typically in FP16).
The additional rescaling operations will introduce non-negligible floating point accumulated errors.

Ths PR is adding the option to round the scale to a power-of-two in scaled translation. Supporting at the moment only rounding up and down. The rounding mode
can be modified in the config dataclass `AutoScaleConfig`. Scaled translations updated are: `dot_general`, `add`, `sub` and `reduce_sum`.

Finally, when implicitely converting scalars to scaled arrays, the method `make_scaled_scaled` now splits the input mantissa and exponent between data and scale.
@balancap balancap force-pushed the use-power-of-two-scaling-factors-in-unit-scaling-rules branch from 8971b50 to 81b876c Compare January 3, 2024 12:18
@balancap balancap merged commit 9f7391b into main Jan 3, 2024
2 checks passed
@balancap balancap deleted the use-power-of-two-scaling-factors-in-unit-scaling-rules branch January 3, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ops Ops coverage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement power of 2 scaling in AutoScale
1 participant