Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: implement type-based alias analysis #70488

Open
andreybokhanko opened this issue Nov 21, 2024 · 5 comments
Open

cmd/compile: implement type-based alias analysis #70488

andreybokhanko opened this issue Nov 21, 2024 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@andreybokhanko
Copy link
Contributor

Go version

go version devel go1.24-3ca78afb3b Mon Nov 18 04:56:52 2024 +0000 linux/arm64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/abokhanko/.cache/go-build'
GODEBUG=''
GOENV='/home/abokhanko/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build625133068=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/home/abokhanko/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/abokhanko/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/abokhanko/goroot'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/abokhanko/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/abokhanko/goroot/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='devel go1.24-3ca78afb3b Mon Nov 18 04:56:52 2024 +0000'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

Test case:

package test

func Foo(x **int, y *int) int {
  *y = 10
  *x = nil
  return *y + 20
}

Compiled with go build -a -gcflags='-d ssa/all/dump=Foo' test.go.

What did you see happen?

Read of *y memory location and thus, the subsequent addition of a constant are not eliminated:

$ cat Foo_51__trim.dump
...
    (+6) v17 = MOVDload <int> v8 v15 : R1
    (6) v19 = ADDconst <int> [20] v17 : R0
    (6) v20 = MakeResult <int,mem> v19 v15 : <>

What did you expect to see?

Read of *y eliminated; subsequent addition of two constant values (10, which is the value of *y and 20) folded.

@andreybokhanko andreybokhanko changed the title cmd/go: implement type-based alias analysis cmd/compile: implement type-based alias analysis Nov 21, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 21, 2024
@andreybokhanko
Copy link
Contributor Author

andreybokhanko commented Nov 21, 2024

The reason CSE or other SSA optimizations didn’t eliminated the read of *y is due to presence of a write to *x, that ssa.disjoint can’t disambiguate with *y.

However, this can be done based on type aliasing rules. My reading of the rules (https://pkg.go.dev/unsafe#Pointer) is that no non-pointer type (like int in case of *y) can alias with a pointer type (*int in case of *x).

@andreybokhanko
Copy link
Contributor Author

I implemented a patch that adds type-based alias analysis. Here it goes:

diff --git a/src/cmd/compile/internal/ssa/rewrite.go b/src/cmd/compile/internal/ssa/rewrite.go
index 5630bfd729..65f1b85172 100644
--- a/src/cmd/compile/internal/ssa/rewrite.go
+++ b/src/cmd/compile/internal/ssa/rewrite.go
@@ -863,11 +863,18 @@ func disjoint(p1 *Value, n1 int64, p2 *Value, n2 int64) bool {
                }
                return base, offset
        }
+
+        // Run types-based analysis
+        if disjointTypes(p1.Type, p2.Type) {
+          return true
+        }
+
        p1, off1 := baseAndOffset(p1)
        p2, off2 := baseAndOffset(p2)
        if isSamePtr(p1, p2) {
                return !overlap(off1, n1, off2, n2)
        }
+
        // p1 and p2 are not the same, so if they are both OpAddrs then
        // they point to different variables.
        // If one pointer is on the stack and the other is an argument
@@ -888,6 +895,48 @@ func disjoint(p1 *Value, n1 int64, p2 *Value, n2 int64) bool {
        return false
 }

+// disjointTypes reports whether a memory region pointed to by a pointer of type
+// t1 does not overlap with a memory region pointed to by a pointer of type t2 --
+// based on type aliasing rules.
+func disjointTypes(t1 *types.Type, t2 *types.Type) bool {
+  if t1 == t2 {
+    return false
+  }
+
+  // Unsafe pointer and uintptr can alias with anything.
+  if t1.IsUnsafePtr() || t2.IsUnsafePtr() || t1.IsUintptr() || t2.IsUintptr() {
+    return false
+  }
+
+  // Pointers and non-pointers are disjoint (https://pkg.go.dev/unsafe#Pointer).
+  // IsPtrShaped doesn't include interfaces -- even though they are represented as
+  // pointers. Include them.
+  if t1.IsPtrShaped() != t2.IsPtrShaped() && !t1.IsInterface() && !t2.IsInterface() {
+    return true
+  }
+
+  // For pointers, recursively go through pointer elements.
+  if t1.IsPtr() && t2.IsPtr() {
+    return disjointTypes(t1.Elem(), t2.Elem())
+  }
+
+  // For arrays and slices, get element type
+  if t1.IsArray() || t1.IsSlice() {
+    t1 = t1.Elem()
+  }
+
+  if t2.IsArray() || t2.IsSlice() {
+    t2 = t2.Elem()
+  }
+
+  // Either t1 or t2 can only alias with itself.
+  if (t1.IsNonaliasable() || t2.IsNonaliasable()) && t1.Kind() != t2.Kind() {
+    return true
+  }
+
+  return false
+}
+
 // moveSize returns the number of bytes an aligned MOV instruction moves.
 func moveSize(align int64, c *Config) int64 {
        switch {
diff --git a/src/cmd/compile/internal/types/type.go b/src/cmd/compile/internal/types/type.go
index 9d3dde8c13..d1ad2709d2 100644
--- a/src/cmd/compile/internal/types/type.go
+++ b/src/cmd/compile/internal/types/type.go
@@ -1400,6 +1400,16 @@ func (t *Type) HasNil() bool {
        return false
 }

+// IsNonaliasable reports whether a value of type t can only alias of other
+// values of the same type (exluding TUNSAFEPTR, that can alias with anything).
+func (t *Type) IsNonaliasable() bool {
+       switch t.kind {
+       case TNIL, TSTRING, TCHAN, TMAP:
+               return true
+       }
+       return false
+}
+
 func (t *Type) IsString() bool {
        return t.kind == TSTRING
 }

It kicks in and enables additional optimizations in a few spots in the standard library and a real-world program I tried (etcd). For example, it enables removal of c.p load from https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/regexp/syntax/compile.go;l=164, as it is dominated by a store to c.p at https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/regexp/syntax/compile.go;l=81 (compiler.inst gets inlined to compiler.init). Without type-based analysis, we can't understand that the store to c.p.NumCap at https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/regexp/syntax/compile.go;l=82 doesn't alias with c.p. There are several other places like this both in the standard library and in etcd code.

However, this doesn't bring any tangible benefits to sweet benchmark:

                       │ base.results │         improved_1.results          │
                       │    sec/op    │    sec/op     vs base               │
BleveIndexBatch100-4      11.19 ±  1%    11.14 ±  3%       ~ (p=0.853 n=10)
ESBuildThreeJS-4          1.289 ±  1%    1.288 ±  0%       ~ (p=0.971 n=10)
ESBuildRomeTS-4          631.0m ±  1%   633.6m ±  1%       ~ (p=0.105 n=10)
EtcdPut-4                50.18m ±  1%   50.60m ±  2%       ~ (p=0.631 n=10)
EtcdSTM-4                290.4m ±  1%   291.1m ±  1%       ~ (p=0.481 n=10)
GoBuildKubelet-4          186.6 ±  3%    189.4 ±  3%       ~ (p=0.218 n=10)
GoBuildKubeletLink-4      19.39 ±  4%    19.40 ±  3%       ~ (p=0.853 n=10)
GoBuildIstioctl-4         140.0 ±  5%    142.0 ±  4%       ~ (p=0.280 n=10)
GoBuildIstioctlLink-4     12.62 ± 10%    12.62 ±  7%       ~ (p=0.853 n=10)
GoBuildFrontend-4         54.13 ±  9%    54.83 ±  7%       ~ (p=0.247 n=10)
GoBuildFrontendLink-4     2.387 ± 29%    2.348 ± 23%       ~ (p=0.684 n=10)
GopherLuaKNucleotide-4    38.48 ±  0%    38.28 ±  1%       ~ (p=0.247 n=10)
MarkdownRenderXHTML-4    284.7m ±  0%   285.1m ±  0%       ~ (p=0.143 n=10)
Tile38QueryLoad-4        1.052m ±  1%   1.040m ±  1%       ~ (p=0.052 n=10)
geomean                   2.728          2.732        +0.15%

@mundaym , @randall77 , it seems you are the primary authors of disjoint; may I ask you to kindly take a look at my patch and let me know if it makes sense to submit a CL with these results -- that is, there are cases of better code generated in the standard library / real-world applications, but no impact on sweet?

Andrey

@dmitshur dmitshur added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 21, 2024
@dmitshur dmitshur added this to the Backlog milestone Nov 21, 2024
@randall77
Copy link
Contributor

You'll want to read through https://go-review.googlesource.com/c/go/+/551381 first.
TL;DR type-based alias analysis isn't safe in Go.

I think possibly your ptr vs nonptr rule might work, but not the rest.

@andreybokhanko
Copy link
Contributor Author

You'll want to read through https://go-review.googlesource.com/c/go/+/551381 first. TL;DR type-based alias analysis isn't safe in Go.

I think possibly your ptr vs nonptr rule might work, but not the rest.

@randall77 thanks for pointing at this CL; I wasn't aware of it!

I read through your comments; I believe they are already handled in my code. Essentially, what my patch does is what you proposed in your comment to CL551381:

Also, TBAA isn't completely pointless. Pointers like **byte and *byte are known not to alias based on their type, as there is no way to alias them using rule 1 described in the unsafe package (which admittedly depends on defining "equivalent memory layout" which is mentioned there but not really defined, but in any reasonable definition would imply that byte and *byte have different layouts). I'm not sure if those cases would help much in practice.

The only "extra" is IsNonaliasable function, that claims that types TNIL, TSTRING, TCHAN and TMAP can only alias with themselves.

I tested this patch on go build itself and on sweet benchmark -- it triggers several thousand times (unfortunately, with very little optimization effect -- perhaps due to lack of more advanced optimizations that require more advanced analysis to give a tangible effect -- a kind of chicken and egg problem) and there are no fails.

Andrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants