From 9922d3093eea9545aea04d6ba06443e37e57c1ed Mon Sep 17 00:00:00 2001
From: Alexey Bader <alexey.bader@intel.com>
Date: Thu, 19 Sep 2024 16:48:55 -0700
Subject: [PATCH] [NFC] Fix typos, markdown linter issues

Among other changes:
- removed trailing spaces
- fixed 80-char line limitations
---
 sycl/doc/design/CompilerAndRuntimeDesign.md |   4 +-
 sycl/doc/design/ThinLTO.md                  | 190 ++++++++++----------
 2 files changed, 98 insertions(+), 96 deletions(-)

diff --git a/sycl/doc/design/CompilerAndRuntimeDesign.md b/sycl/doc/design/CompilerAndRuntimeDesign.md
index e22492970d99e..7843b5d3b88e6 100644
--- a/sycl/doc/design/CompilerAndRuntimeDesign.md
+++ b/sycl/doc/design/CompilerAndRuntimeDesign.md
@@ -550,8 +550,8 @@ unit)
 - `off` - disables device code split. If `-fno-sycl-rdc` is specified, the behavior is
    the same as `per_source`
 
-If ThinLTO is enabled, device code splitting is run during the compilation stage.
-See [here](ThinLTO.md) for more information.
+If ThinLTO is enabled, device code splitting is run during the compilation
+stage. See [here](ThinLTO.md) for more information.
 
 ##### Symbol table generation
 
diff --git a/sycl/doc/design/ThinLTO.md b/sycl/doc/design/ThinLTO.md
index b1cf4d4082698..41a0a00786bdf 100644
--- a/sycl/doc/design/ThinLTO.md
+++ b/sycl/doc/design/ThinLTO.md
@@ -6,142 +6,144 @@ This document describes the purpose and design of ThinLTO for SYCL.
 
 ## Background
 
-With traditional SYCL device code linking, all user code is linked together 
-along with device libraries into a single huge module and then split and 
-processed by `sycl-post-link`. This requires sequential processing, has a large 
+With traditional SYCL device code linking, all user code is linked together
+along with device libraries into a single huge module and then split and
+processed by `sycl-post-link`. This requires sequential processing, has a large
 memory footprint, and differs from the linking flow for AMD and NVIDIA devices.
 
 ## Summary
-SYCL ThinLTO will hook into the existing community mechanism to run LTO as part 
-of device linking inside `clang-linker-wrapper`. We split the device images 
-early at compilation time, and at link time we use ThinLTO's function importing 
-feature
-to bring in the defintions for referenced functions. Only the new offload model
-is supported.
+
+SYCL ThinLTO will hook into the existing community mechanism to run LTO as part
+of device linking inside `clang-linker-wrapper`. We split the device images
+early at compilation time, and at link time we use ThinLTO's function importing
+feature to bring in the definitions for referenced functions. Only the new
+offload model is supported.
 
 ## Device code compilation time changes
-Most of the changes for ThinLTO occur during device link time, however there is 
-one major change during compilation (-c) time: we now run device code split 
-during compilaton instead of linking.
-The main reason for doing this is increased parallelization. Many compilation 
-jobs can be run at the same time, but linking happens once total for the 
-application. Device code split is currently a common source of performance 
-issues.
-
-Splitting early means that the resulting IR after splitting is not complete, it 
-still may contain calls to functions (user code and/or the SYCL device 
+
+Most of the changes for ThinLTO occur during device link time, however there is
+one major change during compilation (-c) time: we now run device code split
+during compilation instead of linking. The main reason for doing this is
+increased parallelization. Many compilation jobs can be run at the same time,
+but linking happens once total for the application. Device code split is
+currently a common source of performance issues.
+
+Splitting early means that the resulting IR after splitting is not complete, it
+still may contain calls to functions (user code and/or the SYCL device
 libraries) from other object files.
 
-We rely on the assumption that all function defintions matching a declaration 
+We rely on the assumption that all function definitions matching a declaration
 will be the same and we can let ThinLTO pull in any one.
 
-For example, let's start with user device code that defines a `SYCL_EXTERNAL` 
-function `foo` in translation unit `tu_foo`. There is also another translation 
-unit `tu_bar` that references `foo`.
-During the early device code splitting run of `tu_foo`, we may find that more 
-than one of the resultant device images contain a defintion for `foo`.
+For example, let's start with user device code that defines a `SYCL_EXTERNAL`
+function `foo` in translation unit `tu_foo`. There is also another translation
+unit `tu_bar` that references `foo`. During the early device code splitting run
+of `tu_foo`, we may find that more than one of the resultant device images
+contain a definition for `foo`.
 
-We assert that any function defintion for `foo` that is deemed a match by the 
-ThinLTO infrastruction during the processing of `tu_bar` is valid.
+We assert that any function definition for `foo` that is deemed a match by the
+ThinLTO infrastructure during the processing of `tu_bar` is valid.
 
-As a result of running early device code split, the fat object file generated 
-as part of device compilation may contain multiple device code images.
+As a result of running early device code split, the fat object file generated as
+part of device compilation may contain multiple device code images.
 
-# Device code link time changes
+## Device code link time changes
 
-Before we go into the link time changes for SYCL, let's understand the device 
+Before we go into the link time changes for SYCL, let's understand the device
 linking flow for community devices (AMD/NVIDIA):
 
 ![Community linking flow](images/ThinLTOCommunityFlow.svg)
 
-SYCL has two differenting requirements:
+SYCL has two differentiating requirements:
+
 1) The SPIR-V backend is not production ready and the SPIR-V translator is used.
-2) The SYCL runtime requires metadata (module properties and module symbol 
-table) computed from device images that will be stored along the device images 
+2) The SYCL runtime requires metadata (module properties and module symbol
+table) computed from device images that will be stored along the device images
 in the fat executable.
 
-The effect of requirement 1) is that instead of letting ThinLTO call the SPIR-V 
-backend, we add a callback that runs right before codegen would run.
-In that callback, we call the SPIR-V translator and store the resultant file 
-path for use later, and we instruct the ThinLTO framework to not
-perform codegen.
-
-An interesting additional fact about requirement 2) is that we actually need to 
-process fully linked module to accurate compute the module properties. One 
-example where we need the full module is to [compute the required devicelib mask](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLLowerIR/SYCLDeviceLibReqMask.cpp).
-If we only process the device code that was included in the 
-original fat object input to `clang-linker-wrapper`, we will miss devicelib 
-calls in referenced `SYCL_EXTERNAL` functions.
-
-The effect of requirement 2) is that we store the fully linked device image for 
-metadata computation in the SYCL-specific handing code after the ThinLTO 
-framework has completed. Another option would be to try to compute the metadata 
-inside the ThinLTO framework callbacks, but this would require SYCL-specific 
+The effect of requirement 1) is that instead of letting ThinLTO call the SPIR-V
+backend, we add a callback that runs right before CodeGen would run. In that
+callback, we call the SPIR-V translator and store the resultant file path for
+use later, and we instruct the ThinLTO framework to not perform CodeGen.
+
+An interesting additional fact about requirement 2) is that we actually need to
+process fully linked module to accurate compute the module properties. One
+example where we need the full module is to [compute the required devicelib
+mask](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLLowerIR/SYCLDeviceLibReqMask.cpp).
+If we only process the device code that was included in the original fat object
+input to `clang-linker-wrapper`, we will miss devicelib calls in referenced
+`SYCL_EXTERNAL` functions.
+
+The effect of requirement 2) is that we store the fully linked device image for
+metadata computation in the SYCL-specific handing code after the ThinLTO
+framework has completed. Another option would be to try to compute the metadata
+inside the ThinLTO framework callbacks, but this would require SYCL-specific
 arguments to many caller functions in the stack and pollute community code.
 
 Here is the current ThinLTO flow for SYCL:
 
 ![SYCL linking flow](images/ThinLTOSYCLFlow.svg)
 
-We add a `PreCodeGenModuleHook` function to the `LTOConfig` object so that we 
+We add a `PreCodeGenModuleHook` function to the `LTOConfig` object so that we
 can process the fully linked module without running the backend.
 
 However, the flow is not ideal for many reasons:
-1) We are relying on the external `llvm-spirv` tool instead of the SPIR-V 
-backend. We could slightly improve this issue by using a library call to the 
-SPIR-V translator instead of the tool, however the library API requires setting 
-up an object to represent the arguments while we only have strings, and it's 
-non-trivial to parse the strings to figure out how to create the argument 
-object. Since we plan to use the SPIR-V backend in the long term, this does not 
+
+1) We are relying on the external `llvm-spirv` tool instead of the SPIR-V
+backend. We could slightly improve this issue by using a library call to the
+SPIR-V translator instead of the tool, however the library API requires setting
+up an object to represent the arguments while we only have strings, and it's
+non-trivial to parse the strings to figure out how to create the argument
+object. Since we plan to use the SPIR-V backend in the long term, this does not
 seem to be worth the effort.
 
-2) We manually run passes inside `PreCodeGenModuleHook`. This is because we 
-don't run codegen, so we can't take advantage of the `PreCodeGenPassesHook` 
-field of `LTOConfig` to run some custom passes, as those passes are only run 
-when we actually are going to run codegen.
+2) We manually run passes inside `PreCodeGenModuleHook`. This is because we
+don't run CodeGen, so we can't take advantage of the `PreCodeGenPassesHook`
+field of `LTOConfig` to run some custom passes, as those passes are only run
+when we actually are going to run CodeGen.
 
-3) We have to store the fully linked module. This is needed because we need a 
-fully linked module to accurately compute metadata, see the above explanation 
-of SYCL requirement 2). We could get around storing the module by computing the 
-metadata inside the LTO framework and storing it for late use by the SYCL 
-bundling code, but doing this would require even more SYCL-only customizations including 
-even more new function arguments and modifications of the `OffloadFile` class. 
-There are also compliations because the LTO framework is multithreaded, and not all 
-LLVM data structures are thread safe.
+3) We have to store the fully linked module. This is needed because we need a
+fully linked module to accurately compute metadata, see the above explanation of
+SYCL requirement 2). We could get around storing the module by computing the
+metadata inside the LTO framework and storing it for late use by the SYCL
+bundling code, but doing this would require even more SYCL-only customizations
+including even more new function arguments and modifications of the
+`OffloadFile` class. There are also compilations because the LTO framework is
+multithreaded, and not all LLVM data structures are thread safe.
 
 The proposed long-term SYCL ThinLTO flow is as follows:
 
 ![SYCL SPIR-V backend linking flow](images/ThinLTOSYCLSPIRVBackendFlow.svg)
 
-The biggest difference here is that we are running codegen using the SPIR-V 
+The biggest difference here is that we are running CodeGen using the SPIR-V
 backend.
 
-Also, instead of using a lambda function in the `PreCodeGenModuleHook` 
-callback to run SYCL finalization passes, we can take advantage of the `PreCodeGenPassesHook` field to add 
-passes to the pass manager that the LTO framework will run.
-
-It is possible that the number of device images in the fat executable
-and which device image contains which kernel is different with ThinLTO
-enabled, but we do expect this to have any impact on correctness or
-performance, nor we do expect users to care.
+Also, instead of using a lambda function in the `PreCodeGenModuleHook` callback
+to run SYCL finalization passes, we can take advantage of the
+`PreCodeGenPassesHook` field to add passes to the pass manager that the LTO
+framework will run.
 
+It is possible that the number of device images in the fat executable and which
+device image contains which kernel is different with ThinLTO enabled, but we do
+expect this to have any impact on correctness or performance, nor we do expect
+users to care.
 
-# Current limitations
+## Current limitations
 
-`-O0`: Compiling with `-O0` prevent clang from generating ThinLTO metadata 
-during the compilation phase. In the current implementation, this is an error. 
-In the final version, we could either silently fall back to full LTO or 
-generate ThinLTO metadata even for `-O0`.
+`-O0`: Compiling with `-O0` prevent clang from generating ThinLTO metadata
+during the compilation phase. In the current implementation, this is an error.
+In the final version, we could either silently fall back to full LTO or generate
+ThinLTO metadata even for `-O0`.
 
-SYCL libdevice: Current all `libdevice` functions are explicitly marked to be 
-weak symbols. The ThinLTO framework does not consider a defintion of function 
-with weak linkage as it cannot be sure that this definiton is the correct one. 
+SYCL libdevice: Current all `libdevice` functions are explicitly marked to be
+weak symbols. The ThinLTO framework does not consider a definition of function
+with weak linkage as it cannot be sure that this definition is the correct one.
 Ideally we could remove the weak symbol annotation.
 
-No binary linkage: The SPIR-V target does not currently have a production 
-quality binary linker. This means that we must generate a fully linked image as 
-part of device linkage. At least for AMD devices, this is not a requirement as 
-`lld` is used for the final link which can resolve any unresolved symbols. 
-`-fno-gpu-rdc` is default for AMD, so in that case it can call `lld` during 
-compile, but if `-fno-gpu-rdc` is passed, the lld call happens as part of 
-`clang-linker-wrapper` to resolve any symbols not resolved by ThinLTO.
\ No newline at end of file
+No binary linkage: The SPIR-V target does not currently have a production
+quality binary linker. This means that we must generate a fully linked image as
+part of device linkage. At least for AMD devices, this is not a requirement as
+`lld` is used for the final link which can resolve any unresolved symbols.
+`-fno-gpu-rdc` is default for AMD, so in that case it can call `lld` during
+compile, but if `-fno-gpu-rdc` is passed, the lld call happens as part of
+`clang-linker-wrapper` to resolve any symbols not resolved by ThinLTO.