Java sources should be formatted according to the Google style guide. It can be included in IntelliJ and Eclipse. Google's C++ style guide should also be used for C++ code.
For dependencies, we can use anything compliant with this list, but we want to keep the core libraries as dependency free as possible.
To build all the artifacts locally, simply invoke the command mvn install
at the root of this repository (or the Maven command of your choice).
If you're using JDK 16+, you need to add some exports for the formatter plugin:
--add-exports jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED
--add-exports jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED
--add-exports jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED
--add-exports jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED
--add-exports jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED
This can be done in .mvn/jvm.config
or MAVEN_OPTS
.
By default, the build will attempt to download the existing TensorFlow binaries from the web for the platform it is running on (so you need to have an active internet connection).
If such binaries are not available for your platform, you will need to build the TensorFlow runtime library from sources, by appending the -Pnative-build
argument to your Maven
command. This requires a valid environment for building TensorFlow, including the bazel build tool and a few Python dependencies
(please read TensorFlow documentation for more details). Note that building from sources can take multiple hours on a regular laptop.
To build for GPU, pass -Djavacpp.platform.extension=-gpu
to maven. If you want to use TensorFlow Java with unsupported GPUs, set the environment variable TF_CUDA_COMPUTE_CAPABILITIES
, or
configure it in a bazel rc file (i.e. build --action_env TF_CUDA_COMPUTE_CAPABILITIES="6.1"
).
Occasionally tests will fail with a message like:
Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.0:test(default-test)on project tensorflow-core-api:There are test failures.
Please refer to C:\mpicbg\workspace\tensorflow\java\tensorflow-core\tensorflow-core-api\target\surefire-reports for the individual test results.
Please refer to dump files(if any exist)[date]-jvmRun[N].dump,[date].dumpstream and[date]-jvmRun[N].dumpstream.
The forked VM terminated without properly saying goodbye.VM crash or System.exit called?
Command was cmd.exe/X/C"C:\Users\me\.jdks\adopt-openj9-1.8.0_275\jre\bin\java -jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396\surefirebooter5751859365434514212.jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396 2020-12-18T13-57-26_766-jvmRun1 surefire2445852067572510918tmp surefire_05950149004635894208tmp"
Error occurred in starting fork,check output in log
Process Exit Code:-1
Crashed tests:
org.tensorflow.TensorFlowTest
org.apache.maven.surefire.booter.SurefireBooterForkException:The forked VM terminated without properly saying goodbye.VM crash or System.exit called?
Command was cmd.exe/X/C"C:\Users\me\.jdks\adopt-openj9-1.8.0_275\jre\bin\java -jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396\surefirebooter5751859365434514212.jar C:\Users\me\AppData\Local\Temp\surefire236563113746082396 2020-12-18T13-57-26_766-jvmRun1 surefire2445852067572510918tmp surefire_05950149004635894208tmp"
Error occurred in starting fork,check output in log
Process Exit Code:-1
Crashed tests:
org.tensorflow.TensorFlowTest
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:671)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:533)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:278)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:244)
This is because the native code crashed (i.e. because of a segfault), and it should have created a dump file somewhere in the project that you can use to tell what caused the issue.
To upgrade the version of TensorFlow that is embedded within TensorFlow Java, please follow carefully these steps.
- Download locally the archive of the tensorflow release at https://github.com/tensorflow/tensorflow/archive/refs/tags/vX.X.X.tar.gz
- Compute the SHA sum using the shell command
shasum -a 256 <tensorflow-x.x.x.tar.gz>
- Update
urls
,sha256
andstrip_prefix
fields of theorg_tensorflow
archive rule in Bazel workspace - Extract the archive in a temporary folder
- Copy the content of
tensorflow-x.x.x/.bazelrc
file totensorflow-core/tensorflow-core-native/tensorflow.bazelrc
under TensorFlow Java source tree - Copy the content of
tensorflow-x.x.x/WORKSPACE
after the "###### Copy content of..." notice iftensorflow-core/tensorflow-core-native/WORKSPACE
, read notice for more details - Copy the content of
tensorflow-x.x.x/.bazelversion
file totensorflow-core/tensorflow-core-native/.bazelversion
- Validate that options in
tensorflow-core/tensorflow-core-native/.bazelrc
are still accurate or update them accordingly - Update URLs of existing TensorFlow binaries in the
tensorflow-core/tensorflow-core-native/scripts/dist_download
script - Update URLs of TensorFlow-Text binaries used for testing in the
tensorflow-core/tensorflow-core-api/scripts/test_download
script
In order to build the TensorFlow native library to work with TensorFlow Java, we sometimes need to apply some patches to the TensorFlow sources. These
patches are found in tensorflow-core/tensorflow-core-native/external
.
- If you have an error like "Error in fail: Error applying patch //external:xxx.patch:", verify why the patch is failing by looking at the TensorFlow source code. Chances are that this code has changed and the patch needs to be updated.
- To create a new patch or update one, you can make a copy of the TensorFlow source file to change, make your change and generate a patch using
git diff <file> <file-updated>
- If more than one file needs to be added to the patch, it's easier to clone the TensorFlow repository, apply the changes and use
git diff
at the root of the tree
After upgrading the TensorFlow library, you need to regenerate all Java bindings that depends on the native code. That includes Java protos, C API bindings (JavaCPP) and
operator classes. You can trigger the regeneration of these bindings with the Maven command mvn clean install -Pgenerating
.
This will also trigger a small Bazel build of the TensorFlow sources to regenerate the Java protos, so make sure your environment is setup properly.
When generating the operator classes, the build process might prompt you to provide information about the new operations found in the targeted TensorFlow version. This will generate a new API definition under the tensorflow-core/tensorflow-core-api/api folder. The required information is:
- The visibility for this op
- VISIBLE to force the creation of a Java class that will be also exposed by the
*Ops
API classes. - HIDDEN for creating a Java class that won't be exposed by the
*Ops
API classes. - SKIP for not creating a Java class for this operation
- DEFAULT to rely on the visibility settings set in TensorFlow sources
- VISIBLE to force the creation of a Java class that will be also exposed by the
- The name group for this operator
- This name is used to place this operator under the right subpackage and
*Ops
API. - For example, the group
nn
will place the operatorConv
under theorg.tensorflow.op.nn
package and in theNnOps
API class. - When no group is specified, the operator will go under the
org.tensorflow.op.core
package and in theOps
API class.
- This name is used to place this operator under the right subpackage and
- The name for this op
- By default is the name found in TensorFlow registry but can be useful in some cases to rename it in case it clashes with Java keywords (e.g.
Switch
->SwitchCond
) - Can also be used to remove the suffix of an operation that has multiple versions (e.g.
RestoreV2
->Restore
)
- By default is the name found in TensorFlow registry but can be useful in some cases to rename it in case it clashes with Java keywords (e.g.
The actual classification process is a bit arbitrary and based on the good judgement of the developer. The reason is that most ops in Python are being wrapped by a higher-level API and therefore are left unclassified, while in Java they are exposed and can be used directly by the users.
Please review the location of the new generated operators after the build is complete and make necessary adjustments to the API definitions protos manually if some of them seems to be in the "wrong" place, making sure to repeat this process until satisfaction.
Some operations might be just an upgrade of another existing operations. For instance, there are many version of the BatchMatMul
kernel (V1, V2, V3...).
When you see that a new op is just an upgrade from another other one, make sure that the latest version has a valid endpoint and that all other
previous versions of this operation are marked as VISIBILITY: SKIP
.
TensorFlow Java distributes a large number proto definitions found in the TensorFlow native library as Java classes. Again, new protos might not
be classified properly since they may be lacking the option java_*
statements at the beginning of their definition. The build script will attempt
to mitigate this omission by generating the proto bindings under the same package as the package
statement (if also present), and under the root package
org.tensorflow.proto
.
Code generation for Ops
and related classes is done during tensorflow-core-api
's compile
phase, using the annotation processor in
tensorflow-core-generator
. If you change or add any operator classes (annotated with org.tensorflow.op.annotation.Operator
), endpoint methods (
annotated with org.tensorflow.op.annotation.Endpoint
), or change the annotation processor, be sure to re-run a
mvn clean install -Pgenerating
in tensorflow-core-api
.
In some cases, a op supported by Tensorflow Java will not have a gradient defined, resulting in errors like this:
org.tensorflow.exceptions.TensorFlowException: No gradient defined for op: ReadVariableOp. Please see https://www.tensorflow.org/code/tensorflow/cc/gradients/README.md for instructions on how to add C++ gradients.
at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101)
at org.tensorflow.Graph.addGradients(Graph.java:708)
at org.tensorflow.Graph.addGradients(Graph.java:291)
The description in the linked file are accurate for adding C++ Graph gradients, which are used by our Graph
. Examples of doing that are tensorflow/tensorflow#46115 and tensorflow/tensorflow#47774.
You can also code and register the missing gradients in Java, using the TensorFlow Java custom gradient registration capabilities. Check at the JavaDoc of tensorflow-core-api
for more details.