Thanks for considering to contribute! To get you started, here is a bit of information about the structure of this implementation.
You will need to sign the Oracle Contributor Agreement for us to be able to merge your work.
Please also take some time to review our code of conduct for contributors.
The first thing you want to do is to set up
mx. This is the build tool we use to
develop GraalVM languages. You also need LLVM 6, including the opt
tool -- the
latter is not included on macOS by default, here you can install the homebrew
version of LLVM, which includes this tool. Note that you can use any JDK, and do
not necessarily need GraalVM for development. In that case you'll only be able
to run without the just-in-time compiler, but that can be fine for making and
testing changes that are not performance sensitive.
Once you have mx
on your PATH
, you can run mx build
in this
repository. This will initially download the required dependencies next to the
repository and build Python. If it succeeds without errors, you should already
be able to run mx python
and get a REPL.
If you just want to copy and paste some commands, these should get you started:
$ git clone https://github.com/graalvm/mx.git
$ git clone https://github.com/graalvm/graalpython.git
$ cd graalpython
$ ../mx/mx build
$ ../mx/mx python -c "print(42)"
For development, we recommend running mx ideinit
next. This will generate
configurations for Eclipse, IntelliJ, and Netbeans so that you can open the
projects in these IDEs. If you use another editor with support for the Eclipse
language server we also had reports
of useable development setups with that, but it's not something we support.
Besides the source code of the Python interpreter, we have some useful mx
functions defined under the mx.graalpython
directory. As you make changes, you
can test always test them with mx build && mx python
. Additionally, there are
various "gates" that we use on our CI systems to check any code that goes
in. You can run all of these using mx python-gate
or just some by using mx python-gate --tags [TAG]
. Two interesting gates to run that cover most things
are:
python-unittest
- Run the unittests written in Python, including those for the C extension APIpython-license
- Check that all files have the correct copyright headers applied to them
For the most part, built-in modules and classes are implemented in the
com.oracle.graal.python.builtins
package. For each module or class, there's
Java class annoted with @CoreFunctions
. Each function in a module or a class
is implemented in a Node annotated with @Builtin
. Take a look at the existing
implementations to get a feel for how this is done. For now, when adding new
classes or modules, they need to be added to the list in
com.oracle.graal.python.builtins.Python3Core
.
Some built-in functions, modules, and classes are implemented in pure Python. The
files for this are in graalpython/lib-graalpython
. These files are listed in
the Java com.oracle.graal.python.builtins.Python3Core
class. Take a look at
these files to see what they do. If a file is called exactly as a built-in
module is, it is executed in the context of that module during startup, so some
of our modules are implemented both in Java and Python. If the name matches no
existing module, the file is executed just for the side-effects.
When implementing a new (or fixing an existing) built-in, take a look at the
CPython source. The layout and naming of modules and types is kept similar to
the CPython source so it should be relatively easy to find the right piece of
code. For some special dunder methods (__add__
, __getitem__
,
__getattribute__
, ...) you may have to figure out the C API slot names for
them to find the right piece of code (nb_add
, sq_item
, tp_getattr
, ...).
You will find that often there are specific C API methods that are called to
convert or coerce arguments, to look up methods either starting on the object or
only on the class, to call a callable object or invoke a method, and more. In
general, most of these methods should have equivalents in our
PythonObjectLibrary
. See the
IMPLEMENTATION_DETAILS.md file for details on
that library. If something is missing that is commonly used, we probably have
some Node for it, but it may be a good idea to add it to the
PythonObjectLibrary
for easier discovery.
GraalPython has its own variant of the argument clinic preprocessor. It is
activated by: extending PythonXXXClinicBuiltinNode
(e.g.,
PythonBinaryClinicBuiltinNode
), using @ArgumentClinic
annotations
on the builtin node class, and overriding the getArgumentClinic
method
to return the class that will be generated from the annotations (it will be
named the same as the node class plus ClinicProviderGen
suffix).
Sometimes, you will not easily find what exactly happens for a given piece of
code when that involves more than just a simple built-in call. The dis
module
on CPython can often help get an angle on what a particular piece of code is
doing. You can call dis.dis
on any Python function and it will print you
details of the bytecode and associated data, which can be a good starting point
to browse through the CPython source.
The C implementation and headers for our C API are in
graalpython/com.oracle.graal.python.cext
. The naming is analogous to C
Python's source names. This folder also includes a modules
folder for built-in
modules that we have adapted from C Python.
The GraalVM implementation of Python provides proper debug options. It is possible to either debug the Python code, using Chrome debugger,
or the java code, using your preferred IDE.
The following commands should be executed in a virtualenv environment, which provides a graalpython executable.
For debug Python side code call this:
graalpython --inspect your_script.py
This will open a debug server, which can be accessed in Chrome Browser under URL chrome://inspect
.
For debugging java implemented code execute:
graalpython --experimental-options -debug-java your_script.py
The command will also start a debug server, which can be used in an IDE. If the IDE was initialized properly
by using the command mentioned above, the existing GraalDebug
run configuration can be used to debug.
Here are some advanced commands to debug test failures and fix issues.
First, we have three sets of unittests in the base repository:
- Our own Python-bases unittests
- JUnit tests
- Python's standard library tests
To run the first, you can use this command:
mx python-gate --tags python-unittest
If some of the tests fail, you can re-run just a single test like this,
substituting TEST-PATTERN (and possibly the file glob on the third line) with
the test you want to run. Note that you can insert -d
to debug on the Java
level or use --inspect
to debug in the Chrome debugger.
mx [-d] graalpytest [--inspect] test_*.py \
-k TEST-PATTERN
To run the JUnit tests, you can use this command:
mx python-gate --tags python-junit
To run a subset of the tests, you can use the following. Again, you can use -d
to attach with a Java debugger.
mx [-d] punittest JAVA-TEST-CLASSNAME
To run the Python standard library tests, you can use the following:
mx python-gate --tags python-tagged-unittest
Note that we use "tag files", small .txt
files that select which tests to run,
so we only run tests that we know should pass. To run a subset of those tests,
use the following command. However, the way we run those tests is by spawning a
sub-process for every stdlib tests, to avoid interference while our
implementation isn't quite ready, so you have to put the flags somewhere else to
debug. You can see -debug-java
and --inspect
below, to debug in Java
debugger or Chromium, respectively.
mx python-run-cpython-unittest [-debug-java] [--inspect] NAME-OF-CPYTHON-UNITTEST
A tag file can be regenerated with
mx python-retag-unittests NAME-OF-CPYTHON-UNITTEST
There's also multiple other gates that may fail with changes. One of these is
our style gate, which checks formatting rules and copyrights. To auto-fix most
issues, run the following commands. Anything that's reported as error after this
command you have to fix manually. Note that to really match what's in the gate,
you have to set the JDT
environment variable to the path to an Eclipse
compiler Jar file, and the ECLIPSE_EXE
environment variable to the path of an
eclipse executable.
mx python-style --fix
mx python-gate --tags style
Another important gate is the gate that checks if you broke the native image
building. To test if building a native image still works, you can use the
following command. This will create a native executable called graalpython
and
print its path as the last output, if successful.
mx python-svm
If you made changes to the parser, you may have to regenerate the golden files like so:
find graalpython -name *.scope -delete
find graalpython -name *.tast -delete
mx punittest com.oracle.graal.python.test.parser
We use the mx
facilities for benchmarking. Use this to list the available
Python suites and VM configurations:
mx benchmark --list
If you just want to run a single benchmark from, for example, the meso
suite,
you can use this:
mx benchmark meso --list
Then if you want to run something, use (for example):
mx benchmark meso:nbody3
To select which Python VM you want to use, you can pass the arguments separated
by --
:
mx benchmark meso:nbody3 -- --python-vm=cpython
For additional arguments to the Python launcher, you can separate them by another double-dash:
mx benchmark meso:nbody3 -- --python-vm=graalpython -- --python.EmulateJython -Dgraal.Dump= -Dgraal.MethodFilter=*measure*
Note that there may be a little confusion about the configuration names of benchmarks.
We have benchmarks for GraalVM Community and Enterprise. For historical reasons, these are sometimes referred to in some config files as CE and EE; core and enterprise; graalvm_ce and graalvm_ee; or graalpython_core and graalpython_enterprise, respectively.
There are also different options for how the Python interpreter is run, passed
via the --python-vm-config
parameter:
default
- run using the standard optionsdefault-multi
- run using a shared engine, which is the mode that is recommended to embedders that want to spawn multiple isolated Python contextsnative
- same asdefault
, its name is due to the fact that it runs C extensions using a mixture of LLVM bitcode interpreted and compiled via GraalVM and real native librariessandboxed
- this name is historical - this configuration requires a GraalVM Enterprise and runs all C extensions purely as LLVM bitcode on the GraalVM, without any access to the native OS libraries, i.e., using the--llvm.managed
option for GraalVM.
Finally, there are the --jvm
and --jvm-config
configuration options for mx benchmark
. By default, the commands presented above will run on the JVM in
server mode, using the Graal compiler in what we call hosted mode. This is
almost the same but not quite the --jvm
mode you will get when running the
graalpython
executable from a full GraalVM, and usually good enough if you
want to look at the compiler graphs or peak performance numbers. In our CI,
however, we always build a full GraalVM and benchmark using that, since that is
what we ship. There, we have two different configurations corresponding to the
launcher flags available for the GraalVM graalpython
executable: jvm and
native. The first runs on top of Hotspot using the Graal compiler, the second
runs the AOT compiled GraalVM native image of Python.
Building a GraalVM Python configuration can be achieved for the CE version like so:
mx --env ../../graal/vm/mx.vm/ce --exclude-components=slgm --dynamicimports /vm graalvm-show
mx --env ../../graal/vm/mx.vm/ce --exclude-components=slgm --dynamicimports /vm build
The first command will print some information about the GraalVM configuration
that is about to be built, and the second will build it. IMPORTANT: The
first command should tell you that the Config name
is ce_python
, otherwise
the next commands will not work.
To run the JVM configuration:
mx --env ../../graal/vm/mx.vm/ce --exclude-components=slgm --dynamicimports /vm benchmark meso:nbody3 -- --python-vm=graalpython --jvm=graalvm-ce-python --jvm-config=jvm --python-vm-config=default --
To run the Native Image configuration:
mx --env ../../graal/vm/mx.vm/ce --exclude-components=slgm --dynamicimports /vm benchmark meso:nbody3 -- --python-vm=graalpython --jvm=graalvm-ce-python --jvm-config=native --python-vm-config=default --
For best performance we keep references to long-lived user objects (mostly functions, classes, and modules) directly in the AST nodes when using the default configuration of a single Python context (as is used when running the launcher). For better sharing of warmup and where absolutely best peak performance is not needed, contexts can be configured with a shared engine and the ASTs will be shared across contexts. However, that implies we must not store any user objects strongly in the ASTs. We test that we have no PythonObjects alive after a Context is closed that are run as part of our JUnit tests. These can be run by themselves, for example, like so:
$ mx python-leak-test --lang python --shared-engine --code 'import site, json' --forbidden-class com.oracle.graal.python.builtins.objects.object.PythonObject --keep-dump
The --keep-dump
option will print the heapdump location and leave the file
there rather than deleting it. It can then be opened for example with VisualVM
to check for the paths of any leaked object, if there are any.