This is a list of notable new features, or any changes which could potentially break or change the behavior of existing setups.
This is intentionally kept short. For a full change log, just see the Git log.
2021-03-18: Subnetwork sub layer can be independent (#473)
This has an effect on recurrent subnetworks. In the optimization phase, individual sub layers can be optimized out of the loop now. This is crucial to allow for an easy use of nested subnetworks. Nested subnetworks are important to allow for generic building blocks such as in the returnn_common recipes. This was a larger internal change in RETURNN, which possibly can simplify other code in RETURNN like losses in subnetworks.
The extended batch information (BatchInfo
attached to Data
)
contains information about merged or packed dimensions in the batch dimension,
such as a beam (from beam search), fixed dimensions or variable-length dimensions.
This has an effect on keeping the information of beam search,
on FlattenBatchLayer
, SplitBatchTimeLayer
, MergeDimsLayer
(on batch dim) and related.
We use literal Python format as serialization format in many places,
e.g. OggZipDataset
.
The idea was that Python should be very fast in parsing Python code
(e.g. via eval
or ast.literal_eval
).
Unfortunately, it turned out that Python is not very fast at this
(specifically to parse literal Python, a subset of Python),
and e.g. JSON parsing is much faster.
We now have native code to parse literal Python,
which is much faster than before,
and this is already used in OggZipDataset
.
Everything should work as before but just be faster.
Note that for the future,
it is probably a better idea to use JSON for serialization,
or some binary format.
2021-03-03: Simplified logging
usage
2021-03-01: External module import with import_
(#436)
Together with this mechanism, some common recipes are being developed in rwth-i6/returnn_common.
2021-02-27: SentencePieces
vocabulary class for SentencePiece
This can use BPE but also potentially better alternatives like unigram language model based subword units. This can also do stochastic sampling for training.
We did not change the defaults.
However, we observed that the defaults don't make sense.
So if you have used batch_norm
with the defaults before,
you likely want to redo any such experiments.
See here
for reasonable defaults.
Esp you want to set momentum
to a small number, like 0.1,
and you probably want update_sample_only_in_training=True
and delay_sample_update=True
.
2020-11-06: PyTorch-to-RETURNN project
2020-08-03: New code structure (discussion)
TFEngine
(or returnn.TFEngine
) becomes returnn.tf.engine
, etc.
2020-06-30: New generic training pipeline / extended custom pretraining (discussion)
Define def get_network(epoch: int, **kwargs): ...
in your config,
as an alternative to pretrain
with custom construction_algo
and network
.
Otherwise this is pretty similar in behavior
(with all similar features, such as #config
overwrites, dataset overwrites, etc),
but not treated as "pretraining",
but used always.
2020-06-12: TensorFlow 2 support (discussion)
Configs basically should "just work". We recommend everyone to use TF2 now.
2020-06-10: Distributed TensorFlow support (discussion, wiki)
2020-06-05: New TF dataset pipeline via tf.dataset
(discussion)
Define def dataset_pipeline(context: InputContext) -> tf.data.Dataset
in your config.
See returnn.tf.data_pipeline
.
This will show the same information as before, but much more compact,
and also in addition the dimension tags (DimensionTag
),
which also got improved in many further cases.
You can have e.g. multiple additional networks which redefine existing layers (they would automatically share params), which can use different flags (e.g. enable the search flag).
It was designed to support this from the very beginning, but the implementation was never fully finished for this. Now examples like hard attention work.
pip install returnn
, and then import returnn
.
Currently pylint and PyCharm inspection checks automatically run in Travis. Both have some false positives, but so far the PyCharm inspections seems much more sane. A lot of code cleanup is being done now. This is not complete yet, and thus the failing tests are ignored.
Based on DotLayer
now.
Is more generic if the attention weights
have multiple time axes (e.g. in Transformer training).
Does checks whether the base time axis
and weights time axis match,
and should automatically select the right one from weights
if there are multiple
(before: it always used the first weights time axis).
The output format (order of axes) might be
different than it was before in some cases.
E.g. the default feature dim axis (if unspecified)
is the last non-dynamic axis.
Also in some cases the time axis will be
automatically re-selected if the original one
was removed and there are multiple dynamic axes.
DimensionTag
support was extended.
When copying compatible to some other data
with multiple dynamic axes, it will more correctly
match the dynamic axes via the dimension tags
(see test cases for examples).
I.e. the output format (order of axes) might be different than it was before in some cases.
2019-02-27: CombineLayer
/ EvalLayer
/ any which concatenate multiple sources, extended automatic broadcasting
See e.g. concat_sources
.
If your whole dataset does not fit into memory
(or you don't want to consume so much memory),
for TensorFlow,
you should always use cache_size = 0
(or "0"
) in the config.
This case got a huge speedup.
If you used MergeDimsLayer
with "axes": "BT"
on some time-major input,
and then later SplitBatchTimeLayer
to get the time-axis back, it was likely incorrect.
2018-08: multi-GPU support via Horovod
2016-12: start on TensorFlow support (Albert Zeyer)
Initial working support already finished within that month. TF 0.12.0.