Skip to content

Latest commit

 

History

History
89 lines (73 loc) · 3.96 KB

FAQ.md

File metadata and controls

89 lines (73 loc) · 3.96 KB

FAQ

简体中文

Pruning

How do I modify the pruning rate of a single operator?

Q: In the example, we set the pruning rate of the whole network. How to adjust the pruning rate of a specific layer?

# example config
sparsity: 0.25
metrics: l2_norm # The available metrics are listed in `tinynn/graph/modifier.py`

A: After calling pruner.prune(), a new configuration file with the sparsity for each operator will be generated inplace. You can use this file as the configuration for the pruner or generate a new configuration file based on this one. (e.g. line 42 in examples/oneshot/oneshot_prune.py)

# new yaml generated
sparsity:
  default: 0.25
  model_0_0: 0.25
  model_1_3: 0.25
  model_2_3: 0.25
  model_3_3: 0.25
  model_4_3: 0.25
  model_5_3: 0.25
  model_6_3: 0.25
  model_7_3: 0.25
  model_8_3: 0.25
  model_9_3: 0.25
  model_10_3: 0.25
  model_11_3: 0.25
  model_12_3: 0.25
  model_13_3: 0.25
metrics: l2_norm # Other supported values: random, l1_norm, l2_norm, fpgm

How to speed up training?

The training in TinyNeuralNetwork is based on PyTorch. Usually, the bottleneck is in the data processing part, which you can try to use LMDB and other in-memory databases to accelerate.

Quantization

How to deal with errors related to quantization?

Q: Some operators such as max_pool2d_with_indices will fail when quantizing

A: The quantization-aware training of TinyNeuralNetwork is based on that of PyTorch, and only reduces its complexity related to operator fusion and computational graph rewrite. TinyNeuralNetwork does not support operators that are not natively supported by PyTorch, such as LeakyReLU and etc. Please wrap up torch.quantization.QuantWrapper on those modules. (More operators are supported in higher versions of PyTorch. So, please consult us first or try a higher version if you encounter any failure)

How to perform mixed precision quantization?

Q: How to quantize only part of a quantized graph when the default is to perform quantization on the whole graph?

# Quantization with the whole graph
with model_tracer():
    quantizer = QATQuantizer(model, dummy_input, work_dir='out')
    qat_model = quantizer.quantize()

A: First, perform quantization for the whole graph. Then, manually modify the positions of QuantStub and DeQuantStub. After that, using the code below to load the model.

# Reload the model with modification
with model_tracer():
    quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'force_overwrite': False})
    qat_model = quantizer.quantize()

How to handle the case of inconsistent training and inference computation graphs?

Q: Models may have some extra logic in the training phase that are not needed in inference, such as the model below (which is also a common scenario in real world OCR and face recognition). This will result in the quantization model code generated by codegen during training is not available for inference.

class FloatModel(nn.Module):
    def __init__(self):
        self.conv = nn.Conv2d()
        self.conv1 = nn.Conv2d()

    def forward(self, x):
        x = self.conv(x)
        if self.training:
            x = self.conv1(x)
        return x

A: There are generally two ways to tackle this problem.

  • Use the code generator in TinyNeuralNetwork to create qat_train_model.py, qat_eval_model.py in case of model.train(), model.eval(), respectively Use qat_train_model.py for training, and then use qat_eval_model.py to load the weights trained by the former when inference is needed (Since there is no self.conv1 in qat_eval_model.py, you need to set strict=False when calling load_state_dict)
  • Like the former one, generate two different copies of the model in training mode and evaluation mode respectively. And then, make a copy of qat_train_model.py and replace the forward function with that in qat_eval_model.py manually. Finally, use the modified script as the one for the evaluation mode.