Tai Chi ☯️ , known as a Chinese martial art, emphasizes on practicing "smart strength" like the leverage of joints to gain great power with small efforts. This philiosophy interestingly fits perfectly into few-shot learning (FSL) research -- with "smart tricks", people try to train models with good performance using small amount of data. So we name our FSL library as Taichi in the hope that it will help your model training in low data scenario.
Over last few years, we have seen great progress in FSL research thanks to the work in pre-training, meta-learning, data augmentation, and public benchmark datasets. Since data collection and labeling are often expensive and time-consuming, breakthroughs in FSL research have huge potential use cases in ML/DL industry. The Salesforce Research team has also done a lot of FSL related projects for research and application purposes, please feel free to check out our publications in FSL and other areas here.
The Taichi library actually serves as an API hub for various effective methods proposed by the Salesforce Research team. We are currently releasing Taichi 1.0, which contains two main FSL methods: DNNC and USLP. These two methods are mainly for few-shot intent classification. We are working on including more useful FSL methods into Taichi, stay tuned for next release!
- Pythonic API, “from taichi import few_shot_learning_method”
- Based on pyTorch and Huggingface transformers library
- Included two recently published few-shot methods: DNNC and USLP
- Data sampling and error analysis API
- Examples on CLINC150 dataset for quick start
- Pre-trained English and multi-lingual transformer models and pre-processed CLINC150 dataset here
The following figure provides a quick comparison of standard intent classification, DNNC, and USLP. In short, both DNNC and USLP are based upon NLI-style classification, DNNC reframes classification as entailment prediction between query and utterances in the training set while USLP tries to predict entailment relationship of utterance and semantic labels. Please refer to our DNNC and USLP paper for more details.
We are also sharing the backbone models for DNNC and USLP. The models are based upon pubic pre-trained models from Huggingface and further tuned with NLI dataset to make them adapated to NLI-style classification.
- nli-pretrained-roberta-base, English only model
- nli-pretrained-xlm-roberta-base, based upon XLM-RoBERTa model, which supports 100 languages, can be used for multi/cross-lingual projects
Please refer to the NLI pre-training pipeline here if you would like to pre-train a new model.
We use CLINC150 Dataset for benchmark and tutorials. The original data_small.json
is sub-sampled and futher processed. User can download the processed dataset from here.
1. Data Sampling
-
The following step imports the Data Pipeline object for quick data sampling
from taichi.data_pipeline import DataPipeline
-
The following step sets up the data pipeline object with the dataset name, path and language
dp = DataPipeline(name=“clinc150”, data_path=“full path to data file in csv or json, edit accordingly”)
- Expects json data file in the following format:
{split: list(list containing utterance and label)}
- Example:
{'train':[[utterance1, label1], [utterance2, label2], ... 'test':[[...]]}
- Example:
- The data format is as found in CLINC150 dataset
- Expects csv data file in the following format:
utterance, label (no headers and no index)
- Example:
book a ticket from San Francisco to New York, Book a Flight
- Example:
-
Based on the data file and format received (csv/json), we can subsample the input data file and save it as csv or json in the path (
save_dir
) of our choice- to save to csv, use the following command:
dp.save_subsampled_data_to_csv(save_dir="./data/CLINC150/1-shot", split='train', n_shot=1, is_json=True, random_state=42, save_filename="train.csv")
- Here, the default split
train
(will check for right split name and throw exception in case of incorrect split name; also does not matter if the data source iscsv
) in theCLINC150
dataset json file (is_json=True
, False in case of data source being acsv
) gets subsampled into10
samples per class (will check if possible_ and saved inos.path.join(save_dir, save_filename)
creating the path if it doesn’t exist in the process ascsv
file in the format mentioned above in 2c
- Here, the default split
- we can save our file as json in much the same way with the following command:
dp.save_subsampled_data_to_json(save_dir="./data/CLINC150/1-shot", split='train', n_shot=1, is_json=True, random_state=42, orient='records', save_filename="1-shot-train.json")
- to save to csv, use the following command:
2. Modifying Config Parameters
- We have individual config files containing hyperparameters for USLP and DNNC models. Please find below an example of the config file for USLP (the DNNC config file also has the same parameters):
{ "model": "roberta-base", "checkpoint_dir": "./model/nli-pretrained-roberta-base/uslp", "train_data_path": "./data/CLINC150/5-shot/train.csv", "test_data_path": "./data/CLINC150/5-shot/test.csv", "ood_train_data_path": "./data/CLINC150/5-shot/ood_train.csv", "ood_test_data_path": "./data/CLINC150/5-shot/ood_test.csv", "gradient_accumulation_steps": 1, "learning_rate": 5e-05, "no_cuda": false, "num_train_epochs": 200, "pretrained_model_path": "./model/nli-pretrained-roberta-base", "save_result_fp": "./data/CLINC150/5-shot/uslp_inference.json", "seed": 42, "max_seq_length": 64, "test_batch_size": 128, "train_batch_size": 128, "transform_labels": false, "warmup_proportion": 0.1, "weight_decay": 0.0001, "threshold": 0.01 }
- Let us dive deeper into some of the individual parameters and groups of parameters to understand why they are needed
model
defines the model name, e.g. roberta-base, TaiChi will use this information to load pretrained tokenizer from huggingface;checkpoint_dir
is the user defined directory for saving models after finetuning;train_data_path
,test_data_path
,ood_train_data_path
andood_test_data_path
are user defined paths for the model to know where to take the data from;pretrained_model_path
specifies the path to the model pretrained on general NLI datasets;save_result_fp
is the path to store the inference results in terms of threshold, in-domain accuracy, precision, recall, and f1 macro along with ood-recall in ajson
format- Other configuration parameters are mostly about hyperparameters for training.
- Let us dive deeper into some of the individual parameters and groups of parameters to understand why they are needed
3. Run Code End-to-End
- Please find a quick snapshot on how the USLP model can be trained as below
from taichi import uslp # import algorithm uslp_model = uslp.USLP() # instantiate algorithm (default config path set to ./taichi/uslp_config.json) uslp_model.init() # initialize the data and model uslp_model.train() # model training uslp_model.eval() # model evaluation
Results From Paper (Focus on DNNC and USLP-T)
Benchmark results on CLINC150
- Computing environment: torch==1.7.1, transformers==4.5.1, A100 GPU (user might expect results to vary with different software versions/hardwares)
- Hyper-parameter
- threshold: 0.01
- training batch size: 128
- epochs: 200
- learning rate: 5e-5
Comparable results for USLP using Taichi to the results presented in the paper (USLP-T) for in-domain F1, OOD-Recall and OOD-Precision. Higher results for DNNC in comparison to results in the paper (DNNC) for in-domain F1 and OOD-Recall with comparable OOD-Precision.
model | samples per class | in-domain F1 | OOD-recall | OOD-precision |
---|---|---|---|---|
USLP | full | 0.9459 | 0.637 | 0.947 |
10 | 0.892 | 0.734 | 0.854 | |
5 | 0.8354 | 0.679 | 0.857 | |
1 | 0.6667 | 0.629 | 0.664 | |
DNNC | full | 0.9489 | 0.25 | 0.996 |
10 | 0.9203 | 0.603 | 0.933 | |
5 | 0.902 | 0.789 | 0.858 | |
1 | NA | NA | NA |
We also compare this with using off-the-shelf (not NLI-pretrained) BERT model (bert-base-uncased
) and get the following results:
model | samples per class | in-domain F1 | OOD-recall | OOD-precision |
---|---|---|---|---|
USLP | full | 0.9446 | 0.722 | 0.914 |
10 | 0.8838 | 0.738 | 0.836 | |
5 | 0.8289 | 0.772 | 0.721 | |
1 | 0.6526 | 0.66 | 0.584 | |
DNNC | full | 0.9258 | 0.329 | 0.968 |
10 | 0.9055 | 0.58 | 0.898 | |
5 | 0.8732 | 0.737 | 0.791 | |
1 | NA | NA | NA |
Notes on Full-Shot DNNC Experiments
- We faced OOM issues on running the DNNC code as is for these experiments. We tried the following as workaround:
- We reduced the number of negative nli pairs by random subsampling, using a ratio of negative to positive pairs (50 for our experiments) as a variable
- We processed the data in batches during training and inference
- We ran these experiments for 10 epochs and it took ~35 hours to train on an A100 GPU for both
roberta-base
andbert-base-uncased
models- The OOD-recall results are worse (lower) most likely due to running these experiments on reduced number of epochs (10 as opposed to 200 for other experiments)
- The training time naturally blows up due to the algorithm design of generating negative and positive nli pairs
- If we consider CLINC150 full-shot experiment, the training data has 50 (
m
) examples per class and 150 classes (n
) = 7500 examples(m * n)
- If we consider one example out of them and pair them to get positive and negative NLI pairs based on whether they belong to the same class, we get
(m-1)
49 positive pairs and(m * n - m)
7450 negative pairs. The ratio between them(m * n - m)/(m - 1)
is approximately equal ton
which is 150 (152.04 in this case) - If all pairs add up, the sheer number of examples makes it prohibitive to train the model and get results quickly.
- If we consider CLINC150 full-shot experiment, the training data has 50 (
- The tricks we implemented are NOT part of the DNNC code we share since TaiChi is designed for few shot learning use case.
Testing
To test if the models work as expected, please run test_uslp.py
and test_dnnc.py
which can be found in the tests
directory.
Please note that the config files (test_uslp_config.json
and test_dnnc_config.json
) would have to be altered accordingly to point to the model and data we use to evaluate the tests. For USLP, we run 1-shot experiment on CLINC150 and for DNNC, we run 5-shot experiment on CLINC150.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- XLM-RoBERTa: Unsupervised Cross-lingual Representation Learning at Scale
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- USLP: Few-Shot Intent Classification by Gauging Entailment Relationship Between Utterance and Semantic Label
- DNNC: Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference
- CLINC150 Dataset
Please feel free to reach out to jqu@salesforce.com for questions or feedback.