Intent Classification Service

Intent Classification Service
- Challenge Task
- Approach
  - Data Preparation
  - Data
  - Modeling
    - Model
    - Parameters
Instructions
- Setup
- Building or downloading docker image
API Documentation
Data Preparation and Model Training
Evaluation
Future Improvements

Challenge Task

Your task is to implement a neural network-based intent classifier that can be used to provide inferencing service via an HTTP Service. The boiler plate for the Service is implemented in file server.py and you'll have to implement the API function for inferencing as per the API documentation provided below. The neural network interface has been defined in intent_classifer.py. You can add any methods and functionality to this class you deem necessary.

You may use any deep learning library (Tensorflow, Keras, PyTorch, ...) you wish and you can also use pre-existing components for building the network architecture if these would be useful in real-life production systems. Provide tooling and instructions for training the network from scratch.

Also provide a jupyter notebook for model development which trains and tests the model. The final output of this notebook should be trained models with respect to their tests. Evaluation metrics should include the following:

Accuracy
Precision
Recall
F1
Any other metric that you think is suitable for the comparison

In addition, the same notebook provides a section to evaluate models in production. Assuming the following scenario:

You have both of the models in production and no labeled data is available to you. How would you compare them? Which metrics would you use for this kind of comparison? For example, you can use metrics based on confidence values or related ones.

Approach

In this notebook we are training a multilingual intent classification, for the purpose of POC I am selecting following languages:

English
Hindi
Spanish

Data Preparation

The given ATIS dataset is provided in English, I have created a parallel dataset using google translation for Hindi and Spanish.

In the training dataset, we have the following distribution of data:

 flight                        3426
 airfare                        403
 ground_service                 235
 airline                        148
 abbreviation                   108
 aircraft                        78
 flight_time                     52
 quantity                        49
 distance                        20
 city                            18
 airport                         18
 ground_fare                     17
 flight+airfare                  17
 capacity                        16
 flight_no                       12
 meal                             6
 restriction                      5
 airline+flight_no                2
 ground_service+ground_fare       1
 airfare+flight_time              1
 cheapest                         1
 aircraft+flight+flight_no        1

From what I can observe, the data is quite unbalanced and also some of the classes seems to be a combination of others, which gives me an indication that we can perhaps model this as a multi-label classification problem. Also, it is possible that the user query may have multiple intents and a multi-label model is a good choice to handle such a scenario.

For this reason, I have transformed the created a multi-label dataset, with the following 17 classes:

'ground_service', 'abbreviation', 'ground_fare', 'airline', 'city',
'aircraft', 'flight_no', 'airport', 'flight', 'quantity', 'meal',
'capacity', 'restriction', 'airfare', 'distance', 'flight_time', 'cheapest'

Data

data - The provided ATIS dataset
- downlaod link: https://drive.google.com/drive/folders/1I2cALZXOIaz9WnmdtubpVavflkUvFRHG?usp=share_link
data_mlabel - The multilabel version of provided ATIS dataset
- download link: https://drive.google.com/drive/folders/1-0VzuUa16j3nEcqywHVKlJONMPg0F40S?usp=share_link
multilingual_data - Parallel Multilingual translated data of provdied ATIS dataset
- download link - https://drive.google.com/drive/folders/1A-t73esVP27KbC9eAEMBv6klu8jpQSd-?usp=share_link

Modeling

The model architecture is a simple one, which I believe is a strong baseline for the task and can be used for handling multi-lingual queries.

Encoder

As I am tring to train a multilingual model, the first step in the NLP pipeline would be to have a bert-base-multilingual-cased from huggingface-transformers. This BERT varient is a pretrained model on the top 104 languages with the largest Wikipedia using a masked language modeling (MLM) objective. We can easily extend our approach to handle queries in 104 languages, however the performance might differ between languages depending on the amount of data used in multi-lingual BERT pretraining.
Decoder (classifier)

The decoder is a single linear layer mapping the encoder output to our 17-output classes.
Loss Function

Binary Cross Entropy is a suitable loss function for multi-label modeling in this scenario.

Model



IntentClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(119547, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (classifier): Linear(in_features=768, out_features=17, bias=True)
  (criterion): BCEWithLogitsLoss()
)

Parameters

  | Name       | Type              | Params
-------------------------------------------------
0 | bert       | BertModel         | 177 M 
1 | classifier | Linear            | 13.1 K
2 | criterion  | BCEWithLogitsLoss | 0     
-------------------------------------------------
177 M     Trainable params
0         Non-trainable params
177 M     Total params
711.466   Total estimated model params size (MB)

Instructions

Setup

Create a conda environment using:
- conda create -n "intent-clf-env" python=3.10.11
Activate the conda environment:
- conda activate intent-clf-env
Pylint Setup:
- conda install -c conda-forge pre_commit
- pre-commit install
- pre-commit autoupdate
- Note: Not keeping these dependencies in requirements.txt as they are only required for development purpose.
Install dependencies
- pip install -r requirements.txt
In the repository you will find .env.bkp file. You need to create a copy of the file:
- cp .env.bkp .env
- Setup the environment variables:
  - ```
    PORT=8080
    CHECKPOINT_PATH="./model/best-checkpoint-v1.ckpt"
    ML_BINARIZER_PATH="./model/ml_binarizer.pkl"
```
Download the model files in the models directory:
- best-checkpoint-v1.ckpt - https://drive.google.com/file/d/1-P-mIf9ChF04LzZ63EZiitUiGthVfYzm/view?usp=share_link
- ml_binarizer.pkl - https://drive.google.com/file/d/1-Q5671xZmR54XSChXq9yv1yFN41-tZVJ/view?usp=share_link
Running the Flask server
- python server.py

Building or downloading docker image

Building docker image

The code has a Dockerfile pre-setup
In the repository you will find .env.bkp file. You need to create a copy of the file:
- cp .env.bkp .env
- Setup the environment variables:
  - ```
    PORT=8080
    CHECKPOINT_PATH="./model/best-checkpoint-v1.ckpt"
    ML_BINARIZER_PATH="./model/ml_binarizer.pkl"
```
Download the model files in the models directory:
- best-checkpoint-v1.ckpt - https://drive.google.com/file/d/1-P-mIf9ChF04LzZ63EZiitUiGthVfYzm/view?usp=share_link
- ml_binarizer.pkl - https://drive.google.com/file/d/1-Q5671xZmR54XSChXq9yv1yFN41-tZVJ/view?usp=share_link
Build docker images:
- sudo docker build . -t lordzuko/intent-clf-service:v1.0.0
Running docker container
- sudo docker run -p 8080:8080 lordzuko/intent-clf-service:v1.0.0

Dowload pre-built docker image

DockerHub
docker pull lordzuko/intent-clf-service:v1.0.0

Limitation

The Dockerfile currently does not support GPU usage even though the inference code works with GPU. This is because the chosen base image does not have NVIDIA-CUDA drivers installed.

API Documentation

The documentation provides how to use the API, with python, cURL etc.

POSTMAN documentation for the service can be found here.

Data Preparation and Model Training

The process for model training and evaluation is described in notebook: notebooks/multi_lingual_multilabel_intent_clf.ipynb
The process for model evalation during production scenario is described in notebook: notebooks/Production_Evaluation.ipynb
- For running the Production_Evaluation.ipynb notebook, you can use the pre-computed predictions on train, test and valid split here
- The link to the collab notebook for Production_Evaluation.ipynb is here. This shows certain plots which are not shown in the committed notebook.

Evaluation

Accuracy

Best Threshold: 0.30
Train Accuracy: 0.999
Val Accuracy: 0.998
Test Accuracy: 0.995

AU-ROC

Overall

Best Threshold: 0.20
Train AUROC: 0.874
Val AUROC: 0.833
Test AUROC: 0.805

Per Intent Class

You can format the validation and test results for markdown as follows:

Label	Validation AUROC	Test AUROC
abbreviation	1.000000	0.999959
aircraft	0.998834	0.997295
airfare	0.999978	0.989687
airline	0.999845	0.992285
airport	1.000000	0.999959
capacity	1.000000	0.999260
cheapest	0.000000	0.000000
city	1.000000	0.936331
distance	1.000000	1.000000
flight	0.999852	0.989004
flight_no	0.992795	1.000000
flight_time	0.992006	1.000000
ground_fare	1.000000	0.997722
ground_service	0.998485	0.999989
meal	0.992806	0.926189
quantity	0.999711	0.996064
restriction	0.000000	0.000000

Classificaiton Report

Label	Train Support	Validation Support	Test Support	Validation Precision	Test Precision	Validation Recall	Test Recall	Validation F1-Score	Test F1-Score
abbreviation	309.0	15.0	78.0	1.00	0.97	1.00	0.99	1.00	0.98
aircraft	227.0	10.0	24.0	0.83	0.88	1.00	0.88	0.91	0.88
airfare	1190.0	73.0	183.0	0.97	0.97	1.00	0.93	0.99	0.95
airline	421.0	29.0	90.0	1.00	0.96	0.97	0.96	0.98	0.96
airport	53.0	1.0	39.0	1.00	0.95	1.00	0.97	1.00	0.96
capacity	46.0	2.0	63.0	1.00	1.00	1.00	0.94	1.00	0.97
cheapest	3.0	0.0	0.0	0.00	0.00	0.00	0.00	0.00	0.00
city	51.0	3.0	15.0	1.00	1.00	1.00	0.40	1.00	0.57
distance	58.0	2.0	30.0	1.00	0.39	1.00	1.00	1.00	0.57
flight	9822.0	510.0	1881.0	1.00	0.99	0.99	0.98	1.00	0.99
flight_no	43.0	2.0	27.0	1.00	1.00	0.50	1.00	0.67	1.00
flight_time	151.0	8.0	3.0	0.78	0.27	0.88	1.00	0.82	0.43
ground_fare	51.0	3.0	21.0	1.00	0.29	1.00	0.95	1.00	0.44
ground_service	672.0	36.0	108.0	0.97	0.92	1.00	1.00	0.99	0.96
meal	17.0	1.0	18.0	0.00	0.00	0.00	0.00	0.00	0.00
quantity	142.0	5.0	9.0	0.83	0.21	1.00	1.00	0.91	0.35
restriction	15.0	0.0	0.0	0.00	0.00	0.00	0.00	0.00	0.00

Summary	Train Support	Validation Support	Test Support	Validation Precision	Test Precision	Validation Recall	Test Recall	Validation F1-Score	Test F1-Score
micro avg	13271.0	700.0	2589.0	0.99	0.93	0.99	0.97	0.99	0.95
macro avg	13271.0	700.0	2589.0	0.79	0.64	0.78	0.76	0.78	0.65
weighted avg	13271.0	700.0	2589.0	0.99	0.96	0.99	0.97	0.99	0.96
samples avg	13271.0	700.0	2589.0	0.99	0.95	0.99	0.97	0.99	0.96

Future Improvements

The model is currently being loaded from checkpoint, which has optimizer and gradient placeholders, which is increasing the file size. A better way would be to save model dict and load the model from state dict. This will reduce the model file size, which will ultimately decrease the size of resulting docker image.
Evaluation is not done at language level, which can be important to make any language specific improvements and updates.
The /intent api endpoint does not provide an api_version field which could be important to keep track of the current version of intent classification service which is producing the output. The downstream applications might take advantage of it to handle business logic or update them according to api_version.
1. It can also be important in AB-testing
2. Also, data management could take advantage of this field
The /intent api does not check for the language. This could be problametic as our model currently only supports for 3 languages, whereas the tokenizer we are using can support 104 langauges. Not detecting the supported languages could lead to unforseen model performance issues.
Data imbalance among intent classes is currently a bottlenect and upsampling needs to be down for low data classes. We can use translation and paraphrasing to tackle the data imbalance issues.
Update Dockerfile to support GPU usage.
NannyML installation has numpy version mismatch with the rest of the code, so it might create issues with building Docker image and has not been tested yet. However, given that it is not required during inference, I suggest install it in separate environment for testing for now.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
model		model
notebooks		notebooks
src		src
.dockerignore		.dockerignore
.env.bkp		.env.bkp
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CHALLENGE_TASK.md		CHALLENGE_TASK.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intent Classification Service

Challenge Task

Approach

Data Preparation

Data

Modeling

Model

Parameters

Instructions

Setup

Building or downloading docker image

Building docker image

Dowload pre-built docker image

Limitation

API Documentation

Data Preparation and Model Training

Evaluation

Accuracy

AU-ROC

Overall

Per Intent Class

Classificaiton Report

Future Improvements

About

Releases 1

Packages

Languages

lordzuko/intent-classification

Folders and files

Latest commit

History

Repository files navigation

Intent Classification Service

Challenge Task

Approach

Data Preparation

Data

Modeling

Model

Parameters

Instructions

Setup

Building or downloading docker image

Building docker image

Dowload pre-built docker image

Limitation

API Documentation

Data Preparation and Model Training

Evaluation

Accuracy

AU-ROC

Overall

Per Intent Class

Classificaiton Report

Future Improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages