Introduction to NLP with TensorFlow
In this workshop, we will cover how text is processed using TensorFlow, a popular platform for machine learning.
Goal | Description |
---|---|
What will you learn | Analyze text using TensorFlow |
What you'll need | |
Duration | 1 hour |
Slides | Powerpoint |
Video coming soon!
You can activate the sandbox environments in each module, or alternately setup your local environment:
If you are trying the examples in a local environment, install required libraries and download the dataset in the same directory where the notebooks are:
$ pip install --quiet tensorflow_datasets==4.4.0
$ wget -q -O - https://mslearntensorflowlp.blob.core.windows.net/data/tfds-ag-news.tgz | tar xz
TensorFlow is a popular Machine Learning platform that allows many workflows and operations for Machine Learning in Python. In this workshop, you'll be learn how to process and analyze text, so that you can create generated text or answer questions based on context.
Complete the sandboxed Jupyter Notebook which will go through the following:
- Perform a text classification task using the AG NEWS dataset
- Convert text into numbers that can be represented as tensors
- Create a Bag-of-Words text representation
- Automatically calculate Bag-of-Words vectors with TensorFlow
Go through the sandboxed Jupyter Notebook to work with the AG News dataset and try to represent a semantic meaning of a word.
In this section you will:
- Create and train a Classifier Neural Network
- Work with semantic embeddings
- Use pre-trained embeddings available in the Keras framework
- Find about potential pitfalls and limitations of traditional pre-trained embedding representations like Word2Vec
Work through the sandboxed Jupyter Notebook to understand not only the aggregated meaning of words but take into account the order. To capture the meaning of a text sequence you'll use a recurrent neural network.
The notebook will go through the following items:
- Load the AG News dataset and train it with TensorFlow
- Use masking to minimize the amount of padding
- Use LSTM to learn relationships between distant tokens
Complete the sandboxed Jupyter Notebook to discover how to generate text using RNNs (Recurrent Neural Networks). In this final section of the workshop, you'll cover the following:
- Build character vocabulary with tokenization
- Train an RNN to generate titles
- Produce output with a custom decoding function
- Sample generated strings during training to check for correctness
Verify your knowledge with a short quiz
There are other Learn Modules for TensorFlow that are grouped in the TensorFlow fundamentals Learning Path
In this workshop you used pre-trained models which may yield limited results. Try using other data sources to train your own model. What can you discover?
Be sure to give feedback about this workshop!