The Extract Text Features custom step enables SAS Studio users to extract additional features from text fields. This steps creates a varying amount of features based on the selected options within the step. At a minimum it generates eight features but it can potentially generate hundreds of features. This custom step covers topics around NLP (Natural Language Processing).
The generated features range from simple features based on the occurrence of exclamation points in the text to extracting sentiment and text topics from the text.
For more information about all of the Advanced Text Analytics Features please refer to the SAS Documentation.
A full walkthrough of this custom step can be found in this YouTube video.
2022.10 or later.
For the gathering of Link Metadata you need to provide a Python runtime to the Compute Context containing at least the following three packages: pandas, requests & bs4.
For the Advanced Text Analytics features you need to have a SAS Visual Text Analytics license.
For a demonstration of the usage of this step please check out this YouTube video - the example flow is included here.
The video makes use of the train.csv data from this Kaggle Challenge - to download the data you need an account with Kaggle - for a quick demonstration of how to do this, check this part of the the YouTube video.
- Version 1.0 (27NOV2022)
- Initial version