Skip to content

Latest commit

 

History

History
67 lines (47 loc) · 5.7 KB

README.md

File metadata and controls

67 lines (47 loc) · 5.7 KB

End-to-End Platform Use-Case Application Demos

In This Document

Overview

The demos tutorials directory contains full end-to-end use-case applications that demonstrate how to use the Iguazio Data Science Platform ("the platform") and related tools to address data science requirements for different industries and implementations.

Smart Stock Trading

The stocks demo demonstrates a smart stock-trading application: the application reads stock-exchange data from an internet service into a time-series database (TSDB); uses Twitter to analyze the market sentiment on specific stocks, in real time; and saves the data to a platform NoSQL table that is used for generating reports and analyzing and visualizing the data on a Grafana dashboard.

  • The stock data is read from Twitter by using the TwythonStreamer Python wrapper to the Twitter Streaming API, and saved to TSDB and NoSQL tables in the platform.
  • Sentiment analysis is done by using the TextBlob Python library for natural language processing (NLP).
  • The analyzed data is visualized as graphs on a Grafana dashboard, which is created from the Jupyter notebook code. The data is read from both the TSDB and NoSQL stock tables.

Predictive Infrastructure Monitoring

The netops demo demonstrates predictive infrastructure monitoring: the application builds, trains, and deploys a machine-learning model for analyzing and predicting failure in network devices as part of a network operations (NetOps) flow. The goal is to identify anomalies for device metrics — such as CPU, memory consumption, or temperature — which can signify an upcoming issue or failure.

  • The model training is sped up by using the Dask Python library for parallel computing, which extends pandas DataFrames.
  • The model prediction is done by using open-source Python libraries — including scikit-learn (a.k.a. sklearn), SciPy, and NumPy — and the gradient boosting ML technique.
  • The data is generated by using an open-source generator tool that was written by Iguazio. This generator enables users to customize the metrics, data range, and many other parameters, and prepare a data set that's suitable for other similar use cases.

Image Recognition

The image-classification demo demonstrates image recognition: the application builds and trains an ML model that identifies (recognizes) and classifies images.

  • The data is collected by downloading images of dogs and cats from the Iguazio sample data-set AWS bucket.
  • The training data for the ML model is prepared by using pandas DataFrames to build a predecition map. The data is visualized by using the Matplotlib Python library.
  • An image recognition and classification ML model that identifies the animal type is built and trained by using Keras, TensorFlow, and scikit-learn (a.k.a. sklearn).

Natural Language Processing (NLP)

The nlp demo demonstrates natural language processing (NLP): the application processes natural-language textual data — including spelling correction and sentiment analysis — and generates a Nuclio serverless function that translates any given text string to another (configurable) language.

  • The textual data is collected and processed by using the TextBlob Python NLP library. The processing includes spelling correction and sentiment analysis.
  • A serverless function that translates text to another language, which is configured in an environment variable, is generated by using the Nuclio framework.

Stream Enrichment

The stream-enrich demo demonstrates a typical stream-based data-engineering pipeline, which is required in many real-world scenarios: data is streamed from an event streaming engine; the data is enriched, in real time, using data from a NoSQL table; the enriched data is saved to an output data stream and then consumed from this stream.

  • Car-owner data is streamed into the platform from a simulated streaming engine by using an event-triggered Nuclio serverless function.
  • The data is written (ingested) into an input platform stream by using the the platform's Streaming Web API.
  • The input stream data is enriched with additional data, such as the car's color and vendor, and the enriched data is saved to a NoSQL table by using the platform's NoSQL Web API.
  • The Nuclio function writes the enriched data to an output platform data stream by using the platform's Streaming Web API.
  • The enriched data is read (consumed) from the output stream by using the platform's Streaming Web API.