🤖 Real time inference with GPMA #65
Replies: 1 comment
-
What is real-time inference?First, a deep neural network (DNN) model specifically designed for the problem domain and data available is trained, usually on a GPU or high-performance CPU cluster for anywhere from tens of hours to a few weeks. Then, it is deployed into a production environment where it takes in a continuous stream of input data and runs inference in real time, yielding output either directly used as the end result or further fed into downstream systems. Either way, applications that have an ever stricter latency requirement, driverless cars, and search engines, for instance, demand lightning-fast deep learning inference, usually within tens of milliseconds for each sample. Thus, beyond the academia’s typical focus on faster training, the industry is often more concerned with faster inference, bringing inference acceleration to the spotlight and core of many hardware and software solutions. It is used in an emerging cloud service known as MLaaS - Machine Learning as a Service Difference between Deep Learning Training and Inference |
Beta Was this translation helpful? Give feedback.
-
Exploring ways to integrate GPMA with Seastar that will provide it to be very useful. We are looking at fast real time inference of GNN models with minimal latency and fast update times.
Beta Was this translation helpful? Give feedback.
All reactions