A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Nov 22, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Large-scale LLM inference engine
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
A native Mac App for Troplo's TPU made with SwiftUI
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
DECIMER Image Transformer is a deep-learning-based tool designed for automated recognition of chemical structure images. Leveraging transformer architectures, the model converts chemical images into SMILES strings, enabling the digitization of chemical data from scanned documents, literature, and patents.
cBLUE is a tool to calculate the total propagated uncertainty of bathymetric lidar data.
Pre-training GPT-2 model from scratch using GPUs and TPUs.
Everything we actually know about the Apple Neural Engine (ANE)
Differentiable Fluid Dynamics Package
Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)
TPU accelerated traffic lane segmentation engine for your Raspberry Pi
A simple and efficient TPU (Transaction Processing Unit) client for Solana, utilizing the QUIC protocol for data transmission.
Solana TpuClient Typescript Implementation
Everything you want to know about Google Cloud TPU
Add a description, image, and links to the tpu topic page so that developers can more easily learn about it.
To associate your repository with the tpu topic, visit your repo's landing page and select "manage topics."