Skip to content

Video Summarization | Lip Reading | Object Detection | Semantic Segmentation | Pose Estimation

Notifications You must be signed in to change notification settings

TVR28/Computer-Vision

Repository files navigation

Computer Vision

This repository consists of various computer vision and Multimodal AI projects. The projects focus on various fields of CV such as:

  • Vision Language Models and Multimodal AI
  • Stable Diffusion (Text-To-Image Generation)
  • Video Summarization
  • Object Tracking (Face Tracking & Lip Tracking)
  • Object Detection
  • Semantic Segmentation and Segment Anything
  • Realtime Pose Estimation

The tools in which I gained proficiency while working on these projects are:

  • PyTorch
  • TensorFlow
  • OpenCV
  • OpenVino
  • Transformers
  • OpenAI
  • Diffusers

Multimodal AI Assistant

image

VidSummAPI: Video Summarization API Using DSNet

image

Gym Workout Tracker

Screenshot (623)

Background Removal Using Segment Any

background_removal

Semantic Segmentation

image

YoloV8 Object Detection

image

Lip Reading: Deep Learning Based Spoken Text Generation from Lip Movement

lipnet

About

Video Summarization | Lip Reading | Object Detection | Semantic Segmentation | Pose Estimation

Resources

Stars

Watchers

Forks