-
Nomono
- Trondheim, Norway
- @iver56
Starred repositories
Course for doing databricks dataops, based on a data mesh monorepo structure
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source …
TrOMR:Transformer-based Polyphonic Optical Music Recognition
BreezeWhite / oemer
Forked from meteo-team/oemerEnd-to-end Optical Music Recognition (OMR) system. Transcribe phone-taken music sheet image into MusicXML, which can be edited and converted to MIDI.
This repository is the official implementation of "DisPose: Disentangling Pose Guidance for Controllable Human Image Animation"
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Improving Source Extraction with Diffusion and Consistency Models
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
When it comes to optimizers, it's always better to be safe than sorry
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Magic Maestro 🪄✨🎹 is a DIY gesture-based MIDI controller that uses hand tracking and to bring instruments to life in real-time. Transform your gestures into dynamic expression and volume control wi…
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"
Official implementation of "AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models", presented in LAMIR 2024 Workshop
Audio production style transfer with inference-time optimization
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
A suite of image and video neural tokenizers
Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
first base model for full-duplex conversational audio
The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023
This is the repository for the speech enhancement model SyncFormer
PyTorch native quantization and sparsity for training and inference
VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration