Papers

Group Normalization -Kaiming He, et al, arxiv2018
Graph Convolutional Network -Xiaolong Wang, Yufei Ye, Abhinav Gupta, CVPR2018
DetNAS: Backbone Search for Object Detection
Mixup

network

CabViT: Cross Attention among Blocks for Vision Transformer -Intellifusion, arxiv2022, code
EfficientFormerV2Rethinking Vision Transformers for MobileNet Size and Speed -Snap, arxiv2022, code
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning -ICLR2022,code
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP -sensetime, ECCV2022, code
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications -arxiv2022, code
Edgevits: Competing light-weight cnns on mobile devices with vision transformers -ECCV2022,code
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios -bytedance, arxiv2022, code
TRT-ViT: TensorRT-oriented Vision Transformer -bytedance, arxiv2022
EfficientFormer: Vision Transformers at MobileNet Speed -snap, arxiv2022, code
UNeXt: MLP-based Rapid Medical Image Segmentation Network -arxiv2022, code
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation -tencent, CVPR2022, code
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer apple, ICLR2022, code
PP-LCNet: A Lightweight CPU Convolutional Neural Network -baidu, arxiv2022
Metaformer is actually what you need for vision -Yanshuicheng, CVPR2022
TinyNetModel Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets -huawei, NeurIPS2020
GhostNet: More Features from Cheap Operations -huawei, CVPR2020
EfficientNet
SqueezeNet
Mobilenets -google, arxiv2017
MobileNet-V2 -google, CVPR2018 caffe-code
MobileNetV3
NasNet-A-Learning transferable architectures for scalable image recognition -google brain, CoRR2017
ShuffleNet -megvii, CoRR2017
ShuffleNetV2
ThunderNet
DarkNet/Tiny YOLOv3/Tiny YOLOv2/Yolo-Nano/SlimYOLO/YOLO-LITE/Gaussian YOLOv3
LightweightNet: Toward fast and lightweight convolutional neural networks via architecture distillation -XuTingbin, PR2019
Mobilefacenets
EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse
Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs
Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition -dujun, arxiv2019 Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition -huoqiang, PR2019 vovnet
http://openaccess.thecvf.com/content_CVPRW_2019/papers/CEFRL/Lee_An_Energy_and_GPU-Computation_Efficient_Backbone_Network_for_Real-Time_Object_CVPRW_2019_paper.pdf

model compression

teacher-student/mutual-learning/Self-Distillation
low-rank/SVD-decomposition/Tucker-decomposition/CP-decomposition

InformationExtraction

database

EPHOIE - visual information extraction (VIE) in educational documents
[PubLayNet] - pretrain
[RVL-CDIP][IIT-CDIP]- document classification
[FUNSD]
[CORD]- receipt sementic entity extraction
[DocVQA]

knowledge distillation

Decoupled Knowledge Distillation -megvii, CVPR2022, code
Efficient knowledge distillation for rnn-transducer models -google/facebook, ICASSP2021
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models -NICT japan, ICASSP2019
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation -IBM, Interspeech2019
Explaining sequence-level knowledge distillation as data-augmentation for neural machine translation -arxiv2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion -microsoft, Interspeech2019
Knowledge Distillation for Sequence Model -AISpeech, Interspeech2018
Improved knowledge distillation from bi-directional to uni-directional LSTM CTC for end-to-end speech recognition -IBM, SLT2018
An Investigation of a Knowledge Distillation Method for CTC Acoustic Models -NICT japan, ICASSP2018
Sequence-Level Knowledge Distillation -Yoon Kim, EMNLP2016

Document Enhancement

Document Rectification

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary -baidu, arxiv2023
Deep Unrestricted Document Image Rectification -arxiv2023, code
End-to-End Piece-Wise Unwarping of Document Images -amazon, ICCV2021, code
Geometric Representation Learning for Document Image Rectification -ECCV2022, code
Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild -MM2022, jinlianwen, code
Fourier Document Restoration for Robust Document Dewarping and Recognition -CVPR2022, bai song database
Revisiting document image dewarping by grid regularization -alibaba,CVPR2022,code
Learning From Documents in the Wild to Improve Document Unwarping -snap, SIGGRAPH2022, code
DocScanner: Robust Document Image Rectication with Progressive Learning -arxiv2021
Doctr: Document image transformer for geometric unwarping and illumination correction -MM2021, code
Document Dewarping with Control Points -ICDAR2021, code&dataset
Document Rectification and Illumination Correction using a Patch-based CNN -SIGGRAPH2019, code
Learning to Calibrate Straight Lines for Fisheye Image Rectification -CVPR2019

image alignment/registration

DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures -jinlianwen, arxiv2023, code
Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping -IJDAR2023, code

Inpainting

Inpaint Anything: Segment Anything Meets Image Inpainting -arxiv2023, code
LAMA:Resolution-Robust Large Mask Inpainting With Fourier Convolutions -samsung, WACV2022, code
RePaint: Inpainting Using Denoising Diffusion Probabilistic Models -CVPR2022, code
MAT: Mask-Aware Transformer for Large Hole Image Inpainting -adobe, CVPR2022, code
Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding -CVPR2022, code
Aggregated Contextual Transformations for High-Resolution Image Inpainting -arxiv2021, code
Free-Form Image Inpainting with Gated Convolution -bytedance, ICCV2019, code

Graph

Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks -PR2021
A Comprehensive Survey on Graph Neural Networks -TNN2020
Contextual Stroke Classification in Online Handwritten Documents with Edge Graph Attention Networks -SNCS2020
Deepgcns: Can gcns go as deep as cnns? -ICCV2019
Heterogeneous graph attention network -WWW2019
Contextual Stroke Classification in Online Handwritten Documents with Graph Attention Networks -ICDAR2019
Graph Convolutional Networks for Text Classification -AAAI2019
Graph Attention Networks -ICLR2018
Semi-Supervised Classification with Graph Convolutional Networks -ICLR2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Papers

depth estimation

layout

asr

Contextual Biasing

table detection & recognition

mathematical expression recognition

word_vector

Chemical Structure

Seq2Seq

ReID

PoseEstimation

EdgeDetection

line segmentation

video_classification

dnn_base

network

model compression

InformationExtraction

database

knowledge distillation

Document Enhancement

Document Rectification

image alignment/registration

Inpainting

Graph

Files

README.md

Latest commit

History

README.md

File metadata and controls

Papers

depth estimation

layout

asr

Contextual Biasing

table detection & recognition

mathematical expression recognition

word_vector

Chemical Structure

Seq2Seq

ReID

PoseEstimation

EdgeDetection

line segmentation

video_classification

dnn_base

network

model compression

InformationExtraction

database

knowledge distillation

Document Enhancement

Document Rectification

image alignment/registration

Inpainting

Graph