Skip to content

Latest commit

 

History

History
352 lines (328 loc) · 50.7 KB

README.md

File metadata and controls

352 lines (328 loc) · 50.7 KB

Papers

depth estimation

AdaBins: Depth Estimation using Adaptive Bins kaust, CVPR2021, code

layout

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception -shanghaiAILab, arxiv2024, code
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis -jinlianwen, CVPR2023, dataset
Page segmentation using convolutional neural network and graphical model -Lixiaohui, DAS2020
Printed/Handwritten Texts and Graphics Separation in Complex Documents Using Conditional Random Fields -LiXiaohui, DAS2018

asr

Conformer: Convolution-augmented Transformer for Speech Recognition -google, Interspeech2020, code1,code2
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context -google, Interspeech2020, code
CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency -tsinghua, INTERSPEECH2020, code
Investigation of modeling units for mandarin speech recognition using DFSMN-CTC-sMBR -alibaba, ICASSP2019
Sequence discriminative distributed training of long short-term memory recurrent neural networks -google,
Sequence-discriminative training of deep neural networks INTERSPEECH2013

Contextual Biasing

Improved recognition of contact names in voice commands -google, ICASSP2015
Bringing contextual information to google speech recognition -google, INTERSPEECH2015
Shallow-Fusion End-to-End Contextual Biasing -google, INTERSPEECH2019
Streaming End-to-end Speech Recognition for Mobile Devices -google, ICASSP2019

table detection & recognition

Robust Table Detection and Structure Recognition from Heterogeneous Document Images -huoqiang, arxiv2022
Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images -arxiv2021
Guided Table Structure Recognition through Anchor Optimization -arxiv2021
TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition -ICDAR2021
PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex -pingan, arxiv2021
LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment -ICDAR2021
ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX -arxiv2021
TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition JD-ICCV2021, code
Parsing Table Structures in the Wild -alibaba, ICCV2021, dataset
TNCR: Table Net Detection and Classification Dataset -arxiv2021, dataset
Form2Seq : A Framework for Higher-Order Form Structure Extraction -EMNLP2020
Table Structure Recognition using Top-Down and Bottom-Up Cues ECCV2020
Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images -ICDAR2019
Image-based table recognition: data, model, and evaluation -arxiv2019
Deep Splitting and Merging for Table Structure Decomposition -ICDAR2019
Deepdesrt: Deep learning for detection and structure recognition of tables in document images -ICDAR2017

mathematical expression recognition

MathWriting: A Dataset For Handwritten Mathematical Expression Recognition -google, arxiv2024
CCLSL: Combination of Contrastive Learning and Supervised Learning for Handwritten Mathematical Expression Recognition -ACCV2022
Primitive Contrastive Learning for Handwritten Mathematical Expression Recognition -liuchenglin, ICPR2022
Syntactic data generation for handwritten mathematical expression recognition -nakagawa, PRL2022, code
TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition -dujun, AAAI2022, code
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition -peking, ECCV2022, code
Syntax-Aware Network for Handwritten Mathematical Expression Recognition -baixiang, CVPR2022, code
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition -baixiang, ECCV2022, code
Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning -tencent, AAAI2022, code
Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition -Dujun, PR2022
Offline Handwritten Mathematical Expression Recognition via Graph Reasoning Network -yinfei, ACPR2022
Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer -huawei, ICDAR2021, code
An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder -huoqiang, ICDAR2021
End-to-End Detection and Recognition of Arithmetic Expressions -yinfei, PRCV2021
Accurate Structured-Text Spotting for Arithmetical Exercise Correction -tencent, AAAI2020
Graph-to-graph: towards accurate and interpretable online handwritten mathematical expression recognition -wujinwen, AAAI2021
MRD: A Memory Relation Decoder for Online Handwritten Mathematical Expression Recognition -dujun, ICDAR2021
ICFHR 2020 Competition on Offline Recognition and Spotting of Handwritten Mathematical Expressions - OffRaSHME -Wangdahan, ICFHR2020
EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition -arxiv2020, code,dataset
Improvement of End-to-End Offline Handwritten Mathematical Expression Recognition by Weakly Supervised Learning -ICFHR2020
Stroke Based Posterior Attention for Online Handwritten Mathematical Expression Recognition -dujun, ICPR2022
Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention -Jinlianwen, ICFHR2020
EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition -arxiv2020, code
Handwritten mathematical expression recognition via paired adversarial learning -WuJinwen, IJCV2020
Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition -DuJun, arxiv2020, code
SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition -DuJun, TMM2020, code
A Tree-Structured Decoder for Image-to-Markup Generation -Dujun, ICML2020, code
Multi-modal Attention Network for Handwritten Mathematical Expression Recognition -DuJun, ICDAR2019
Robust Encoder-Decoder Learning Framework towards Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Deep Neural Network-arxiv2019
Pattern generation strategies for improving recognition of Handwritten Mathematical Expressions -nakagawa, PRL2019
Track, attend, and parse (tap): An end-to-end framework for online handwritten mathematical expression recognition -DuJun, TMM2018, code
Multi-scale attention with dense encoder for handwritten mathematical expression recognition -DuJun, ICPR2018, code
Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition -J Zhang, J Du, S Zhang, D Liu, Y Hu, J Hu, S Wei, PR2017, code, code2
A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition -Dujun, ICDAR2017, code
Image-to-markup generation with coarse-to-fine attention -Dengyuntian, ICML2017, code
What you get is what you see: A visual markup decompiler -Dengyuntian, arxiv2016, code
Context-aware mathematical expression recognition: An end-to-end framework and a benchmark -Hewenhao, ICPR2016
ICFHR2016 CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions -ICFHR2016
An integrated grammar-based approach for mathematical expression recognition -PR2016

word_vector

Enriching Word Vectors with Subword Information--FAIR, arxiv2016
fastText-Bag of Tricks for Efficient Text Classification-FAIR, arxiv2016
An empirical evaluation of doc2vec with practical insights into document embedding generation-Jey Han Lau, Timothy Baldwin, arxiv2016
TagSpace:Semantic Embeddings from Hashtags-FAIR, EMNLP2014
doc2vec-Distributed Representations of Sentences and Documents-google, ICML2014
word2vec-Efficient estimation of word representations in vector space-google, ICLR2013

Chemical Structure

MolMiner: You only look once for chemical structure recognition - arxiv2022
Robust Molecular Image Recognition: A Graph Generation Approach -MIT, arxiv2022
Image-to-Graph Transformers for Chemical Structure Recognition -Samsung, arxiv2022
DECIMER—hand-drawn molecule images dataset -Germany, 2022
SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer -2022
Performance of chemical structure string representations for chemical image recognition using transformers -Germany, 2022
Review of techniques and models used in optical chemical structure recognition in images and scanned documents -2022
Image2SMILES: Transformer-Based Molecular Optical Recognition Engine -2022
A Large-Scale Database for Chemical Structure Recognition and Preliminary Evaluation -liuchenglin, ICPR2022
DECIMER 1.0: deep learning for chemical image recognition using transformers -Germany, 2021
Img2Mol–accurate SMILES recognition from molecular graphical depictions -2021
ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning -Stanford, 2021
Open data and algorithms for open science in AI-driven molecular informatics

Seq2Seq

Convolutional Sequence to Sequence Learning -FAIR, arxiv2017
A Convolutional Encoder Model for Neural Machine Translation-FAIR, arxiv2016
Sequence level training with recurrent neural networks-FAIR, ICLR2016

ReID

Alignedreid: Surpassing human-level performance in person re-identification -Face++, arxiv2017

PoseEstimation

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields -CMU, CVPR2017
AlphaPose

EdgeDetection

Pixel Difference Networks for Efficient Edge Detection -ICCV2021, code
Richer Convolutional Features for Edge Detection -YunLiu, ..., Baixiang et, PAMI2019
Deepedge: A multi-scale bifurcated deep network for top-down contour detection -Gedas, CVPR15

line segmentation

M-LSD: Towards Light-weight and Real-time Line Segment Detection -NAVER, AAAI2022, code
Deep Hough Transform for Semantic Line Detection -PAMI2021, code
Holistically-Attracted Wireframe Parsing -CVPR2020, code
EDlines

video_classification

Learnable pooling with Context Gating for video classification -A. Miech, et al, TPAMI2018, Youtube8M-Competition-Top1
Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text -Zhe wang, et al, arxiv2017

dnn_base

Group Normalization -Kaiming He, et al, arxiv2018
Graph Convolutional Network -Xiaolong Wang, Yufei Ye, Abhinav Gupta, CVPR2018
DetNAS: Backbone Search for Object Detection
Mixup

network

CabViT: Cross Attention among Blocks for Vision Transformer -Intellifusion, arxiv2022, code
EfficientFormerV2Rethinking Vision Transformers for MobileNet Size and Speed -Snap, arxiv2022, code
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning -ICLR2022,code
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP -sensetime, ECCV2022, code
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications -arxiv2022, code
Edgevits: Competing light-weight cnns on mobile devices with vision transformers -ECCV2022,code
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios -bytedance, arxiv2022, code
TRT-ViT: TensorRT-oriented Vision Transformer -bytedance, arxiv2022
EfficientFormer: Vision Transformers at MobileNet Speed -snap, arxiv2022, code
UNeXt: MLP-based Rapid Medical Image Segmentation Network -arxiv2022, code
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation -tencent, CVPR2022, code
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer apple, ICLR2022, code
PP-LCNet: A Lightweight CPU Convolutional Neural Network -baidu, arxiv2022
Metaformer is actually what you need for vision -Yanshuicheng, CVPR2022
TinyNetModel Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets -huawei, NeurIPS2020
GhostNet: More Features from Cheap Operations -huawei, CVPR2020
EfficientNet
SqueezeNet
Mobilenets -google, arxiv2017
MobileNet-V2 -google, CVPR2018 caffe-code
MobileNetV3
NasNet-A-Learning transferable architectures for scalable image recognition -google brain, CoRR2017
ShuffleNet -megvii, CoRR2017
ShuffleNetV2
ThunderNet
DarkNet/Tiny YOLOv3/Tiny YOLOv2/Yolo-Nano/SlimYOLO/YOLO-LITE/Gaussian YOLOv3
LightweightNet: Toward fast and lightweight convolutional neural networks via architecture distillation -XuTingbin, PR2019
Mobilefacenets
EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse
Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs
Joint Architecture and Knowledge Distillation in Convolutional Neural Network for Offline Handwritten Chinese Text Recognition -dujun, arxiv2019 Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition -huoqiang, PR2019 vovnet
http://openaccess.thecvf.com/content_CVPRW_2019/papers/CEFRL/Lee_An_Energy_and_GPU-Computation_Efficient_Backbone_Network_for_Real-Time_Object_CVPRW_2019_paper.pdf

model compression

teacher-student/mutual-learning/Self-Distillation
low-rank/SVD-decomposition/Tucker-decomposition/CP-decomposition

InformationExtraction

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition -alibaba, cvpr2024, code
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding -alibaba, code
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding -bytedance
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding -alibaba, code
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions -AAAI2024, code
Unifying Vision, Text, and Layout for Universal Document Processing -microsoft, cvpr2023, code
Diffusion-based Document Layout Generation -2023
Cogvlm: Visual expert for pretrained language models -tsinghua, code
Unidoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding -bytedance, arxiv2023
Llavar: Enhanced visual instruction tuning for text-rich image understanding -adobe, arxiv2023, code
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models -code
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models -megvii, arxiv2023, code
CogAgent: A Visual Language Model for GUI Agents -tsinghua, arxiv2023, code
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models -Salesforce, ICML2023, code
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation -Salesforce, arxiv2022, code
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking -microsoft, arxiv2022, code
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding -jinlianwen, ACL2022, code
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding -ant group, CVPR2022
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents -naver, AAAI2022, code
OCR-free Document Understanding Transformer -NAVER, ECCV2022, code
Synthetic document generator for annotation-free layout recognition -2021
StructuralLM: Structural Pre-training for Form Understanding -alibaba, ACL2021
Graph-based Deep Generative Modelling for Document Layout Generation -ICDAR2021
DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis -ICDAR2021, code
UniDoc: Unified Pretraining Framework for Document Understanding -adobe, NeurIPS2021
DocFormer: End-to-End Transformer for Document Understanding -amazon, ICCV2021
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis -ICDAR2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer -ICDAR2021
Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution -jinlianwen, AAAI2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding -microsoft,arxiv2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding -microsoft, ACL2021, code
SelfDoc: Self-Supervised Document Representation Learning -adobe, CVPR2021
Going full-tilt boogie on document understanding with text-image-layout transformer -ICDAR2021
READ: Recursive Autoencoders for Document Layout Generation -Amazon, CVPRW2020
LayoutLM: Pre-training of Text and Layout for Document Image Understanding -microsoft, KDD2020, code
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding -hikvision, arxiv2020
Image Generation from Layout -CVPR2019, code

database

EPHOIE - visual information extraction (VIE) in educational documents
[PubLayNet] - pretrain
[RVL-CDIP][IIT-CDIP]- document classification
[FUNSD]
[CORD]- receipt sementic entity extraction
[DocVQA]

knowledge distillation

Decoupled Knowledge Distillation -megvii, CVPR2022, code
Efficient knowledge distillation for rnn-transducer models -google/facebook, ICASSP2021
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models -NICT japan, ICASSP2019
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation -IBM, Interspeech2019
Explaining sequence-level knowledge distillation as data-augmentation for neural machine translation -arxiv2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion -microsoft, Interspeech2019
Knowledge Distillation for Sequence Model -AISpeech, Interspeech2018
Improved knowledge distillation from bi-directional to uni-directional LSTM CTC for end-to-end speech recognition -IBM, SLT2018
An Investigation of a Knowledge Distillation Method for CTC Acoustic Models -NICT japan, ICASSP2018
Sequence-Level Knowledge Distillation -Yoon Kim, EMNLP2016

Document Enhancement

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild -xpixel, arxiv2014, project, code
APISR: Anime Production Inspired Real-World Anime Super-Resolution -cvpr2024, code
Focal Network for Image Restoration -ICCV2023, code
PromptIR: Prompting for All-in-One Blind Image Restoration -code
HQ-50K: A Large-scale, High-quality Dataset for Image Restoration -code&dataset
Learning single image defocus deblurring with misaligned training pairs -pengcheng, AAAI2023, dataset
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior -arxiv2023, code
A Comprehensive Survey on Deep Neural Image Deblurring -arxiv2023
Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement -tsinghua, ICCV2023, code
High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net -ICCV2023, code
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration -CVPR2023, code
LSDIR: A Large Scale Dataset for Image Restoration,dataset
Learning Generative Structure Prior for Blind Text Image Super-Resolution -CVPR2023, code
Appearance Enhancement for Camera-captured Document Images in the Wild -jinlianwen, TAI2023, code&dataset
Pyramid Ensemble Structure for High Resolution Image Shadow Removal -meituan, CVPRW2023
NTIRE 2023 Image Shadow Removal Challenge Report
Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method -AAAI2023, code
NTIRE 2023 Challenge on Image Denoising: Methods and Results
AsConvSR: Fast and Lightweight Super-Resolution Network With Assembled Convolutions -huawei, CVPRW2023, code
Towards Real-Time 4K Image Super-Resolution -cvprw2023
Lightweight Real-Time Image Super-Resolution Network for 4K Images -cvprw2023, code
Bicubic++: Slim, Slimmer, Slimmest - Designing an Industry-Grade Super-Resolution Network -CVPRW2023
Efficient Deep Models for Real-Time 4K Image Super-Resolution. NTIRE 2023 Benchmark and Report
NTIRE 2023 Challenge on Efficient Super-Resolution: Methods and Results -CVPR2023
U-Shape Transformer for Underwater Image Enhancement -TIP2023, code
DocDiff: Document Enhancement via Residual Diffusion Models -arxiv2023, code
Pyramid Attention Network for Image Restoration -IJCV2023, code
Perceptual Image Enhancement for Smartphone Real-Time Applications -WACV2023, code
FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring -MM2022
Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis -arxiv2022, code
Realistic blur synthesis for learning image deblurring -ECCV2022, code
Deep Image Deblurring: A Survey -IJCV2022
Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior -tencent, IJCV2022, code
Learning Degradation Representations for Image Deblurring -ECCV2022, code
Stripformer: Strip transformer for fast image deblurring -ECCV2022, tsinghua-tw, code
Image De-raining Transformer -PAMI2022
C3-stisr: Scene text image super-resolution with triple clues -arxiv2022, bytedance, code
NTIRE 2022 challenge on perceptual image quality assessment
Learning Enriched Features for Fast Image Restoration and Enhancement -PAMI2022, code
Efficient Long-Range Attention Network for Image Super-Resolution -code
Improving Image Restoration by Revisiting Global Information Aggregation -megvii, ECCV2022, code
NAFSSR: Stereo Image Super-Resolution Using NAFNet -megvii, CVPR2022, code
AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos -tencent, NeurIPS2022, code
Simple baselines for image restoration -megvii, ECCV2022, code
Restormer: Efficient Transformer for High-Resolution Image Restoration -google, CVPR2022, code
MAXIM: Multi-Axis MLP for Image Processing -google, CVPR2022,code
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing -TCL, ECCV2022, database/code
Global-Local Stepwise Generative Network for Ultra High-Resolution Image Restoration -arxiv2022
Breaking down Polyblur: Fast Blind Correction of Small Anisotropic Blurs -google, 2022, code
Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis -arxiv2022, code
Real-esrgan: Training real-world blind super-resolution with pure synthetic data -tencent, ICCV2021, code
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices -alibaba, ACMMM2021code
SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices -arxiv2021, code
Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution for Mobile Devices -CVPR2021, code
Rethinking Coarse-to-Fine Approach in Single Image Deblurring -korea, ICCV2021,code
A Survey on Deep learning based Document Image Enhancement -arxiv2021
Iterative filter adaptive network for single image defocus deblurring -POSTECH, CVPR2021, RealDOF dataset
NTIRE 2021 challenge for defocus deblurring using dual-pixel images: Methods and results -CVPR2021, code
Contrastive Learning for Compact Single Image Dehazing -tencent, CVPR2021, code
Towards Flexible Blind JPEG Artifacts Removal -ICCV2021, code
Hinet: Half instance normalization network for image restoration -megvii, code
Multi-Stage Progressive Image Restoration -google, CVPR2021, code
Learning frequency domain priors for image demoireing -PAMI2021, code
Morié Attack (MA): A New Potential Risk of Screen Photos -NIPs2021, code
High-resolution photorealistic image translation in real-time: A laplacian pyramid translation network -alibaba, CVPR2021, code
WDNet: Watermark-Decomposition Network for Visible Watermark Removal -baixiang, WACV2021, database/code
Image demoireing with learnable bandpass filters -CVPR2020, code
High Resolution Demoire Network -ICIP2020, code
BEDSR-Net: A Deep Shadow Removal Network From a Single Document Image -CVPR2020, code
DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement -PAMI2020, code
Deblurring by Realistic Blurring -tencent, CVPR2020,code
DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better -softserve, ICCV2019, code
Deep Bilateral Learning for Real-Time Image Enhancement -google, Siggraph2017, code

Document Rectification

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary -baidu, arxiv2023
Deep Unrestricted Document Image Rectification -arxiv2023, code
End-to-End Piece-Wise Unwarping of Document Images -amazon, ICCV2021, code
Geometric Representation Learning for Document Image Rectification -ECCV2022, code
Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild -MM2022, jinlianwen, code
Fourier Document Restoration for Robust Document Dewarping and Recognition -CVPR2022, bai song database
Revisiting document image dewarping by grid regularization -alibaba,CVPR2022,code
Learning From Documents in the Wild to Improve Document Unwarping -snap, SIGGRAPH2022, code
DocScanner: Robust Document Image Recti cation with Progressive Learning -arxiv2021
Doctr: Document image transformer for geometric unwarping and illumination correction -MM2021, code
Document Dewarping with Control Points -ICDAR2021, code&dataset
Document Rectification and Illumination Correction using a Patch-based CNN -SIGGRAPH2019, code
Learning to Calibrate Straight Lines for Fisheye Image Rectification -CVPR2019

image alignment/registration

DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures -jinlianwen, arxiv2023, code
Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping -IJDAR2023, code

Inpainting

Inpaint Anything: Segment Anything Meets Image Inpainting -arxiv2023, code
LAMA:Resolution-Robust Large Mask Inpainting With Fourier Convolutions -samsung, WACV2022, code
RePaint: Inpainting Using Denoising Diffusion Probabilistic Models -CVPR2022, code
MAT: Mask-Aware Transformer for Large Hole Image Inpainting -adobe, CVPR2022, code
Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding -CVPR2022, code
Aggregated Contextual Transformations for High-Resolution Image Inpainting -arxiv2021, code
Free-Form Image Inpainting with Gated Convolution -bytedance, ICCV2019, code

Graph

Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks -PR2021
A Comprehensive Survey on Graph Neural Networks -TNN2020
Contextual Stroke Classification in Online Handwritten Documents with Edge Graph Attention Networks -SNCS2020
Deepgcns: Can gcns go as deep as cnns? -ICCV2019
Heterogeneous graph attention network -WWW2019
Contextual Stroke Classification in Online Handwritten Documents with Graph Attention Networks -ICDAR2019
Graph Convolutional Networks for Text Classification -AAAI2019
Graph Attention Networks -ICLR2018
Semi-Supervised Classification with Graph Convolutional Networks -ICLR2017