VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

📢 News

[2024.6] We release the arXiv paper.
[2024.9] Accepted by NeurIPS 2024 D&B.
[2024.10] We released the data at Huggingface dataset. Please stay tuned for further updates.

📖 Introduction

TL;DR: A Multi-modal Benchmark for Visual-centric GUI Automation from Instructional Videos.

Visual-centric softwares and tasks: VideoGUI focuses on professional and novel software like PR and AE for video editing, or Stable Diffusion and Runway for visual creation. Besides, the task query emphasizes visual preview rather than textual instructions.

Instructional videos with human demonstration: We source novel tasks from high-quality instructional videos, with annotators replicating these to reproduce effects.

Hierarchical planning and actions: We provide detailed annotations with planning procedures and recorded actions for hierarchical evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
static		static
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

📢 News

📖 Introduction

About

Releases

Packages

Languages

showlab/videogui

Folders and files

Latest commit

History

Repository files navigation

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

📢 News

📖 Introduction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages