HubLensLLMmeituan/EvoCUA
// archived 2026-04-29
meituan

EvoCUA

AI#LLM#Multimodal#Agent#Computer Use#vLLM
View on GitHub
314

// summary

EvoCUA is a high-performance open-source multimodal model designed for end-to-end computer automation across various desktop applications. It currently holds the top ranking on the OSWorld benchmark and demonstrates superior cross-OS generalization capabilities. Additionally, the model is recognized for its robust safety profile, exhibiting the lowest unintended-behavior rate among leading computer-use agents.

// technical analysis

EvoCUA is a general-purpose multimodal agent designed for computer use, utilizing a novel data synthesis and training methodology to enhance performance across various desktop applications. By achieving state-of-the-art results on the OSWorld benchmark, it addresses the challenge of creating robust, open-source agents capable of executing complex, multi-turn tasks via natural language instructions. The project prioritizes both performance and safety, demonstrating superior robustness against unintended behaviors compared to other leading computer-use agents.

// key highlights

01
Ranks as the #1 open-source model on the OSWorld benchmark with a 56.7% task completion rate.
02
Demonstrates strong zero-shot cross-OS generalization, significantly outperforming base models on the WindowsAgentArena.
03
Features a novel training and data synthesis approach that improves computer-use capabilities without sacrificing general model performance.
04
Provides end-to-end multi-turn automation for common desktop software including Chrome, Excel, PowerPoint, and VSCode.
05
Validated as the safest computer-use agent in an independent study, exhibiting the lowest rate of unintended behaviors.
06
Offers high efficiency by achieving competitive performance with fewer parameters and fewer execution steps than larger models.

// use cases

01
End-to-end multi-turn automation for applications like Chrome, Excel, and VSCode
02
Zero-shot cross-OS control for diverse desktop environments
03
Scalable synthetic experience training for improved computer-use capabilities

// getting started

To begin, clone the repository and install the required dependencies using Python 3.12. Download the model weights from HuggingFace and deploy them using vLLM as an OpenAI-compatible inference server. Finally, configure your environment variables and use the provided evaluation scripts to run tasks within the OSWorld environment.