https://github.com/frdel/agent-zero
https://digirl-agent.github.io/
https://github.com/opendilab/awesome-ui-agents
https://github.com/hyp1231/awesome-llm-powered-agent
https://github.com/skyvern-ai/skyvern
https://github.com/Envedity/DAIA
https://github.com/mem0ai/mem0
https://qinghonglin.github.io/
https://github.com/showlab/Awesome-GUI-Agent
https://github.com/waterhorse1/LLM_Tree_Search
https://github.com/evilsocket/nerve
https://github.com/test-time-training/ttt-lm-pytorch
https://github.com/stanfordnlp/dspy
https://www.builder.io/blog/micro-agent
search for site:github.com <computer agent benchmark name> agent
and get a bunch of new computer agent frameworks
https://github.com/posgnu/rci-agent
https://github.com/stanfordnlp/wge
https://github.com/ServiceNow/BrowserGym
https://lmql.ai llm query language
neural network generation/neural developmental programs
https://arxiv.org/abs/2406.09787
niuzaisheng/ScreenAgent: ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model
tmgthb/Autonomous-Agents: Autonomous Agents (LLMs) research papers. Updated Daily.
ltzheng/Synapse: [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
SkyworkAI/agent-studio: Benchmarks, environments, and toolkits for general computer agents
landing-ai/vision-agent: Vision agent
smartcomputer-ai/agent-os: Build autonomous AI agents! 🌞
idosal/AgentLLM: AgentLLM is a PoC for browser-native autonomous agents
posgnu/rci-agent: A codebase for “Language Models can Solve Computer Tasks”
khulnasoft/gpt-computer-agent: GPT4 for windows, macos and ubuntu
TheDuckAI/DuckTrack: Multimodal computer agent data collection program
X-PLUG/MobileAgent: Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
OSU-NLP-Group/Mind2Web: [NeurIPS’23 Spotlight] “Mind2Web: Towards a Generalist Agent for the Web”
microsoft/autogen: A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
richardyc/Chrome-GPT: An AutoGPT agent that controls Chrome on your desktop
Mass Copy URLs − copy all URLs on all tabs
openeqa: embodied agent question answering benchmark
https://github.com/facebookresearch/open-eqa
GUI Automation agent and dataset:
https://github.com/OpenGVLab/GUI-Odyssey (cross-app tasks)
https://github.com/TransformerOptimus/AutoNode
https://huggingface.co/datasets/SuperAGI/GUIDE/
https://huggingface.co/SuperAGI/SAM
AI pipeline orchestration:
https://github.com/instill-ai/instill-core
https://github.com/ComposioHQ/composio/ (with GUI agent)
GUI dataset annotation can be done manually or using multimodal LLM.
It matters to validate the answer of questions requiring code execution with code.
YOLO GUI element identification:
https://github.com/rahulkundelwalll/YOLOv8-Web-Element-Recognition-Model
https://huggingface.co/foduucom/web-form-ui-field-detection
https://github.com/js0nwu/webui
https://huggingface.co/docs/transformers/model_doc/pix2struct
https://github.com/google-research/pix2struct
https://github.com/M3SOulu/WinGUICrawler
https://huggingface.co/datasets/yiye2023/GUIEnv
https://huggingface.co/datasets/yiye2023/GUIAct
https://huggingface.co/SiyuanH/GUIAgent
https://huggingface.co/datasets/SiyuanH/GUIAgent
https://huggingface.co/SiyuanH/GUIAgent-InternLM7B
Not every repo has official documentation.
1 | git clone https://github.com/opendilab/LightZero |