2023-12-25
Cybergod Related Projects

https://github.com/frdel/agent-zero


https://digirl-agent.github.io/

https://github.com/opendilab/awesome-ui-agents

https://github.com/hyp1231/awesome-llm-powered-agent

https://github.com/skyvern-ai/skyvern


https://github.com/Envedity/DAIA


https://github.com/mem0ai/mem0


https://qinghonglin.github.io/

https://github.com/showlab/Awesome-GUI-Agent

https://github.com/waterhorse1/LLM_Tree_Search

https://github.com/evilsocket/nerve

https://www.superjoin.ai

https://github.com/test-time-training/ttt-lm-pytorch

https://github.com/stanfordnlp/dspy

https://www.builder.io/blog/micro-agent


search for site:github.com <computer agent benchmark name> agent and get a bunch of new computer agent frameworks

https://github.com/posgnu/rci-agent

https://github.com/stanfordnlp/wge

https://github.com/ServiceNow/BrowserGym


https://lmql.ai llm query language


neural network generation/neural developmental programs

https://arxiv.org/abs/2406.09787

James4Ever0/agi_computer_control: Autonomous computer program that can do anything without human operators.

niuzaisheng/ScreenAgent: ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model

tmgthb/Autonomous-Agents: Autonomous Agents (LLMs) research papers. Updated Daily.

ltzheng/Synapse: [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control

SkyworkAI/agent-studio: Benchmarks, environments, and toolkits for general computer agents

OS-Copilot/OS-Copilot: An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.

landing-ai/vision-agent: Vision agent

smartcomputer-ai/agent-os: Build autonomous AI agents! 🌞

idosal/AgentLLM: AgentLLM is a PoC for browser-native autonomous agents

posgnu/rci-agent: A codebase for “Language Models can Solve Computer Tasks”

khulnasoft/gpt-computer-agent: GPT4 for windows, macos and ubuntu

TheDuckAI/DuckTrack: Multimodal computer agent data collection program

X-PLUG/MobileAgent: Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

mnotgod96/AppAgent: AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

stableagents/stableagents: Stable, Semi-Autonomous, Reliable and Steerable LLM Agents for production use cases.

OSU-NLP-Group/Mind2Web: [NeurIPS’23 Spotlight] “Mind2Web: Towards a Generalist Agent for the Web”

microsoft/autogen: A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

geekan/MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

richardyc/Chrome-GPT: An AutoGPT agent that controls Chrome on your desktop

Mass Copy URLs − copy all URLs on all tabs


openeqa: embodied agent question answering benchmark

https://github.com/facebookresearch/open-eqa


GUI Automation agent and dataset:

https://github.com/OpenGVLab/GUI-Odyssey (cross-app tasks)

https://github.com/TransformerOptimus/AutoNode

https://superagi.com/

https://huggingface.co/datasets/SuperAGI/GUIDE/

https://huggingface.co/SuperAGI/SAM


AI pipeline orchestration:

https://github.com/instill-ai/instill-core

https://github.com/ComposioHQ/composio/ (with GUI agent)


GUI dataset annotation can be done manually or using multimodal LLM.

It matters to validate the answer of questions requiring code execution with code.


YOLO GUI element identification:

https://github.com/rahulkundelwalll/YOLOv8-Web-Element-Recognition-Model

https://huggingface.co/foduucom/web-form-ui-field-detection

https://github.com/js0nwu/webui

https://huggingface.co/docs/transformers/model_doc/pix2struct

https://github.com/google-research/pix2struct

https://github.com/M3SOulu/WinGUICrawler

https://huggingface.co/datasets/yiye2023/GUIEnv

https://huggingface.co/datasets/yiye2023/GUIAct

https://huggingface.co/SiyuanH/GUIAgent

https://huggingface.co/datasets/SiyuanH/GUIAgent

https://huggingface.co/SiyuanH/GUIAgent-InternLM7B


Not every repo has official documentation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
git clone https://github.com/opendilab/LightZero
git clone https://github.com/ruvnet/q-star
git clone https://github.com/tairov/QStarLearning.mojo
git clone https://github.com/estill01/open_qstar
git clone https://github.com/openai/Video-Pre-Training
git clone https://github.com/abhiprojectz/SingularGPT
git clone https://github.com/ddupont808/GPT-4V-Act
# preload-view.js:markPage is the html-to-boundingbox tool.
# the author wants to create a coco dataset
# specialized in UIED-like functionality
git clone https://github.com/Charmve/gpt-eyes
git clone https://github.com/OthersideAI/self-operating-computer
git clone https://github.com/unconv/gpt4v-browsing
git clone https://github.com/THUDM/CogVLM
git clone https://github.com/mnotgod96/AppAgent

Read More