Cybergod Related Projects

This article discusses the utilization of Large Language Models (LLMs) and neural networks in various AI and autonomous agent projects, including smartphone apps, web tasks, and research papers. It highlights notable projects such as AgentLLM, rci-agent, gpt-computer-agent, etc., and mentions that some lack official documentation. Additionally, cloned repositories of these projects are also mentioned in the article.

search for site:github.com <computer agent benchmark name> agent and get a bunch of new computer agent frameworks

https://github.com/posgnu/rci-agent

https://github.com/stanfordnlp/wge

https://github.com/ServiceNow/BrowserGym

https://lmql.ai llm query language

neural network generation/neural developmental programs

https://arxiv.org/abs/2406.09787

James4Ever0/agi_computer_control: Autonomous computer program that can do anything without human operators.

niuzaisheng/ScreenAgent: ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model

tmgthb/Autonomous-Agents: Autonomous Agents (LLMs) research papers. Updated Daily.

ltzheng/Synapse: [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control

SkyworkAI/agent-studio: Benchmarks, environments, and toolkits for general computer agents

OS-Copilot/OS-Copilot: An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.

landing-ai/vision-agent: Vision agent

smartcomputer-ai/agent-os: Build autonomous AI agents! 🌞

idosal/AgentLLM: AgentLLM is a PoC for browser-native autonomous agents

posgnu/rci-agent: A codebase for “Language Models can Solve Computer Tasks”

khulnasoft/gpt-computer-agent: GPT4 for windows, macos and ubuntu

TheDuckAI/DuckTrack: Multimodal computer agent data collection program

X-PLUG/MobileAgent: Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

mnotgod96/AppAgent: AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

stableagents/stableagents: Stable, Semi-Autonomous, Reliable and Steerable LLM Agents for production use cases.

OSU-NLP-Group/Mind2Web: [NeurIPS’23 Spotlight] “Mind2Web: Towards a Generalist Agent for the Web”

microsoft/autogen: A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

geekan/MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

richardyc/Chrome-GPT: An AutoGPT agent that controls Chrome on your desktop

Mass Copy URLs − copy all URLs on all tabs

openeqa: embodied agent question answering benchmark

https://github.com/facebookresearch/open-eqa

GUI Automation agent and dataset:

https://github.com/OpenGVLab/GUI-Odyssey (cross-app tasks)

https://github.com/TransformerOptimus/AutoNode

https://superagi.com/

https://huggingface.co/datasets/SuperAGI/GUIDE/

https://huggingface.co/SuperAGI/SAM

AI pipeline orchestration:

https://github.com/instill-ai/instill-core

https://github.com/ComposioHQ/composio/ (with GUI agent)

GUI dataset annotation can be done manually or using multimodal LLM.

It matters to validate the answer of questions requiring code execution with code.

YOLO GUI element identification:

https://github.com/rahulkundelwalll/YOLOv8-Web-Element-Recognition-Model

https://huggingface.co/foduucom/web-form-ui-field-detection

https://github.com/js0nwu/webui

https://huggingface.co/docs/transformers/model_doc/pix2struct

https://github.com/google-research/pix2struct

https://github.com/M3SOulu/WinGUICrawler

https://huggingface.co/datasets/yiye2023/GUIEnv

https://huggingface.co/datasets/yiye2023/GUIAct

https://huggingface.co/SiyuanH/GUIAgent

https://huggingface.co/datasets/SiyuanH/GUIAgent

https://huggingface.co/SiyuanH/GUIAgent-InternLM7B

Not every repo has official documentation.

git clone https://github.com/opendilab/LightZero
git clone https://github.com/ruvnet/q-star
git clone https://github.com/tairov/QStarLearning.mojo
git clone https://github.com/estill01/open_qstar
git clone https://github.com/openai/Video-Pre-Training
git clone https://github.com/abhiprojectz/SingularGPT
git clone https://github.com/ddupont808/GPT-4V-Act
# preload-view.js:markPage is the html-to-boundingbox tool.
# the author wants to create a coco dataset
# specialized in UIED-like functionality
git clone https://github.com/Charmve/gpt-eyes
git clone https://github.com/OthersideAI/self-operating-computer
git clone https://github.com/unconv/gpt4v-browsing
git clone https://github.com/THUDM/CogVLM
git clone https://github.com/mnotgod96/AppAgent

Cybergod Related Projects

Comments