Cybergod-Like Agents, General Computer Control

AI models
Computer control tools
Cybergod-like Agents
llm
OpenGPT-4o
aider
Kyutai
Firefunction v2
Codegeex4-all-9b
LaVague
Tiger
GUI agent model
autocoder
AutoCoder
GUI detection algorithm
This article explores various AI models and tools designed for computer control, such as Cybergod-like Agents, llm, OpenGPT-4o, aider, Kyutai, Firefunction v2, Codegeex4-all-9b, LaVague, Tiger, GUI agent model, autocoder, AutoCoder, and GUI detection algorithm. These tools assist with coding tasks, analyze GUIs on digital devices, and automate various processes.
Published

March 14, 2024


matmul-free llm

https://arxiv.org/abs/2406.02528


https://github.com/KingNishHF/OpenGPT-4o

aider coding assist devon alternative

kyutai moshi gpt4o alternative

firefunction v2 function calling llm

codegeex4-all-9b


https://github.com/lavague-ai/LaVague

https://github.com/Upsonic/Tiger

computer agents:

https://github.com/slavakurilyak/awesome-ai-agents


gui agent model trained on gui-world

gui agent datasets on huggingface

autocoder with pretrained models, has access to terminal:

https://github.com/bin123apple/AutoCoder


you can label the gui manually, write comments to each ui element and write exact operate steps about the exact execution steps.


GUI detection algorithm:

https://github.com/MulongXie/UIED


minified segment anything model:

https://github.com/xinghaochen/TinySAM


https://github.com/graylan0/gptcomputer

https://github.com/patterns-complexity/gpt-pc-control

https://github.com/b5marwan/gpt-vision-agent

https://github.com/rogeriochaves/driver

https://github.com/s-a-ng/control-pc-with-gpt4-vision


gpt related:

https://github.com/szczyglis-dev/py-gpt

https://github.com/EwingYangs/awesome-open-gpt


gpt-4o is gaining popularity in computer control.

https://github.com/CK92149/GPTComputerAutomation

https://github.com/onuratakan/gpt-computer-assistant

https://github.com/kyegomez/GPT4o


terminal controlling agent:

https://github.com/greshake/Alice


Simulated computer control environments:

https://github.com/xlang-ai/OSWorld


Multi-agent framework, routing:

https://python.langchain.com/v0.1/docs/langgraph


Devin open source alternative:

https://github.com/entropy-research/Devon

https://github.com/stitionai/devika

https://github.com/semanser/codel


Web browsing agent:

https://github.com/THUDM/AutoWebGLM


Agent-Eval-Refine contains models for GUI captioning, iOS finetuned CogAgent, and several GUI agent datasets.


ScreenAgent includes a lots of related computer control papers and projects in, along with a self-trained model on huggingface.

Similar projects:

https://github.com/TobiasNorlund/UI-Act

Listed projects:

https://github.com/x-plug/mobileagent

https://github.com/google-research/google-research/tree/master/screen2words

https://github.com/rainyugg/blip-adapter

https://github.com/imnearth/coat

https://github.com/xbmxb/aagent

https://github.com/princeton-nlp/ptp

https://github.com/njucckevin/seeclick

https://github.com/thudm/autowebglm

https://github.com/OS-Copilot/OS-Copilot

Environments:

https://github.com/google-deepmind/android_env

https://github.com/x-lance/mobile-env

Datasets:

https://github.com/google-research-datasets/screen_qa


Open-Interface utilizes GPT-4V to control computer interface.


Devin is an AI agent that can solve many real-world Github issues, with access to browser, terminal and code editor.

Cradle is a general computer controlling agent developed to play Red Dead Redeption II.

Pythagora aka GPT Pilot is a true AI developer that writes code, debugs it, talks to you when it need.


Devin open source counterparts:


GPA-LM: a list of game playing agents