Cybergod-Like Agents, General Computer Control
matmul-free llm
https://arxiv.org/abs/2406.02528
https://github.com/KingNishHF/OpenGPT-4o
aider coding assist devon alternative
kyutai moshi gpt4o alternative
firefunction v2 function calling llm
codegeex4-all-9b
https://github.com/lavague-ai/LaVague
https://github.com/Upsonic/Tiger
computer agents:
https://github.com/slavakurilyak/awesome-ai-agents
gui agent model trained on gui-world
gui agent datasets on huggingface
autocoder with pretrained models, has access to terminal:
https://github.com/bin123apple/AutoCoder
you can label the gui manually, write comments to each ui element and write exact operate steps about the exact execution steps.
GUI detection algorithm:
https://github.com/MulongXie/UIED
minified segment anything model:
https://github.com/xinghaochen/TinySAM
https://github.com/graylan0/gptcomputer
https://github.com/patterns-complexity/gpt-pc-control
https://github.com/b5marwan/gpt-vision-agent
https://github.com/rogeriochaves/driver
https://github.com/s-a-ng/control-pc-with-gpt4-vision
gpt related:
https://github.com/szczyglis-dev/py-gpt
https://github.com/EwingYangs/awesome-open-gpt
gpt-4o is gaining popularity in computer control.
https://github.com/CK92149/GPTComputerAutomation
https://github.com/onuratakan/gpt-computer-assistant
https://github.com/kyegomez/GPT4o
terminal controlling agent:
https://github.com/greshake/Alice
Simulated computer control environments:
https://github.com/xlang-ai/OSWorld
Multi-agent framework, routing:
https://python.langchain.com/v0.1/docs/langgraph
Devin open source alternative:
https://github.com/entropy-research/Devon
https://github.com/stitionai/devika
https://github.com/semanser/codel
Web browsing agent:
https://github.com/THUDM/AutoWebGLM
Agent-Eval-Refine contains models for GUI captioning, iOS finetuned CogAgent, and several GUI agent datasets.
ScreenAgent includes a lots of related computer control papers and projects in, along with a self-trained model on huggingface.
Similar projects:
https://github.com/TobiasNorlund/UI-Act
Listed projects:
https://github.com/x-plug/mobileagent
https://github.com/google-research/google-research/tree/master/screen2words
https://github.com/rainyugg/blip-adapter
https://github.com/imnearth/coat
https://github.com/xbmxb/aagent
https://github.com/princeton-nlp/ptp
https://github.com/njucckevin/seeclick
https://github.com/thudm/autowebglm
https://github.com/OS-Copilot/OS-Copilot
Environments:
https://github.com/google-deepmind/android_env
https://github.com/x-lance/mobile-env
Datasets:
https://github.com/google-research-datasets/screen_qa
Open-Interface utilizes GPT-4V to control computer interface.
Devin is an AI agent that can solve many real-world Github issues, with access to browser, terminal and code editor.
Cradle is a general computer controlling agent developed to play Red Dead Redeption II.
Pythagora aka GPT Pilot is a true AI developer that writes code, debugs it, talks to you when it need.
Devin open source counterparts:
GPA-LM: a list of game playing agents