1 | from jinja2 import Environment |
1 | class NeverUndefined(jinja2.StrictUndefined): |
1 | from jinja2 import Environment |
1 | class NeverUndefined(jinja2.StrictUndefined): |
from collections import defaultdict, Counter
mydict = defaultdict(Counter) # remember to put callable or none as argument of defaultdict
mydict = defaultdict(lambda: defaultdict(lambda: 0)) # alternative
mydict['k1']['k2'] += 1 # two is ok, but not one or three
https://github.com/frdel/agent-zero
https://digirl-agent.github.io/
https://github.com/opendilab/awesome-ui-agents
https://github.com/hyp1231/awesome-llm-powered-agent
https://github.com/skyvern-ai/skyvern
https://github.com/Envedity/DAIA
https://github.com/mem0ai/mem0
https://qinghonglin.github.io/
https://github.com/showlab/Awesome-GUI-Agent
https://github.com/waterhorse1/LLM_Tree_Search
https://github.com/evilsocket/nerve
https://github.com/test-time-training/ttt-lm-pytorch
https://github.com/stanfordnlp/dspy
https://www.builder.io/blog/micro-agent
search for site:github.com <computer agent benchmark name> agent
and get a bunch of new computer agent frameworks
https://github.com/posgnu/rci-agent
https://github.com/stanfordnlp/wge
https://github.com/ServiceNow/BrowserGym
https://lmql.ai llm query language
neural network generation/neural developmental programs
https://arxiv.org/abs/2406.09787
niuzaisheng/ScreenAgent: ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model
tmgthb/Autonomous-Agents: Autonomous Agents (LLMs) research papers. Updated Daily.
ltzheng/Synapse: [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
SkyworkAI/agent-studio: Benchmarks, environments, and toolkits for general computer agents
landing-ai/vision-agent: Vision agent
smartcomputer-ai/agent-os: Build autonomous AI agents! 🌞
idosal/AgentLLM: AgentLLM is a PoC for browser-native autonomous agents
posgnu/rci-agent: A codebase for “Language Models can Solve Computer Tasks”
khulnasoft/gpt-computer-agent: GPT4 for windows, macos and ubuntu
TheDuckAI/DuckTrack: Multimodal computer agent data collection program
X-PLUG/MobileAgent: Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
OSU-NLP-Group/Mind2Web: [NeurIPS’23 Spotlight] “Mind2Web: Towards a Generalist Agent for the Web”
microsoft/autogen: A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
richardyc/Chrome-GPT: An AutoGPT agent that controls Chrome on your desktop
Mass Copy URLs − copy all URLs on all tabs
openeqa: embodied agent question answering benchmark
https://github.com/facebookresearch/open-eqa
GUI Automation agent and dataset:
https://github.com/OpenGVLab/GUI-Odyssey (cross-app tasks)
https://github.com/TransformerOptimus/AutoNode
https://huggingface.co/datasets/SuperAGI/GUIDE/
https://huggingface.co/SuperAGI/SAM
AI pipeline orchestration:
https://github.com/instill-ai/instill-core
https://github.com/ComposioHQ/composio/ (with GUI agent)
GUI dataset annotation can be done manually or using multimodal LLM.
It matters to validate the answer of questions requiring code execution with code.
YOLO GUI element identification:
https://github.com/rahulkundelwalll/YOLOv8-Web-Element-Recognition-Model
https://huggingface.co/foduucom/web-form-ui-field-detection
https://github.com/js0nwu/webui
https://huggingface.co/docs/transformers/model_doc/pix2struct
https://github.com/google-research/pix2struct
https://github.com/M3SOulu/WinGUICrawler
https://huggingface.co/datasets/yiye2023/GUIEnv
https://huggingface.co/datasets/yiye2023/GUIAct
https://huggingface.co/SiyuanH/GUIAgent
https://huggingface.co/datasets/SiyuanH/GUIAgent
https://huggingface.co/SiyuanH/GUIAgent-InternLM7B
Not every repo has official documentation.
1 | git clone https://github.com/opendilab/LightZero |
cursed by the wall.
visit and save https://repo.waydro.id/
as waydroid_init_repo.sh
, https://repo.waydro.id/waydroid.gpg
as waydroid.gpg
(using proxy)
comment out the download part in waydroid_init_repo.sh
1 | # curl --progress-bar --proto '=https' --tlsv1.2 -Sf https://repo.waydro.id/waydroid.gpg --output /usr/share/keyrings/waydroid.gpg |
move waydroid.gpg
to /usr/share/keyrings/waydroid.gpg
execute sudo bash waydroid_init_repo.sh
to setup waydroid repository
on ubuntu you need to use proxy during apt mirror syncing.
to setup proxy (relay local proxy to host):
1 | proxy --port <port> --host <host> \ |
to use proxy:
1 | sudo env https_proxy=http://<host>:<port> http_proxy=http://<host>:<port> all_proxy=http://<host>:<port> apt update |
after installation you should comment out the mirror at: /etc/apt/sources.list.d/waydroid.list
start waydroid service: sudo systemctl enable --now waydroid-container
the download speed of sourceforge is very slow, unless you use mirror like liquidtelecom
modify the file /usr/lib/waydroid/tools/helpers/http.py
1 | ... |
restart service: sudo systemctl restart waydroid-container
run command sudo waydroid init
install weston: apt install weston
configure weston at ~/.config/weston.ini
1 | [core] |
run weston
, launch terminal at top left corner, run waydroid
in addition to the official guide, you also need to enable firewalld
or ufw
to make it work.
the wifi switch is irrelevant to network. it won’t be turned on.
rankrag
standford storm 2.0 writer
togethercomputer moa
starrag
hipporag
https://github.com/microsoft/graphrag
https://github.com/danielmiessler/fabric
https://github.com/infiniflow/ragflow
https://github.com/Jenqyang/LLM-Powered-RAG-System
https://github.com/lamini-ai/Lamini-Memory-Tuning
llm generate images for content
llm generate tags & categories for content
llm generate embedding for content
llm generate query words
llm generate query image/audio
system perform full text search
system perform vector search
llm generate relevance or preference
llm generate potential query for content
system update relevance based on llm preference
1 | pylint --enable=unspecified-exception your_python_file.py |
However it is recommend to build microservices and log failures
Intel uses SYCL and oneAPI for acceleration. These also target NVIDIA GPUs and AMD GPUs.
Unlike AMD GPUs, Intel does not separate iGPU VRAM from system RAM, which means iGPU can make full use of it.
Still cheaper than Mac Studio, though overall memory is smaller.
Disable power feature in case multi-GPU program does not work expectedly. More info here.
1 | sudo vim /etc/default/grub |
The codename: gfx803
You may have to build it yourself
Setup environmental parameters and install drivers:
1 | sudo echo ROC_ENABLE_PRE_VEGA=1 >> /etc/environment |
Build torch:
1 | git clone https://github.com/pytorch/pytorch.git -b v1.13.1 |
If you want to use this beefy GPU for computation, then either prepare a suitable ventalized desktop frame or use external GPU connected by OCuLink, which can be found on latest MiniPCs and laptops.
Your integrated GPU gfx90c
can be used for AI.
To run it without container, you build it with codename gfx900
.
Either way, you need to specify export HSA_OVERRIDE_GFX_VERSION=9.0.0
.
Run a container:
1 | sudo docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest |
If you want to run ollama
on AMD GPUs, you must install ROCm 6.
Additinally if the card is gfx90c
, you need to run export HSA_ENABLE_SDMA=0
.
You can get current ROCm version by dpkg -l | grep -i rocm
.
You can disable GPU by export HSA_OVERRIDE_GFX_VERSION=1
.
Since latest ollama
accesses ROCm, run it with root
account.
In order to circumvent BIOS VRAM limitation for APU, you can follow the instruction here.
Related repos:
force-host-alloction-APU (by hooking VRAM allocators)
To run llama.cpp
on Oneplus Ace2V, you need an extra step:
1 | export LD_LIBRARY_PATH=/vendor/lib64:/vendor/lib64/mt6983:/vendor/lib64/egl/mt6983 |
https://www.novaspivack.com/business/the-four-levels-of-ai-steps-to-self-evolution
i want to create some sort of agent, that learns autoregressively on historical tokens (not necessarily present in history, but close). however, when the agent is given some previous tokens, it is expected to send some actions to the environment in order to really observe the given tokens to get reward. the agent is not allowed to directly generate the token to the environment in order to prevent cheating. the agent is rewarded to successfully rebuild the past or predict and build the future. to predict the future is like the target token is generated by the agent itself instead of some automatic history replay bot, and the rest of the reward system follows the same way as the history replay reward system. this kind of system might have some sort of consciousness and therefore agi
the main objective of AGI is to create another version of itself.
the verification system can be built upon internal hidden tokens (you feel like you made it, feeling based) or similarity based (timeseries similarity or semantic similarity). there can be some external verification system such as lifespan, disk usage, view count, popularity, total capital etc.
the main problem of making this work is how to train it in parallel. the real world can be replaced by some world model (say some neural network) so that it can go back in time, or some really fast real world evaluators or some special world evaluators which supports time traversal, like virtual machine snapshots, web browsers (tab traversal). alphago has such advantage because go game is a very simple world model, while the real world is not.
also this could build some hierarchy like: real world -> world model -> agent -> superagent -> …