站着操作电脑 需要脚踏板式的跑步机
循环水床垫
循环水马甲
带风扇的坐垫和靠背
地板瓷砖
加湿器风扇
水空调 自动加水 工业
除湿机 自动排水
高锰酸钾
软加硬空调出风挡板 强力胶带 防水
移动双风道空调
站着操作电脑 需要脚踏板式的跑步机
循环水床垫
循环水马甲
带风扇的坐垫和靠背
地板瓷砖
加湿器风扇
水空调 自动加水 工业
除湿机 自动排水
高锰酸钾
软加硬空调出风挡板 强力胶带 防水
移动双风道空调
make specialized (in RPA) tokenizer and embedding for this new model. add new words to the tokenizer.
you can just boot ubuntu/kali/parrot iso without installing.
but that would make us embarrasing. we need to check for the option.
use ChatGPT-derived projects for localized propaganda on CyberGod and The Frozen Forest.
using obs-websocket you can use python to do real scripting. but first spin up obs first (with websocket related commandline arguments)
launch obs in minimized way obs --minimize-to-tray
or just using xvfb.
you can also write and load scripts for obs, run on custom intervals and conditions.
your OS may go slient if you want to record audio from “speakers”
using pyaudio, on macos, you need blackhole for sending all audio to oblivion, thus able to be recorded.
on Linux, you need audio loopback device.
run: sudo modprobe snd-aloop
you use hw:1:0
or “Analog Device Output” for slient output/speaker, and use hw:1:1
or “Analog Device Input” for recording.
it is always a mystery for us to develop the right ML model. however, we can setup guidelines of good performance over specific task.
automate the benchmark, setup metrics. there could be more room for trials and imagination.
use hfft/rfft to transform multipart inputs (special bits, different part of mouse coords (x, y, dx, dy))
if you want to use complex number as RNN input, you may need to swap ViT for ComplexConv2D, but maybe you just need a few.
libraries that handle complex neural networks:
do our model have to output multimodal data?
if you combine some “special” bits along with token embeding by ihfft, you may have to retrain the entire damn network. also in order to make way for special bits, you may have to introduce extra linear layer.
some may prefer “LoRA”? by only introducing few tunable params and changing the overall output?
we may not annotate anything in our dataset. in contrast, we will set goals and make multiple interfaces for our model to explore.
you can add special task specific embedding before passing to main model, then minus that task specific embedding after passing to classification model.
make sure you don’t share important files as read/write on VM.
you may host some “execution server” on UTM VMs. you may expose your very large hard disk using WebDAV server. i think x11vnc and other vnc server may suffice for linux, but we always want to listen to the real operational data, including human operation/intervention, not just those in VNC protocols.
WebDAV servers:
wsgidav (python)
1 | wsgidav --host=192.168.64.1 --port=8081 --root="/Volumes/Toshiba XG3/works/agi_computer_control" --auth=anonymous |
webdav-cli (nodejs)
1 | webdav-cli --host=192.168.64.1 --port=8081 --username=root --password=root --path="/Volumes/Toshiba XG3/works/agi_computer_control" |
for Ubuntu ARM VM, mss
failed on wayland but pyautogui
works in both cases. write one python script to pipe raw images to ffmpeg for better compression ratio by shell. the final video is not “time-accurate”. it is frame by frame, matched with timestamps.
forcing ubuntu to use xorg by: sudo vim /etc/gdm3/custom.conf
you need to first resize the virtio disk in utm setting, then resize partition by using gparted, then update the device mapper
i said taking notes manually is always a bad idea if you can just do it with LLM. since web browsing is just like playing video games
to get latest personal news (like https://github.com/James4Ever0.private.atom?token=<token>
), trends from github, use some rss feeds.
to monitor discord channels, monitor QQ chats, wechat messages and filter links out of it.
to dump pages and clipboards from devices, which is what we are trying to do.
better keep it tight. when doing so, you want to make it is absolutely private. nothing is public.
you can use a filter or something. you can also just keep it discrete.
use “bookmark all tabs” or “copy all tab urls” browser extension, or your OG lazero extension
check for “Comprehensive Research” series
you can only bookmark all tabs and tab groups within a single window.
for example, if you have two windows open, you need to do it twice to save all links to your computer.
check “Healthcare and Watch Cases” under firefox bookmarks.
to install a plugin on kiwi browser, you (maybe) need to unpack and install it manually.
you need chrome extension v3 or below. pack the extension manually if needed. must install from .crx file.
query: site:github.com chrome copy all tabs
if you want markdown-style, use this or this or this (you need to modify this as it is the only hope now) or this. query: site:github.com chrome copy all tabs markdown
.
you click every tab manually, in order to copy them. goddamn the android!
for firefox, you search in the official addon store.
clipboard data and tab records dumped at ~/works/modifier_clipboard_and_browser_tabs_dumps
remember to clean up the damn phone. it is running way too slow.
latest python has better performance.
pypy is fast.
codon is using python syntax to compile python into static executable.
do not mix the water with the watch. you have been warned.
Prefer WearOS watches like LG W100. Found 3D printable case on Shapeways but not downloadable. Shapeways provides service for printing. Shapeways builds ShapeJS which can construct 3D models with code.
Use Pixle2Mesh++ to recover 3D meshes from multiple images (dumped at ~/Desktop/works/shapeways_reconstruct_from_image_lg_w100
)of different viewpoints. Determine the size of the mesh after measurement or learning specs.
if you want iwatch instead, remember to buy some apple gift cards for buying watchos apps. remember to ask for battery life since older watches tend to die halfway in a day. buy iphone 6s and newer models with ios 14 and newer os to manage and install apps on iwatch. no need for 3d modeling since plenty tough-tested cases around.
if you want to do it on your own, you have to know how to send notifications on different operating systems.
on macOS:
1 | osascript -e 'display notification "This message should be showing on the notification" with title "Coding Tips"' |
terminal-notifier (brew installable)
on linux:
1 | notify-send "Dinner ready!" |
using remind
:
1 | remind "I'm still here" now |
on windows:
1 | msg /SERVER:DestinationPC * /TIME:60 “This is the message to be sent to a PC named DestinationPC and closes in 60 seconds." |
notify-send-for-Windows (needs AHK)
RSIBreak is for linux, and it does not work well.
stretchly has an online version. on macOS make sure your browser is allowed to post notifications.
“Drink.” on mac app store is a water drinking reminder for macOS.
BreakTimer has windows, macOS and linux version. on linux you better use snap or appimage version.
somebody hates clash and proxy.py, now we proxy websites directly in another website:
Ultraviolet by Titanium Network
visit Holy Unblocker for demonstration
github topic: webproxy
Gist on using python3 and PhantomJS
npm package: markdown-pdf
how do i design an ai model using gpt, rnn, lstm, vit to accept computer display screenshots and human instructions as input, keyboard and mouse events as output, trained on random keystrokes and mouse events, also human instructions paired with human demonstrations? show me python code
1 | That would be an extremely complex AI system to build. Some high level steps would be: |
Designing an AI model that combines GPT, RNN, LSTM, and ViT for the task you described is complex and requires deep understanding of these architectures. However, I’ll provide you with a high-level overview and a basic structure of the code to help you get started.
For computer display screenshots, use a Vision Transformer (ViT) model to extract features.
For human instructions, use a GPT model to encode the natural language instructions.
Combine the features from both the ViT and GPT models.
Use an LSTM or RNN to process the combined features and maintain the temporal context.
Finally, decode the output of the LSTM/RNN into keyboard and mouse events.
Here’s a basic structure of the code:
1 | import torch |
You’ll need to collect a dataset with computer display screenshots, human instructions, and paired keyboard and mouse events. Then, preprocess the data, split it into training and validation sets, and train the model using an appropriate loss function and optimizer.
Keep in mind that this is just a starting point, and you may need to experiment with different architectures, loss functions, and hyperparameters to achieve the desired performance.
Run some community contributed ChatGPT-like models on commondity PCs.
Below are some models we are about to use:
ChatRWKV, or RWKV-based models, some are fine-tuned on alpaca dataset.
ChatGLM-6B, open-sourced by Tsinghua KEG, with INT4 quantized version.
OpenAssistant by LAION-AI, trained on their own OIG dataset. There are also few models contributed by their discord community.
Alpaca, trained on alpaca dataset (synthetic, generated by ChatGPT) by Standford University. Model weights are community provided.
ChatYuan by ClueAI.
There are quite a few more models to be listed. You can check this curated open-sourced ChatGPT-like model list for updates. But for now, these models shall be sufficient.
Floating-point values in model weights are stored as 32bit. Quantization can reduce storage space and computation by switching to 16bit, 8bit or 4bit values. However, most quantized models cannot be trained or fine-tuned, some 16bit models can only be trained on certain architecture of GPUs, such as Ada and Turing.
To make LLM (Large Language Model) inference feasible on common hardware, GPU is usually mandatory. However, most commondity GPUs have smaller VRAM compared to RAM, limiting the size of LLM to be run, thus the capability of the LLM. Most computer have 12GB of VRAM, 32GB of RAM. GGML is a project aiming to make LLM inference on CPU as fast as GPU, utilizing larger RAM compared to VRAM to run larger LLMs. Currently some popular LLMs have been ported to GGML, like LLaMA and Alpaca.
In deeplearning, people tend to tune all parameters during training, requiring much VRAM and time. To train GPT3.5 aka ChatGPT, OpenAI spends millions to rent interconnected A100 GPUs. This is impossible for an individual to afford such.
With technologies like LoRA, by freezing most part of the model and introducing a small fraction of tunable parameters, training requirements can be greatly reduced. One can easily tune 7B LLaMA or 14B RWKV using LoRA on a PC (usually rented on the cloud, such as AutoDL) with a single 80GB A100 card and 200GB of RAM.
LLMs are general problem solvers given enough external storage and access to search engines. Text is the only way to language models (not for multimodal LLMs, like GPT4, OFA or UniLM).
To enhance the capability of LLMs, you have to maintain its memory, define action keywords and trigger external actions during the conversation, connect it to semantic search engines powered by other AI models like sentence transformers.
One such library is LangChain.
The process of generation for LLMs is sequential. Server needs to maintain a streaming API to match this behavior. Tokens are fetched one by one from the server with a constant speed, revealed in the frontend.
One can check third-party frontend-only or self-hosted projects for conversational LLMs for reference.
enable render option trim_blocks
and lstrip_blocks
with jinja2
to avoid whitespace and indentation nightmare.
always remember to import uvicorn
if you want to run without the uvicorn
executable
generate nodejs client from openapi.json
fastapi-code-generator to generate python code
create doc inside code: adding metadata
to share lock across process, use redis lock or filelock.
to share lock across forked process in the same worker, use multiprocessing.Lock()
fastapi can generate openapi json and doc page
websockets are async. will it block the server?
happen to found akismet (proprietary wordpress spam protection). oss alternatives are:
forget spam comment (js plugin for wordpress)