2024-02-28
Multimodal Autoregressive Unsupervised Learning

It is so good that I can login to Kaggle on smartphone.


Instruction following image editing via prompt engineering like Mini DALLE3 or multimodal model like CogCoM


Recently search engine & browser augmented generation has become popular. A great tool for creating video scripts.


Google has released a new architecure called Genie, which can generate latent action space only using video unsupervised training.


Do not ever use NTFS in Linux. If you unfortunatedly face disk inaccessible problem when accessing NTFS disks, in the first case run chkdsk /f onthen reboot into Windows twice. The usage of the /f parameter iimportant!

After fixing, copy all files to another place, format the disk into ext4 or xfs, then recover the files.


GPT2 models from huggingface accept inputs_embeds as parameter of instance call and method “generate”.

Typically to adapt ViT into LLM you need LayerNorm and a linear projection layer.

You cannot put custom embedding into text generaton pipeline.


To encode token manually in GPT2:

1
2
inputs_embeds = model.transformer.wte(input_ids)

In LLaMA:

1
2
3
4
5
6
7
def embed_tokens(self,token_ids):
if hasattr(self.llama_model.base_model, "model"): # with lora
embeds = self.llama_model.base_model.model.model.embed_tokens(token_ids)
else:
self.llama_model.base_model.embed_tokens(token_ids)
return embeds


Install pytorch gpu with conda:

1
2
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia


Set HF_HUB_OFFLINE=1 while loading local models, to prevent accessing network.


Set Environment="OLLAMA_MODELS=<model_storage_path>" in ollama systemd service file. Remember to change username and usergroup too, and set appropriate permission to model storage path.

In Windows, set it in system environment variables, or create a directory symlink by:

1
2
mklink /D C:\Users\<User>\.ollama\models E:\AI\Ollama\Models

Read More