2024-02-07
Recent Discoveries, Ideas And Breakthroughs Around 2024

programming ai agent:

https://useadrenaline.com/


blockchain applicatiion:

https://playdoge.io/en

coin mixer:

https://usdtmixer.com/index.html

https://github.com/sh0tn1k/simple-usdt-mixer

payment provider:

https://www.usdtcnpay.com/

https://github.com/CoinPayVIP/dujiaoka_usdt_payment

trc-20 related:

https://github.com/topics/trc20

https://tron.network/index

blockchain inspectors:

https://bscscan.com/

https://tronscan.org

trading platform:

https://www.okx.com/zh-hans

query: buy things with usdt

https://www.coinsbee.com

https://cryptonews.com/news/who-accepts-tether.htm


new proxy client:

https://github.com/clash-next/clash-meta

https://github.com/MetaCubeX/mihomo (supports proxy chaining)

https://wiki.metacubex.one/en

https://github.com/clash-verge-rev/clash-verge-rev


web3 security and poc:

https://github.com/SubWeb3Sec/DeFiHackLabs

https://github.com/0xCO2/Web3_Hack_PoC

https://github.com/Anugrahsr/Awesome-web3-Security

https://github.com/ManasHarsh/Awesome-Web3-security

https://github.com/descartes100/Web3Hack


Subscribe publish sources like blog, news, rss feed, recommendation engine to create video generation agents.

Github blog


Smart contract in EVM might be overkill, but it comes with the advantage of multiuser security.

A central solution might be beanie, an ODM (Object Document Mapper) in MongoDB, or pyneo4j-ogm, an OGM (Object Graph Mapper) in Neo4j


Playstation emulators can emulate up to PS5.


There are two Xbox emulators around. One is xemu, the original or the first Xbox emulator, which can run almost on any platform. The game for it can be downloaded at the vault. The other is xenia or the Xbox 360 emulator, which can only be running on top of Windows 10+.

Xbox 360 is hackable and the game for it can be downloaded as GOD format.


Greenlight is an electron app for local Xbox streaming and XGPU cloud gaming, supports multiple platforms.


To train a gaming AI and get potential financial rewards:

Buy a gaming console like Xbox Series S and use Arduino Leonardo or ESP32 to send gamepad signal. However since you do not have the authority to mimic official gamepad this might fail.

You may also use HID to gamepad hardware.

Make sure you do not delete the game accidentally. You may design such mechanism to prevent exiting the game and misbehave. You can create child accounts to prevent unwanted purchase.

Use Xbox 360 Emulator instead of running a real one.


Blockchain or Web3 is not silver bullet. Anything on Web3 can just be done in Web2 in centralized way.

Still smart contract compiler does not have Python API generator.

Local node: Geth, Hardhat, Ganache

Development environment: Ape, Brownie

Smart contract language: Solidity, Vyper


You have posted a lot of ideas into your QQ, WeChat, bookmarks and Bilibili favourites. Consider using tools to retrieve them?


卡点视频可以用Magenta Studio之类的MIDI生成模型或者时序输出模型来做,卡时间节点。


I hate Linkedin and Twitter. They both ban my accounts for no goddamn reason. Newly registered accounts shall do no operation for a long time before using features like browsing, following.

Consider finding some alternative platforms for job finding (remote work: V2EX, crypto careers, web3.career) and social network.


Use smart contract like sui move for auto re-execution of automated (agentic) video generation scripts and hacking scripts, reward these scripts for successful task fulfillment (view count bypassing 1000; ssh login success; crypto currency obtained), and punish or ban these scripts at failed condition.

Human execution will be recorded for automated execution, and it is free. Automated init execution will only be granted once.

Notice you do not want overflow errors in the smart contract.


Use retrieval system with chat history while building the chatbot. Mind different index modes in llamaindex, in order to better capture consecutive context range with dynamic sliding window size.


Tachiyomi is for free manga reading, and manga narriative video. Its plugin store has been removed and requires you to search for workarounds.


Deep recursive thoughts and subconscious operators like REM and DELEGATE might help for making the agent to think for itself, becoming conscious.


Tornado cash is for crypto mixing and now it is public archive. Alternatives are:

  • Aztec zk.money

  • Monero 0xTIP

  • Messier 87

  • Cyclone Protocol

  • RenBridge


Use vimgolf for collecting vim usage data to train cybergod, however the cli sucks and requires learning the javascript request headers from browser.


A comprehensive guide from llamaindex hackrathon


vanna.ai can work as sql agent. what can generate cypher query for neo4j?


Coroutines and tasks (asyncio.create_task) can save time. You can also make blocking actions non-blocking by asyncio.to_thread.


InternLM-XComposer seems to be better than CogVLM, and can generate image with text interleaving articles.

The platform openxlab is a huggingface clone, and maybe it can prvide some model execution service.


A better alternative for SadTalker: Dream-Talk (still single image generation).

Audio2Gesture (pose generation by audio) can be exported to Metahuman.

Wav2Lip, Linly-Talker, VirtualWife, OpenVTuber


Prompt engineering: Pezzo for prompt version management, LangGPT for prompt template generation

Fine-tuning: Prompt2Model, GPT-LLM-Trainer

Video generation: MagicVideo-V2 by ByteDance, Lumiere by Google, VideoCrafter2 by Tencent, I2VGen by Alibaba

Anymate Anyone with OpenPose, MotionShop

DragonDiffusion for image detail editing

DeepSeekMoE 16B: the first MoE model in China

Auto Labeling: AutoLabel, AnyLabeling

RAG: QAnything by NetEase, StreamRAG for video search, DAnswer

Document understanding & OCR: Nougat, CnOCR

TTS: Bert-ViTS, GPT-SoViTS

Agent: ProAgent, crewAI, Dify, AutoGen, Agenta, AgentVerse, ChatDev, Agently

YOLOv8 AimBot

Make-A-Character

AnyDoor, AnyText

PhotoMaker

CoTracker

MagicAnimate

ImageDream

VideoPoet

Boximator

InpaintAnything

Read More

2023-12-08
How To Create Cybergod

https://www.novaspivack.com/business/the-four-levels-of-ai-steps-to-self-evolution

https://www.novaspivack.com/business/exploring-higher-order-ai-training-on-first-order-ai-networks-for-optimal-path-outcomes


i want to create some sort of agent, that learns autoregressively on historical tokens (not necessarily present in history, but close). however, when the agent is given some previous tokens, it is expected to send some actions to the environment in order to really observe the given tokens to get reward. the agent is not allowed to directly generate the token to the environment in order to prevent cheating. the agent is rewarded to successfully rebuild the past or predict and build the future. to predict the future is like the target token is generated by the agent itself instead of some automatic history replay bot, and the rest of the reward system follows the same way as the history replay reward system. this kind of system might have some sort of consciousness and therefore agi

the main objective of AGI is to create another version of itself.

the verification system can be built upon internal hidden tokens (you feel like you made it, feeling based) or similarity based (timeseries similarity or semantic similarity). there can be some external verification system such as lifespan, disk usage, view count, popularity, total capital etc.


the main problem of making this work is how to train it in parallel. the real world can be replaced by some world model (say some neural network) so that it can go back in time, or some really fast real world evaluators or some special world evaluators which supports time traversal, like virtual machine snapshots, web browsers (tab traversal). alphago has such advantage because go game is a very simple world model, while the real world is not.

also this could build some hierarchy like: real world -> world model -> agent -> superagent -> …

Read More

2023-09-04
Understanding Captchas: Assessing Capabilities, Not Behaviors

captcha for turing test

captchas are used for public automated turing tests

however, it is not behavior based but rather like capability/skill based.

Read More

2023-04-03
Ask Chatgpt How To Create An Ai Model For Controlling Computer Under Human Instruction

question

how do i design an ai model using gpt, rnn, lstm, vit to accept computer display screenshots and human instructions as input, keyboard and mouse events as output, trained on random keystrokes and mouse events, also human instructions paired with human demonstrations? show me python code

answers

Claude+

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
That would be an extremely complex AI system to build. Some high level steps would be:
Collect a large dataset of screenshots, human instructions, and corresponding keyboard/mouse interactions. This dataset would be crucial for training.
Use a CNN to extract visual features from the screenshots. Something like VGG16 or ResNet would work well here.
Use an RNN/LSTM to encode the sequence of instructions. BERT or RoBERTa would also be good options here.
Combine the visual and textual features. Options include concatenation, addition, or a more complex fusion method.
Use a Transformer like GPT-3 or Visual-Transformers to generate the keyboard/mouse interactions from the combined visual+textual features.
Train the entire model end-to-end to minimize the difference between the generated interactions and the ground truth interactions from your dataset.
This would likely take a team of ML engineers and researchers to develop, and would require lots of time, data, and compute.
Some sample Python code would be:
import torchvision.models as models
import torch
import torch.nn as nn
from transformers import RobertaModel, GPT2LMHeadModel
# Feature Extractors
cnn = models.resnet50(pretrained=True)
roberta = RobertaModel.from_pretrained("roberta-base")
# Fusion
fusion = nn.Sequential(nn.Linear(2048 + 768, 1024), nn.ReLU(), nn.Linear(1024, 512))
# Generation Model
generator = GPT2LMHeadModel.from_pretrained("gpt2")
# Forward pass
img_features = cnn(img)
instruction_features = roberta(instructions)[0]
combined_features = fusion(torch.cat((img_features, instruction_features), dim=1))
preds = generator(inputs_embeds=combined_features)[0]
This is just a brief sample, but hopefully gives you an idea of the components and flow. Let me know if you have any other questions!

GPT4

Designing an AI model that combines GPT, RNN, LSTM, and ViT for the task you described is complex and requires deep understanding of these architectures. However, I’ll provide you with a high-level overview and a basic structure of the code to help you get started.

  1. First, process the input data:
  • For computer display screenshots, use a Vision Transformer (ViT) model to extract features.

  • For human instructions, use a GPT model to encode the natural language instructions.

  1. Combine the features from both the ViT and GPT models.

  2. Use an LSTM or RNN to process the combined features and maintain the temporal context.

  3. Finally, decode the output of the LSTM/RNN into keyboard and mouse events.

Here’s a basic structure of the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import torch
import torchvision
from transformers import GPT2Tokenizer, GPT2Model
# Load pretrained models
gpt_model = GPT2Model.from_pretrained('gpt2')
vit_model = torchvision.models.vit_b_16(pretrained=True)
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
class CustomModel(torch.nn.Module):
def __init__(self, gpt_model, vit_model, hidden_size, output_size):
super(CustomModel, self).__init__()
self.gpt_model = gpt_model
self.vit_model = vit_model
self.rnn = torch.nn.LSTM(input_size=hidden_size, hidden_size=hidden_size, batch_first=True)
self.fc = torch.nn.Linear(hidden_size, output_size)
def forward(self, instructions, screenshots):
# Encode instructions using GPT2
inputs = tokenizer(instructions, return_tensors="pt", padding=True, truncation=True)
gpt_output = self.gpt_model(**inputs)
gpt_features = gpt_output.last_hidden_state[:, 0, :]
# Encode screenshots using ViT
vit_output = self.vit_model(screenshots)
vit_features = vit_output["last_hidden_state"][:, 0, :]
# Combine features
combined_features = torch.cat((gpt_features, vit_features), dim=1)
# Process combined features through an RNN
rnn_output, _ = self.rnn(combined_features.unsqueeze(1))
# Decode the output into keyboard and mouse events
output = self.fc(rnn_output.squeeze(1))
return output
# Instantiate the custom model
model = CustomModel(gpt_model, vit_model, hidden_size=1024, output_size=256)
# Train and evaluate the model using your dataset

You’ll need to collect a dataset with computer display screenshots, human instructions, and paired keyboard and mouse events. Then, preprocess the data, split it into training and validation sets, and train the model using an appropriate loss function and optimizer.

Keep in mind that this is just a starting point, and you may need to experiment with different architectures, loss functions, and hyperparameters to achieve the desired performance.

Read More

2023-04-02
Chatgpt Local Version

Run some community contributed ChatGPT-like models on commondity PCs.

Model Selection

Below are some models we are about to use:

There are quite a few more models to be listed. You can check this curated open-sourced ChatGPT-like model list for updates. But for now, these models shall be sufficient.

Quantization and Optimization

Floating-point values in model weights are stored as 32bit. Quantization can reduce storage space and computation by switching to 16bit, 8bit or 4bit values. However, most quantized models cannot be trained or fine-tuned, some 16bit models can only be trained on certain architecture of GPUs, such as Ada and Turing.

To make LLM (Large Language Model) inference feasible on common hardware, GPU is usually mandatory. However, most commondity GPUs have smaller VRAM compared to RAM, limiting the size of LLM to be run, thus the capability of the LLM. Most computer have 12GB of VRAM, 32GB of RAM. GGML is a project aiming to make LLM inference on CPU as fast as GPU, utilizing larger RAM compared to VRAM to run larger LLMs. Currently some popular LLMs have been ported to GGML, like LLaMA and Alpaca.

Training and Fine-tuning

In deeplearning, people tend to tune all parameters during training, requiring much VRAM and time. To train GPT3.5 aka ChatGPT, OpenAI spends millions to rent interconnected A100 GPUs. This is impossible for an individual to afford such.

With technologies like LoRA, by freezing most part of the model and introducing a small fraction of tunable parameters, training requirements can be greatly reduced. One can easily tune 7B LLaMA or 14B RWKV using LoRA on a PC (usually rented on the cloud, such as AutoDL) with a single 80GB A100 card and 200GB of RAM.

Prompting and Chaining

LLMs are general problem solvers given enough external storage and access to search engines. Text is the only way to language models (not for multimodal LLMs, like GPT4, OFA or UniLM).

To enhance the capability of LLMs, you have to maintain its memory, define action keywords and trigger external actions during the conversation, connect it to semantic search engines powered by other AI models like sentence transformers.

One such library is LangChain.

Serving as API

The process of generation for LLMs is sequential. Server needs to maintain a streaming API to match this behavior. Tokens are fetched one by one from the server with a constant speed, revealed in the frontend.

One can check third-party frontend-only or self-hosted projects for conversational LLMs for reference.

Read More

2023-01-02
Autonomous Lazero Bot, Controlling Computer Using Natural Language Instructions

robotics

RT-1 robotics transformer and SayCan

VIMA General Robot Manipulation with Multimodal Prompts

multimodal model

magma a GPT-style multimodal model that can understand any combination of images and language

Versatile-Diffusion Text, Images and Variations All in One Diffusion Model

AI based reinforcement GUI testing

glider tasklet crawler

GUI based bug detection using RL

Read More

2022-12-13
Turing-Project And His Works On Ai And Nlp

he recently interacts with racketeers on wechat, find how to add new friends (and groups if any) on wechat.

the bilibili user and his repo

video transfer based on DCT-Net 视频洗稿 伪原创

AntiFraudChatBot is a wechaty bot using a super large model based on megatron called Yuan 1.0 which is only freely avaliable within three month (30k api calls) when applied to chat with racketeers, another application: AI剧本杀

megatron deepspeed enables training large model on cheap hardware

essaykillerbrain is another project he has involved in, which contains EssayKiller_V2 EssayKiller_V1 EssayTopicPredict WrittenBrainBase

alphafold in mindspore

Read More

2022-12-10
Awesome Transformer & Transfer Learning In Nlp

machine learning guide lots of links, broad topics

This repository contains a hand-curated of great machine (deep) learning resources for Natural Language Processing (NLP) with a focus on Bidirectional Encoder Representations from Transformers (BERT), attention mechanism, Transformer architectures/networks, and transfer learning in NLP.

Transformer

Transformer

Transformer (BERT) (Source)

Table of Contents

Expand Table of Contents

Papers

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

  2. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le and Ruslan Salakhutdinov.

  • Uses smart caching to improve the learning of long-term dependency in Transformer. Key results: state-of-art on 5 language modeling benchmarks, including ppl of 21.8 on One Billion Word (LM1B) and 0.99 on enwiki8. The authors claim that the method is more flexible, faster during evaluation (1874 times speedup), generalizes well on small datasets, and is effective at modeling short and long sequences.
  1. Conditional BERT Contextual Augmentation by Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han and Songlin Hu.

  2. SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering by Chenguang Zhu, Michael Zeng and Xuedong Huang.

  3. Language Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.

  4. The Evolved Transformer by David R. So, Chen Liang and Quoc V. Le.

  • They used architecture search to improve Transformer architecture. Key is to use evolution and seed initial population with Transformer itself. The architecture is better and more efficient, especially for small size models.
  1. XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
  • A new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE).

  • “Transformer-XL is a shifted model (each hyper-column ends with next token) while XLNet is a direct model (each hyper-column ends with contextual representation of same token).” — Thomas Wolf.

  • Comments from HN:

A clever dual masking-and-caching algorithm.
  • This is NOT “just throwing more compute” at the problem.

  • The authors have devised a clever dual-masking-plus-caching mechanism to induce an attention-based model to learn to predict tokens from all possible permutations of the factorization order of all other tokens in the same input sequence.

  • In expectation, the model learns to gather information from all positions on both sides of each token in order to predict the token.

  • For example, if the input sequence has four tokens, [“The”, “cat”, “is”, “furry”], in one training step the model will try to predict “is” after seeing “The”, then “cat”, then “furry”.

  • In another training step, the model might see “furry” first, then “The”, then “cat”.

  • Note that the original sequence order is always retained, e.g., the model always knows that “furry” is the fourth token.

  • The masking-and-caching algorithm that accomplishes this does not seem trivial to me.

  • The improvements to SOTA performance in a range of tasks are significant – see tables 2, 3, 4, 5, and 6 in the paper.

  1. CTRL: Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar, Richard Socher et al. [Code].

  2. PLMpapers - BERT (Transformer, transfer learning) has catalyzed research in pretrained language models (PLMs) and has sparked many extensions. This repo contains a list of papers on PLMs.

  3. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Google Brain.

  • The group perform a systematic study of transfer learning for NLP using a unified Text-to-Text Transfer Transformer (T5) model and push the limits to achieve SoTA on SuperGLUE (approaching human baseline), SQuAD, and CNN/DM benchmark. [Code].
  1. Reformer: The Efficient Transformer by Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya.
  • “They present techniques to reduce the time and memory complexity of Transformer, allowing batches of very long sequences (64K) to fit on one GPU. Should pave way for Transformer to be really impactful beyond NLP domain.” — @hardmaru
  1. Supervised Multimodal Bitransformers for Classifying Images and Text (MMBT) by Facebook AI.

  2. A Primer in BERTology: What we know about how BERT works by Anna Rogers et al.

  • “Have you been drowning in BERT papers?”. The group survey over 40 papers on BERT’s linguistic knowledge, architecture tweaks, compression, multilinguality, and so on.
  1. tomohideshibata/BERT-related papers

  2. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by Google Brain. [Code] | [Blog post (unofficial)]

  • Key idea: the architecture use a subset of parameters on every training step and on each example. Upside: model train much faster. Downside: super large model that won’t fit in a lot of environments.
  1. An Attention Free Transformer by Apple.

  2. A Survey of Transformers by Tianyang Lin et al.

  3. Evaluating Large Language Models Trained on Code by OpenAI.

  • Codex, a GPT language model that powers GitHub Copilot.

  • They investigate their model limitations (and strengths).

  • They discuss the potential broader impacts of deploying powerful code generation techs, covering safety, security, and economics.

  1. Training language models to follow instructions with human feedback by OpenAI. They call the resulting models InstructGPT. ChatGPT is a sibling model to InstructGPT.

  2. LaMDA: Language Models for Dialog Applications by Google.

  3. Training Compute-Optimal Large Language Models by Hoffmann et al. at DeepMind. TLDR: introduces a new 70B LM called “Chinchilla” that outperforms much bigger LMs (GPT-3, Gopher). DeepMind has found the secret to cheaply scale large language models — to be compute-optimal, model size and training data must be scaled equally. It shows that most LLMs are severely starved of data and under-trained. Given the new scaling law, even if you pump a quadrillion parameters into a model (GPT-4 urban myth), the gains will not compensate for 4x more training tokens.

Articles

BERT and Transformer

  1. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing from Google AI.

  2. The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).

  3. Dissecting BERT by Miguel Romero and Francisco Ingham - Understand BERT in depth with an intuitive, straightforward explanation of the relevant concepts.

  4. A Light Introduction to Transformer-XL.

  5. Generalized Language Models by Lilian Weng, Research Scientist at OpenAI.

  6. What is XLNet and why it outperforms BERT

  • Permutation Language Modeling objective is the core of XLNet.
  1. DistilBERT (from HuggingFace), released together with the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT.

  2. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations paper from Google Research and Toyota Technological Institute. — Improvements for more efficient parameter usage: factorized embedding parameterization, cross-layer parameter sharing, and Sentence Order Prediction (SOP) loss to model inter-sentence coherence. [Blog post | Code]

  3. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning - A BERT variant like ALBERT and cost less to train. They trained a model that outperforms GPT by using only one GPU; match the performance of RoBERTa by using 1/4 computation. It uses a new pre-training approach, called replaced token detection (RTD), that trains a bidirectional model while learning from all input positions. [Blog post | Code]

  4. Visual Paper Summary: ALBERT (A Lite BERT)

Attention Concept

  1. The Annotated Transformer by Harvard NLP Group - Further reading to understand the “Attention is all you need” paper.

  2. Attention? Attention! - Attention guide by Lilian Weng from OpenAI.

  3. Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) by Jay Alammar, an Instructor from Udacity ML Engineer Nanodegree.

  4. Making Transformer networks simpler and more efficient - FAIR released an all-attention layer to simplify the Transformer model and an adaptive attention span method to make it more efficient (reduce computation time and memory footprint).

  5. What Does BERT Look At? An Analysis of BERT’s Attention paper by Stanford NLP Group.

Transformer Architecture

  1. The Transformer blog post.

  2. The Illustrated Transformer by Jay Alammar, an Instructor from Udacity ML Engineer Nanodegree.

  3. Watch Łukasz Kaiser’s talk walking through the model and its details.

  4. Transformer-XL: Unleashing the Potential of Attention Models by Google Brain.

  5. Generative Modeling with Sparse Transformers by OpenAI - an algorithmic improvement of the attention mechanism to extract patterns from sequences 30x longer than possible previously.

  6. Stabilizing Transformers for Reinforcement Learning paper by DeepMind and CMU - they propose architectural modifications to the original Transformer and XL variant by moving layer-norm and adding gating creates Gated Transformer-XL (GTrXL). It substantially improve the stability and learning speed (integrating experience through time) in RL.

  7. The Transformer Family by Lilian Weng - since the paper “Attention Is All You Need”, many new things have happened to improve the Transformer model. This post is about that.

  8. DETR (DEtection TRansformer): End-to-End Object Detection with Transformers by FAIR - :fire: Computer vision has not yet been swept up by the Transformer revolution. DETR completely changes the architecture compared with previous object detection systems. (PyTorch Code and pretrained models). “A solid swing at (non-autoregressive) end-to-end detection. Anchor boxes + Non-Max Suppression (NMS) is a mess. I was hoping detection would go end-to-end back in ~2013)” — Andrej Karpathy

  9. Transformers for software engineers - This post will be helpful to software engineers who are interested in learning ML models, especially anyone interested in Transformer interpretability. The post walk through a (mostly) complete implementation of a GPT-style Transformer, but the goal will not be running code; instead, they use the language of software engineering and programming to explain how these models work and articulate some of the perspectives they bring to them when doing interpretability work.

  10. Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance - PaLM is a dense decoder-only Transformer model trained with the Pathways system, which enabled Google to efficiently train a single model across multiple TPU v4 Pods. The example explaining a joke is remarkable. This shows that it can generate explicit explanations for scenarios that require a complex combination of multi-step logical inference, world knowledge, and deep language understanding.

Generative Pre-Training Transformer (GPT)

  1. Better Language Models and Their Implications.

  2. Improving Language Understanding with Unsupervised Learning - this is an overview of the original OpenAI GPT model.

  3. 🦄 How to build a State-of-the-Art Conversational AI with Transfer Learning by Hugging Face.

  4. The Illustrated GPT-2 (Visualizing Transformer Language Models) by Jay Alammar.

  5. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism by NVIDIA ADLR.

  6. OpenGPT-2: We Replicated GPT-2 Because You Can Too - the authors trained a 1.5 billion parameter GPT-2 model on a similar sized text dataset and they reported results that can be compared with the original model.

  7. MSBuild demo of an OpenAI generative text model generating Python code [video] - The model that was trained on GitHub OSS repos. The model uses English-language code comments or simply function signatures to generate entire Python functions. Cool!

  8. GPT-3: Language Models are Few-Shot Learners (paper) by Tom B. Brown (OpenAI) et al. - “We train GPT-3, an autoregressive language model with 175 billion parameters :scream:, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.”

  9. elyase/awesome-gpt3 - A collection of demos and articles about the OpenAI GPT-3 API.

  10. How GPT3 Works - Visualizations and Animations by Jay Alammar.

  11. GPT-Neo - Replicate a GPT-3 sized model and open source it for free. GPT-Neo is “an implementation of model parallel GPT2 & GPT3-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the mesh-tensorflow library.” [Code].

  12. GitHub Copilot, powered by OpenAI Codex - Codex is a descendant of GPT-3. Codex translates natural language into code.

  13. GPT-4 Rumors From Silicon Valley - GPT-4 is almost ready. GPT-4 would be multimodal, accepting text, audio, image, and possibly video inputs. Release window: Dec - Feb. #hype

  14. New GPT-3 model: text-Davinci-003 - Improvements:

  • Handle more complex intents — you can get even more creative with how you make use of its capabilities now.

  • Higher quality writing — clearer, more engaging, and more compelling content.

  • Better at longer form content generation.

  1. ChatGPT blog post and link to the conversational interface.
  • ChatGPT is OpenAI’s newest language model fine-tuned from a model in the GPT-3.5 series (which finished training in early 2022), optimized for dialogue. It is trained using Reinforcement Learning from Human Feedback; human AI trainers provide supervised fine-tuning by playing both sides of the conversation.

  • Is it evidently better than GPT-3 at following user instructions and context? People have noticed, ChatGPT’s output quality seems to represent a notable improvement over previous GPT-3 models.

Large Language Model (LLM)

  1. GPT-J-6B - Can’t access GPT-3? Here’s GPT-J — its open-source cousin.

  2. Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B - Prior to GitHub Copilot tech preview launch, Max Woolf, a data scientist tested GPT-J-6B’s code “writing” abilities.

  3. GPT-Code-Clippy (GPT-CC) - An open source version of GitHub Copilot. The GPT-CC models are fine-tuned versions of GPT-2 and GPT-Neo.

  4. GPT-NeoX-20B - A 20 billion parameter model trained using EleutherAI’s GPT-NeoX framework. They expect it to perform well on many tasks. You can try out the model on GooseAI playground.

  5. Metaseq - A codebase for working with Open Pre-trained Transformers (OPT).

  6. YaLM 100B by Yandex is a GPT-like pretrained language model with 100B parameters for generating and processing text. It can be used freely by developers and researchers from all over the world.

  7. BigScience’s BLOOM-176B from the Hugging Face repository [paper, blog post] - BLOOM is a 175-billion parameter model for language processing, able to generate text much like GPT-3 and OPT-175B. It was developed to be multilingual, being deliberately trained on datasets containing 46 natural languages and 13 programming languages.

  8. bitsandbytes-Int8 inference for Hugging Face models - You can run BLOOM-176B/OPT-175B easily on a single machine, without performance degradation. If true, this could be a game changer in enabling people outside of big tech companies being able to use these LLMs.

Additional Reading

  1. How to Build OpenAI’s GPT-2: “The AI That’s Too Dangerous to Release”.

  2. OpenAI’s GPT2 - Food to Media hype or Wake Up Call?

  3. How the Transformers broke NLP leaderboards by Anna Rogers. :fire::fire::fire:

  • A well put summary post on problems with large models that dominate NLP these days.

  • Larger models + more data = progress in Machine Learning research :question:

  1. Transformers From Scratch tutorial by Peter Bloem.

  2. Real-time Natural Language Understanding with BERT using NVIDIA TensorRT on Google Cloud T4 GPUs achieves 2.2 ms latency for inference. Optimizations are open source on GitHub.

  3. NLP’s Clever Hans Moment has Arrived by The Gradient.

  4. Language, trees, and geometry in neural networks - a series of expository notes accompanying the paper, “Visualizing and Measuring the Geometry of BERT” by Google’s People + AI Research (PAIR) team.

  5. Benchmarking Transformers: PyTorch and TensorFlow by Hugging Face - a comparison of inference time (on CPU and GPU) and memory usage for a wide range of transformer architectures.

  6. Evolution of representations in the Transformer - An accessible article that presents the insights of their EMNLP 2019 paper. They look at how the representations of individual tokens in Transformers trained with different objectives change.

  7. The dark secrets of BERT - This post probes fine-tuned BERT models for linguistic knowledge. In particular, the authors analyse how many self-attention patterns with some linguistic interpretation are actually used to solve downstream tasks. TL;DR: They are unable to find evidence that linguistically interpretable self-attention maps are crucial for downstream performance.

  8. A Visual Guide to Using BERT for the First Time - Tutorial on using BERT in practice, such as for sentiment analysis on movie reviews by Jay Alammar.

  9. Turing-NLG: A 17-billion-parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. This work would not be possible without breakthroughs produced by the DeepSpeed library (compatible with PyTorch) and ZeRO optimizer, which can be explored more in this accompanying blog post.

  10. MUM (Multitask Unified Model): A new AI milestone for understanding information by Google.

  • Based on transformer architecture but more powerful.

  • Multitask means: supports text and images, knowledge transfer between 75 languages, understand context and go deeper in a topic, and generate content.

  1. GPT-3 is No Longer the Only Game in Town - GPT-3 was by far the largest AI model of its kind last year (2020). Now? Not so much.

  2. OpenAI’s API Now Available with No Waitlist - GPT-3 access without the wait. However, apps must be approved before going live. This release also allow them to review applications, monitor for misuse, and better understand the effects of this tech.

  3. The Inherent Limitations of GPT-3 - One thing missing from the article if you’ve read Gwern’s GPT-3 Creative Fiction article before is the mystery known as “Repetition/Divergence Sampling”:

when you generate free-form completions, they have a tendency to eventually fall into repetitive loops of gibberish.

For those using Copilot, you should have experienced this wierdness where it generates the same line or block of code over and over again.

  1. Language Modelling at Scale: Gopher, Ethical considerations, and Retrieval by DeepMind - The paper present an analysis of Transformer-based language model performance across a wide range of model scales — from models with tens of millions of parameters up to a 280 billion parameter model called Gopher.

  2. Competitive programming with AlphaCode by DeepMind - AlphaCode uses transformer-based language models to generate code that can create novel solutions to programming problems which require an understanding of algorithms.

  3. Building games and apps entirely through natural language using OpenAI’s code-davinci model - The author built several small games and apps without touching a single line of code, simply by telling the model what they want.

  4. Open AI gets GPT-3 to work by hiring an army of humans to fix GPT’s bad answers

  5. GPT-3 can run code - You provide an input text and a command and GPT-3 will transform them into an expected output. It works well for tasks like changing coding style, translating between programming languages, refactoring, and adding doc. For example, converts JSON into YAML, translates Python code to JavaScript, improve the runtime complexity of the function.

  6. Using GPT-3 to explain how code works by Simon Willison.

  7. Character AI announces they’re building a full stack AGI company so you could create your own AI to help you with anything, using conversational AI research. The co-founders Noam Shazeer (co-invented Transformers, scaled them to supercomputers for the first time, and pioneered large-scale pretraining) and Daniel de Freitas (led the development of LaMDA), all of which are foundational to recent AI progress.

  8. How Much Better is OpenAI’s Newest GPT-3 Model? - In addition to ChatGPT, OpenAI releases text-davinci-003, a Reinforcement Learning-tuned model that performs better long-form writing. Example, it can explain code in the style of Eminem. 😀

Educational

  • minGPT by Andrej Karpathy - A PyTorch re-implementation of GPT, both training and inference. minGPT tries to be small, clean, interpretable and educational, as most of the currently available GPT model implementations can a bit sprawling. GPT is not a complicated model and this implementation is appropriately about 300 lines of code.

Tutorials

  1. How to train a new language model from scratch using Transformers and Tokenizers tutorial by Hugging Face. :fire:

Videos

BERTology

  1. XLNet Explained by NLP Breakfasts.
  • Clear explanation. Also covers the two-stream self-attention idea.
  1. The Future of NLP by 🤗
  • Dense overview of what is going on in transfer learning in NLP currently, limits, and future directions.
  1. The Transformer neural network architecture explained by AI Coffee Break with Letitia Parcalabescu.
  • High-level explanation, best suited when unfamiliar with Transformers.

Attention and Transformer Networks

  1. Sequence to Sequence Learning Animated (Inside Transformer Neural Networks and Attention Mechanisms) by learningcurve.

Official Implementations

  1. google-research/bert - TensorFlow code and pre-trained models for BERT.

Other Implementations

PyTorch and TensorFlow

  1. 🤗 Hugging Face Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, CTRL…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. [Paper]

  2. spacy-transformers - a library that wrap Hugging Face’s Transformers, in order to extract features to power NLP pipelines. It also calculates an alignment so the Transformer features can be related back to actual words instead of just wordpieces.

PyTorch

  1. codertimo/BERT-pytorch - Google AI 2018 BERT pytorch implementation.

  2. innodatalabs/tbert - PyTorch port of BERT ML model.

  3. kimiyoung/transformer-xl - Code repository associated with the Transformer-XL paper.

  4. dreamgonfly/BERT-pytorch - A PyTorch implementation of BERT in “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.

  5. dhlee347/pytorchic-bert - A Pytorch implementation of Google BERT.

  6. pingpong-ai/xlnet-pytorch - A Pytorch implementation of Google Brain XLNet.

  7. facebook/fairseq - RoBERTa: A Robustly Optimized BERT Pretraining Approach by Facebook AI Research. SoTA results on GLUE, SQuAD and RACE.

  8. NVIDIA/Megatron-LM - Ongoing research training transformer language models at scale, including: BERT.

  9. deepset-ai/FARM - Simple & flexible transfer learning for the industry.

  10. NervanaSystems/nlp-architect - NLP Architect by Intel AI. Among other libraries, it provides a quantized version of Transformer models and efficient training method.

  11. kaushaltrivedi/fast-bert - Super easy library for BERT based NLP models. Built based on 🤗 Transformers and is inspired by fast.ai.

  12. NVIDIA/NeMo - Neural Modules is a toolkit for conversational AI by NVIDIA. They are trying to improve speech recognition with BERT post-processing.

  13. facebook/MMBT from Facebook AI - Multimodal transformers model that can accept a transformer model and a computer vision model for classifying image and text.

  14. dbiir/UER-py from Tencent and RUC - Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo (with more focus on Chinese).

Keras

  1. Separius/BERT-keras - Keras implementation of BERT with pre-trained weights.

  2. CyberZHG/keras-bert - Implementation of BERT that could load official pre-trained models for feature extraction and prediction.

  3. bojone/bert4keras - Light reimplement of BERT for Keras.

TensorFlow

  1. guotong1988/BERT-tensorflow - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

  2. kimiyoung/transformer-xl - Code repository associated with the Transformer-XL paper.

  3. zihangdai/xlnet - Code repository associated with the XLNet paper.

Chainer

  1. soskek/bert-chainer - Chainer implementation of “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.

Transfer Learning in NLP

As Jay Alammar put it:

The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing or NLP for short). Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Moreover, the NLP community has been putting forward incredibly powerful components that you can freely download and use in your own models and pipelines (It’s been referred to as NLP’s ImageNet moment, referencing how years ago similar developments accelerated the development of machine learning in Computer Vision tasks).

One of the latest milestones in this development is the release of BERT, an event described as marking the beginning of a new era in NLP. BERT is a model that broke several records for how well models can handle language-based tasks. Soon after the release of the paper describing the model, the team also open-sourced the code of the model, and made available for download versions of the model that were already pre-trained on massive datasets. This is a momentous development since it enables anyone building a machine learning model involving language processing to use this powerhouse as a readily-available component – saving the time, energy, knowledge, and resources that would have gone to training a language-processing model from scratch.

BERT builds on top of a number of clever ideas that have been bubbling up in the NLP community recently – including but not limited to Semi-supervised Sequence Learning (by Andrew Dai and Quoc Le), ELMo (by Matthew Peters and researchers from AI2 and UW CSE), ULMFiT (by fast.ai founder Jeremy Howard and Sebastian Ruder), the OpenAI transformer (by OpenAI researchers Radford, Narasimhan, Salimans, and Sutskever), and the Transformer (Vaswani et al).

ULMFiT: Nailing down Transfer Learning in NLP

ULMFiT introduced methods to effectively utilize a lot of what the model learns during pre-training – more than just embeddings, and more than contextualized embeddings. ULMFiT introduced a language model and a process to effectively fine-tune that language model for various tasks.

NLP finally had a way to do transfer learning probably as well as Computer Vision could.

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning by Sebastian Ruder et al. MultiFiT extends ULMFiT to make it more efficient and more suitable for language modelling beyond English. (EMNLP 2019 paper)

Books

  1. Transfer Learning for Natural Language Processing - A book that is a practical primer to transfer learning techniques capable of delivering huge improvements to your NLP models.

Other Resources

Expand Other Resources
  1. hanxiao/bert-as-service - Mapping a variable-length sentence to a fixed-length vector using pretrained BERT model.

  2. brightmart/bert_language_understanding - Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN.

  3. algteam/bert-examples - BERT examples.

  4. JayYip/bert-multiple-gpu - A multiple GPU support version of BERT.

  5. HighCWu/keras-bert-tpu - Implementation of BERT that could load official pre-trained models for feature extraction and prediction on TPU.

  6. whqwill/seq2seq-keyphrase-bert - Add BERT to encoder part for https://github.com/memray/seq2seq-keyphrase-pytorch

  7. xu-song/bert_as_language_model - BERT as language model, a fork from Google official BERT implementation.

  8. Y1ran/NLP-BERT–Chinese version

  9. yuanxiaosc/Deep_dynamic_word_representation - TensorFlow code and pre-trained models for deep dynamic word representation (DDWR). It combines the BERT model and ELMo’s deep context word representation.

  10. yangbisheng2009/cn-bert

  11. Willyoung2017/Bert_Attempt

  12. Pydataman/bert_examples - Some examples of BERT. run_classifier.py based on Google BERT for Kaggle Quora Insincere Questions Classification challenge. run_ner.py is based on the first season of the Ruijin Hospital AI contest and a NER written by BERT.

  13. guotong1988/BERT-chinese - Pre-training of deep bidirectional transformers for Chinese language understanding.

  14. zhongyunuestc/bert_multitask - Multi-task.

  15. Microsoft/AzureML-BERT - End-to-end walk through for fine-tuning BERT using Azure Machine Learning.

  16. bigboNed3/bert_serving - Export BERT model for serving.

  17. yoheikikuta/bert-japanese - BERT with SentencePiece for Japanese text.

  18. nickwalton/AIDungeon - AI Dungeon 2 is a completely AI generated text adventure built with OpenAI’s largest 1.5B param GPT-2 model. It’s a first of it’s kind game that allows you to enter and will react to any action you can imagine.

  19. turtlesoupy/this-word-does-not-exist - “This Word Does Not Exist” is a project that allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch. We’ve never seen fake text so real.

Tools

  1. jessevig/bertviz - Tool for visualizing attention in the Transformer model.

  2. FastBert - A simple deep learning library that allows developers and data scientists to train and deploy BERT based models for NLP tasks beginning with text classification. The work on FastBert is inspired by fast.ai.

  3. gpt2tc - A small program using the GPT-2 LM to complete and compress texts. It has no external dependency, requires no GPU and is quite fast. The smallest model (117M parameters) is provided. Larger models can be downloaded as well. (no waitlist, no sign up required).

Tasks

Named-Entity Recognition (NER)

Expand NER
  1. kyzhouhzau/BERT-NER - Use google BERT to do CoNLL-2003 NER.

  2. zhpmatrix/bert-sequence-tagging - Chinese sequence labeling.

  3. JamesGu14/BERT-NER-CLI - Bert NER command line tester with step by step setup guide.

  4. sberbank-ai/ner-bert

  5. mhcao916/NER_Based_on_BERT - This project is based on Google BERT model, which is a Chinese NER.

  6. macanv/BERT-BiLSMT-CRF-NER - TensorFlow solution of NER task using Bi-LSTM-CRF model with Google BERT fine-tuning.

  7. ProHiryu/bert-chinese-ner - Use the pre-trained language model BERT to do Chinese NER.

  8. FuYanzhe2/Name-Entity-Recognition - Lstm-CRF, Lattice-CRF, recent NER related papers.

  9. king-menin/ner-bert - NER task solution (BERT-Bi-LSTM-CRF) with Google BERT https://github.com/google-research.

Classification

Expand Classification
  1. brightmart/sentiment_analysis_fine_grain - Multi-label classification with BERT; Fine Grained Sentiment Analysis from AI challenger.

  2. zhpmatrix/Kaggle-Quora-Insincere-Questions-Classification - Kaggle baseline—fine-tuning BERT and tensor2tensor based Transformer encoder solution.

  3. maksna/bert-fine-tuning-for-chinese-multiclass-classification - Use Google pre-training model BERT to fine-tune for the Chinese multiclass classification.

  4. NLPScott/bert-Chinese-classification-task - BERT Chinese classification practice.

  5. fooSynaptic/BERT_classifer_trial - BERT trial for Chinese corpus classfication.

  6. xiaopingzhong/bert-finetune-for-classfier - Fine-tuning the BERT model while building your own dataset for classification.

  7. Socialbird-AILab/BERT-Classification-Tutorial - Tutorial.

  8. malteos/pytorch-bert-document-classification - Enriching BERT with Knowledge Graph Embedding for Document Classification (PyTorch)

Text Generation

Expand Text Generation
  1. asyml/texar - Toolkit for Text Generation and Beyond. Texar is a general-purpose text generation toolkit, has also implemented BERT here for classification, and text generation applications by combining with Texar’s other modules.

  2. Plug and Play Language Models: a Simple Approach to Controlled Text Generation (PPLM) paper by Uber AI.

Question Answering (QA)

Expand QA
  1. matthew-z/R-net - R-net in PyTorch, with BERT and ELMo.

  2. vliu15/BERT - TensorFlow implementation of BERT for QA.

  3. benywon/ChineseBert - This is a Chinese BERT model specific for question answering.

  4. xzp27/BERT-for-Chinese-Question-Answering

  5. facebookresearch/SpanBERT - Question Answering on SQuAD; improving pre-training by representing and predicting spans.

Knowledge Graph

Expand Knowledge Graph
  1. sakuranew/BERT-AttributeExtraction - Using BERT for attribute extraction in knowledge graph. Fine-tuning and feature extraction. The BERT-based fine-tuning and feature extraction methods are used to extract knowledge attributes of Baidu Encyclopedia characters.

  2. lvjianxin/Knowledge-extraction - Chinese knowledge-based extraction. Baseline: bi-LSTM+CRF upgrade: BERT pre-training.

License

Expand License

This repository contains a variety of content; some developed by Cedric Chee, and some from third-parties. The third-party content is distributed under the license provided by those parties.

I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

The content developed by Cedric Chee is distributed under the following license:

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.

Read More

2022-09-25
Introducing Lavis: A Library For Language-Vision Intelligence With Cross-Modal Implementations

lavis multimodal library for information retrieval and intelligence, can be used with jina

lavis has multiple cross-modal implementations

LAVIS - A Library for Language-Vision Intelligence

Read More

2022-09-17
Agi (Artificial General Intelligence) Related Projects

said by HTM, AGI knows what it did to the world (self-awareness), also signals from sensors.

google research

gwern wrote a fiction. he thinks agi starts from automl-zero which is similar to lazero and metalazero by name and perspective.

by design lazero can be deeply aligned, inspecting and studying user’s actions. it also has its own exploration space. however, these expectations can never be fully satisfied at the same time. if you want more power, you have to let go.

lucidrains repositories

this one got lots of state-of-the-art implementations for close-sourced papers and also repos for AGI. stunning.

JEPA-pytorch (WIP) yann lecun’s version how agi will be built

PaLM scaling language model with pathways

side projects

make a video text to video generation

nuwa text to video generation

opencog

moses (supervised) for evolutionary program synthesis

repos on github

he4o

aixijs general reinforcement learning in browser repo

opennars

brain simulator 2 on windows platform

DQfD: Learning from Demonstrations for Real World Reinforcement Learning (paper)

mit class on AGI

jiaxiaogang’s god-knows-what theory and training logs

awesome deep reinforcement learning (deep-rl)

awesome agicocosci exhausitive list of papers and repos for cognitive science and AGI

introduction and links on AGI

Read More