chatgpt

generative_question_answering
reinforcement_learning_language_models
openai
carperai
rallio67s_dataset
koboldai
opt
gpt_neo
sentence_transformers
Chatgpt_deployment
marketing_customer_service
This article delves into the integration of generative question-answering and reinforcement learning in language models. It highlights projects such as OpenAI, CarperAI, Rallio67’s dataset, KoboldAI, OPT, GPT-Neo, Sentence Transformers, ChatGPT deployment, and various marketing/customer service applications.
Published

December 6, 2022


GPT4 is out.


三个国内镜像站

https://chat.forchange.cn

https://aigcfun.com

https://ai.askai.top


besides from decent processors, RAM and optimized runtime, in order to load LLMs fast, one would store the model weights on SSDs.


now colossalai supports chatgpt training with a single gpu, using open-source code

check humata for paper QA and information extraction/language understanding from PDF files


the syntax of chatgpt’s response is obviously markdown.

in order to be unblocked by chatgpt just because we are using static ip of corp’s wifi, we can connect through our phone’s hotspot.

Microsoft’s EdgeGPT needs you to open in Edge browser and join the waitlist of new Bing, having 3rd party API here

Merlin is an extension based on ChatGPT which is avaliable for free and all countries, with 11 queries for free each day. Pro subscriptions incoming.

Rallio67 builds dataset for RLHF and has released multiple chatgpt-like models on huggingface. namely, joi, chip and rosey, all based on pythia or neox-20b. laion people tend to share loads to CPU in order to run these huge models properly.

KoboldAI considered OPT and GPT-Neo as generic LMs. special models like NSFW shits may serve some purposes better.

many alternatives, but many are specialized in marketing and content generation, some are chatgpt replica, like chatsonic (with google knowledge) and youchat (from you.com (awesome!))

open assistant now has a data collection website, in which you can only perform tasks given and earn points (working for free? nah?)

it is adviced to run this chatgpt program with libraries instead of manually, to prevent issues.

my account has been banned from trying chatgpt. though it is not going to be free forever, you need to moderate your input (multi-language support, not only english but chinese) using some api to prevent similar incidents. also some topics outside of blacklist are banned intentionally so you need to check if the model is really producing the answer. if not you should avoid or change the way of asking it.

moderation via official openai api, perspective api (free), or via some projects like content moderation deeplearning, bert text moderation, bert-base-uncased-hatexplain, toxic-bert, copilot-toxicity and multilingual-hate-speech-robacofi, train on datasets like hate_speech_offensive, toxicity (by surge-ai, a dataset labelling workforce) and multilingual-hate-speech

from my point of view, this is a service you cannot replicate at home, either requires smaller models with different architecture, or requires crowd-sourced computational power.

saying chatgpt is powered by ray, increasing parallelism.

bigscience petals colab and petals repo

discord chatroom for reproducing chatgpt

since many different models are derived from the original pretrained language model, opendelta can save disk space by freezing main parameters, only tuning few of them.

this gpt seems really good. currently only api access.

but it is provided by openai which is no longer so “open” in the sense of “open-source”.

stability.ai is providing alternative open-source implementations of SOTA AI algorithms, which includes carper.ai, eleuther.ai, dreamstudio, harmonai (audio), laion.ai (datasets and projects)

viable approaches to chatgpt

according to my point of view, chatgpt is just specialized on chat, or socialized in other words.

the elo rating system is the key to facebook social network, many zero-sum games. basically it is some revolution rating system. to do such rating system effectively one shall use along with classifiers and embeddings.

according to the training process of instructgpt and webgpt, we know that gpt has learned more by interacting with people (multiple QA), doing self-examination (learning a reward model) and performing actions (searching and quoting on web).

RLHF

chainer, prompt engineering

awesome chatgpt prompts

langchain extending llm by advanced prompts, llm wrappers actions, databases and memories

RL algorithms, tools for providing feedback

Awesome-RLHF paper and code about RLHF

openai baselines

stable-baselines 3

SetFit

Efficient few-shot learning with Sentence Transformers, used by FewShotRLGPT (no updates till now?)

RLHF models

non-language models

image_to_text_rlhf

algorithm-distillation-rlhf

language models

chatrwkv pure rnn language model, with chinese support

lamda-rlhf-chatgpt

blenderbot2 a bot which can search internet, blenderbot3 is US only. install ParlAI then clone ParlAI_SearchEngine. tutorial

promptCLUE based on T5, created by clueai, trained on pCLUE

openassistant

openchatgpt-neox-125m trained on chatgpt prompts, can be tested here, trained from pythia

copycat chatgpt replicate

medicine-chatgpt shit sick of COVID-19

baby-rlhf both cartpole and languge model

rlhf-shapespeare

textrl 100+stars

PaLM-RLHF claims RETRO will be integrated soon?

RL4LMs with multiple rl methods

minRLHF

webgpt-cli interface openai api to browse web and answer questions

lm-human-preferences by openai

rlhf-magic using trlx (supports GPT3-like models) which has PPO and ILQL (as trainable model)

trl only has PPO on GPT2

Tk-Instruct T5 trained on natural instruct dataset. is it trained on RLHF systems?

datasets

whisperhub collection of chatgpt prompts by plugin

hh-rlhf

instructgpt samples

natural instructions

dataset building tools

open-chatgpt-prompt-collective

crowd-kit purify noisy data

promptsource

reward models

rankgen scores model generations given a prefix (or prompt)

electra-webgpt-rm and electra-large-reward-model is based on electra discriminator

GPT3-like models

galactica is opt trained on scientific data

bloomz and mt0 trained on xP3 (multilingual prompts and code)

T0PP T0 optimized for zero-shot prompts, despite much smaller than GPT-3

RETRO another model with GPT-3 capabilities with fewer parameters?

gpt3 is gpt2 with sparse attension, which enables it to generate long sequence

Diffusion-LM

PaLM

metaseq provides OPT, which is basically GPT3

GPT-JT altered in many ways, trained on natural instructions huggingface space

GPT-Neo

GPT-J

GPT-NeoX

Bloom large language model by bigscience

autonomous learning

autonomous-learning-library doc and repo

Gu-X doing god-knows-what experiments

analysis about how to make such model

gpt3 is capable of imitation (cause it is unsupervised.)

but! if you want to get things done (when you really need it!), you better want some aligned AI.

two similar models by openai: webgpt and instructgpt

about instructgpt

it is first fine-tuned on supervised datasets, then train some reward model, then use the reward model to handle prompts and do reinforcement learning with PPO.

details on webgpt environment

guess: create states by performing actions, then generate templates to allow model filling blanks.

Our text-based web-browsing environment is written mostly in Python with some JavaScript. For a
high-level overview, see Section 2. Further details are as follows:
• When a search is performed, we send the query to the Microsoft Bing Web Search API, and
convert this to a simplified web page of results.
• When a link to a new page is clicked, we call a Node.js script that fetches the HTML of the
web page and simplifies it using Mozilla’s Readability.js.
• We remove any search results or links to reddit.com or quora.com, to prevent the model
copying answers from those sites.
• We take the simplified HTML and convert links to the special format
【<link ID>†<link text>†<destination domain>】, or
【<link ID>†<link text>】 if the destination and source domains are the same. Here,
the link ID is the index of the link on the page, which is also used for the link-clicking
command. We use special characters such as 【 and 】 because they are rare and encoded
in the same few ways by the tokenizer, and if they appear in the page text then we replace
them by similar alternatives.
• We convert superscripts and subscripts to text using ^ and _, and convert images to the
special format [Image: <alt text>], or [Image] if there is no alt text.
• We convert the remaining HTML to text using html2text.
• For text-based content types other than HTML, we use the raw text. For PDFs, we convert
them to text using pdfminer.six. For all other content types, and for errors and timeouts, we
use an error message.
• We censor any pages that contain a 10-gram overlap with the question (or reference answer,
if provided) to prevent the model from cheating, and use an error message instead.
• We convert the title of the page to text using the format <page title> (<page domain>).
For search results pages, we use Search results for: <query>.
• When a find in page or quote action is performed, we compare the text from the command
against the page text with any links stripped (i.e., including only the text from each link).
We also ignore case. For quoting, we also ignore whitespace, and allow the abbreviated
format <start text>━<end text> to save tokens.
• During browsing, the state of the browser is converted to text as shown in Figure 1(b).
For the answering phase (the last step of the episode), we convert the question to
text using the format <question>■, and follow this by each of the collected quotes
in the format [<quote number>] <quote page title> (<quote page domain>)
<double new line><quote extract>■.

access via api

https://github.com/altryne/chatGPT-telegram-bot

https://github.com/taranjeet/chatgpt-api

https://github.com/acheong08/ChatGPT

https://github.com/vincelwt/chatgpt-mac

https://github.com/transitive-bullshit/chatgpt-api

https://github.com/rawandahmad698/PyChatGPT

models like chatgpt

lfqa retrival based generative QA

lm-human-preferences by openai

trl Train transformer language models with reinforcement learning based on gpt2

trlx A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF) by CarperAI

RL4LMs A modular RL library to fine-tune language models to human preferences

PaLM-rlhf-pytorch saying this is basically chatgpt with palm

gpt-gmlp saying this design integrates gpt with gmlps so will use less ram and can be trained on a single gpu

WebGPT

tk-instruct with all models by allenai can be multilingual, trained on natural instructions

there’s a ghosted repo named instructgpt-pytorch found in bing but no cache preserved, also an empty repo called InstructFNet wtf?

AidMe Code and experiment of the article AidMe User-in-the-loop Adaptative Intent Detecttion for Instructable Digital Assistant

cheese Used for adaptive human in the loop evaluation of language and embedding models.

Kelpie Explainable AI framework for interpreting Link Predictions on Knowledge Graphs

GrIPS Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

queakily nlp datasets cleaner

gpt-j

super big bilingual model GLM-130B

multi-modal deeplearning paper collections

bloom a huge model like gpt-3

notice, gpt-2 is somehow inferior to gpt-3 since it has smaller model parameters

dialogue-generation Generating responses with pretrained XLNet and GPT-2 in PyTorch.

personaGPT Implementation of PersonaGPT Dialog Model

DialoGPT Large-scale pretraining for dialogue