Autonomous Machines & Society.

2022-12-07
human-in-the-loop AI training and models

github topic includes dalle-flow, argilla, refinery and more

human-in-the-loop-learning

mariusmcl who has a ghosted repo called instructgpt-pytorch seems like the topic on instructive AI including decision transformer

CarperAI has many repos related including trix (distributed training of language models with Reinforcement Learning via Human Feedback)

Read More

2022-12-07
Rl, Trajectory Prediction, Model Predictive Control

this reminds me of ddpg-usv-asmc and Deep-Reinforcement-Learning-Algorithms-with-PyTorch (is it? nope. it is stable-baseline3, containing PPO preferred by OpenAI when training InstructGPT) or Deep-reinforcement-learning-with-pytorch

mpc.torch

awesome-deep-rl For deep RL and the future of AI.

Read More

2022-12-07
Useful Sources On Cyber Attack

learning resource and bug bounty

https://www.hacker101.com

https://www.hackerone.com

https://www.hacker101.com/resources

open source virus/malware in your arsenal

powershell obfuscator advanced, will bypass any av

post-exploit framework, evasion

https://github.com/PowerShellMafia/PowerSploit

https://github.com/cobbr/SharpSploit

https://github.com/EmpireProject/Empire

thefatrat is an exploiting tool which compiles a malware with famous payload, and then the compiled maware can be executed on Linux , Windows , Mac and Android. TheFatRat Provides An Easy way to create Backdoors and Payload which can bypass most anti-virus. the author has some tools to share.

pupy is an opensource, cross-platform (Windows, Linux, OSX, Android) C2 and post-exploitation framework written in python and C

venom - C2 shellcode generator/compiler/handler

virus samples

the malware repo

open source virus

thezoo A repository of LIVE malwares for your own joy and pleasure. theZoo is a project created to make the possibility of malware analysis open and available to the public.

malwares codebase, botnet

open source malware on github, repo list

virus for win10

kafan virus samples

vbgood

debugman reverse engineering


official blackhat arsenal under toolswatch category arsenal

massive hacking tools collection

burpa burp suite automation tool

twitter token generator register twitter in batch, has a large proxy list

i0gan some hacker with automated tools like awd_script

ichunqiu ctf educational resources

cyberchief online ctf interactive tools suite

bugku tools

ctftools curated online tool list

ctf online tools

kanxue home page, articles

52pojie hack tools

kanxue knowledge base

ctfshow

ctfhub tools

渗透师导航

resources recommended by ctfwiki

shellcode storm database can be queried via api

exploitdb find exploits, poc code, google hacking database for finding juicy information/urls, shellcodes with an advanced search interface

cracking.org

OSINT: open source (public source) intelligence is the practice of collecting information from published or otherwise publicly available sources

osint tools:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Maltego
Google dorks
Mitaka
SpiderFoot
Spyse
BuiltWith
Intelligence X
DarkSearch.io
Grep.app
Recon-ng
theHarvester
Shodan
Metagoofil
Searchcode
SpiderFoot
Babel X

Read More

2022-12-07
Some Notes Just Like Me: Today I Learned (Til)

repo here: til

Read More

2022-12-07
Cyber Grand Challenge Darpa Machine Automated Cyber Attack

ctfwiki’s intro on CGC

analyze source code first, then plan attack or fix code

cgc’s github repo and website

search for darpa cgc on github

cyber-challenge Some toy examples, to demonstrate ideas that could be used in DARPA’s Cyber Grand Challenge including modifying java bytecode and filter out html requests on the fly

EVIL (Exploiting software VIa natural Language) is an approach to automatically generate software exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work.

Topics

linux exploit encoder assembly decoder dataset seq2seq shellcode nmt software-exploitation codebert

Resources

Readme

License

GPL-3.0 license

Stars

13 stars

Watchers

3 watching

Forks

1 fork

Releases

No releases published

Packages

No packages published

Contributors 2

@piliguori

piliguori Pietro Liguori

@taisazero

taisazero Erfan Al-Hossami

Languages

Python

97.6%

Shell

2.0%

Other

0.4%

Read More

2022-12-07
直接调用安卓方法 Frida 直接调用二进制里面的方法

Read More

2022-12-07
Tools From Breachforums

  1. Invicti

Invicti is a web application security scanner hacking tool to find SQL Injection, XSS, and vulnerabilities in web applications or services automatically.

  1. Fortify WebInspect

It is used to identify security vulnerabilities by allowing it to test the dynamic behavior of running web applications.

  1. Cain & Abel

It is used to recover the MS Access passwords

  1. Nmap (Network Mapper)

Used in port scanning, one of the phases in ethical hacking, is the finest hacking software ever.

  1. Nessus

Nessus is the world’s most well-known vulnerability scanner, which was designed by tenable network security. It is free and is chiefly recommended for non-enterprise usage.

  1. Nikto

Checks web servers and identifies over 6400 CGIs or files that are potentially dangerous

  1. Kismet

Kismet is basically a sniffer and wireless-network detector that works with other wireless cards and supports raw-monitoring mode.

  1. NetStumbler

Identifying AP (Access Point) network configuration

  1. Acunetix

Integration of scanner results into other platforms and tools

  1. Netsparker

Uniquely verifies identified vulnerabilities, showing that they are genuine, not false positives

  1. Intruder

Integrates with Slack, Jira, and major cloud providers

  1. Nmap

Contains a data transfer, redirection, and debugging tool

  1. Metasploit

Ideal for finding security vulnerabilities

  1. Aircrack-Ng

It can crack WEP keys and WPA2-PSK, and check Wi-Fi cards

  1. Wireshark

Allows coloring rules to packet lists to facilitate analysis

  1. OpenVAS

OpenVAS has the capabilities of various high and low-level Internet and industrial protocols, backed up by a robust internal programming language.

  1. SQLMap

Supports executing arbitrary commands

  1. Ettercap

Live connections sniffer

  1. Maltego

Performs real-time information gathering and data mining

  1. Burp Suite

Uses out-of-band techniques

  1. John the Ripper

Tests different encrypted passwords

  1. Angry IP Scanner

This is a free tool for scanning IP addresses and ports

  1. SolarWinds Security Event Manage

Recognized as one of the best SIEM tools, helping you easily manage memory stick storage

  1. Traceroute NG

Detects paths changes and alerts you about them

  1. LiveAction

Its packet intelligence provides deep analyses

  1. QualysGuard

Responds to real-time threats

  1. WebInspect

Tests dynamic behavior of web applications for the purpose of spotting security vulnerabilities

  1. Hashcat

Supports distributed cracking networks

  1. L0phtCrack

Fixes weak passwords issues by forcing a password reset or locking out accounts

  1. Rainbow Crack

  2. IKECrack

IKECrack is an authentication cracking tool with the bonus of being open source.

  1. Sboxr

Checks for over two dozen types of web vulnerabilities

  1. Medusa

One of the best tools for thread-based parallel testing and brute-force testing

  1. Cain and Abel

uncovers password fields, sniffs networks, recovers MS Access passwords, and cracks encrypted passwords using brute-force, dictionary, and cryptanalysis attacks.

  1. Zenmap

Administrators can track new hosts or services that appear on their networks and track existing downed services

Read More

2022-12-06
Mirror Sites Change

if it only blocks a range of ip, you use proxy to avoid this constraint.

some mirror sites serves us poorly and block access from us. we point them out, list alternatives and provide quick fixes.

these actions are intentionally done against specific group of people. it does block a whole range of IPs.

actors:

1
2
3
https://mirrors.aliyun.com
https://mirrors.tuna.tsinghua.edu.cn/

fixes:

currently we use some previously picked up tunnel accounts provided by topsap. may fix this problem?

python pip:

1
2
pip3 config set global.index-url https://mirrors.ustc.edu.cn/pypi/web/simple

taobao npm mirror:

1
2
3
http://npm.taobao.org => http://npmmirror.com
http://registry.npm.taobao.org => http://registry.npmmirror.com

Read More

2022-12-06
Seo Search Engine Optimization Semrush Alternative

semrush contains multiple services, and it is paid. many online tools are paid as well. to find open source alternatives (usually it can’t be achieved with a single tool alone, from scraping to analyzing), let’s figure out what does this tool do, also few tech terms.

semrush does SEO, SEM, and SMM.

put social media buttons on webpages to let users share the content, usually by passing parameters in url, which is part of SMM.

tools

keyword mining (by search engine or more): 2 words -> 3 words -> 4 words -> 5 words (recursive)

keyword-suggest-tool is a simple tool that provides you keyword suggestion from multiple search engines like google, bing, yahoo, ebay, amazon, ebay, deployed on sutlej.net/seo-tools

ULTRA Unbiased Learning To Rank Algorithms, sorting things out, find what users like the most

serpbear check rankongs on google

curated seo tools huge tools/website collection on seo category

awesome-keyword-finder-tools A curated list of amazingly awesome seo keyword finder tools

Keyword-Research-tool-python Build a Keyword research tool with google autocomplete suggestions in python

keyword tool The Keyword Manager is a tool to support SEAs and SEOs finding new keywords from a website.

keyword_tool Web app to extract keywords from pasted text. Built with NLTK and Streamlit.

keywordshitter2 A website to find long-tail keywords using search suggestions, still works on here

PURR (PUppeteer RunneR) is a devops-friendly tool for browser testing and monitoring by semrush

awesome-local-seo A curated list of amazingly awesome local seo resources.

seo-audits-toolkit SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc …

seo_keyword_research_tools The Keyword Volume Tool uses the Google Adwords API Targeting Ideas Service to return the search volume and competition of a massive list of keywords. The Keyword Expansion Tool uses the Google Adwords API Targeting Ideas Service to expand an input keyword into up to 500 related keywords with search volume.

Resources

functionalities

Competitive analysis

Keyword research

Backlink research

Content research/Content optimization/Content planning

Rank Tracker

Site audit tool/Site explorer

Link analysis/Link profile

Domain comparison

competitor research

SEO Metrics

Google Data studio

glossaries

Local SEO: the practice of optimizing your website for a specific local area

SERP: search engine result page, means scraping from search engine to get rankings.

SEO: search engine optimization, means to cheat the search engine to get higher rankings.

SEM: search engine marketing, pay ads to search engine, or advertise on your own search engine?

SMM: social media marketing, play nice with the public

SMO: social media optimization, attract users on platform

Read More

2022-12-06
chatgpt

GPT4 is out.


三个国内镜像站

https://chat.forchange.cn

https://aigcfun.com

https://ai.askai.top


besides from decent processors, RAM and optimized runtime, in order to load LLMs fast, one would store the model weights on SSDs.


now colossalai supports chatgpt training with a single gpu, using open-source code

check humata for paper QA and information extraction/language understanding from PDF files


the syntax of chatgpt’s response is obviously markdown.

in order to be unblocked by chatgpt just because we are using static ip of corp’s wifi, we can connect through our phone’s hotspot.

Microsoft’s EdgeGPT needs you to open in Edge browser and join the waitlist of new Bing, having 3rd party API here

Merlin is an extension based on ChatGPT which is avaliable for free and all countries, with 11 queries for free each day. Pro subscriptions incoming.

Rallio67 builds dataset for RLHF and has released multiple chatgpt-like models on huggingface. namely, joi, chip and rosey, all based on pythia or neox-20b. laion people tend to share loads to CPU in order to run these huge models properly.

KoboldAI considered OPT and GPT-Neo as generic LMs. special models like NSFW shits may serve some purposes better.

many alternatives, but many are specialized in marketing and content generation, some are chatgpt replica, like chatsonic (with google knowledge) and youchat (from you.com (awesome!))

open assistant now has a data collection website, in which you can only perform tasks given and earn points (working for free? nah?)

it is adviced to run this chatgpt program with libraries instead of manually, to prevent issues.

my account has been banned from trying chatgpt. though it is not going to be free forever, you need to moderate your input (multi-language support, not only english but chinese) using some api to prevent similar incidents. also some topics outside of blacklist are banned intentionally so you need to check if the model is really producing the answer. if not you should avoid or change the way of asking it.

moderation via official openai api, perspective api (free), or via some projects like content moderation deeplearning, bert text moderation, bert-base-uncased-hatexplain, toxic-bert, copilot-toxicity and multilingual-hate-speech-robacofi, train on datasets like hate_speech_offensive, toxicity (by surge-ai, a dataset labelling workforce) and multilingual-hate-speech

from my point of view, this is a service you cannot replicate at home, either requires smaller models with different architecture, or requires crowd-sourced computational power.

saying chatgpt is powered by ray, increasing parallelism.

bigscience petals colab and petals repo

discord chatroom for reproducing chatgpt

since many different models are derived from the original pretrained language model, opendelta can save disk space by freezing main parameters, only tuning few of them.

this gpt seems really good. currently only api access.

but it is provided by openai which is no longer so “open” in the sense of “open-source”.

stability.ai is providing alternative open-source implementations of SOTA AI algorithms, which includes carper.ai, eleuther.ai, dreamstudio, harmonai (audio), laion.ai (datasets and projects)

viable approaches to chatgpt

according to my point of view, chatgpt is just specialized on chat, or socialized in other words.

the elo rating system is the key to facebook social network, many zero-sum games. basically it is some revolution rating system. to do such rating system effectively one shall use along with classifiers and embeddings.

according to the training process of instructgpt and webgpt, we know that gpt has learned more by interacting with people (multiple QA), doing self-examination (learning a reward model) and performing actions (searching and quoting on web).

RLHF

chainer, prompt engineering

awesome chatgpt prompts

langchain extending llm by advanced prompts, llm wrappers actions, databases and memories

RL algorithms, tools for providing feedback

Awesome-RLHF paper and code about RLHF

openai baselines

stable-baselines 3

SetFit

Efficient few-shot learning with Sentence Transformers, used by FewShotRLGPT (no updates till now?)

RLHF models

non-language models

image_to_text_rlhf

algorithm-distillation-rlhf

language models

chatrwkv pure rnn language model, with chinese support

lamda-rlhf-chatgpt

blenderbot2 a bot which can search internet, blenderbot3 is US only. install ParlAI then clone ParlAI_SearchEngine. tutorial

promptCLUE based on T5, created by clueai, trained on pCLUE

openassistant

openchatgpt-neox-125m trained on chatgpt prompts, can be tested here, trained from pythia

copycat chatgpt replicate

medicine-chatgpt shit sick of COVID-19

baby-rlhf both cartpole and languge model

rlhf-shapespeare

textrl 100+stars

PaLM-RLHF claims RETRO will be integrated soon?

RL4LMs with multiple rl methods

minRLHF

webgpt-cli interface openai api to browse web and answer questions

lm-human-preferences by openai

rlhf-magic using trlx (supports GPT3-like models) which has PPO and ILQL (as trainable model)

trl only has PPO on GPT2

Tk-Instruct T5 trained on natural instruct dataset. is it trained on RLHF systems?

datasets

whisperhub collection of chatgpt prompts by plugin

hh-rlhf

instructgpt samples

natural instructions

dataset building tools

open-chatgpt-prompt-collective

crowd-kit purify noisy data

promptsource

reward models

rankgen scores model generations given a prefix (or prompt)

electra-webgpt-rm and electra-large-reward-model is based on electra discriminator

GPT3-like models

galactica is opt trained on scientific data

bloomz and mt0 trained on xP3 (multilingual prompts and code)

T0PP T0 optimized for zero-shot prompts, despite much smaller than GPT-3

RETRO another model with GPT-3 capabilities with fewer parameters?

gpt3 is gpt2 with sparse attension, which enables it to generate long sequence

Diffusion-LM

PaLM

metaseq provides OPT, which is basically GPT3

GPT-JT altered in many ways, trained on natural instructions huggingface space

GPT-Neo

GPT-J

GPT-NeoX

Bloom large language model by bigscience

autonomous learning

autonomous-learning-library doc and repo

Gu-X doing god-knows-what experiments

analysis about how to make such model

gpt3 is capable of imitation (cause it is unsupervised.)

but! if you want to get things done (when you really need it!), you better want some aligned AI.

two similar models by openai: webgpt and instructgpt

about instructgpt

it is first fine-tuned on supervised datasets, then train some reward model, then use the reward model to handle prompts and do reinforcement learning with PPO.

details on webgpt environment

guess: create states by performing actions, then generate templates to allow model filling blanks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Our text-based web-browsing environment is written mostly in Python with some JavaScript. For a
high-level overview, see Section 2. Further details are as follows:
• When a search is performed, we send the query to the Microsoft Bing Web Search API, and
convert this to a simplified web page of results.
• When a link to a new page is clicked, we call a Node.js script that fetches the HTML of the
web page and simplifies it using Mozilla’s Readability.js.
• We remove any search results or links to reddit.com or quora.com, to prevent the model
copying answers from those sites.
• We take the simplified HTML and convert links to the special format
【<link ID>†<link text>†<destination domain>】, or
【<link ID>†<link text>】 if the destination and source domains are the same. Here,
the link ID is the index of the link on the page, which is also used for the link-clicking
command. We use special characters such as 【 and 】 because they are rare and encoded
in the same few ways by the tokenizer, and if they appear in the page text then we replace
them by similar alternatives.
• We convert superscripts and subscripts to text using ^ and _, and convert images to the
special format [Image: <alt text>], or [Image] if there is no alt text.
• We convert the remaining HTML to text using html2text.
• For text-based content types other than HTML, we use the raw text. For PDFs, we convert
them to text using pdfminer.six. For all other content types, and for errors and timeouts, we
use an error message.
• We censor any pages that contain a 10-gram overlap with the question (or reference answer,
if provided) to prevent the model from cheating, and use an error message instead.
• We convert the title of the page to text using the format <page title> (<page domain>).
For search results pages, we use Search results for: <query>.
• When a find in page or quote action is performed, we compare the text from the command
against the page text with any links stripped (i.e., including only the text from each link).
We also ignore case. For quoting, we also ignore whitespace, and allow the abbreviated
format <start text>━<end text> to save tokens.
• During browsing, the state of the browser is converted to text as shown in Figure 1(b).
For the answering phase (the last step of the episode), we convert the question to
text using the format <question>■, and follow this by each of the collected quotes
in the format [<quote number>] <quote page title> (<quote page domain>)
<double new line><quote extract>■.

voice assistants

voice assistant in cpp

ChatWaifu with anime voice, ChatWaifu with live2d

hacking

give longterm memory and external resources to gpt3

write backend logic with gpt

hackgpt exploit vulnerabilities

vulchatgpt ida plugin for reverse engineering

chatgpt-universe things related to chatgpt

galgame using chatgpt

记笔记

12.27更新了一个更精简的应用

强烈建议部署到服务器上

huggingface参考:https://huggingface.co/spaces/Mahiruoshi/Lovelive-Nijigasaku-Chat-iSTFT-GPT3

GitHub:https://github.com/Paraworks/vits_with_chatgpt-gpt3

地址:https://drive.google.com/drive/folders/1vtootVMQ7wTOQwd15nJe6akzJUYNOw4d?usp=share_link

你可以先尝试在服务器上部署,之后可以直接解压进文件夹后运行exe(mac、安卓端需要用renpy自行编译)

https://beta.openai.com/account/api-keys获取api-key

参数照着敲就好了

人物id通常是从0开始的数字,我的模型最大到12

api部署方法:把inference_api.py放入你的vits目录下,进入文件修改config和checkpoint.pth的路径,比起应用程序来说十分简单,可以自行设计。码龄三个月写出的的雪山代码警告

——————————————————————————————————————————————————

Chatgpt部署方法已于12.26更新(视频后部分)

vits参考:https://github.com/CjangCjengh/vits

服务器端建议用ISTFT VITS:https://github.com/innnky/MB-iSTFT-VITS

model库:https://github.com/CjangCjengh/TTSModels

也可以用我的https://huggingface.co/spaces/Mahiruoshi/MIT-VITS-Nijigaku

CHATGPT参考:https://github.com/rawandahmad698/PyChatGPT

示例视频(纯服务器api,gpt3)https://www.bilibili.com/video/BV1hP4y1B7wH/?spm_id_from=333.999.0.0&vd_source=7e8cf9f5c840ec4789ccb5657b2f0512

穗乃果配音来自缪斯的模型@Freeze_Phoenix

gpt3加载参考@ぶらぶら散策中

chatgpt use cases curated list

DAILA use chatgpt

to identify function calls in decompiler

awesome transformer language models a huge collection on transformer based LMs, huge models by megacorps, with some introduction and analogy on chatgpt

huggingface blog on RLHF containing similar projects and source code

bilibili sends me lots of videos (and articles) on hacking and ai (including chatgpt) via its android app. recommend you to scrape this source and collect transcription and screenshots for searching and content generation.

b站有做免杀 绕过杀软的

chatgpt原理解析

chatgpt对接搜索引擎

下载链接:

github: https://github.com/josStorer/chat-gpt-search-engine-extension/releases/

百度网盘: https://pan.baidu.com/s/1MnFJTDIatyIIPr5kUMWsAw?pwd=1111

提取码:1111

原项目: https://github.com/wong2/chat-gpt-google-extension

我创建的fork, 添加了多个搜索引擎支持的版本: https://github.com/josStorer/chat-gpt-search-engine-extension

PR: https://github.com/wong2/chat-gpt-google-extension/pull/31

已修复先前百度需要手动刷新的问题

access via api

https://github.com/altryne/chatGPT-telegram-bot

https://github.com/taranjeet/chatgpt-api

https://github.com/acheong08/ChatGPT

https://github.com/vincelwt/chatgpt-mac

https://github.com/transitive-bullshit/chatgpt-api

https://github.com/rawandahmad698/PyChatGPT

models like chatgpt

lfqa retrival based generative QA

lm-human-preferences by openai

trl Train transformer language models with reinforcement learning based on gpt2

trlx A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF) by CarperAI

RL4LMs A modular RL library to fine-tune language models to human preferences

PaLM-rlhf-pytorch saying this is basically chatgpt with palm

gpt-gmlp saying this design integrates gpt with gmlps so will use less ram and can be trained on a single gpu

WebGPT

tk-instruct with all models by allenai can be multilingual, trained on natural instructions

there’s a ghosted repo named instructgpt-pytorch found in bing but no cache preserved, also an empty repo called InstructFNet wtf?

AidMe Code and experiment of the article AidMe User-in-the-loop Adaptative Intent Detecttion for Instructable Digital Assistant

cheese Used for adaptive human in the loop evaluation of language and embedding models.

Kelpie Explainable AI framework for interpreting Link Predictions on Knowledge Graphs

GrIPS Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

queakily nlp datasets cleaner

gpt-j

super big bilingual model GLM-130B

multi-modal deeplearning paper collections

bloom a huge model like gpt-3

notice, gpt-2 is somehow inferior to gpt-3 since it has smaller model parameters

dialogue-generation Generating responses with pretrained XLNet and GPT-2 in PyTorch.

personaGPT Implementation of PersonaGPT Dialog Model

DialoGPT Large-scale pretraining for dialogue

Read More