schedule

offline backup

schedule pyjom on alpharetta backup to disk every 12 hours
set a notice to let me execute time machine backup every 1 week (next scheduled backup: thu aug 18)

online backup

send systemwide notification if aliyun disk token expires, with reacquiring method broadcasted
schedule pyjom on alpharetta backup to cloud disks every 12 hours

pyjom dev schedules

整活

应急诈骗食品 (派蒙加Rick Ashley 如何混合？）

recommendation

use txtai to do NLU and recommend things to people

topic discovery/acquiring

personal/customized topics

tencent qq customized (can associate with mail)
wechat customized
bilibili per user customized

dog/cat video generation

make render engine runnable

issues:

video length too long (10 mins)

it was the speed calculation error.

bgm somehow not in sync (too broad bpm/clip ranges?)
to analyze the peaks (abrupt changes) in bgm and grab louder peaks using pyloudnorm (getting audio volume)

1 2	pip3 install pyloudnorm

import soundfile as sf
import pyloudnorm as pyln
data, rate = sf.read("0055014.wav") # load audio (with shape (samples, channels))
print(data.shape)
meter = pyln.Meter(rate) # create BS.1770 meter
loudness = meter.integrated_loudness(data) # measure loudness
print(loudness)

place video on loudest points, abrupt changes detected by talib or just take direvative and gaussian average
video too repetitive (small corpus?)
do not remove subtitle and crop active region (reviewer’s resource not used? but i rather advise you to do it directly since it requires less computational power)
do not have minimum motion threshold (reviewer’s fault? also recommend you to do this in producer)

remove all watermarks, subtitles and crop video boundaries accordingly
source video and audio (infinite, basic test is to find 500 sources at once without duplicate, second test is to find 500 second is to find 500 without duplicate twice), improve highlight algorithm
find 500 songs without duplicate at once
find 500 songs no duplicate twice
find 500 animal videos without duplicate
find 500 animal videos no duplicate twice
generate appropriate title, cover, info and tags
collect feedback after the post
find some shocking fonts for cover and subtitle, english and chinese
make that karaoke effect
make ass with karaoke effect with lrc files
make lyrics sync logic fluent, according to what have learned from karaoke effects
make selected video clips fluent, no abrupt cuts, maybe we need pyscenedetect?

text to video, template based video generator (this is perhaps the most complex video generator ever. do it with caution, it might also includes the flipcard, narrator and slideshow based generators)

generator models subarchitecture (subcategories of template based generators)

flipcard

slideshow (video and audio, might also include the dog&cat video!)

narrator

summarized video

policy evasion, NSFW filters

remove all hints from image, video, audio and script that may lead to copyright issues

analyze the media content and metadata, relationships

analyze danmaku
paraphrase the script
cut the crap and understand each clip’s meaning

process the video clips, like changing the human figure, changing face, stylish the video, adding 2d to 3d effects

process the audio clips, like changing voice, adding sound effects, separating audio/music tracks, ducking

index, retrieve and align video and audio content according to our collected database

retrieve and align video and audio according to our smart search agent (keyword extractor, related words) and do live compilation

qq managing

mitm chats in friends
mitm chats in groups
source and send pictures to qzone
source and send pictures to chat
reduce posting frequency by group size and feedback
post relative video link relative to group topic

personal info collecting and email/sms bulk sending

avoid mail being trashed or turned into junk
collect and make mail templates for mail posting

voice changer

vst based voice changer
train or find a decent voice generator 御姐音语料库小受音语料库

请在b站或者qq群里面寻找或者什么其他的有关的地方寻找谢谢

直播 live streaming

source the video

如果是同一个站的尽量放一个月以前的视频半个月以前的音频

prepare some space for storing live streaming data
source the audio
automatic interactions
handle the vtuber model’s actions

review previous notes and fill blanks, list blanks below, better with direct link to it

tag all notes, especially mark out those stub, incomplete ones
review and complete bilibili courses, reorder, rename and split them if necessary
传播学导论把笔记做完

review your history

visit all previously visited links and store briefs generated by readbility.js and elinks

finance and quantatitive trading

design a basic algorithm and complete regression test on joinquant
design and complete one regression test offline

artificial general intelligence

design a program automatically execute commands in shell
design a program automatically click everywhere in GUI

study nars

run python version of nars

study opencog

study he4o

indexing necessary tools, blogs, procedures, manuals, snippets and search them

indexing kali tools
use hacker’s search engines

github coin (xmrig) mining

automatic captcha resolve by clash redirection to force the captcha not being too complex
create better labeling interface for spiral picture detection
automate the whole process unsupervised
find more ways to mine coins other than cirrus
hide our intention of coin mining
create or reassure our monero wallet

bilibili seo and hacking

study bilibili source code

study and learn some parameters/apis for faking data

general hacking and tool learning

study popular tools

frida

cutter

radare2

study popular kali tools

nmap script to find nearby hosts in same router
metasploit script to scan vulnerable hosts

solve ctf challenges

study hacking posts

study hacking tutorial on darkweb

find a free hacking tutorial on darkweb

2022-07-31

Bilibili 账号找回

联系了客服娘人工找回的

看来这个b站还是欠日什么消息都要我记住？我记得住个屁

提醒我隔一段时间检查一下这个话费两个号码都要检查

一周提醒一次从8月19号开始每个星期五都要检查话费

新注册的号码 6个月之后可以更换套餐换成8块钱一个月的

2022-07-14

复读机 Chatbot

利用带时间戳的QQ消息提取和天气有关的内容根据历史天气预报推断群友位置

根据聊天记录推断群友位置

测试聊天机器人的方式就是对聊测试自媒体可以用自动测试假数据进行测试

带二维码的图片二维码对比度要低避免被qq管家撤回

classify bilibili video which users recommend, then train the model against recent chats and topics with video tags

for our potential viewers, you may send them popular/hot things, place trackers (短链接统计) on those links, guess their favourites.

you may fool bilibili trackers, official parameter-based trackers.

对于qq上面聊骚的可以发a片给他们

get bilibili user email address by asking them from chat. if they give the email address, send setu as gift (regularly?)

of course you need to pass uid to us. either by parameters or by asking.

建立用户画像 cache足够多的信息总结出来足够精确的话题标签

发点啥吸引人的提供某种服务不然就会被踢

大晚上的不要说话大家都睡觉说话容易被踢

一般被引用的图片发图之后被回复图片下面比较激烈的回复代表着图片质量比较好要取决于图片具体内容进行分类

note: in order to install libraries and dependencies, you need ubuntu inside termux. create shortcuts and alias to launch ubuntu. link files and directories to ububtu proot filesystem. also never attempt to update kali since that will break shit, especially for bumping python versions.

时序数据库

tdengine stream processing

influxdb python client

智能问答

智能问答与深度学习附带代码

近义词

use wordnet to find hyponyms and antonyms

find antonyms for chinese with wordnet

中文近义词以及如何扩充词库

话题建模句向量

10 nlp libraries

gensim word2vec

word embedding using word2vec

gensym word2vec complete guide

go-cqhttp 自定义合并转发消息生成不存在的合并转发消息

渐进式领红包对于某个群先是两分钟（左右）之后领一次领不到就时间减半下一次再领如果领到了就不减半最快6秒领不能再减了防止某些群为了检测机器人而发红包
处理信息不要流水线处理放在messagepool里面要有重要度排序相关性排序
QQ漂流瓶机器人捡漂流瓶API
改回群昵称总有些脑瘫喜欢给我乱起名一天检查一次模仿其他人的群昵称看看有没有能用的马甲
mitm Chatbot

chatbot frameworks:

convai-bot 1337 the best hybrid convai bot

omeglemiddleman

chatterbot able to learn while having conversation

qary: nlpia-bot a hybrid framework for developing chatbot by mannings

mitm-omegle watch strangers talk

ai chatbot framework

用sentence bert做search based dialog 替代levenshtein 最好是asymetrical semantic search
有人有测试红包外挂的红包可能有“test”、“测试”、“别抢”、“不要”之类的字眼这种红包不要抢抢了飞机票
群聊的下一句话不一定是上一句话的回答训练模型寻找句子相关性计算相关度以及句子顺序
对接小冰
管理员/群主在的时候或者管理员经常出现的群里面不要冒泡不然容易被封

转发的图片至少要在之前一小时以内或更长时间内没有被重复发送才行同一个信息内也不能出现重复图片否则不发送这个信息（很有可能是广告）

有二维码不发送有网址不发送

图片里面的文字要是有广告也是不能要的

文字信息不要广告用简单分类器

个性化搜索推荐 elasticsearch

按照老毛的思想要一边造谣一边辟谣一边承认一边否定同样的话颠三倒四可以说无数遍也可以选择不说这样可以和很多的类似故事杂交

处理私聊信息每回复一个人就清除他的所有历史发言每隔一段时间处理其中的一个人不会相互挤占只有在不闲聊的时候处理私聊信息特定的人不能进行私聊
白天聊天收集数据晚上离线训练（此逻辑可以推广到任意的机器学习驱动的平台）
增加训练数据的context 不要只是一问一答总语句数量要增加
占用系统显卡训练的时候需要专门acquire一个filelock 表示大量资源被占用系统忙
选取质量好有情感的聊天样本长短适中不要广告不要偏激违禁词去掉表情包去掉链接清洗数据同时模型用于对话的时候不要输入输出一些违禁词可以通过话题建模进一步细分归类对话数据之间的联系

schedule the training on minute basis first for complete test, then schedule it on fixed time per day.

for qq client: dump 500 continual sentences when adding one new while holding the filelock, do not block or stop running if GPT not responding

for gpt2 server: (where to train? how to prevent maching from burning? for how long?)

rename the dataset while holding the filelock

always keep the latest 2 models, remove those not readable first, then delete older ones.

if train on CPU, still need to limit training time, sleep while doing so. GPU for sure we need sleep during training, and do not use VRAM for other applications.

把”汪汪”翻译成表情包同时可以随机添加其他表情
根据实时群聊数据训练gpt2
根据离线群聊数据训练gpt2

自动骂人

https://github.com/liuke-wuhan/ZuAnBot

添加一个FileLock在gpt2 main server里面不要让多个对话同时进行处理
在人多的群里面少说话具体表现为每次说话的时间间隔变长次数变少同时要注意聊天内容过于严肃专业的群尽量不要水

dialogpt documentation

闲聊chitchat dialog bot training framework by facebook:

https://github.com/facebookresearch/ParlAI

debug the consecutive group reply thresholding protocol

reply according to individual description and group description

同时推广自己和别人的视频或者内容收集推荐反馈同时逐步减小推荐别人视频或者内容的频率
推广视频的时候可以加入别人的视频高赞评论动态的GIF 音频或者是短视频然后再发送xml
增加复读图片的功能增加chatlocal返回图片的功能
增加反馈功能根据发言之后群里面的回复来确定发言是否有益
用txtai或者其他information retrieval (semantic search, smart search)语义查找工具来代替levenshtein的history based reply logic 查找时要包括上下文
复读机不能使得死群活起来但是主动推送可以推送长的自言自语的对话到群里面不能是同一个群主题要相关 filter out too negative ones
拉人到别的群里面来最好是多个号不共享群但是话题有交集的人
add involution option, allow to append unqualified replies to input message, separated by space.
add extend conversation option, allow to reply more than one sentence at a time (proper delay needed) -> could be achieved by using GPT2 alike generation model
可以给群友点赞
可以发语音

每次对话输入的context不能太小不然看起来假

添加复读原句子的功能触发条件为sentiment

往群里面发b站视频广告的话最好和群聊主题相关和最近接收到的消息相关同时频率不能太高要设置全局的counter 群聊每发送一条消息trigger一次counter counter mod period == 0 的时候就执行发广告命令同时可以考虑渲染任务和发广告的逻辑要解耦合同时访问一片数据比如redis 根据最近聊到的内容制作和上传视频不能在同一个群里面以太快的频率发送相同视频相同的视频必须间隔一段时间再往其他群发送最好用schedule库实现方法内部要实现delay或者放弃schedule的效果

如果群聊被踢可以考虑换头像换昵称更改个人资料然后重新申请同样可以考虑更改b站的信息用外网网红信息来填充自己的信息更改资料频率和申请频率都需要控制需要单独设置每天的quota quota保存在文件里面申请的信息最好用ai生成或者paraphrase一下或者到网上搜索收集相关内容先训练一下头像可以全网到处爬可以选择二次元头像（动漫头识别）对比度高的可以是类似头像不能是系统默认头像不然太过无聊可以和群聊主题相关资料抄别人的别的群里面的抄该群群成员的资料或者别的群的资料不能是管理员资料

根据模板生成下一句不要直接生成素材可以是群公告群主题接收到的信息

模板生成要和新词发现结合

模板生成 paraphraser可以和chatlocal或者repeater结合

>>> import re
>>> re.split(r"(abc|acd)","aaabcaaacdaaa")
['aa', 'abc', 'aa', 'acd', 'aaa']
>>> word="aaabcaaacdaaa"
>>> word="aaabcaaacdaaa"
>>> re.escape("abc")
'abc'
>>> re.escape("efgh")
'efgh'
>>>

可以拆分句子为列表

去除经常生成的话语比如你好之类的

挑选levenshtein距离大于0（不能是它本身）的上一句，排序选择10句根据情绪激烈程度（正负皆可去掉过于负面的）排序输出第一名选择下一句作为回答然后记录这个回答在机器人的回答历史中

句子如果是取同一个group里面的不能太recent 起码距离要有50个句子的距离

文字图片视频都可以搜索百度搜狗中文搜索api 根据相关度和情绪来排序（语种一致）回答文字或者多媒体

拆分大句子为小句子依次放入注意要过滤掉广告一般广告比较长有链接？

输入的内容不能有违禁词否则不回答
输出内容的时候不能有违禁词语放进来的可以违禁或者用拼音或者拆字转换这些违禁词语保证上下文一致性文本审查

bad chinese -> letter(pinyin initials) -> leetspeek

下一次挑选的时候自动过滤掉这些下一句在历史回答里面的句子对

那个lock 要限制自身的读取/删除操作以及新消息的append操作

关于情绪激烈程度如何提高生成器的情绪激烈度做一个鉴别器可以选择性的不去back propagate情绪不激烈的生成结果或者直接用鉴别器筛选输入的语料

Project Cybergod
Project Pyjom
Project Prometheus
Project Pyjom
Blog Source Code
My Github
Samoyedsun's Blog
Atlant1c's Blog
Gregoryuan's Blog
Yubingtao's Blog