2022-12-13
Turing-Project And His Works On Ai And Nlp

he recently interacts with racketeers on wechat, find how to add new friends (and groups if any) on wechat.

the bilibili user and his repo

video transfer based on DCT-Net 视频洗稿 伪原创

AntiFraudChatBot is a wechaty bot using a super large model based on megatron called Yuan 1.0 which is only freely avaliable within three month (30k api calls) when applied to chat with racketeers, another application: AI剧本杀

megatron deepspeed enables training large model on cheap hardware

essaykillerbrain is another project he has involved in, which contains EssayKiller_V2 EssayKiller_V1 EssayTopicPredict WrittenBrainBase

alphafold in mindspore

Read More

2022-12-08
Talk To Openai Chatgpt To Learn A Few On Paraphrasing, Title Generation

it’s like a huge search engine which can talk.

github topic on paraphrase

小发猫

文章伪原创工具 and backup 有提到用扫描仪来扫描原创文章的方法

online paraphrase tool translation based, can process chinese

chatgpt says creativity matters, such as hackers, video producers and your evil sex shit.

you said LDA and LSA are helpful for paraphrasing

Yes, LDA and LSA are two methods that can be used to generate paraphrases of a given text. LDA (Latent Dirichlet Allocation) is a probabilistic model that can be used to identify topics in a text and then generate new sentences that express the same topics in a different way. LSA (Latent Semantic Analysis) is a method that uses singular value decomposition to identify the underlying meaning of words in a text and then generate paraphrases by replacing words with their closest semantic equivalents. Both LDA and LSA can be helpful for generating paraphrases, but they each have their own strengths and limitations.

can you demonstrate how to use LSA for paraphrasing in python

I’m sorry, but I am not currently able to browse the internet or access any external resources. As a large language model trained by OpenAI, my knowledge is based solely on the text that I have been trained on, and I don’t have the ability to run code or demonstrate specific examples. However, I can provide a general overview of how LSA could be used for paraphrasing in Python.

To use LSA for paraphrasing in Python, you would first need to create a matrix of word vectors for the words in the text you want to paraphrase. This matrix would represent the semantic relationships between the words in the text, and would be used to identify the underlying meaning of each word.

Next, you would use singular value decomposition (SVD) to decompose the matrix into three matrices, which represent the different factors that contribute to the meaning of the words in the text.

Once you have the three matrices, you can then use them to identify the most closely related words for each word in the text, and generate paraphrases by replacing words with their closest semantic equivalents.

This is just a brief overview of how LSA could be used for paraphrasing in Python. There are many different ways to implement LSA, and the exact details of the implementation would depend on the specific requirements of your project.

continue

I’m sorry, but I’m not sure what you would like me to continue with. If you have a specific question or topic you would like to discuss, I would be happy to help. Otherwise, I can provide more information about LSA and how it can be used for paraphrasing if that would be helpful. Let me know what you would like to know.

how to use LDA for paraphrasing

To use LDA for paraphrasing, you would first need to train a LDA model on a large corpus of text. This would involve using a machine learning algorithm to identify the topics that are present in the corpus, and then generating a set of “topic vectors” that represent the distribution of those topics across the text.

Once you have trained the LDA model, you can then use it to generate paraphrases of a given text by identifying the topics present in the text and then replacing words with synonyms or related words that express the same topics in a different way. This process can be repeated multiple times to generate multiple paraphrases of the same text.

Here is a brief example of how LDA could be used for paraphrasing in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
from gensim import corpora, models
# Load the text you want to paraphrase
text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text and create a dictionary of words
tokens = text.split(" ")
dictionary = corpora.Dictionary([tokens])
# Train a LDA model on the text
lda_model = models.LdaModel([dictionary.doc2bow(tokens)], num_topics=1)
# Generate paraphrases of the text
for i in range(3):
paraphrase = lda_model.show_topics(num_topics=1, num_words=len(tokens), formatted=False)
print(" ".join([word[0] for word in paraphrase[0][1]]))

This example trains a simple LDA model on a single sentence and then generates three paraphrases of the sentence by replacing words with their closest semantic equivalents. Of course, this is just a simple example to illustrate the basic concept of using LDA for paraphrasing. In practice, you would need to use a much larger corpus of text to train the LDA model and generate more accurate paraphrases.

demonstrate using LSA to paraphrase in python, using gensim if needed

As I mentioned earlier, I am not currently able to browse the internet or access external resources, so I cannot demonstrate using LSA to paraphrase in Python using gensim or any other specific libraries. However, I can provide a general overview of how LSA could be used for paraphrasing in Python.

To use LSA for paraphrasing in Python, you would first need to create a matrix of word vectors for the words in the text you want to paraphrase. This matrix would represent the semantic relationships between the words in the text, and would be used to identify the underlying meaning of each word.

Once you have the matrix of word vectors, you can use singular value decomposition (SVD) to decompose the matrix into three matrices, which represent the different factors that contribute to the meaning of the words in the text. The gensim library includes a implementation of SVD that you can use for this purpose.

Once you have the three matrices, you can then use them to identify the most closely related words for each word in the text, and generate paraphrases by replacing words with their closest semantic equivalents. Here is an example of how this could be done in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from gensim import corpora, models, matutils
# Load the text you want to paraphrase
text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text and create a dictionary of words
tokens = text.split(" ")
dictionary = corpora.Dictionary([tokens])
# Create a matrix of word vectors using the dictionary
word_vectors = [dictionary.doc2bow(tokens)]
# Use SVD to decompose the matrix of word vectors
U, S, VT = matutils.svd(word_vectors, k=3)
# Generate paraphrases of the text
for i in range(3):
paraphrase = []
for word in tokens:
# Identify the closest semantic equivalents for each word
closest_words = VT.dot(VT[dictionary.token2id[word]])
closest_words = sorted(enumerate(closest_words), key=lambda x: x[1], reverse=True)
closest_word = closest_words[0][0]
# Replace the word with its closest semantic equivalent
paraphrase.append(dictionary[closest_word])
print(" ".join(paraphrase))

This example trains an LSA model on a single sentence and then generates three paraphrases of the sentence by replacing words with their closest semantic equivalents. Again, this is just a simple example to illustrate the basic concept of using LSA for paraphrasing. In practice, you would need to use a much larger corpus of text to train the LSA model and generate more accurate paraphrases.

how to use transformer for paraphrasing

The transformer is a type of neural network architecture that has been used for a wide range of natural language processing tasks, including paraphrasing. To use a transformer for paraphrasing, you would need to train the transformer on a large corpus of text using a machine learning algorithm.

During training, the transformer would learn to identify the underlying meaning of words and phrases in the text and then generate paraphrases by replacing words with synonyms or related words that express the same meaning in a different way.

Here is a brief example of how a transformer could be used for paraphrasing in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load the text you want to paraphrase
text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text using a pre-trained transformer model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
tokens = tokenizer.encode(text)
# Load a pre-trained transformer model
transformer = AutoModelForSeq2SeqLM.from_pretrained("distilbert-base-cased")
# Generate paraphrases of the text
for i in range(3):
paraphrase = transformer.generate(tokens)
paraphrase = tokenizer.decode(paraphrase[0])
print(paraphrase)

This example uses a pre-trained transformer model to generate three paraphrases of a single sentence. Of course, in practice you would need to train the transformer on a much larger corpus of text to generate more accurate paraphrases. The exact details of the implementation would depend on the specific requirements of your project.

how to generate catchy title from keywords using language model

To generate a catchy title from a set of keywords using a language model, you would need to train the language model on a large corpus of text that includes a variety of titles. During training, the language model would learn to identify the underlying meaning of words and phrases and the characteristics of catchy titles.

Once the language model has been trained, you can then use it to generate a catchy title from a set of keywords by providing the keywords as input to the model and then selecting the generated title that is the most catchy and relevant to the keywords.

Here is a brief example of how this could be done in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the keywords you want to use to generate the title
keywords = ["quick", "brown", "fox"]
# Tokenize the keywords using a pre-trained language model
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokens = tokenizer.encode(keywords)
# Load a pre-trained language model
language_model = AutoModelForCausalLM.from_pretrained("gpt2")
# Generate candidate titles from the keywords
candidate_titles = []
for i in range(3):
title = language_model.generate(tokens, max_length=8)
title = tokenizer.decode(title[0])
candidate_titles.append(title)
# Select the most catchy and relevant title
selected_title = candidate_titles[0]
for title in candidate_titles:
if is_catchy(title) and is_relevant(title, keywords):
selected_title = title
break
print(selected_title)

This example generates three candidate titles from a set of keywords using a pre-trained language model, and then selects the most catchy and relevant title as the final title. Of course, this is just a simple example to illustrate the basic concept of using a language model to generate a catchy title from keywords. In practice, you would need to use a much larger corpus of text to train the language model and generate more accurate and diverse titles.

Read More

2022-11-28
Exploring Popular Ai Libraries And Tools For Various Tasks

Read More

2022-09-17
Agi (Artificial General Intelligence) Related Projects

said by HTM, AGI knows what it did to the world (self-awareness), also signals from sensors.

google research

gwern wrote a fiction. he thinks agi starts from automl-zero which is similar to lazero and metalazero by name and perspective.

by design lazero can be deeply aligned, inspecting and studying user’s actions. it also has its own exploration space. however, these expectations can never be fully satisfied at the same time. if you want more power, you have to let go.

lucidrains repositories

this one got lots of state-of-the-art implementations for close-sourced papers and also repos for AGI. stunning.

JEPA-pytorch (WIP) yann lecun’s version how agi will be built

PaLM scaling language model with pathways

side projects

make a video text to video generation

nuwa text to video generation

opencog

moses (supervised) for evolutionary program synthesis

repos on github

he4o

aixijs general reinforcement learning in browser repo

opennars

brain simulator 2 on windows platform

DQfD: Learning from Demonstrations for Real World Reinforcement Learning (paper)

mit class on AGI

jiaxiaogang’s god-knows-what theory and training logs

awesome deep reinforcement learning (deep-rl)

awesome agicocosci exhausitive list of papers and repos for cognitive science and AGI

introduction and links on AGI

Read More

2022-08-26
Unlocking The Potential Of Ai-Assisted Live Streaming And Audience Data

关于直播的思路

可以用长音频 长视频替代直播源

Yukio 23:00:49

这个我还在研究这玩意

卑劣的写作者 23:01:13

[图片]

Yukio 23:01:19

尤其是怎么把别人的皮套拿来当成自己的

Yukio 23:01:44

追踪虚拟Vtuber的动作然后放到我的皮套上

Yukio 23:02:50

搞媒体不都靠抄么

Yukio 23:03:38

你们要是能把别人一个月之前的直播弄下来 视频音频分别杂交处理一下 弄的人看不出来是抄的

卑劣的写作者 23:03:48

那不是塞里斯特色媒体吗

Yukio 23:03:50

你就躺赚啊

gjz010 23:03:52

你偷大物皮套感觉会被版权炸弹

gjz010 23:04:13

你看即使是怪盗也不敢把自己的皮套偷过来用

gjz010 23:04:48

那你还不如用阿b的公用皮套

Yukio 23:04:49

你随便弄个b站提供的免费皮套

Yukio 23:05:00

或者原神的

Yukio 23:05:32

一天换一个啊 肯定有人看的

gjz010 23:05:44

也不一定

gjz010 23:05:58

皮套有商标的意味

Yukio 23:06:03

把别人的皮套动作追踪之后 绑定到免费皮套上面

gjz010 23:06:14

啥 皮套动作不都是跟着你走的吗

gjz010 23:06:20

偷别人的动作有啥用

Yukio 23:06:20

把别人的中文语音截取下来 随机播放

gjz010 23:06:30

你还不如找个ai念

Yukio 23:06:39

我为什么要绑我的动作

gjz010 23:06:54

就是不追踪瞎摇的

gjz010 23:07:01

动捕坏了的时候用

Yukio 23:07:08

我这个不是瞎摇晃

Yukio 23:07:18

我这个是重播

Yukio 23:07:36

把别人的动作再播送一遍

Yukio 23:07:49

所以只要你记忆力没有一个月

Yukio 23:07:59

没法把全网的直播都看一遍

Yukio 23:08:14

你不可能知道我究竟这期节目抄的谁

Yukio 23:08:41

我不仅动作和语音不是一个人 画面也是另外一个人

卑劣的写作者 23:08:58

Yukio 23:09:12

我还会把所有和原作者有关的东西自动清除

Yukio 23:09:24

比如任何QQ号码 任何联系方式

卑劣的写作者 23:09:26

这人不能处

Yukio 23:09:37

任何作者署名

Yukio 23:10:32

我会把语音变声处理

Yukio 23:11:51

只要有机会 我直接下载外网twitch直播 把国内的语音放上来 都是同类游戏

Yukio 23:14:06

我用谷歌翻译流行的游戏名字 拿到外网去搜索

Yukio 23:16:17

同时我还有一个自动读评论的插件

Yukio 23:16:36

每隔几分钟读一次 让你们以为这是个真人

Yukio 23:17:03

我通过图片截图搜索 得到游戏名字

Yukio 23:17:45

通过相似图片得到关键词 生成标题 主题 标签 分区

Yukio 23:20:47

皮套人的动作有自动过渡系统

Yukio 23:20:57

不会出现跳变

Yukio 23:22:47

利用智能匹配 选取最适合的主题 动作 语音 自动生成连续的内容

小晴清风揽月 23:24:01

见到皮套人就恶心

Yukio 23:24:19

皮套人是资本收割机

Yukio 23:24:38

可以把处男的jy转化为软妹币

Yukio 23:24:58

非常的节能环保 非常高效

重庆人快融化啦 23:26:20

[图片]

Yukio 23:26:40

如果我算力充足 完全可以跳出这个抄别人的逻辑 进行完全的所谓原创直播

Yukio 23:27:10

但是就一台笔记本 抄直播是最为经济有效的

Yukio 23:28:01

也为之后定制更高端的原创模型打好基础

Yukio 23:30:30

我可以用观众的弹幕数据作为搜索分类的数据 可以拿来衡量情绪激烈程度

Yukio 23:30:56

语音数据也是如此

小晴清风揽月 23:30:56

你语言混乱,先去看看医生

Yukio 23:31:06

不需要

Yukio 23:31:28

觉得我混乱的 你压根还不懂

Yukio 23:31:42

也就是没想清楚

小晴清风揽月 23:32:01

我开玩笑的

小晴清风揽月 23:32:08

对不起

小晴清风揽月 23:32:16

我只是在学仰山杨爱民说话

Read More

2022-08-09
Awesome-Data-Labeling

A curated list of awesome data labeling tools

Images

  • labelImg - LabelImg is a graphical image annotation tool and label object bounding boxes in images

  • CVAT - Powerful and efficient Computer Vision Annotion Tool

  • labelme - Image Polygonal Annotation with Python

  • VoTT - An open source annotation and labeling tool for image and video assets

  • imglab - A web based tool to label images for objects that can be used to train dlib or other object detectors

  • Yolo_mark - GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2

  • PixelAnnotationTool - Software that allows you to manually and quickly annotate images in directories

  • OpenLabeling - Label images and video for Computer Vision applications

  • imagetagger - An open source online platform for collaborative image labeling

  • Alturos.ImageAnnotation - A collaborative tool for labeling image data

  • deeplabel - A cross-platform image annotation tool for machine learning

  • MedTagger - A collaborative framework for annotating medical datasets using crowdsourcing.

  • Labelbox - Labelbox is the fastest way to annotate data to build and ship computer vision applications

  • turktool - A modern React app for scalable bounding box annotation of images

  • Pixie - Pixie is a GUI annotation tool which provides the bounding box, polygon, free drawing and semantic segmentation object labelling

  • OpenLabeler - OpenLabeler is an open source desktop application for annotating objects for AI appplications

  • Anno-Mage - A Semi Automatic Image Annotation Tool which helps you in annotating images by suggesting you annotations for 80 object classes using a pre-trained model

  • CATMAID - Collaborative Annotation Toolkit for Massive Amounts of Image Data

  • make-sense - makesense.ai is a free to use online tool for labelling photos

  • LOST - Design your own smart Image Annotation process in a web-based environment

  • Annotorious - A JavaScript library for image annotation.

  • Sloth - Tool for labeling image and video data for computer vision research.

Text

  • YEDDA - A Lightweight Collaborative Text Span Annotation Tool (Chunking, NER, etc.). ACL best demo nomination.

  • ML-Annotate - Label text data for machine learning purposes. ML-Annotate supports binary, multi-label and multi-class labeling.

  • TagEditor - Annotation tool for spaCy

  • SMART - Smarter Manual Annotation for Resource-constrained collection of Training data

  • PIAF - A Question-Answering annotation tool

Audio

  • EchoML - Play, visualize, and annotate your audio files

  • audio-annotator - A JavaScript interface for annotating and labeling audio files.

  • audio-labeler - An in-browser app for labeling audio clips at random, using Docker and Flask.

  • wavesurfer.js - Simple annotations tool, check the example.

  • peak.js - Browser-based audio waveform visualisation and UI component for interacting with audio waveforms, developed by BBC UK.

  • Praat - Doing Phonetics By Computer

  • Aubio - Tool designed for the extraction of annotations from audio signals.

Video

  • UltimateLabeling - A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker

  • VATIC - VATIC is an online video annotation tool for computer vision research that crowdsources work to Amazon’s Mechanical Turk.

Time Series

  • Curve - Curve is an open-source tool to help label anomalies on time-series data

  • TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

  • time-series-annotator - The CrowdCurio Time Series Annotation Library implements classification tasks for time series.

  • WDK - The Wearables Development Toolkit (WDK) is a set of tools to facilitate the development of activity recognition applications with wearable devices.

3D

  • webKnossos - webKnossos is an open-source web-based tool for visualizing, annotating, and sharing large 3D image datasets. It features fast 3D data browsing, skeleton (line-segment) annotations, segmentation and proof-reading tools, mesh visualization, and collaboration features. The public instance webknossos.org hosts a collection of published datasets and can be used without a local setup.

  • KNOSSOS - KNOSSOS is a software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity.

Lidar

MultiDomain

  • Label Studio - Label Studio is a configurable data annotation tool that works with different data types

  • Dataturks - Dataturks support E2E tagging of data items like video, images (classification, segmentation and labelling) and text (full length document annotations for PDF, Doc, Text etc) for ML projects.

Read More

2022-06-01
Time Series Analysis

Read More

2022-05-31
Sketch Based Applications

magenta studio sketch completion

awesome sketch based applications paper and code sketch syntheses inbetweening:

https://github.com/MarkMoHR/Awesome-Sketch-Based-Applications#17-sketch-animationinbetweening

deep sketch based cartoon inbetweening:

https://github.com/xiaoyu258/Inbetweening

Read More

2022-05-29
Jina: Neural Search Engine For Images, Videos, Audios

openclip

haystack

tutorial: build QA pipeline with no dependencies with haystack

towhee

milvus

visit jina hub to get multiple embedding models and workflows

jina import video/image/text

finetuner: text to image search via clip

datawhale provides tutorials on machine learning, also provide book materials, topics are: numpy, matplotlib, pandas,

vced: holy gift from datawhale able to edit video by text, video auto editor, cutter

VCED 可以通过你的文字描述来自动识别视频中相符合的片段进行视频剪辑。该项目基于跨模态搜索与向量检索技术搭建,通过前后端分离的模式,帮助你快速的接触新一代搜索技术。

jina:

https://github.com/jina-ai/jina/

documentation:

https://docs.jina.ai

quick demos:

dress Fashion image search: jina hello fashion

robot QA chatbot: pip install “jina[demo]” && jina hello chatbot

newspaper Multimodal search: pip install “jina[demo]” && jina hello multimodal

fork_and_knife Fork the source of a demo to your folder: jina hello fork fashion ../my-proj/

Create a new Jina project: jina new hello-jina

ai video metadata generation:

Read More

2022-05-29
Dall_E Text To Image

open sourced text to image:

https://github.com/lucidrains/DALLE-pytorch

dalle_mini:

https://github.com/borisdayma/dalle-mini

jina ai human in the loop multi prompt text to image dalle-flow:

https://github.com/jina-ai/dalle-flow

dalle playground:

https://github.com/saharmor/dalle-playground

Read More