2022-08-10

Python Suggest Binary File Name Extension

Detect media file corruption, Python suggest binary file name extension

to rule out those corrupted media files, or unplayable files. maybe simply by parsing these files is not enough, we need a dedicated file corruption detector.

to truncate these files and see errors produced by media readers. use text file with media file extension to test them.

2022-07-15

模板创作模式自媒体洗稿

在编写抖音文案的时候用豆包来洗稿提示词：改写下面的文章查重率不超过30%

媒体的意义和AI类似别人知道的就不要发了有可能出错别人不知道的就发有可能有用在兴趣圈周围探索拓宽视野

网页转文章

readbility.js

pagescraper in php

elinks -dump <url>

可以把一个文字或者其他类型的内容当成模板其他文字视频图片当作素材根据模板收集素材形成内容注意素材不能是模板本身素材不能单一不然被认定为抄袭

文章洗稿基于标题和context生成段落：

https://github.com/yangjianxin1/CPM

bert mask:

https://huggingface.co/fnlp/bart-base-chinese

https://huggingface.co/hfl/chinese-macbert-base

chinese paraphrase:

https://github.com/ZhuiyiTechnology/roformer

https://github.com/ZhuiyiTechnology/simbert

可能不是paraphrase模型

https://huggingface.co/lijingxin/mt5-for-zh-paraphrase/tree/main

https://huggingface.co/facebook/m2m100_418M

https://github.com/jiangnanboy/chinese_sentence_paraphrase

chinese summarize generator:

一般抽取式的提取都需要有gpt生成器在中间插入一些句子

hanlp自带抽取文本方案

抽取式文本摘要

bart t5 pegasus中文文本摘要有训练数据集训练教程

一直在想怎么能正确高效的处理seo中，采集的文章怎么去伪原创和洗稿。如果是人工操作的话，那就太麻烦了。采集下来的文章不进行伪原创又害怕被飓风算法命中。

1，tr算法提取摘要再人工重组新的文章。

正好今天发现了python中的textrank4zh库，依赖于jieba、numpy和networkx库，可以通过tr算法进行文章的摘要提取。然后根据摘要再人工洗稿，整合成一篇全新的文章。

测试一篇蚂蜂窝上面的问答，蚂蜂窝问答下面是有很多个答主的内容，通过python爬取所有内容，然后再利用tr算法提取摘要，根据摘要进行重组出一篇新的文章。这样基本上可以成功躲避飓风算法。

先安装依赖库，然后再利用tr4进行摘要提取。

from textrank4zh import TextRank4Keyword, TextRank4Sentence
content = "" # 这里是python采集下来的content html内容
text = re.sub('<.*?>','',content)
text = re.sub(r'\s','',text)
zy = ''
tr4s = TextRank4Sentence()
tr4s.analyze(text=text, lower=True, source = 'all_filters')
# 可修改num值，设置摘要长度。
for item in tr4s.get_key_sentences(num=10):
zy = zy + item.sentence

2，利用google翻译双向翻译洗稿

之前有接触一个所谓人工智能洗稿的网站小发猫，说的是利用NLP算法进行洗稿，本来我以为洗稿只有同义词替换这个办法。

后来研究了一下小发猫，我首先觉得这个绝对不是利用什么所谓的NLP算法来洗稿，研究了一下发现可能是利用google翻译进行双向翻译，就是先中文翻译英文，然后再拿翻译出来的英文再翻译成中文。

自己也开发了一个这样的伪原创工具，发现其实并不好用。如果不仔细读，这样双向翻译出来的文章还能读，但是仔细读的话。其实语法习惯还有用词根本不准确，甚至有些情况还改变了这句话原有的语义。

2022-06-11

现代后现代 Vtuber 流行趋势

把内容生产者的内容拿来训练当模板生成类似内容

把大众的评论互动弹幕拿来当鉴别器的训练资料聚类观众建立观众群体和内容的对应关系

现代就是烂大街的内容大家都接受

后现代就是一部分人接受一部分不接受的内容有流行的趋势

将现代的内容（已经流行过的内容）作为素材将后现代（将要流行或者根据以往经验生成的模板）作为模板合成内容可以满足后现代群体需要具有流行趋势

2022-05-31

Youtube Monitization 油管变现

8种方式变现

https://www.xiaohongshu.com/web-login/canvas?redirectPath=http%3A%2F%2Fwww.xiaohongshu.com%2Fdiscovery%2Fitem%2F61fdc910000000000102be7b

如何永久的影响世界剧本

核弹

权力

大量金钱

基因改造

革命性技术

大众传媒

大规模战争

永生

时空穿越

和外星文明交流

编程

宗教

语言

feedback:

不结婚不生娃

数学

喝酒

用人

忠诚

领导力

智慧

人类思想

人类本性

一句话改变世界

The Interesting Life: Increase Video/Essay/Post Views

Interesting is defined as the amount of changes. The more you change the more interesting it will be.

Find and summarize essay by conjunctives, describe the relationship between each segments, arrange them by conjunctive templates.

文案生成器自媒体话术

2022-05-14

动漫剪辑过审

剪的时候不要超过4分钟可以用spleeter切出语音加入自己的背景音乐

这个属于anti nsfw anti censorship 反内容审查反视频审查对抗机制可以在github上面搜索

二创某种意义也是反审查

审查的 nsfw 微信小程序可以解包然后调用别人的接口可能不稳定

https://github.com/superdashu/frida_with_wechat_applet

https://github.com/superdashu/pc_wxapkg_decrypt_python

2022-05-13

Attractive Dynamic Plus Attractive Video

Some contents are viral to the users. Will add extra watches if combined with related video or essay.

May apply the same rule to other platforms. Must select those with largest views, or verified by trained grading models. Native language only, or we have to translate and verify/convey it into native form. Post it to QQ, other platforms in the form of pictures, links.

2022-04-25

Content Usage

Use the original transcript for paraphrasing, while using danmaku for joke generation.

idea

2022-08-10

Python Suggest Binary File Name Extension

Detect media file corruption, Python suggest binary file name extension

2022-07-15

模板创作模式自媒体洗稿

网页转文章

2022-06-18

捡塑料瓶机器人吸硬币回收硬币机器人

2022-06-11

现代后现代 Vtuber 流行趋势

2022-05-31

Youtube Monitization 油管变现

2022-05-24

如何永久的影响世界

如何永久的影响世界剧本

2022-05-16

The Interesting Life

The Interesting Life: Increase Video/Essay/Post Views

2022-05-14

动漫剪辑过审

2022-05-13

Attractive Dynamic Plus Attractive Video

2022-04-25

Content Usage

Links

idea

2022-08-10 Python Suggest Binary File Name Extension

Detect media file corruption, Python suggest binary file name extension

2022-07-15 模板创作模式 自媒体 洗稿

网页转文章

2022-06-18 捡塑料瓶机器人 吸硬币 回收硬币机器人

2022-06-11 现代 后现代 Vtuber 流行趋势

2022-05-31 Youtube Monitization 油管变现

2022-05-24 如何永久的影响世界

如何永久的影响世界 剧本

2022-05-16 The Interesting Life

The Interesting Life: Increase Video/Essay/Post Views

2022-05-14 动漫剪辑过审

2022-05-13 Attractive Dynamic Plus Attractive Video

2022-04-25 Content Usage

Links

2022-08-10

Python Suggest Binary File Name Extension

2022-07-15

模板创作模式自媒体洗稿

2022-06-18

捡塑料瓶机器人吸硬币回收硬币机器人

2022-06-11

现代后现代 Vtuber 流行趋势

2022-05-31

Youtube Monitization 油管变现

2022-05-24

如何永久的影响世界

如何永久的影响世界剧本

2022-05-16

The Interesting Life

2022-05-14

动漫剪辑过审

2022-05-13

Attractive Dynamic Plus Attractive Video

2022-04-25

Content Usage