2022-07-15

模板创作模式自媒体洗稿

from textrank4zh import TextRank4Keyword, TextRank4Sentence
content = "" # 这里是python采集下来的content html内容
text = re.sub('<.*?>','',content)
text = re.sub(r'\s','',text)
zy = ''
tr4s = TextRank4Sentence()
tr4s.analyze(text=text, lower=True, source = 'all_filters')
# 可修改num值，设置摘要长度。
for item in tr4s.get_key_sentences(num=10):
zy = zy + item.sentence

2，利用google翻译双向翻译洗稿

之前有接触一个所谓人工智能洗稿的网站小发猫，说的是利用NLP算法进行洗稿，本来我以为洗稿只有同义词替换这个办法。

后来研究了一下小发猫，我首先觉得这个绝对不是利用什么所谓的NLP算法来洗稿，研究了一下发现可能是利用google翻译进行双向翻译，就是先中文翻译英文，然后再拿翻译出来的英文再翻译成中文。

自己也开发了一个这样的伪原创工具，发现其实并不好用。如果不仔细读，这样双向翻译出来的文章还能读，但是仔细读的话。其实语法习惯还有用词根本不准确，甚至有些情况还改变了这句话原有的语义。

Translators/Paraphraser for casual usage

baidu translator (api) provided by paddlehub

baidu language detector (api)

text style transfer:

https://blog.csdn.net/qq_27590277/article/details/106991084

python google translate api:

pip install googletrans

google translate in php:

https://github.com/Stichoza/google-translate-php

paraphrase via rephrasing and reordering

pegasus paraphrase:

increase the num_beams and temperature

https://analyticsindiamag.com/how-to-paraphrase-text-using-pegasus-transformer/

https://www.thepythoncode.com/article/paraphrase-text-using-transformers-in-python

example paraphrase project using LSTM as decoder and encoder:

https://github.com/vsuthichai/paraphraser

paraphrase with t5:

https://github.com/Vamsi995/Paraphrase-Generator

paraphrase dataset:

https://github.com/Wys997/Chinese-Paraphrase-from-Quora

文本纠错

https://github.com/James4Ever0/pycorrector

数据增强变换句子形式

https://yongzhuo.blog.csdn.net/article/details/89166307

https://github.com/zhanlaoban/eda_nlp_for_Chinese

calculate perplexity:

https://github.com/DUTANGx/Chinese-BERT-as-language-model

https://github.com/James4Ever0/nlp-fluency

https://zhuanlan.zhihu.com/p/265677864

https://github.com/mattzheng/py-kenlm-model

multi-purpose tool for chinese: 偏旁部首情感分析

https://github.com/SeanLee97/xmnlp

敏感词过滤语言检测训练语料库

https://github.com/fighting41love/funNLP

paraphraser.io

multilingual paraphrase database:

paraphrase.org

simbert

https://www.zhihu.com/question/317540171

BERT：原始版本bertRoberta：哈工大开源的中文wwm roberta模型BERT-SQ：本人在百度知道相似句数据集(Sim-Query)上微调后的bert模型Roberta-SQ：同上BERT-Whitening： @苏剑林最新博客中提出的白化模型。Roberta-Whitening：同上

https://yongzhuo.blog.csdn.net/article/details/89166307

language fluency test:

https://github.com/baojunshan/nlp-fluency

many paraphraser models for english are on huggingface, but few for chinese.

https://huggingface.co/lijingxin/mt5-for-zh-paraphrase

https://pypi.org/project/genienlp/

https://github.com/salesforce/decaNLP

parrot paraphraser with nlu engines for english:

https://github.com/PrithivirajDamodaran/Parrot_Paraphraser

sentence level paraphraser:

https://github.com/vsuthichai/paraphraser

document level paraphraser, with sentence rewriting and reordering(shuffle):

https://github.com/L-Zhe/CoRPG

https://pypi.org/project/lexsub/

https://github.com/hit-joseph/lexical-paraphrase-extraction

synonyms (python library)

you can also train a contextual search tool using fine-tuned repurposed paraphrase model.

https://pypi.org/project/nlp-text-search/

文言文

https://github.com/raynardj/yuan

粤语

https://huggingface.co/x-tech

huggingface有英语翻译到其他语言的模型没有翻译成中文的模型

在线

https://github.com/nidhaloff/deep-translator

https://github.com/UlionTse/translators

translatepy

离线

https://huggingface.co/tasks/translation

https://huggingface.co/Helsinki-NLP/opus-mt-zh-en

https://github.com/argosopentech/argos-translate

libretranslate

https://github.com/Teuze/translate

https://github.com/xhlulu/dl-translate/

facebook/mbart-large-50-many-to-many-mmt

mbart50

m2m100

view under https://huggingface.co/tasks to see great models fitting exact needs.

2022-04-25

Content Usage

Use the original transcript for paraphrasing, while using danmaku for joke generation.

2022-04-09

Minor Changes Will Defeat Deduplicate Algorithm While Maintain Overall Fluency

I found several evation methods, like paraphraser, random character swapper for text, and video blur, mirror to post the same video again. I guess it is essential not to let any part of the content look like the original one.

paraphraser

2022-07-15

模板创作模式自媒体洗稿

网页转文章

2022-04-29

Translators For Casual Usage

Translators/Paraphraser for casual usage

2022-04-25

Content Usage

2022-04-09

Minor Changes Will Defeat Deduplicate Algorithm While Maintain Overall Fluency

Links

paraphraser

2022-07-15 模板创作模式 自媒体 洗稿

网页转文章

2022-04-29 Translators For Casual Usage

Translators/Paraphraser for casual usage

2022-04-25 Content Usage

2022-04-09 Minor Changes Will Defeat Deduplicate Algorithm While Maintain Overall Fluency

Links

2022-07-15

模板创作模式自媒体洗稿

2022-04-29

Translators For Casual Usage

2022-04-25

Content Usage

2022-04-09

Minor Changes Will Defeat Deduplicate Algorithm While Maintain Overall Fluency