2023-01-09
Everything You Need To Startup Your Media Project: Viral Video Generator, Viral Video Analyzer, Trend Analyzer, Automated Email Account Registration, Download Only A Portion Of Video, Peek Video Screenshots

Microsoft powered recommendation system, has now joined the Linux Foundation.


CantoonSegmentation can be used for creating visual effects over static cartoon images.


use grammar or state machine to constraint how llm generate tokens, in order to perform actions and call tools correctly


remove vocal from video and from music downloaded from web, then execute shazam. or only remove vocal from video and execute music search.


voice auto translation by meta


use attention visualization to create pan effects, focus on different parts and illustrate separately


to get response fast, you need to create content fast. sometimes it is important to degrade aspects like resolution and more.


https://github.com/audio-agi/audiosep

changedetection get website updates

longlora

自动删评机器人:

根据对类似内容的评论预期(结合大模型)进行删除

根据情感分析删除不利评论


根据视频内容 标题调整封面人物表情


造仿制b站 收集用户点击数据


b站禁用了手机端搜索接口的页面数目,但没有限制电脑端搜索页面数目,估计是为了避免二创素材收集不受影响。


deep danbooru for anime comprehension, scene understanding etc


Do not use removable drives (NAS?) as scratch/data disk for long-term programs. Instead, use internal drives.

If possible, first sync all necessary files from removable drives to internal disk, then run the long-running program from there, or first run from rootfs, then jump to removable drives. When error occurs, check if disk is unmounted, try different methods to reload them and rerun the program.

Another possible cause is insufficient current. Check for similar issues online.


短视频内部模拟不同视频上下翻页效果


如果封面提取关键词进行训练并更换 尝试自己生成配图和文字的方法难以实现 可以考虑更换字体和色彩 文字内容不变

封面文字遵循一定的阅读顺序 从上到下 从左到右 可以设立排版以及合并(用于训练)规则


youtube now can auto generate video chapters, which might be part of text/video summarization.

multinational, multilingual, subtitles, localizations


models for content generation:

one-for-all multi-modal generation

magma: a GPT-style multimodal model that can understand any combination of images and language


视频创作导航 可以用来找自动化创作视频的思路

字由 识别字体 下载免费字体

Startup Toolbox

tools for startup companies, including:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[Website]
[Design]
[Support and customer communication]
[Payments, billing and distribution]
[User Analytics and Reporting]
[Business Analytics]
[Automation]
[HR and Payroll]
[Forms and Surveys]
[Tech]
[Product building]
[Marketing and growth]
[Collaboration]
[Build a chatbot]
[Domains and naming]
[Legal, Account and Invoicing]
[Funding]
[Sales]
[Communities]
[Learn]
[A/B testing]
[Launch]
[Other]

Awesome Streaming

a list of live streaming and content creation tools:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[Content Creation]
[Sponsors And Affiliate]
[Setup Guides]
[Branding]
[Discord Creation]
[Brand Website]
[Uncopyrighted Music]
[FPS Accuracy]
[Subathon Framework]
[Video Editing]
[OpSec]
[Stat Tracking]
[Game Development]
[V-Tubing]
[How To Start An LLC]

收集大up主的名字 根据提取出来的名词 清除文案中的含有名字的句子

for information cascade, treat recommendation system as your target data, and fit your content into the prediction of “to-view” video of popular videos

分析弹幕关键词是否与大量其他视频标题相重合

B站撞车搬运检测:

油管有带有英文番剧名称的AMV/MAD剪辑

通过B站搜索Weibo Tumblr Tiktok Youtube的视频ID 或者名称,可以找出视频是否被转载

通过搜索其他网站的前缀 比如https://www.youtube.com 可以得到被转载视频的链接 但是感觉数据不是很靠谱 不是很火的那种 要看大流量的还是得爬首页推荐链接 根据话题搜索

video highlights extraction

although you may want to train/extract that manually, it would sure be tedious and not self-updating (unless using reinforcement learning).

often we determine highlights by sound, visual and voice together. highlights often can be identified without too much context, so it can be chunk based.

bilibili

b站的高能进度条 在油管被叫做”most replayed”

b站有弹幕 所以可以根据弹幕找到精彩片段 VClimax是一个浏览器插件 可以通过弹幕单位时间增长速率,设置相关的阈值,来定位最精彩的内容 (弹幕密度怕还是得要分析) 跳转部分番剧OP 视频搞笑片段精准定位 (怕还得是要机器学习)

bilibili Danmaku Skip is another browser plugin which will identify highlights by analyzing danmaku with parameters like threshold, interval and bias

youtube

youtube’s most played data can be extracted by:

youtube-heatmap (nodejs, using puppeteer (bad!))

youtube operational api‘s (powered by shared API keys and info extractors without key), while apparantly youtube-most-replayed is using this service to retrieve the data from yt.lemonslife.com powered by this library

heatmap extractor

youtube.js (reverse engineered innertube api) added support for chapters and video heatmap

youtube-dl search youtube video

1
2
3
4
youtube-dl "ytsearch[optional_result_limit]:[query]"
# pass query url directly to allow pagination or filters
youtube-dl "https://www.youtube.com/results?search_query=how+to+create+android+app+in+android+studio&page=1"

record live streaming video, upload video

biliup & biliup-rs (commandline program)

全自动录播、投稿工具,也支持twitch、ytb频道搬运。提供分p上传b站接口

其实直播回放没有什么好看的 很单调 另外b站上传之后可以获得视频预测标签

youtube automation toolkit post same content to multiple platforms: bilibili, douyin, douyu, instagram, reddit, spotify, tiktok, twitch

though the idea is correct by posting original content to multiple platforms to prevent pirating, but the description/title generation is a vital part of the process, which must be done intelligibly (AI or human). as for now the repo is just full of links. if you want tools you click given link.

Download a portion of video

yt-dlp (latest)

check pyjom/tests/download_sections_video_portion_partial_download_youtube_yt_dlp_bilibili/test_bilibili.sh for advanced usage of yt-dlp and more on bilibili parsing.

SINCE YT-DLP IS UPDATED YOU CAN USE --download-sections ARGUMENT FOR YOUTUBE

If you want to download multiple sections of same video, you must specify video output format string via -o

But when using that without “–force-keyframes-at-cuts” (skip re-encoding which can speed up thing but not ensuring quality of video at tail), you better keep margin at tail for 10 seconds (could glitch at last 5 seconds) and head for 5 seconds (maybe head margin is not needed?).

youtube-dl

first acquire download url: youtube-dl [--youtube-skip-dash-manifest] [-f 18] -g "https://www.youtube.com/watch?v=V_f2QkBdbRI" (you need to force the format.)

then use ffmpeg with the url to chop the slice: ffmpeg -ss 00:00:15.00 -i "OUTPUT-OF-FIRST URL" -t 00:00:10.00 -c copy out.mp4

RangeDownloader by A-Soul-Database

A-Soul-Database is a live-streaming replay record database designed for vtubers, organized in some way for easy information retrieval.

RangeDownloader is acting like a server, though sometimes we are not sure how fast(er) it can really be.

Viral videos

Data sources and monitors

Rebang.today

通过B站推荐标签接口可以得到观众的实时需求

提供全站 知乎 微博 IT之家 百度 虎扑 直播吧 少数派 36氪 吾爱破解 天涯 小众软件 反斗限免 哔哩哔哩 抖音 技术期刊 v2ex GitHub的热榜

API:https://api.rebang.today/v1/items?tab=<TAB_NAME>&page=<PAGE_NUM> (potential parameters: sub_tab, date_type (default:now))

知乎有个专门的热榜,地址:https://www.zhihu.com/billboard

知乎圆桌 知乎发现

name tab
全站热榜 top-all
微博 weibo
V2EX v2ex
知乎 zhihu
哔哩哔哩 bilibili
GitHub github
抖音 douyin
技术期刊 journal-tech
虎扑 hupu
少数派 sspai
百度 baidu
36氪 36kr
天涯 tianya
吾爱破解 52pojie
IT之家 ithome
全站24小时 top-daylong
直播吧 zhibo8
小众软件 appinn
反斗限免 apprcn

番茄数据

番茄数据提供了从近24小时–近90天B站的最新热门视频,你既可以通过“搜索标题、简介、标签、评论出现关键词”,也可以通过行业分类、播放数、点赞数、投币数、视频时长、观众画像等高级条件,精准定位想要查找领域的热门视频。传播指数的计算方法有待研究。传播指数是根据UP主的粉丝数、视频点赞数、播放数、投币数、分析数等分析出来的综合得分。根据评论热词分析视频观众热点,根据评论用户分析用户画像。

对于各大数据网站 都存在一个收录的接口 如果up主从来没有上过首页 大概率不会被收录 需要手动提交 间接说明大up主是如何被找到的

Image recognizers

Baidu Image recognizer 百度识图

与此相关的识图项目位置:pyjom/tests/search_engine_suggestion_based_qa_bot

可以获取关键字,标签,同样的图,秒懂百科视频,百度百科数据,包含图片的信息,颜值信息

通过把上传接口修改 以及http改成https 现在可以继续使用

位置:pyjom/tests/viral_video_experiments/BaiduSerchImgApi

通过http改成https 修改好了360识图的接口

位置:tests/viral_video_experiments/360ImageSearch

Video collectors

tiktok compilation video generator

collect popular video on tiktok by multiple filters such as hashtags, categories , popularities and search queries

WeiboSpider

需要cookie 收集用户信息 用户粉丝列表 用户关注列表 微博采集 微博评论采集 微博转发采集 基于关键词的微博检索

botTuber: a instagram compilation reposter to youtube

Using instaloader and instalooter, it can download videos from instagram. It merges a series of video and add intro and outro. It only contains one default title starts with “TRY NOT TO LAUGH” in its “auto” mode.

reddit hot videos to youtube

In “TiktokCringe” reddit channel, we are able to get hot posts and video links prefixed by https://v.redd.it (from tiktok to reddit) in json format: https://www.reddit.com/r/TikTokCringe/hot.json?limit=12. This link looks like some API or subscription. Maybe Bilibili and other sites have similar “hot” json urls. The way to extract video links is in atmt.sh. It adds transitions to every video clip.

Video editors

vced

i think it needs to be fine-tuned on large diversive training data.

VCED 可以通过你的文字描述来自动识别视频中相符合的片段进行视频剪辑。该项目基于跨模态搜索与向量检索技术搭建,通过前后端分离的模式,帮助你快速的接触新一代搜索技术。

videofy

it is a self-hosted service, summarize article, get relevant text and image, determine mood to select BGM. try for yourself!

meme video maker

It uses google cloud to select “English” words on image, enable the user to edit the “stage” to show meme step by step.

It requires amazon cloud services and google cloud services.

回声工坊 TRPG Replay Generator

TRPG:桌上角色扮演游戏 有丢骰子(随机元素)的RPG

角色立绘可以是动态的 但是是多个png文件

背景可以设定为 {'black','white','greenscreen'} 中的一个,以建立纯色背景

Has special requirements to media sources. Use .ogg format for BGM. Use .wav format for AFX and voices. Use .png for image. Cannot get background video layers working so you might consider some “green screen” effects.

Use --ExportVideo flag to export video without GUI.

openshot and libopenshot (for python bindings)

Nightcorify is accelerating audio and raising pitch (asetrate strenches the timeline to change sample rate, atempo (not used in here) change the timeline but not the pitch, while aresample changes the sample rate but not timeline), also showing audio wave shape with showwaves.

This library is complex AND WITHOUT PROPER DOC FOR PYTHON thus not recommend for using

keybert and summarization transformer pipeline

Check docs on transformers pipelines for default and fine-tuned task-specific models for each pipeline.

Keybert uses “sentence-transformers”. The author would advise either “all-MiniLM-L6-v2” for English documents or “paraphrase-multilingual-MiniLM-L12-v2” for multi-lingual documents or any other language. Search for “multi” with tag “summarization in huggingface, then you would get huge models. A mT5 model is very large, size upto 2.33GB

Keywords-image pairs can be used for CLIP model training.

watson based video maker

It first downloads wikipedia content from algorithmia, then uses regex to filter out unwanted parts, uses watson AI for sentence cutting, set a limit for max sentences (notice: not summarization), then search image with keywords, finally create video.

In another similar project IMDB (Popular Film/TV series) and Google search trends (as RSS) are included.

Auto-Editor

By passing --edit option, you can remove unwanted parts identified by motion or audio (can be combined). It can import clip with manual “cut-out”. It can export as json.

Pictory

Leveraging 3 millions of tagged video clips and audio, choosing most semantically similar clip to current scene (by extracting keyword -> search images -> compare images to video sources with all embedding things going under the hood (CLIP)), map video word by word to the timeline (to create extractive highlights and remove unwanted words like “um”)

Wisecut

Short videos can attract your viewers and converting them into followers (to view more of your long videos). Make short videos with music, subtitles and facial recognition auto-reframe (detect main speaker). It match the right BGM with the type of content, with audio ducking, which can be achieved with ffmpeg or editly.

It is listed among an AI marketing tools list, which mentions copywriting, social media/email/blog marketing text/content generation (like copy.ai), text to video

Jumpcutter

An audio-slience based video cutter. In jumpcut_file.py it chops audio into chunks and decide if it is slience or not. The core logic is to first compare max volume of each chunk against threshold, then check in neighbors of every chunk if all of them are slient and cut them out. It has audio speed changing methods from audiotsm.

In another implementation, it uses ring buffer by collections.deque and applies VAD (Voice Activity Detetion) by webrtcvad to every chunk of audio.

Gifcurry

Adding text to video, has typing effects, written in haskell. You can add -m flag to export video instead of GIF.

Backgroundremover

A commandline tool powered by torch, removing background from images and video

Moviepy most loved commandline video editor?

There are some cool text effects called “Text with moving letters” (PPT-like), and a dancing video generator based on tempo finder and video loop maker, which can help you adjust video speed according to video period and music bpm. The Star-Wars Text Effect reminds me of easing functions used with page scrolling.

Data collect/analyze

Social media statistics are time series data which should be collected regularly and predictable with time forcasting models.

open-sir

Use sirx over sir.

I think it is hard to use. Many “presumed” parameters are out there. It can fit “reproduction rate” but no individual “alpha” and “beta” values.

In tradirional SIR models, beta is infection rate, gamma is recovery rate. While in open-sir it is different. alpha now is beta, beta now is gamma.

Youtube Viral Video Machine Learning Analysis

Refer to this document for details in data collection and machine learning methods.

Usage:

You can decide whether to copy a video or not when it is posted for only a few days.

Dataset creation:

Monitoring video right at the time it is posted, monitor for a few days, calculate features, then wait for a month or two (it must stablize then), judge the video is viral or not by view counts.

Using multiple machine learning techniques, there are some top features matters the most for viral video forecasting (though you can derive your own by collecting more data (like the follower-view theory if applied), and beware if your video all sucks, you may not get an accurate model out of your data alone):

Rank Feature Name Importance
1 views_acc 12%
2 views_1 11%
3 ageRatioReviews_1 9%
4 video_duration 9%
5 comments_1 5%
6 channel_uploads 5%
7 ageRatioLikes_1 4%
8 comments_acc 4%
9 channel_views 4%
10 comments_sentiment_compound 3%

ViralCaster

TitleParser.py analyses views along with words, getting the most “popular” word or word combinations. It has demo data. It generates “max” “min” “mean” views related to single word or word combinations.

Predictube

peak_detection.py use daily view count to categorize and identify trends. “MonoIncr” might be our desired category.

Video Viralization Tool

It uses relative infection ratio instead of absolute to predict the trend. By “information cascade” it means statistics can be used to predict future view counts. It considers individuals and viewers as nodes. It suggests different relationships between parameters in SIR model and data (likes, shares, comments, new subscribers, subscribers, length, quality, tag keywords, description keywords).

Read More

2022-10-24
关于伪原创的方法总结 自动软文生成器 一键生成软文 伪原创 文案生成器 自动生成软文

这些是paraphrase相关的关键字 国外的工具相当齐全了 summarizer paraphraser 看相关的paraphrase的供应商就知道该用什么工具

国内现成的平台比较多 工具可能欠探索 所以在这里罗列关键字 方便搜索

搜索相关视频 文章 提取关键字 摘要 然后拼接

文章伪原创API 即将关站 站内关于伪原创的方法总结

众所周知,百度搜索引擎现在对网站内容质量的要求越来越高。如果一个网站的内容质量差,即使有很多外部链接和高质量的外部链接,它通常也不会得到很高的排名,因为内容质量差的网站往往有很高的跳转率,这已经成为百度排名算法的一个重要元素

然而,制作一个网站的少量原创内容并不困难,但是对于任何一个草根站长来说,每天更新都是非常困难的,尤其是对于一些垂直行业的网站。由于这个行业的内容是相对固定的,发布原创内容就更加困难,所以伪原创是一个重要的方式。然而,传统的伪原创方法已经难以提高内容质量,这将使网站成为垃圾网站。因此,从发展的角度来看,伪原创的质量更难提高。

那么我们如何才能有效地提高伪原创内容的质量呢?我认为我们可以从以下几个方面入手,使伪原始内容和原始内容的质量相等。

第一,伪原创的创新并购方式。

我们知道伪原创通常在网上寻找一些内容,然后改变标题,混淆文章的段落,甚至用伪原创工具替换同义词,导致伪原创内容可读性差。因此,我们应该放弃这种伪原始方法,整合相关内容,并用我们自己的语言重新组织它。在梳理的过程中,我们可以结合相关内容进行一定的观点创新,使这种伪原创的内容呈现出新的思路。

当合并相关内容时,我们必须确保第一段和最后一段都是原始内容,并在这两个地方建立您的中心内容。这个中心内容通常可以与不同概念的集成结合在一起。如果站长此时满脑子都是想法,有自己独立的想法,他也可以写出来,这样伪原创内容的质量就可以得到有效的保证。即使此时文本中有一些相似度很高的内容,百度也不会反感。

二是内容与科学收藏的整合。

我们知道互联网上的一些内容和市场上销售的书籍有一定的相关性,但它们不可能完全一样,否则这些书会被复制,所以我们可以把这些书的内容搬到互联网上,进行一些优化和创新,然后把它们转化成非常好的原创内容,这些内容具有很好的可读性和知识性,成为百度蜘蛛最喜欢的内容餐。

另一种是整合互联网的现有内容,例如,制作一些论坛发帖的百科全书、游戏策略的百科全书等。这些内容往往不需要原创,只需要在网上收集相关内容,然后混合在一起,就可以形成非常有参考价值的内容。此外,这些内容也是百度蜘蛛最喜欢的食物,它有望成为百度主页的常客。

第三,等价交换法

(1)文本排序法:如果你随意拿这篇文章“游戏编辑写虚假原创文章的五大技巧”,如何做等价交换法?对等交换可以通过同义词和打乱标题关键词的顺序来实现。您可以将其更改为“游戏编辑撰写虚假原创文章的五大技巧”和“协助游戏编辑撰写虚假原创文章的五大技巧”。你可以看到标题被巧妙地改变了,但是意思没有变。这是等价交换法。

②数字交流方法:如标题:五种伪主动性技能。你可以停止移除一些伪主动性技能或者增加一些伪主动性技能。至少,你可以让搜索引擎至少认为你的标题是非传统的。

(3)换词法:看图造义是指交换词语的相关或同义词,从而达到变汤不换药的效果。

第四,标题组合法

组合方法是使用上面总结的三种方法或两种方法。例如,网站管理员网站中的一篇文章的标题“网站管理员如何进行网站营销分析并制定策略”,可以改为“进行网络营销分析的好策略”,其中使用了等价交换法和文本修改法。

五、文本修改方法

当标题准确时,我们可以进行一定的加工和修饰,如加问句、反问句、比较级、隐喻、拟人,并与原标题完全分离,从而增加标题的影响力。例如,“五种伪主动性技能”可以改为“五种伪主动性技能有用吗”?

第六,标题与内容相关

标题的修改是为了减少搜索引擎中的重复,而不是在修改后改变原文的意思,这样就失去了伪主动性的初衷。不管如何停止修改标题,首先,要忠于原来的标题;第二,我们应该参与更适合读者需求的特色。只有这样,我们才能达到伪主动性的意想不到的结果。

七、文本内容修正方法

1.第一段总结:为我写第一段,就像引言一样。如果你有这种精神,阅读完整的文本做一个总结,并把它放在头版。如果你觉得你没有时间阅读它,这也很简单:我编辑了它,必须把它整合到我网站的关键词中;

②在文本中插入链接锚文本:我想每个人都知道锚文本的作用,它可以帮助提高相关关键词的排名,并且可以在别人收集你的资料时收集锚文本链接,这相当于给你增加了一个外部链条:如果你收集我,我会申请你,这是公平的。每200-300字,可以适当增加2-3个锚文本链接;

3、尾部总结法:总结整篇文章,其实,关于搜索引擎优化,不仅仅是这些内容,还有小技巧必须注意,玩搜索引擎是一项细致的工作,所以你不仅会做,而且会考虑它。有快速进步和进步的能力,绕过班级;

④新图片:每个人都会知道一张图片胜过千言万语。当然,目前大多数本地搜索引擎不能读取图片的内容,但是图片中的alt属性可以停止标注,这将给搜索引擎一个新的外观,认为您的内容是新的和包含的;

⑤段落交换法:这种方法是为了停止内容交换,但注意不要影响原文的阅读。尤其是一种操作方法,绝对不能使用,否则,你知道的。因此,这种方法并不符合一切,应该避免逻辑文章。

上述伪原创方法可以有效提高内容创作速度,同时拓展内容创作空间,但在进行伪原创时必须注意内容的可读性,而不是简单地用软件替换同义词,打乱段落。尝试将自己的一些观点融入到伪原创中,让伪原创中的内容有一个新的生命,从而为网站质量的提高做出更大的贡献。

Read More

2022-06-11
现代 后现代 Vtuber 流行趋势

把内容生产者的内容拿来训练 当模板 生成类似内容

把大众的评论 互动 弹幕拿来当鉴别器的训练资料 聚类观众 建立观众群体和内容的对应关系

现代就是烂大街的内容 大家都接受

后现代就是一部分人接受 一部分不接受的内容 有流行的趋势

将现代的内容(已经流行过的内容)作为素材 将后现代(将要流行 或者根据以往经验生成的模板)作为模板 合成内容 可以满足后现代群体需要 具有流行趋势

Read More

2022-05-16
The Interesting Life

The Interesting Life: Increase Video/Essay/Post Views

Interesting is defined as the amount of changes. The more you change the more interesting it will be.

Find and summarize essay by conjunctives, describe the relationship between each segments, arrange them by conjunctive templates.

文案生成器 自媒体话术

Read More