Autonomous Machines & Society.

2022-09-12

Audio And Music Tools

Audio and music processing tools.

AI Apps for Audio

AudioStellar

GuitarML - Zak Jost Audio Enhancement Machine Learning
tyiannak/pyAudioAnalysis: Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
librosa — librosa 0.8.1 documentation python package for music and audio analysis
Yaafe/Yaafe: Audio features extraction
Ask HN: AI-Generated Music? | Hacker News

AI Music Separation

Open Source Tools and Data for Music Source Separation License: CC-BY-NC
Spleeter, open source music separation library from Deezer, from Accapella, Melody, Moises
ISSE
SUDO RM RF : SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures
Deep Audio : Audio Source Separation Without Any Training Data

AI Music Recognition

Audio Tag for recognize music

Audio Creation

Audio and Music Programming

Alda – Text-Based Programming Language for Music Composition - Hacker News
maximecb/noisecraft: Browser-based visual programming language and platform for sound synthesis.
helio.fm libre music composition software
Faust Programming Language

AI Audio and Music Generator

Beepbox : BeepBox is an online tool for sketching and sharing instrumental melodies.
Jummbox : Jummbox is Beepbox modification (online tool for sketching and sharing chip tune melodies).

WASM Audio Processing

Amped Studio : audio processing, need login
Soundation : audio processing, need login
ogv.js : audio processing
Audio Processing
Learning Syhthesizer and Sound
Music Coding
WASM Synth WASM Music Synthesizer

Audio Player

Online Audio Player

captbaritone/webamp: Winamp 2 reimplemented for the browser

Peer to Peer Audio

SonoBus - peer to peer music

Audio Editor

Desktop Audio Editor

Audacity, #opensource

Online Audio Editor

Audiomass, GithubAudio Editor

Other Audio Tools

VB-Audio Virtual Apps audio output to audio input

Working with Virtual Audio Tools

Audio Recording - Transcription

Benefits/drawbacks:

We cannot listen the audio output from playback apps

VAC Audio Recording

Use cases:

Record Zoom Webinar to Speechtexter, Google Docs etc.

Audio Input Mixing

Use cases:

Record with multiphttps://raw.githubusercontent.com/irosyadi/vnote.image/master/vnotebook/app/music-audio-tool.md/20211023055947660_4903.pngP) to Audacity
Broadcast with multiple input (our voice + music from AIMP) to Zoom, Youtube etc.

Audio Output Mixing

Setting:

OS Sound Setting
Outphttps://raw.githubusercontent.com/irosyadi/vnote.image/master/vnotebook/app/music-audio-tool.md/20211023060503358_26770.pngine 1 VAC Microphone
VAC Repeater Setting
Wave in: Line 1 VAC Microphone
Wave out: Speaker/Headphone
Start

Use cases:

Transcribe (and listen) Zoom Seminar to Speaker and Speechtexter, Google Docs
Translate Youtube Video using Google Translate

Zoom Audio Recording/Transcription

Zoom Recording Transcription

Google Meet Audio Transcription

Open Google Meet, usihttps://raw.githubusercontent.com/irosyadi/vnote.image/master/vnotebook/app/music-audio-tool.md/20211023061101034_12558.pngsing default Mic

But we cannot do both Speechtexter and Google Docs

We cannot do both Speechtexter and Voice Note

We can do both Google Meet and Speechtexter/VoiceNote/Dictanote

We can do both Zoom and Speechtexter/VoiceNote/Dictanote

music discovery

find music by combining choices of experts

build recommendation engine using spotify api

use machine learning to predict and recommend music

netease 网易云听歌识曲

demo

self-hosted

shazam algorithm

blog: create your on shazam

findit another shazam clone

audiotag.info

portify recognize local music and add them to spotify playlist

wav_to_info.py

audio identifier

nowspinning get song info and cover art

shazam

shazamio reverse engineered shazam api with many supported functions

audiorec shazam client for linux, with cli support

midomi houndify soundhound

to get track info from soundhound (no cookie):

https://www.midomi.com/api/track?trackID=100282107076607645

the houndify music recognition api:

wss://houndify.midomi.com/

some lib for midomi found over here

it can also bridge yandex music recognition api

houndify also have api for that, but requires credit

free credits per day: 100

soundhound now requires 1 credit

soundhound python sdk

爬取分析视频素材的流程

聚焦在主流平台

主流平台就是有现成api搜索的平台

如果没有api 就需要用playwright 但是可能耗时更长也更加偏离找素材的关键

目前主流平台都有现成高级api可以对接如果找不到api则可能说明不是主流的平台

主流平台分为主流媒体和搜索引擎两大类

如果需要搜索小众的平台建议对接搜索引擎加入高级搜索参数搜出来的链接拿来分析看能不能直接下载

api应该具有的功能

搜索高定制搜索可以做到不重复

视频信息（时长播放量简介封面字幕标题标签评论）提取

相关视频推荐提取首页推荐提取

热榜热搜提取搜索补全提取

下载视频尽量无水印字幕

如果是要发布内容的平台则需要有上传功能

上传视频封面简介标签合集信息字幕

上传文章图片

用什么关键词

找到最合适的适合当前生成框架的素材需要自己去尝试总结

当然也可以用关键字和评论视频播放量反馈机制寻找合适的关键词或者是神经网络机器学习或者是图数据库推荐算法

标签

关键词 ->视频 -> 同类视频

某个观众 -> 同个作者

如何分析视频

首先要裁剪画中画再去除水印去文字提高画质提高帧数如果需要提取识别字幕就需要在指定区域识别语音识别如果要做就需要分离人声检测文字流畅度（对于外文或者歌曲可能不会很流畅）

根据一定的标准筛选裁剪时长和画布比如时长音量光流文字面积是否有人像人物的动作幅度

如果要分离人声一般要配合相应的字幕还得变声检测说话人有几个（如果多人说话语音识别可能不会正常工作文字流畅度低）是男是女

如果需要音乐 BGM 一般不直接从视频里面提取而是从简介里面找到关键字拿到专门的音乐平台去搜索音乐也可能需要筛选一下根据类别和播放量评论反馈筛选

2022-09-12

Use Pyscenedetect Dynamically In Program

对于单纯拼接起来的视频这个算法就如同手术刀一样精准

首尾有可能有一两帧看起来不太对但是可以通过调节start和end来修正

警惕频繁转场的视频它们可能是属于同一个小片段的但是如果不打乱顺序有可能会触发版权识别问题

即使选出来了可以使用的片段对于同一个视频的制作过程依旧要隔一段时间采样 比如两个片段间隔至少5秒 不要单纯的把所有片段一次性提取出来避免内容重复和版权问题

当然对于有渐变转场的视频可能需要用其他的检测方法

from scenedetect import open_video, SceneManager, split_video_ffmpeg
from scenedetect.detectors import ContentDetector
from scenedetect.video_splitter import split_video_ffmpeg
def split_video_into_scenes(video_path, threshold=27.0):
# Open our video, create a scene manager, and add a detector.
video = open_video(video_path)
scene_manager = SceneManager()
scene_manager.add_detector(
ContentDetector(threshold=threshold))
scene_manager.detect_scenes(video, show_progress=True)
scene_list = scene_manager.get_scene_list()
split_video_ffmpeg(video_path, scene_list, show_progress=True)

2022-09-12

Elinks/Lynx With Python: How To Speed Up Headless Website Browsing/Parsing/Scraping With Cookies

newscrawl 狠心开源企业级舆情新闻爬虫项目：支持任意数量爬虫一键运行、爬虫定时任务、爬虫批量删除；爬虫一键部署；爬虫监控可视化; 配置集群爬虫分配策略；👉 现成的docker一键部署文档已为大家踩坑

general news extractor for extracting main content of news, articles

1 2	pip3 install gne

first of all, set it up with a normal user agent

even better, we can chain it with some customized headless puppeteer/phantomjs (do not load video data), dump the dom when ready, and use elinks/lynx to analyze the dom tree.

to test if the recommendation bar shows up:

https://v.qq.com/x/page/m0847y71q98.html

to make web page more readable:

https://github.com/luin/readability

load webpage headlessly:

https://github.com/jsdom/jsdom

https://github.com/ryanpetrello/python-zombie

ffmpeg one liners

For all snippets, check documentation for details and settings.

speed up ffmpeg encoding

ffmpeg speedup cli flags

ffmpeg -threads 4 -crf 28 -preset ultrafast

encode video from a V4L2 device, using specified settings.

x265 worked somewhat better here and produced less skips (although uses 10x CPU compared to x264)

ffmpeg -f video4linux2 -framerate 30 -input_format mjpeg -video_size 1920x1080 -i /dev/video6 -c:v libx265 -preset ultrafast -c:a none -crf 20 out.mp4

Convert a raw YUYV422 frame from my USB “microscope” to PNG:

other valid pixel formats are e.g. rgb24 or yuv420p

ffmpeg -f rawvideo -video_size 2592x1944 -pixel_format yuyv422 -i input_yuyv422_2592x1944.dat -f image2 output.png

copy a portion of a video, copying and not recoding. Might need to use the same container as the input

ffmpeg -i $input -ss $seek_to_seconds -t $output_length -c:v copy -c:a copy $output

Export and show h.264 MVs

ffplay -flags2 +export_mvs input.mkv -vf codecview=mv=pf+bf+bb

Show motion vector estimate on any input:

ffplay $input -vf mestimate=epzs:mb_size=16:search_param=32,codecview=mv=pf+bf+bb

Select one frame every 10, set presentation time stamp to 10x (0.1*PTS), deshake, do not copy audio

ffmpeg -i MOV_3147.mp4 -vf ‘select=not(mod(n,10))’,setpts=0.1*PTS,deshake=edge=blank:rx=64:ry=64:blocksize=4:contrast=31 -tune grain -crf 17 -an wolken-2-deshake.mkv

Extreme high quality deshake:

ffmpeg -i input.mkv -vf ‘select=not(mod(n,20))’,setpts=0.05*PTS,mestimate=hexbs,vidstabdetect=shakiness=10:result=transforms.trf

ffmpeg -i input.mkv -vf ‘select=not(mod(n,20))’,setpts=0.05*PTS,mestimate=hexbs,vidstabtransform=crop=black:smoothing=0:optzoom=0

or for pass 2

ffmpeg -i MOV_3147.mp4 -vf ‘select=not(mod(n,20))’,setpts=0.05*PTS,vidstabtransform=crop=black:smoothing=180:optzoom=0:interpol=bicubic -an -vcodec libx265 -crf 16 -tune grain wolken-2-deshake.mkv

Tonemap a ITU.2020 (HDR, high gamut, ususally 4K) video to ITU.709 (1080p)

Also see: https://stevens.li/guides/video/converting-hdr-to-sdr-with-ffmpeg/

ffmpeg -i file.mkv -vf zscale=t=linear:npl=100,format=gbrpf32le,zscale=p=bt709,tonemap=tonemap=hable,zscale=t=bt709:m=bt709:r=tv,format=yuv420p -crf 20 -acodec copy output.mkv

Extract all frames as .jpg

ffmpeg -r 1 -i file.mp4 -r 1 frames_%05d.jpg

Create video from all the frames (missing any codec spec):

ffmpeg -r 30 -i frames_%05d.jpg output.mp4

XXX the following where copied from

https://hbish.com/ffmpeg-one-liners/

Get infomation for a audio/video

ffmpeg -i file.mp3

Convert video into images

ffmpeg -i video.avi image_output%d.jpg

Split audio files

Generate a section of the audio from the 30 second mark (start) for 15 seconds (duration)

ffmpeg -f mp3 -i input.mp3 -t 00:00:30 -ss 00:00:15 output.mp3

Covert avi to animated gif

ffmpeg -i video.avi output.gif

Add audio to a video file

ffmpeg -i music.mp3 -i video.avi output.mpg

Extract audio from video file”>## Useful for extracting music from youtube videos

avi to mp3

ffmpeg -i video.avi -vn -ar 44100 -ac 2 -ab 192k -f mp3 output.mp3

flv to mp3

ffmpeg -i video.flv -ar 44100 -ac 2 -ab 192k -f mp3 output.mp3

motion verctor estimation, motion vector export, ffmpeg advanced usage

use cases

to detect hard-coded subtitles, crop the region and detect sudden changes

can also use pyscenedetect to do the job

pyav

docs

1 2	pip3 install av

remove/detect silence

… silencedetect A->A Detect silence.

… silenceremove A->A Remove silence.

frame interpolate

ffmpeg -y -i "/root/Desktop/works/pyjom/tests/random_giphy_gifs/samoyed.gif" \
-vf "minterpolate,scale=w=iw*2:h=ih*2:flags=lanczos,hqdn3d" \
-r 60 ffmpeg_samoyed.mp4

motion estimation

to get mosaic motion vectors and visualize:

ffmpeg -i "/root/Desktop/works/pyjom/tests/random_giphy_gifs/samoyed.gif" \
-vf "mestimate=epzs:mb_size=16:search_param=7, codecview=mv=pf+bf+bb"  \
mestimate_output.mp4 -y

get help

on specific filter:

1 2	ffmpeg -h filter=showspectrumpic

on all filters:

1 2	ffmpeg -filters

crop detection, picture in picture (PIP) detection

ffmpeg -i "/root/Desktop/works/pyjom/samples/video/LiEIfnsvn.mp4" \
-vf "mestimate,cropdetect=mode=mvedges,metadata=mode=print" \
-f null -

scene change detection

ffmpeg -hide_banner -i "$file" -an \
-filter:v \
"select='gt(scene,0.2)',showinfo" \
-f null \
- 2>&1

extract motion vectors

ffmpeg can produce motion vector estimation but it is not exportable, only for internal use.

mp4 format provides motion vector information thus maybe we need not to use GPU to get those ‘optical flow’ data.

extract by using ffmpeg apis

mv-extractor Extract frames and motion vectors from H.264 and MPEG-4 encoded video.

extract from mp4 file

mpegflow for easy extraction of motion vectors stored in video files

mv-tractus: A simple tool to extract motion vectors from h264 encoded videos.

take screenshot at time:

1 2	ffmpeg -ss 01:10:35 -i invideo.mp4 -vframes 1 -q:v 3 screenshot.jpg

video denoise filters:

dctdnoiz fftdnoiz hqdn3d nlmeans owdenoise removegrain vaguedenoiser nlmeans_opencl yaepblur

super-resolution, resampling:

deeplearning model, tensorflow

env LD_LIBRARY_PATH=/root/anaconda3/pkgs/cudatoolkit-10.0.130-0/lib/:/root/anaconda3/pkgs/cudnn-7.6.5-cuda10.0_0/lib/:$LD_LIBRARY_PATH \
ffmpeg -i "/root/Desktop/works/pyjom/samples/video/LiEIfnsvn.mp4" \
-y -vf \
"sr=dnn_backend=tensorflow:model=./sr/espcn.pb,yaepblur" \
supertest.mp4

use standard scale method:

ffmpeg -y -i "/root/Desktop/works/pyjom/tests/random_giphy_gifs/samoyed.gif"\
-vf "minterpolate,scale=w=iw*2:h=ih*2:flags=lanczos,hqdn3d" \
-r 60 ffmpeg_samoyed.mp4

options:

‘fast_bilinear’

Select fast bilinear scaling algorithm.

‘bilinear’

Select bilinear scaling algorithm.

‘bicubic’

Select bicubic scaling algorithm.

‘experimental’

Select experimental scaling algorithm.

‘neighbor’

Select nearest neighbor rescaling algorithm.

‘area’

Select averaging area rescaling algorithm.

‘bicublin’

Select bicubic scaling algorithm for the luma component, bilinear for chroma components.

‘gauss’

Select Gaussian rescaling algorithm.

‘sinc’

Select sinc rescaling algorithm.

‘lanczos’

Select Lanczos rescaling algorithm. The default width (alpha) is 3 and can be changed by setting param0.

‘spline’

Select natural bicubic spline rescaling algorithm.

‘print_info’

Enable printing/debug logging.

‘accurate_rnd’

Enable accurate rounding.

‘full_chroma_int’

Enable full chroma interpolation.

‘full_chroma_inp’

Select full chroma input.

‘bitexact’

Enable bitexact output.

datasets for deeplearning, classification and evaluation

100 audio and video datasets

video datasets

youtube8m

gif ratings

official doc

vulgar rate:

r > pg-13 > pg > g

while ‘y’ means accepting all shits, not ‘youth’ nor ‘young’

implementation

giphy has many extensible apis. i guess most media platforms are all the same (complex enough), but we have to start somewhere though…

giphy has ‘clips’ now. clips are gifs with sound, just like short videos.

beta key limitations:

1000 requests per day, 42 requests per hour

~~or just use the public beta key? does that subject to the rate limit?~~

this public beta key is trashed.

1 2	var PUBLIC_BETA_API_KEY = 'dc6zaTOxFJmzC';

is this public api key? maybe it is both api and sdk key.

Gc7131jiJuvI7IdN0HZ1D7nh0ow5BU6g

api keys:

IoJVsWoxDPKBr6gOcCgOPWAB25773hqP

lTRWAEGHjB1AkfO0sk2XTdujaPB5aH7X

sdk keys:

6esYBEm9OG3wAifbBFZ2mA0Ml6Ic0rvy

sXpGFDGZs0Dv1mmNFvYaGUvYwKX0PWIh

to use api:

https://github.com/austinkelleher/giphy-api

to use sdk:

https://github.com/Giphy/giphy-js/blob/master/packages/fetch-api/README.md

find public api keys inside html:

window.GIPHY_FE_MOBILE_API_KEY = "L8eXbxrbPETZxlvgXN9kIEzQ55Df04v0"
window.GIPHY_FE_WEB_API_KEY = "Gc7131jiJuvI7IdN0HZ1D7nh0ow5BU6g"
window.GIPHY_FE_FOUR_O_FOUR_API_KEY = "MRwXFtxAnaHo3EUMrSefHWmI0eYz5aGe"
window.GIPHY_FE_STORIES_AND_GIPHY_TV_API_KEY = "3eFQvabDx69SMoOemSPiYfh9FY0nzO9x"
window.GIPHY_FE_DEFAULT_API_SERVICE_KEY = "5nt3fDeGakBKzV6lHtRM1zmEBAs6dsIc"
window.GIPHY_FE_GET_POST_HEADERS_KEY = "e0771ed7b244ec9c942bea646ad08e6bf514f51a"
window.GIPHY_FE_MEDIUM_BLOG_API_KEY = "i3dev0tcpgvcuaocfmdslony2q9er7tvfndxcszm"
window.GIPHY_FE_EMBED_KEY = "eDs1NYmCVgdHvI1x0nitWd5ClhDWMpRE"

search for ‘ear flops’ to locate the tags in ‘samoyed.html’

2022-09-10

Gfw Circumvention, Download Youtube Videos, Scrape Banned Websites

binder as colab alternative

apart from kaggle, you can also use github actions, devops and more, if only we can get the results in time with code.

github integrated ci platforms

cirrus graphql spec with artifact info

github actions api: download artifact

circleci: artifact

azure pipelines: artifacts

Blog of James Brown

2022-09-12 Audio And Music Tools

AI Apps for Audio

AI Music Separation

AI Music Recognition

Audio Creation

Audio and Music Programming

AI Audio and Music Generator

Audio and Music Creation and Sharing

WASM Audio Processing

Audio Player

Online Audio Player

Peer to Peer Audio

Audio Editor

Desktop Audio Editor

Online Audio Editor

Other Audio Tools

Working with Virtual Audio Tools

Audio Recording - Transcription

Audio Input Mixing

Audio Output Mixing

Zoom Audio Recording/Transcription

Google Meet Audio Transcription

2022-09-12 Song Recognition, Music Recognition Api, Music Discovery, Audio Search, Audio Fingerprint

music discovery

netease 网易云听歌识曲

self-hosted

audiotag.info

shazam

midomi houndify soundhound

2022-09-12 Mastering Video And Article Uploads: Tips For Cover Images, Introductions, Tags, And More

爬取 分析视频素材的流程

聚焦在主流平台

api应该具有的功能

用什么关键词

如何分析视频

2022-09-12 Use Pyscenedetect Dynamically In Program

2022-09-12 Elinks/Lynx With Python: How To Speed Up Headless Website Browsing/Parsing/Scraping With Cookies

2022-09-12 Exploring Ffmpeg'S Advanced Encoding, Conversion, And Audio Functionality

ffmpeg one liners

For all snippets, check documentation for details and settings.

speed up ffmpeg encoding

encode video from a V4L2 device, using specified settings.

x265 worked somewhat better here and produced less skips (although uses 10x CPU compared to x264)

Convert a raw YUYV422 frame from my USB “microscope” to PNG:

other valid pixel formats are e.g. rgb24 or yuv420p

copy a portion of a video, copying and not recoding. Might need to use the same container as the input

Export and show h.264 MVs

Show motion vector estimate on any input:

Select one frame every 10, set presentation time stamp to 10x (0.1*PTS), deshake, do not copy audio

Extreme high quality deshake:

or for pass 2

Tonemap a ITU.2020 (HDR, high gamut, ususally 4K) video to ITU.709 (1080p)

Also see: https://stevens.li/guides/video/converting-hdr-to-sdr-with-ffmpeg/

Extract all frames as .jpg

Create video from all the frames (missing any codec spec):

XXX the following where copied from

https://hbish.com/ffmpeg-one-liners/

Get infomation for a audio/video

Convert video into images

Split audio files

Generate a section of the audio from the 30 second mark (start) for 15 seconds (duration)

Covert avi to animated gif

Add audio to a video file

Extract audio from video file”>## Useful for extracting music from youtube videos

avi to mp3

flv to mp3

2022-09-11 Motion Vector Estimation, Motion Vector Export, Ffmpeg Advanced Usage

motion verctor estimation, motion vector export, ffmpeg advanced usage

use cases

pyav

remove/detect silence

frame interpolate

motion estimation

get help

on specific filter:

on all filters:

crop detection, picture in picture (PIP) detection

scene change detection

extract motion vectors

2022-09-12

Audio And Music Tools

2022-09-12

Song Recognition, Music Recognition Api, Music Discovery, Audio Search, Audio Fingerprint

2022-09-12

Mastering Video And Article Uploads: Tips For Cover Images, Introductions, Tags, And More

爬取分析视频素材的流程

2022-09-12

Use Pyscenedetect Dynamically In Program

2022-09-12

Elinks/Lynx With Python: How To Speed Up Headless Website Browsing/Parsing/Scraping With Cookies

2022-09-12

Exploring Ffmpeg'S Advanced Encoding, Conversion, And Audio Functionality

2022-09-11

Motion Vector Estimation, Motion Vector Export, Ffmpeg Advanced Usage

2022-09-11

Unlocking The Power Of Ai: Exploring Deep Learning Datasets And Resources

2022-09-11

Random Giphy Gifs

2022-09-10

Gfw Circumvention, Download Youtube Videos, Scrape Banned Websites