2022-09-14
Python Retry Libraries

Read More

2022-09-13
Opencv-Python Wrappers, Without Boilerplates

imutils by pyimagesearch

caer do image resizing, image processing, video loading.

documentation here

Read More

2022-09-12
Elinks/Lynx With Python: How To Speed Up Headless Website Browsing/Parsing/Scraping With Cookies

newscrawl 狠心开源企业级舆情新闻爬虫项目:支持任意数量爬虫一键运行、爬虫定时任务、爬虫批量删除;爬虫一键部署;爬虫监控可视化; 配置集群爬虫分配策略;👉 现成的docker一键部署文档已为大家踩坑

general news extractor for extracting main content of news, articles

1
2
pip3 install gne

first of all, set it up with a normal user agent

even better, we can chain it with some customized headless puppeteer/phantomjs (do not load video data), dump the dom when ready, and use elinks/lynx to analyze the dom tree.

to test if the recommendation bar shows up:

https://v.qq.com/x/page/m0847y71q98.html

to make web page more readable:

https://github.com/luin/readability

load webpage headlessly:

https://github.com/jsdom/jsdom

https://github.com/ryanpetrello/python-zombie

Read More

2022-09-08
Calling Java From Python

using jpype or pyjnius

sample code for jpype:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from jpype import *
import jpype.imports # this is needed! shit.
addClassPath("/root/Desktop/works/pyjom/tests/karaoke_effects/classpath/lingua.jar")
startJVM(getDefaultJVMPath())
java.lang.System.out.println("Calling Java Print from Python using Jpype!")
from com.github.pemistahl.lingua.api import *
# detector = LanguageDetectorBuilder.fromAllLanguages().withLowAccuracyMode().build()
detector = LanguageDetectorBuilder.fromAllLanguages().build() # 3.5GB just for detecting language! it is somehow crazy.
sample = 'hello world'
result = detector.detectLanguageOf(sample)
print(result, type(result)) # <java class 'com.github.pemistahl.lingua.api.Language'>
# but we can convert it into string.
strResult = str(result)
print(strResult, type(strResult))
import math
print("CALLING MATH: %d" % math.sqrt(4))
shutdownJVM()

sample for pyjnius:

1
2
3
4
5
6
7
8
9
10
11
12
13
import jnius_config
# jnius_config.add_options('-Xrs', '-Xmx4096')
jnius_config.set_classpath('.', "/root/Desktop/works/pyjom/tests/karaoke_effects/classpath/lingua.jar")
import jnius
jnius.autoclass('java.lang.System').out.println('Hello world')
detector = jnius.autoclass('com.github.pemistahl.lingua.api.LanguageDetectorBuilder').fromAllLanguages().build()
sample = 'hello world'
result = detector.detectLanguageOf(sample)
print(result, type(result))
# breakpoint()
strResult = result.toString()
print(strResult, type(strResult))

Read More

2022-05-31
A Python Wrapper For Ffmpeg: Simplifying Command-Line Functionality

ffmpeg python wrapper

most famous code to cli args ffmpeg python wrapper:

https://github.com/kkroening/ffmpeg-python

Read More