2022-12-13
Web Scraping Logic

select targets for scraping. it could be your browsing history, package indexs, social media (dynamic contents, with different accessing methods than web scraping)

if not accessible, access it with proxies, cookies.

finally store the content into compat and usable formats, categorized and linked

Read More

2022-12-06
Mirror Sites Change

if it only blocks a range of ip, you use proxy to avoid this constraint.

some mirror sites serves us poorly and block access from us. we point them out, list alternatives and provide quick fixes.

these actions are intentionally done against specific group of people. it does block a whole range of IPs.

actors:

1
2
3
https://mirrors.aliyun.com
https://mirrors.tuna.tsinghua.edu.cn/

fixes:

currently we use some previously picked up tunnel accounts provided by topsap. may fix this problem?

python pip:

1
2
pip3 config set global.index-url https://mirrors.ustc.edu.cn/pypi/web/simple

taobao npm mirror:

1
2
3
http://npm.taobao.org => http://npmmirror.com
http://registry.npm.taobao.org => http://registry.npmmirror.com

Read More

2022-10-07
Async Requests With Python, Used For Clash Multiple Proxy Delays

async client

aiohttp

aiohttp-requests

requests-async

http3 (requests-async successor)

many_requests

curequests

asks

trip

request-futures

async non-blocking server

trequests

Read More