2024-05-13

Telegram search engines

You can directly use Google as telegram search engine, like indexed part of Telegram itself site:t.me, and on external aggregators, for example site:tgstat.ru or site:telemetr.me

2023-01-15

Library Genesis, Getting Latest Ebooks For Free

a russian ebook search engine

don’t get confused. it is always in foreign language. always not “educational” or “scanned”.

query like “national geographic” will be perfect. books from amazon kindle will also be potentially good.

2022-12-13

Lazero Search Engine Update Logic

docprompting generate code from doc retrieval, using tldr and CoNaLa for training code generation from prompt

ColBERT and RoBERTa for document retrieval and embedding

the update process shall be atomic. when the update is successful, there should be a file created under index directory. always check the newest index first. cleanup unusable/incompatible indexs.

if there’s no previous compatible index present, make index from group up, clean up incompatible index if necessary. if previous compatible index is found, decompose it into small groups, waiting for merge and update.

first checksum all files along with file names. if file is present with matched checksum, don’t touch it, or either remove it from index, create new index or replace index.

next create or merge file list.

then we scan those new files then act accordingly to our index.

finally we merge our index, save to a different place, place the flag, remove the flag of old index then remove old index completely. if merge is not possible for huge datasource, we perform search in minibatches.

Search Engines DIY

my custom search engine built upon thesaurus/synonyms/antenyms, fzf and grep

RETRO retrieval based attention net, though using faiss, unclear if it is search related. on page 8 of the paper there are different retrieval based models for selections. LDA (topic modeling) can assist search by discovering similar topics.

download nltk data here. when downloading manually, beware of the url path and id, so you would put things in order.

you would patch nltk in order to download via proxy. these data files are hosted on github assets.

check keyword urlopen and filedownloder.py under /data/data/com.termux/files/usr/lib/python3.10/site-packages/nltk

maybe you can explore further with online search engines? select your keyword then search again.

thesaurus will slow down things. make it into a preprocessor.

related shits can be found here

search engine optimization

advertools

zinc search

markuplm markup language model used for feature rich information extraction, webqa, arxiv paper: reading wikipedia to answer open domain questions

zinc search, go implementation of elastic search alternative

I bet there are many many alternatives. even for a relational database or graph database it can be a search engine by its nature.

how the heck can i search my own notes? slice it into little segments? standard excerpt included.

search for search engine in github.

search engines are related to spiders/crawlers.

how to utilize these search engines is a problem/challenge. use url filters, generic extractors, readbility.js, summarizers like sumy.

many specialized search engines that can search image, video and audio. one example is Jina

semantic search tool, multimedia search tool, neural search tool

https://github.com/searxng/searxng

parse popular search engine results like baidu, bing:

https://github.com/bisohns/search-engine-parser

search and scrape news

https://github.com/01joy/news-search-engine

image search engine

https://github.com/matsui528/sis

search engines used by hackers, social engineering, onion sites:

https://github.com/edoardottt/awesome-hacker-search-engines

search engine with customized recommendation:

https://github.com/mtianyan/FunpySpiderSearchEngine

seo tools 百度下拉词获取推荐词相关词

https://github.com/marcobiedermann/search-engine-optimization

a self-hosted search engine that can be deployed on heroku, google alike:

https://github.com/benbusby/whoogle-search

txtai:

semantic search tool

pip3 install txtai

using sentence-transformer models from huggingface sentence embedding

https://github.com/neuml/txtai

yacy:

distributed search engine circumvent censorship

provide rss feeds

searx:

meta search engine self-hosted

has third-party hosted searx websites avaliable:

https://searx.space/ total 83 online(currently)

mwmbl:

distributed crawler central search engine, can be self-hosted

written in python

video search engine:

generate summary from frames

https://github.com/AkshatSh/VideoSearchEngine

yuno:

context based search engine for anime, anime search engine with transformer and deep learning. text based search. more like a semantic search tool, or neural search tool.

Yuno is a context based search engine that indexes over 0.5 million anime reviews and other anime informations. To help you find anime with specific properties. This search engine will help people of r/AnimeSuggest who are looking for specific type of anime to watch.

This search engine was created to solve the problem of finding an object with specific properties and the object in this case is anime. But this search engine can be easily extended to any domain like books,movies,etc. Without the need of any kind of handcrafted dataset.

TypeSense:

dedicated client for every popular programminhg language

consume much fewer ram than meilisearch

need to write custom web interface via nodejs

upload data via client api

MeiliSearch:

good for small dataset

consume whoopy 900mb for my 9mb json dataset.

has intuitive web interface.

upload document via web post.

Search Engines

2024-05-13

Telegram search engines

2023-01-15

Library Genesis, Getting Latest Ebooks For Free

2022-12-13

Lazero Search Engine Update Logic

2022-06-08

Search Engines

Search Engines DIY

my custom search engine built upon thesaurus/synonyms/antenyms, fzf and grep

search engine optimization

Links

Search Engines

2024-05-13 Telegram search engines

2023-01-15 Library Genesis, Getting Latest Ebooks For Free

2022-12-13 Lazero Search Engine Update Logic

2022-06-08 Search Engines

Search Engines DIY

my custom search engine built upon thesaurus/synonyms/antenyms, fzf and grep

search engine optimization

Links

2024-05-13

Telegram search engines

2023-01-15

Library Genesis, Getting Latest Ebooks For Free

2022-12-13

Lazero Search Engine Update Logic

2022-06-08

Search Engines