loading/transforming leaked txt files will be time-consuming. use pypy to speedup the process. use database specific batch processing method to import the data.
entity fragmentation in followthmoney is kind of for “entity recognition in multiple social platforms”, suitable for finding patterns/clients in large leaked databases.
email collector
socialscan Python library for accurately querying username and email usage on online platforms
gitscan scan for email and password (if possible) with predefined domains and rules by searching github
ghunt needs companion browser plugin to get credentials. can collect info on given email
EMAGNET collect database leaks, email and password from recent pastebin records
leaked email and data
occrp (anti corruption & crime) aleph is a bad source for getting email (anonymous/unauthorized user can only get hundreds, having no clue what the email relates to). however, it has a tool called follow the money which works with csv files and exports cypher to neo4j
you typically find links to these databases on anonfiles.com (or else), so query like site:anonfiles.com email rar in duckduckgo (no DMCA censorship)
breachedforum’s index contains “credits only” threads which requires 4-8 credits to unlock. to get credits you need to create thread (which will earn 0 credit) and get 1 credit per reply. post to trivial threads like manga.
in leakbase you earn credits and download leaked databases easier. it has official telegram bot claims to leak free databases everyday.
telegram bots
find telegram bots collection in privacy.club (only OSINT bots) and here (with many other bots)
although i find many leaked databases as torrent, but those torrent search engines usually collect video/movies instead of anything related to leaked database.
Using aMule on macOS, Kad is firewalled (2.2.1 works well said by people, but I’d not use macOS), reason unclear. Maybe on Linux or Windows it will be different.
Some (dead) links of other databases in ed2k emule format:
holehe only check if an email address is registered as account elsewhere using “forget my password” APIs.
sreg is found from a collection of security tools, which is a deprecated tool for getting registration status with phone/email/username on multiple chinese platforms.
(email) account verifier
Usually it only verify existance of given email, like emailhippo (100 requests free per day per ip), or mailforguess checking “gmail”,”laposte”,”protonmail”,”yahoo” emails
using noptcha or nopecha browser extension (free for 100 captcha solves per day) solving hcaptcha, recaptcha. this extension cannot be used with proxy.
the muumuu mail is programmatically connectable using account and password. seems it is using default ports for these services. POP3 is for both sending and receiving. IMAP is for receiving. SMTP is for sending.
this guy’s code is full of hacks. seems only being able to run on his own computer and will break on slightest errors.
he stored potential password combinations and also registered accounts (need testing, some may not work) on this google doc. you can download the sheet named “Sheet1” by this api, which adds double quotes and takes more space than exported from web interfaces, method described here.
email proxy for resending email to you, which I used for github registration (but with a very high block rate without proxy)
email aliasing for sending
icloud’s “hide my email” service seems only provide few email aliases. but according to 3rd party icloud alias generator (cannot be used for chinese version of icloud) you can generate at least 10 aliases. or use hidemyemail-api to login with pyicloud and get aliases as API service. account registered from web without logged in any apple device (maybe virtualbox -> macos has a shot?) will not have email service.
to send email from alias, you can try setting “FROM” address as your alias via smtp protocol, but the credential shall stay the same. the working approach could be platform specific
yahoo provides the most email alias up to 500, but 10 for send only emails. however to get one yahoo account one needs offshore phone numbers.
email collection, email scraping
searching for “site:pastebin.com @yahoo.com” to get some email addresses, also searching in github might help as well.
mailcat find email address by nickname (check if deliverable?)
use other’s links/contents to increase diversity and increase anonymity. put your related contents among them.
email marketing is quantity over quality. know your customers’ preferences and behaviors (language, country, life schedule (by year? month? week? time in a day?)) by linking their accounts on other platforms, telemetry.
email bulk senders are equipped with email templates, statistics (like opened or not, click data monitoring)
vary your email style and content unless you want to get blocked/trashed by servers
email templates
premail is an easy-to-use component-based build system for MJML, the email templating language
freesms: (don’t work for my phone number as recipient though, but I found some interesting projects on github relating to free SMS sending, some using OCR to crack captcha and access API)
sms auto regist is written in go, utilizing yunjiema.top for sms receiving
online sms receivers are not so reliable (not even usable for yahoo registration), and those found from google searches (like free receive sms (这个网站有反js调试 打开debugger自动暂停执行), which has simple interface for fetching data, and you can search this site on github to get more sources and potential API adaptors like disposable phonebook) have chances to get registered yahoo accounts.
Node classification: The objective here is to predict the labels of nodes by considering the labels of their neighbors.
Link prediction: In this case, the goal is to predict the relationship between various entities in a graph. This can for example be applied in prediction connections for social networks.
Graph clustering: This involves dividing the nodes of a graph into clusters. The partitioning can be done based on edge weights or edge distances or by considering the graphs as objects and grouping similar objects together.
Graph classification: This entails classifying a graph into a category. This can be applied in social network analysis and categorizing documents in natural language processing. Other applications in NLP include text classification, extracting semantic relationships between texts, and sequence labeling.
Computer vision: In the computer vision world, GNNs can be used to generate regions of interest for object detection. They can also be used in image classification whereby a scene graph is generated. The scene generation model then identifies objects in the image and the semantic relationship between them. Other applications in this field include interaction detection and region classification.