Keyword Extraction, Topic Modeling, Sentence Embedding
This article delves into Natural Language Processing (NLP) techniques and tools, discussing methods like keyword extraction, topic modeling, and summarization. It explores popular libraries such as AllenNLP-models, BERT Lang Street, deepmatch, fuzzywuzzy, stopwordsISO, sumy, and pyTextrank, which can be utilized for various NLP tasks.
language models
recommendation
fuzzy search
fzf a commandline fuzzy matcher
iterfzf as a fzf python binding and its related projects
stopwords
1 | from nltk.corpus import stopwords |
stopwordsiso in python
summarization
sumy Simple library and command line utility for extracting summary from HTML pages or plain texts
pytextrank Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work plus related knowledge graph practices; used for for phrase extraction and lightweight extractive summarization of text documents
summa TextRank implementation for text summarization and keyword extraction in Python 3, with optimizations on the similarity function.
keyword extraction
rake-nltk RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
multi-rake Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
yake Unsupervised Approach for Automatic Keyword Extraction using Text Features
keybert uses sentence transformer to do the job
pke Python Keyphrase Extraction module
1 | import jieba.analyse as ana |