Keyword Extraction, Topic Modeling, Sentence Embedding

NLP

keyword extraction

topic modeling

summarization

AllenNLP-models

BERT Lang Street

deepmatch

fuzzywuzzy

stopwordsISO

sumy

pyTextrank

This article delves into Natural Language Processing (NLP) techniques and tools, discussing methods like keyword extraction, topic modeling, and summarization. It explores popular libraries such as AllenNLP-models, BERT Lang Street, deepmatch, fuzzywuzzy, stopwordsISO, sumy, and pyTextrank, which can be utilized for various NLP tasks.

Published

October 29, 2022

language models

allennlp-models

bert lang street

recommendation

deepmatch

fuzzy search

fuzzywuzzy or thefuzz

fzf a commandline fuzzy matcher

iterfzf as a fzf python binding and its related projects

rapidfuzz

stopwords

from nltk.corpus import stopwords

stopwordsiso in python

summarization

sumy Simple library and command line utility for extracting summary from HTML pages or plain texts

pytextrank Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work plus related knowledge graph practices; used for for phrase extraction and lightweight extractive summarization of text documents

summa TextRank implementation for text summarization and keyword extraction in Python 3, with optimizations on the similarity function.

keyword extraction

rake-nltk RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

multi-rake Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

yake Unsupervised Approach for Automatic Keyword Extraction using Text Features

tutorial and libraries

keybert uses sentence transformer to do the job

kwx

pke Python Keyphrase Extraction module

import jieba.analyse as ana
# methods under ana:
# ['analyzer', 'default_textrank', 'default_tfidf', 'extract_tags', 'set_idf_path', 'set_stop_words', 'textrank', 'tfidf']