This article delves into Natural Language Processing (NLP) techniques and tools, discussing methods like keyword extraction, topic modeling, and summarization. It explores popular libraries such as AllenNLP-models, BERT Lang Street, deepmatch, fuzzywuzzy, stopwordsISO, sumy, and pyTextrank, which can be utilized for various NLP tasks.

language models

allennlp-models

bert lang street

recommendation

deepmatch

fuzzywuzzy or thefuzz

fzf a commandline fuzzy matcher

iterfzf as a fzf python binding and its related projects

rapidfuzz

stopwords

1
2
from nltk.corpus import stopwords

stopwordsiso in python

summarization

sumy Simple library and command line utility for extracting summary from HTML pages or plain texts

pytextrank Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work plus related knowledge graph practices; used for for phrase extraction and lightweight extractive summarization of text documents

summa TextRank implementation for text summarization and keyword extraction in Python 3, with optimizations on the similarity function.

keyword extraction

rake-nltk RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

multi-rake Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

yake Unsupervised Approach for Automatic Keyword Extraction using Text Features

tutorial and libraries

keybert uses sentence transformer to do the job

kwx

pke Python Keyphrase Extraction module

1
2
3
4
import jieba.analyse as ana
# methods under ana:
# ['analyzer', 'default_textrank', 'default_tfidf', 'extract_tags', 'set_idf_path', 'set_stop_words', 'textrank', 'tfidf']

Comments