Numpy And Pandas Deduplication

Numpy
Duplicate removal
Arrays
np.unique()
Pandas
Drop_duplicates
Dupandas
This article explains how to remove duplicates from numpy arrays using np.unique(). Additionally, it provides resources on dupandas for custom rules and the pandas drop_duplicates function.
Published

September 17, 2022


numpy remove duplicates from array

print(np.unique(ar, axis=1))

dupandas: remove duplicates with custom rules like levenshtein distance, spelling differences and phonetics (fuzzy maching) for english (most likely?)

pip install dupandas

pandas drop_duplicates

df.drop_duplicates(subset=['brand', 'style'], keep='last')