Numpy And Pandas Deduplication
This article explains how to remove duplicates from numpy arrays using np.unique(). Additionally, it provides resources on dupandas for custom rules and the pandas drop_duplicates function.
numpy remove duplicates from array
1 | print(np.unique(ar, axis=1)) |
dupandas: remove duplicates with custom rules like levenshtein distance, spelling differences and phonetics (fuzzy maching) for english (most likely?)
1 | pip install dupandas |
1 | df.drop_duplicates(subset=['brand', 'style'], keep='last') |