ML | Blog of James Brown

2022-08-09

Awesome-Data-Labeling

A curated list of awesome data labeling tools

Images

labelImg - LabelImg is a graphical image annotation tool and label object bounding boxes in images
CVAT - Powerful and efficient Computer Vision Annotion Tool
labelme - Image Polygonal Annotation with Python
VoTT - An open source annotation and labeling tool for image and video assets
imglab - A web based tool to label images for objects that can be used to train dlib or other object detectors
Yolo_mark - GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2
PixelAnnotationTool - Software that allows you to manually and quickly annotate images in directories
OpenLabeling - Label images and video for Computer Vision applications
imagetagger - An open source online platform for collaborative image labeling
Alturos.ImageAnnotation - A collaborative tool for labeling image data
deeplabel - A cross-platform image annotation tool for machine learning
MedTagger - A collaborative framework for annotating medical datasets using crowdsourcing.
Labelbox - Labelbox is the fastest way to annotate data to build and ship computer vision applications
turktool - A modern React app for scalable bounding box annotation of images
Pixie - Pixie is a GUI annotation tool which provides the bounding box, polygon, free drawing and semantic segmentation object labelling
OpenLabeler - OpenLabeler is an open source desktop application for annotating objects for AI appplications
Anno-Mage - A Semi Automatic Image Annotation Tool which helps you in annotating images by suggesting you annotations for 80 object classes using a pre-trained model
CATMAID - Collaborative Annotation Toolkit for Massive Amounts of Image Data
make-sense - makesense.ai is a free to use online tool for labelling photos
LOST - Design your own smart Image Annotation process in a web-based environment
Annotorious - A JavaScript library for image annotation.
Sloth - Tool for labeling image and video data for computer vision research.

Text

YEDDA - A Lightweight Collaborative Text Span Annotation Tool (Chunking, NER, etc.). ACL best demo nomination.
ML-Annotate - Label text data for machine learning purposes. ML-Annotate supports binary, multi-label and multi-class labeling.
TagEditor - Annotation tool for spaCy
SMART - Smarter Manual Annotation for Resource-constrained collection of Training data
PIAF - A Question-Answering annotation tool

Audio

EchoML - Play, visualize, and annotate your audio files
audio-annotator - A JavaScript interface for annotating and labeling audio files.
audio-labeler - An in-browser app for labeling audio clips at random, using Docker and Flask.
wavesurfer.js - Simple annotations tool, check the example.
peak.js - Browser-based audio waveform visualisation and UI component for interacting with audio waveforms, developed by BBC UK.
Praat - Doing Phonetics By Computer
Aubio - Tool designed for the extraction of annotations from audio signals.

Video

UltimateLabeling - A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker
VATIC - VATIC is an online video annotation tool for computer vision research that crowdsources work to Amazon’s Mechanical Turk.

Time Series

Curve - Curve is an open-source tool to help label anomalies on time-series data
TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)
time-series-annotator - The CrowdCurio Time Series Annotation Library implements classification tasks for time series.
WDK - The Wearables Development Toolkit (WDK) is a set of tools to facilitate the development of activity recognition applications with wearable devices.

3D

webKnossos - webKnossos is an open-source web-based tool for visualizing, annotating, and sharing large 3D image datasets. It features fast 3D data browsing, skeleton (line-segment) annotations, segmentation and proof-reading tools, mesh visualization, and collaboration features. The public instance webknossos.org hosts a collection of published datasets and can be used without a local setup.
KNOSSOS - KNOSSOS is a software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity.

Lidar

semantic-segmentation-editor - Web labelling tool for camera and LIDAR data

MultiDomain

Label Studio - Label Studio is a configurable data annotation tool that works with different data types
Dataturks - Dataturks support E2E tagging of data items like video, images (classification, segmentation and labelling) and text (full length document annotations for PDF, Doc, Text etc) for ML projects.

jax

docs

autograd and xla (Accelerated Linear Algebra)

With its updated version of Autograd, JAX can automatically differentiate native Python and NumPy functions. It can differentiate through loops, branches, recursion, and closures, and it can take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation) via grad as well as forward-mode differentiation, and the two can be composed arbitrarily to any order.

XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes.

pyro

probabilistic programming

getting started

examples

sample code

numpyro

getting started

pyro implementation in numpy, alpha stage

scikit-learn

machine learning in python

libsvm

install official python bindings:

1 2	pip install -U libsvm-official

third-party python libsvm package installed by:

1 2	pip install libsvm

opennlp

hands-on docs

model zoo

opennlp uses onnx runtime(maybe?), may support m1 inference.

opennlp is written in java. after installing openjdk on macos with homebrew, run this to ensure openjdk is detected:

1 2	sudo ln -sfn $(brew --prefix)/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk

opennlp has a language detector for 103 languages, including chinese. opennlp has a sentence detector (separator) which could be trained on chinese (maybe?)

in order to use opennlp with less code written, here’s how to invoke java from kotlin

dl4j

found on mannings article about better search engine suggestions. in this example it is used with lucene, which has image retrieval (LIRE) capability. lucene is also avaliable as lucene.net in dotnet/c#.

to install lucene.net:

1 2	dotnet add package Lucene.Net --prerelease

deep learning library for java

xgboost

gradient boost is used to train decision trees and classification models.

lightgbm

Light Gradient Boosting Machine

have official commandline tools. installation on macos:

1 2	brew install lightgbm

install python package on macos:

1
2
3

brew install cmake
pip3 install lightgbm

pymc

examples

if want to enable jax sampling, install numpyro or blackjax via pip

difference between pymc3 (old) and pymc (pymc4):

pymc is optimized and faster than pymc3

pymc3 use theano as backend while pymc use aesara (forked theano)

docs with live demo of pymc

PyMC is a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.

fastai

a high level torch wrapper including “out of the box” support for vision, text, tabular, and collab (collaborative filtering) models.

docs

courses

on the twitter list related to opennlp shown up on its official website, fastai has been spotted.

fastai does not support macos. or is it? fastai is on top of pytorch. initial support starts with 2.7.8 and now it is currently 2.7.9

searching ‘samoyed’ like this in github we get a dataset for pets classification called imagewoof from fastai 2020 tutorial series. more image classes like subcategories of cats may be found in imagenet.

text annotation tool:

https://github.com/doccano/doccano

sqlite 3 backend:

1 2	pip3 install doccano

video/image annotation tool, needs docker, with online demo:

https://github.com/openvinotoolkit/cvat

image labeling:

https://github.com/heartexlabs/labelImg

with audio video support

https://github.com/heartexlabs/label-studio

with audio transcription support

https://github.com/UniversalDataTool/universal-data-tool

image and audio

https://github.com/Cartucho/OpenLabeling

specialized for yolo bounding boxes

https://github.com/developer0hye/Yolo_Label

ML

2022-08-09

Awesome-Data-Labeling

Images

Text

Audio

Video

Time Series

3D

Lidar

MultiDomain

2022-08-07

Opennlp, Fastai And Other Machine Learning Platforms

jax

pyro

numpyro

scikit-learn

libsvm

opennlp

dl4j

xgboost

lightgbm

pymc

fastai

2022-05-24

Ai训练集标注工具

text annotation tool:

video/image annotation tool, needs docker, with online demo:

image labeling:

with audio video support

with audio transcription support

image and audio

specialized for yolo bounding boxes

Links

ML

2022-08-09 Awesome-Data-Labeling

Images

Text

Audio

Video

Time Series

3D

Lidar

MultiDomain

2022-08-07 Opennlp, Fastai And Other Machine Learning Platforms

2022-05-24 Ai训练集标注工具

text annotation tool:

video/image annotation tool, needs docker, with online demo:

image labeling:

with audio video support

with audio transcription support

image and audio

specialized for yolo bounding boxes

Links

2022-08-09

Awesome-Data-Labeling

2022-08-07

Opennlp, Fastai And Other Machine Learning Platforms

2022-05-24

Ai训练集标注工具