2022-08-09
Awesome-Data-Labeling

A curated list of awesome data labeling tools

Images

  • labelImg - LabelImg is a graphical image annotation tool and label object bounding boxes in images

  • CVAT - Powerful and efficient Computer Vision Annotion Tool

  • labelme - Image Polygonal Annotation with Python

  • VoTT - An open source annotation and labeling tool for image and video assets

  • imglab - A web based tool to label images for objects that can be used to train dlib or other object detectors

  • Yolo_mark - GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2

  • PixelAnnotationTool - Software that allows you to manually and quickly annotate images in directories

  • OpenLabeling - Label images and video for Computer Vision applications

  • imagetagger - An open source online platform for collaborative image labeling

  • Alturos.ImageAnnotation - A collaborative tool for labeling image data

  • deeplabel - A cross-platform image annotation tool for machine learning

  • MedTagger - A collaborative framework for annotating medical datasets using crowdsourcing.

  • Labelbox - Labelbox is the fastest way to annotate data to build and ship computer vision applications

  • turktool - A modern React app for scalable bounding box annotation of images

  • Pixie - Pixie is a GUI annotation tool which provides the bounding box, polygon, free drawing and semantic segmentation object labelling

  • OpenLabeler - OpenLabeler is an open source desktop application for annotating objects for AI appplications

  • Anno-Mage - A Semi Automatic Image Annotation Tool which helps you in annotating images by suggesting you annotations for 80 object classes using a pre-trained model

  • CATMAID - Collaborative Annotation Toolkit for Massive Amounts of Image Data

  • make-sense - makesense.ai is a free to use online tool for labelling photos

  • LOST - Design your own smart Image Annotation process in a web-based environment

  • Annotorious - A JavaScript library for image annotation.

  • Sloth - Tool for labeling image and video data for computer vision research.

Text

  • YEDDA - A Lightweight Collaborative Text Span Annotation Tool (Chunking, NER, etc.). ACL best demo nomination.

  • ML-Annotate - Label text data for machine learning purposes. ML-Annotate supports binary, multi-label and multi-class labeling.

  • TagEditor - Annotation tool for spaCy

  • SMART - Smarter Manual Annotation for Resource-constrained collection of Training data

  • PIAF - A Question-Answering annotation tool

Audio

  • EchoML - Play, visualize, and annotate your audio files

  • audio-annotator - A JavaScript interface for annotating and labeling audio files.

  • audio-labeler - An in-browser app for labeling audio clips at random, using Docker and Flask.

  • wavesurfer.js - Simple annotations tool, check the example.

  • peak.js - Browser-based audio waveform visualisation and UI component for interacting with audio waveforms, developed by BBC UK.

  • Praat - Doing Phonetics By Computer

  • Aubio - Tool designed for the extraction of annotations from audio signals.

Video

  • UltimateLabeling - A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker

  • VATIC - VATIC is an online video annotation tool for computer vision research that crowdsources work to Amazon’s Mechanical Turk.

Time Series

  • Curve - Curve is an open-source tool to help label anomalies on time-series data

  • TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

  • time-series-annotator - The CrowdCurio Time Series Annotation Library implements classification tasks for time series.

  • WDK - The Wearables Development Toolkit (WDK) is a set of tools to facilitate the development of activity recognition applications with wearable devices.

3D

  • webKnossos - webKnossos is an open-source web-based tool for visualizing, annotating, and sharing large 3D image datasets. It features fast 3D data browsing, skeleton (line-segment) annotations, segmentation and proof-reading tools, mesh visualization, and collaboration features. The public instance webknossos.org hosts a collection of published datasets and can be used without a local setup.

  • KNOSSOS - KNOSSOS is a software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity.

Lidar

MultiDomain

  • Label Studio - Label Studio is a configurable data annotation tool that works with different data types

  • Dataturks - Dataturks support E2E tagging of data items like video, images (classification, segmentation and labelling) and text (full length document annotations for PDF, Doc, Text etc) for ML projects.

Read More

2022-08-07
Opennlp, Fastai And Other Machine Learning Platforms

jax

docs

autograd and xla (Accelerated Linear Algebra)

With its updated version of Autograd, JAX can automatically differentiate native Python and NumPy functions. It can differentiate through loops, branches, recursion, and closures, and it can take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation) via grad as well as forward-mode differentiation, and the two can be composed arbitrarily to any order.

XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes.

pyro

probabilistic programming

getting started

examples

sample code

numpyro

getting started

pyro implementation in numpy, alpha stage

scikit-learn

machine learning in python

libsvm

install official python bindings:

1
2
pip install -U libsvm-official

third-party python libsvm package installed by:

1
2
pip install libsvm

opennlp

hands-on docs

model zoo

opennlp uses onnx runtime(maybe?), may support m1 inference.

opennlp is written in java. after installing openjdk on macos with homebrew, run this to ensure openjdk is detected:

1
2
sudo ln -sfn $(brew --prefix)/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk

opennlp has a language detector for 103 languages, including chinese. opennlp has a sentence detector (separator) which could be trained on chinese (maybe?)

in order to use opennlp with less code written, here’s how to invoke java from kotlin

dl4j

found on mannings article about better search engine suggestions. in this example it is used with lucene, which has image retrieval (LIRE) capability. lucene is also avaliable as lucene.net in dotnet/c#.

to install lucene.net:

1
2
dotnet add package Lucene.Net --prerelease

deep learning library for java

xgboost

gradient boost is used to train decision trees and classification models.

lightgbm

Light Gradient Boosting Machine

have official commandline tools. installation on macos:

1
2
brew install lightgbm

install python package on macos:

1
2
3
brew install cmake
pip3 install lightgbm

pymc

examples

if want to enable jax sampling, install numpyro or blackjax via pip

difference between pymc3 (old) and pymc (pymc4):

pymc is optimized and faster than pymc3

pymc3 use theano as backend while pymc use aesara (forked theano)

docs with live demo of pymc

PyMC is a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.

fastai

a high level torch wrapper including “out of the box” support for vision, text, tabular, and collab (collaborative filtering) models.

docs

courses

on the twitter list related to opennlp shown up on its official website, fastai has been spotted.

fastai does not support macos. or is it? fastai is on top of pytorch. initial support starts with 2.7.8 and now it is currently 2.7.9

searching ‘samoyed’ like this in github we get a dataset for pets classification called imagewoof from fastai 2020 tutorial series. more image classes like subcategories of cats may be found in imagenet.

Read More

2022-05-24
Ai训练集标注工具

text annotation tool:

https://github.com/doccano/doccano

sqlite 3 backend:

1
2
pip3 install doccano

video/image annotation tool, needs docker, with online demo:

https://github.com/openvinotoolkit/cvat

image labeling:

https://github.com/heartexlabs/labelImg

with audio video support

https://github.com/heartexlabs/label-studio

with audio transcription support

https://github.com/UniversalDataTool/universal-data-tool

image and audio

https://github.com/Cartucho/OpenLabeling

specialized for yolo bounding boxes

https://github.com/developer0hye/Yolo_Label

Read More