Opennlp, Fastai And Other Machine Learning Platforms
This article provides a comprehensive guide to installing and using various machine learning libraries such as OpenNLP, DL4J, XGBoost, LightGBM, and tools like PyMC, Fast.ai, CMake on macOS. It includes examples, documentation, and course links for easy implementation. Additionally, the article discusses the compatibility of fastai with macOS and its support for ‘samoyed’ dataset in GitHub. It also introduces the pet classification dataset imagewoof from the fastai 2020 tutorial series and explores additional image classes found in Imagenet.
jax
autograd and xla (Accelerated Linear Algebra)
With its updated version of Autograd, JAX can automatically differentiate native Python and NumPy functions. It can differentiate through loops, branches, recursion, and closures, and it can take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation) via grad as well as forward-mode differentiation, and the two can be composed arbitrarily to any order.
XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes.
pyro
probabilistic programming
numpyro
pyro implementation in numpy, alpha stage
scikit-learn
machine learning in python
libsvm
install official python bindings:
1 | pip install -U libsvm-official |
third-party python libsvm package installed by:
1 | pip install libsvm |
opennlp
opennlp uses onnx runtime(maybe?), may support m1 inference.
opennlp is written in java. after installing openjdk on macos with homebrew, run this to ensure openjdk is detected:
1 | sudo ln -sfn $(brew --prefix)/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk |
opennlp has a language detector for 103 languages, including chinese. opennlp has a sentence detector (separator) which could be trained on chinese (maybe?)
in order to use opennlp with less code written, here’s how to invoke java from kotlin
dl4j
found on mannings article about better search engine suggestions. in this example it is used with lucene, which has image retrieval (LIRE) capability. lucene is also avaliable as lucene.net in dotnet/c#.
to install lucene.net:
1 | dotnet add package Lucene.Net --prerelease |
deep learning library for java
xgboost
gradient boost is used to train decision trees and classification models.
lightgbm
Light Gradient Boosting Machine
have official commandline tools. installation on macos:
1 | brew install lightgbm |
install python package on macos:
1 | brew install cmake |
pymc
if want to enable jax sampling, install numpyro
or blackjax
via pip
difference between pymc3 (old) and pymc (pymc4):
pymc is optimized and faster than pymc3
pymc3 use theano as backend while pymc use aesara (forked theano)
docs with live demo of pymc
PyMC is a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.
fastai
a high level torch wrapper including “out of the box” support for vision, text, tabular, and collab (collaborative filtering) models.
on the twitter list related to opennlp shown up on its official website, fastai has been spotted.
fastai does not support macos. or is it? fastai is on top of pytorch. initial support starts with 2.7.8 and now it is currently 2.7.9
searching ‘samoyed’ like this in github we get a dataset for pets classification called imagewoof from fastai 2020 tutorial series. more image classes like subcategories of cats may be found in imagenet.