2022-12-16
Pyro Object Detection, 3D Convolution On Video, Remove Watermark In Video

use pyro to create an object detector

Pyro is a probabilistic programming language built on top of PyTorch that allows you to build flexible, expressive models and perform Bayesian inference. You can use Pyro to build an object detector by defining a probabilistic model that takes an input image and outputs a set of bounding boxes and class labels for the objects in the image.

Here is an outline of the process you can follow to build an object detector using Pyro:

Collect and preprocess a dataset of images and annotated bounding boxes for the objects you want to detect.

Define a probabilistic model in Pyro that takes an input image and outputs a set of bounding boxes and class labels for the objects in the image. You can use a convolutional neural network (CNN) to extract features from the input image and a multi-layer perceptron (MLP) to predict the bounding boxes and class labels.

Define a loss function that measures the difference between the predicted bounding boxes and class labels and the ground truth annotations in the dataset.

Use Pyro’s optimization algorithms to optimize the model’s parameters with respect to the loss function.

Use the trained model to make predictions on new images.

Here is an example of how you can define a simple object detector using Pyro:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import torch
import pyro
import pyro.distributions as dist
class ObjectDetector(pyro.nn.PyroModule):
def __init__(self, num_classes):
super().__init__()
self.num_classes = num_classes
self.cnn = pyro.nn.Sequential(
# Define the CNN layers here
)
self.mlp = pyro.nn.Sequential(
# Define the MLP layers here
)
def forward(self, x):
features = self.cnn(x)
bboxes, labels = self.mlp(features)
return bboxes, labels
# Define the loss function
def loss(model, x, bboxes, labels):
predicted_bboxes, predicted_labels = model(x)
bbox_loss = # Compute the bounding box loss here
label_loss = # Compute the class label loss here
return bbox_loss + label_loss
# Define the model and optimizer
model = ObjectDetector(num_classes)
optimizer = pyro.optim.Adam({"lr": 0.001})
# Loop over the dataset and optimize the model
for x, bboxes, labels in dataset:
optimizer.zero_grad()
loss_value = loss(model, x, bboxes, labels)
loss_value.backward()
optimizer.step()
# Use the trained model to make predictions on new images
x_new = # Load a new image
bboxes, labels = model(x_new)

This is just a simple example of how you can use Pyro to build an object detector. You can find more detailed examples and resources on the Pyro website (https://pyro.ai/) and in the Pyro documentation (https://docs.pyro.ai/).

define the CNN network and the MLP network

To define the CNN and MLP networks for an object detector using Pyro, you can use Pyro’s nn module to define the layers of the network.

For the CNN, you can use a combination of convolutional, pooling, and fully-connected (FC) layers to extract features from the input image. Here is an example of how you can define a simple CNN in Pyro:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import torch.nn as nn
import pyro.nn as pyro_nn
class CNN(pyro_nn.PyroModule):
def __init__(self, num_classes):
super().__init__()
self.conv1 = pyro_nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
self.conv2 = pyro_nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
self.pool = pyro_nn.MaxPool2d(kernel_size=2, stride=2)
self.fc1 = pyro_nn.Linear(32 * 7 * 7, num_classes)
def forward(self, x):
x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = x.view(-1, 32 * 7 * 7)
x = self.fc1(x)
return x

For the MLP, you can use a combination of fully-connected (FC) layers to predict the bounding boxes and class labels for the objects in the image. Here is an example of how you can define an MLP in Pyro:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch.nn as nn
import pyro.nn as pyro_nn
class MLP(pyro_nn.PyroModule):
def __init__(self, num_classes, num_bboxes):
super().__init__()
self.fc1 = pyro_nn.Linear(num_classes, 128)
self.fc2 = pyro_nn.Linear(128, num_bboxes * 4)
self.fc3 = pyro_nn.Linear(128, num_bboxes * num_classes)
def forward(self, x):
x = self.fc1(x)
bboxes = self.fc2(x)
labels = self.fc3(x)
return bboxes, labels

You can then combine the CNN and MLP into a single object detector model by defining a PyroModule that takes an input image and outputs the predicted bounding boxes and class labels.

To continue building an object detector using Pyro, you can define a loss function that measures the difference between the predicted bounding boxes and class labels and the ground truth annotations in the dataset. You can then use Pyro’s optimization algorithms to optimize the model’s parameters with respect to the loss function.

For example, you can define a loss function that combines a bounding box loss and a class label loss. The bounding box loss can be computed using the mean squared error (MSE) between the predicted bounding boxes and the ground truth bounding boxes. The class label loss can be computed using the cross-entropy loss between the predicted class labels and the ground truth labels.

Here is an example of how you can define the loss function:

1
2
3
4
5
6
7
8
9
import torch
import pyro
import pyro.distributions as dist
def loss(model, x, bboxes, labels):
predicted_bboxes, predicted_labels = model(x)
bbox_loss = torch.mean((predicted_bboxes - bboxes) ** 2)
label_loss = torch.nn.CrossEntropyLoss()(predicted_labels, labels)
return bbox_loss + label_loss

To optimize the model’s parameters, you can use an optimization algorithm such as Adam or SGD. Here is an example of how you can define an optimizer and use it to train the model:

1
2
3
4
5
6
7
8
9
10
11
12
13
import torch
import pyro
import pyro.optim as optim
# Define the model and optimizer
model = ObjectDetector(num_classes, num_bboxes)
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Loop over the dataset and optimize the model
for x, bboxes, labels in dataset:
optimizer.zero_grad()
loss_value = loss(model, x, bboxes, labels)
loss_value.backward()
optimizer.step()

Once the model is trained, you can use it to make predictions on new images. To do this, you can pass the new images through the model and use the predicted bounding boxes and class labels to draw bounding boxes around the objects in the image.

1
2
3
4
5
6
7
8
9
# Use the trained model to make predictions on new images
x_new = # Load a new image
bboxes, labels = model(x_new)
# Draw bounding boxes around the objects in the image
for bbox, label in zip(bboxes, labels):
# Draw the bounding box on the image
# ...
I hope this helps! Let me know if you have any questions.

how to do 3d convolution on video with pytorch

To perform 3D convolution on a video with PyTorch, you can use the nn.Conv3d layer. This layer applies a 3D convolution over the input tensor, which has shape (batch_size, channels, depth, height, width).

Here is an example of how you can define a 3D convolutional neural network (CNN) in PyTorch to process a video:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
import torch.nn as nn
class VideoCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv3d(3, 16, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool3d(kernel_size=2, stride=2)
self.fc1 = nn.Linear(32 * 7 * 7 * 7, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = x.view(-1, 32 * 7 * 7 * 7)
x = self.fc1(x)
x = self.fc2(x)
return x

To process a video with this CNN, you can first split the video into a sequence of frames, and then pass each frame through the CNN. You can use the torchvision.io.read_video function to read the video and the torchvision.transforms.ToTensor transformation to convert the frames to tensors.

1
2
3
4
5
6
7
8
9
10
11
12
13
import torch
import torchvision
import torchvision.transforms as transforms
# Read the video and split it into a sequence of frames
video, _ = torchvision.io.read_video("video.mp4")
frames = video.permute(0, 3, 1, 2)
# Convert the frames to tensors
transform = transforms.ToTensor()
tensors = [transform(frame) for frame in frames]
# Pass the tensors through the CNN
cnn = VideoCNN()
outputs = [cnn(tensor.unsqueeze(0)) for tensor in tensors]

You can then use the output of the CNN to perform various tasks, such as classification or object detection.

I hope this helps! Let me know if you have any questions.

use 3d convolution to determine location of watermark on video

To use 3D convolution to determine the location of a watermark on a video, you can build a convolutional neural network (CNN) that takes a sequence of frames from the video as input and outputs a prediction of the watermark’s location.

Here is an outline of the process you can follow to build such a CNN using PyTorch:

Collect and preprocess a dataset of videos and annotated watermark locations. To do this, you can manually label the watermark locations in a set of videos, or you can use a watermarked video and its corresponding unwatermarked version to automatically generate labels.

Define a CNN in PyTorch that takes a sequence of frames from a video as input and outputs a prediction of the watermark’s location. You can use 3D convolutional layers to extract features from the frames and fully-connected (FC) layers to make the prediction.

Define a loss function that measures the difference between the predicted watermark location and the ground truth location. You can use a regression loss such as mean squared error (MSE) or mean absolute error (MAE).

Use PyTorch’s optimization algorithms to optimize the CNN’s parameters with respect to the loss function.

Use the trained CNN to make predictions on new videos.

Here is an example of how you can define a simple CNN in PyTorch to determine the location of a watermark on a video:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
import torch.nn as nn
class WatermarkCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv3d(3, 16, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool3d(kernel_size=2, stride=2)
self.fc1 = nn.Linear(32 * 7 * 7 * 7, 128)
self.fc2 = nn.Linear(128, 4) # 4 values for (x, y, width, height)
def forward(self, x):
x = self.conv1(x)
x = self.pool(x)
x = self.conv2(x)
x = self.pool(x)
x = x.view(-1, 32 * 7 * 7 * 7)
x = self.fc1(x)
x = self.fc2(x)
return x

To train this CNN, you can loop over the dataset and optimize the CNN’s parameters with respect to the loss function. Here is an example of how you can define the loss function and optimizer, and use them to train the CNN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
import torch.nn as nn
import torch.optim as optim
# Define the loss function
def loss(predictions, targets):
return nn.MSELoss()(predictions, targets)
# Define the model and optimizer
model = WatermarkCNN()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Loop over the dataset and optimize the model
for frames, watermark_location in dataset:
optimizer.zero_grad()
predictions = model(frames)
loss_value = loss(predictions, watermark_location)
loss_value.backward()
optimizer.step()

Once the model is trained, you can use it to make predictions on new videos. To do this, you can pass the frames of the video through the model and use the predicted watermark location to draw a bounding box around the watermark.

1
2
3
4
5
6
7
8
# Use the trained model to make predictions on new videos
frames = # Load the frames of a new video
predictions = model(frames)
# Draw a bounding box around the watermark
x, y, width, height = predictions
# Draw the bounding box on the frames
# ...

I hope this helps! Let me know if you have any questions.

Read More

2022-08-06
Deeplearning On Macos M-Series Processors

it is funny that macOS still supports AMD GPUs, means any intel Mac (not M-series!) can now utilize internal/external AMD GPUs as long as frameworks like jax and pytorch support MPS/Metal.


calling python code from swift using pythonkit:

1
2
3
4
5
6
7
8
func downloadVideo(link: String){
let sys = Python.import("sys")
sys.path.append(dirPath)
let example = Python.import("sample")
let response = example.downloadVideo(link, dirPath)
videoPath = String(response)
}

run macos in docker with kvm

neural engine

it is used for coreml inference, not training

run coreml on hackintosh

first, download macos montery using the mac.

then, install it on hackintosh, with associated nvidia drivers.

next test gpu avalibility via system info panel.

then install xcode commandline tools and check coreml avalibility

run coreml with swift on linux

darling is at its very premature stage, just like the wine. now it is testing something called “darlingserver” which is a full userspace implementation and is prone to tons of problems. swift repl is not working and installing xcode commandline tools 14 will hang this thing. i suggest you to do light model training on macbook air and convert it to onnx if want to use it everywhere.

before reinstallation of darling, make sure you have removed all darling related files by checking updatedb; locate darling | grep -v <compile directory>

visit here to install darling from source (maybe that’s the only way)

if want to install darling on kali, you must outsource all deeplearning models to other disks, and collect all other big files to somewhere else or trash them. use systemwide user broadcast method to warn me if any of the disk is missing. use automatic symlink change method to adapt the external disk mountpoint changes.

darling can install xcode commandline tools with macos sdk, so maybe it can run coreml models with swift using cpu. gpu support is currently not known. maybe that requires metal support.

thermal and battery life concerns, and more

consider using external gpus (eGPUs) with thunderbolt 3 and AMD GPUs to avoid overheating. currently that can only be done with intel Macs.

battery life is currently bad for intel/amd notebooks of x86/64 architecture.

heavy lifting jobs are likely to be run on Mac Studio with M1 Ultra and 128GB RAM. Macbook Air M1 with 8GB RAM is simply not feasible.

aside of Apple platforms, these APIs are virtually useless.

to run these on other non-apple machines, you need to tweak and install macOS on x86-64 platforms with macOS supported GPUs(may have low performance), which will definitely not taking any advantage of huge shared RAM with CPU, and may run poorly on CoreML/CreateML, may not support deepspeed stage 2/3 or BMI(big model inference)

Non-Supported NVIDIA Cards, use AMD GPU instead

High Sierra no longer supports NVIDIA Mac.

Mojave – Catalina – BigSur only works with AMD graphics and Intel onboard graphics and only a very small number of old NVIDIA products. Suppose you have GTX 1070, 1080, and the like, you can not use High Sierra onwards because Nvidia does not provide any updates for Mac and can not be used in any other way.

In general, the graphics of the Turing, Pascal, and Maxwell series will never be supported again. The latest Mac version that can use this series of graphics is High Sierra.

tensorflow with m1 support

using tensorflow metal plugin, which sets up miniforge and install tensorflow-metal within.

install without miniforge(works!)

1
2
pip3 install tensorflow-macos tensorflow-metal

validation:

1
2
python3 -c "import tensorflow as tf; physical_devices = tf.config.list_physical_devices('GPU'); print('Num GPUs:', len(physical_devices)); print(physical_devices)"

pytorch with m1 support, using MPS (Metal performance shader)

install from nightly release channel, with minimum system version requirements 12.3 (which this machine had been qualified after system update, now 12.5)

1
2
3
# MPS acceleration is available on MacOS 12.3+
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

validation

1
2
python3 -c "import torch; print('MPS avaliable:',torch.backends.mps.is_available()); print('Built with MPS:',torch.backends.mps.is_built())"

run python inside swift

use pythonkit

automatic machine learning using CreateML

1
2
import CreateML

CreateML is similar to any other AutoML tools, like AutoKeras, AutoTrain by Huggingface (works by training against a selected set of user-provided models)

using CoreML

curated, largest coreml models collection

CoreML models can be created by CreateML and some customization can be done via protocol MLCustomLayer.

onnxruntime can run onnx models on CoreML, via c#, since that library is maintained by microsoft.

to install c# on macos:

1
2
brew install dotnet-sdk

to install and launch dotnet repl:

1
2
3
dotnet tool install -g dotnet-repl
dotnet repl

paddlepaddle support

convert into onnx first, then run on onnxruntime.

paddlepaddle itself currently only supports running on M1 CPU only via rosetta 2.

Swift Core ML 3 implementations of GPT-2, DistilGPT-2, BERT, and DistilBERT for Question answering.

train image classifier and text classifier in which CreateMLUI is deprecated (gone)

train source code classifier with flightschool which is a free swift tutorial books provider

classifying sounds with coreml return sound type along with timestamp

detect human pose using coreml

apple speech recognization api request

pytorch mps backend

text classification using createml

onnx model zoo

Getting CoreML Models

CoreML Model Zoo

FCRN-DepthPrediction

Depth Estimation

Predict the depth from a single image.

View Models

MNIST

Drawing Classification

Classify a single handwritten digit (supports digits 0-9).

View Model

UpdatableDrawingClassifier

Drawing Classification

Drawing classifier that learns to recognize new drawings based on a K-Nearest Neighbors model (KNN).

View Model and Code Sample

MobileNetV2

Image Classification

The MobileNetv2 architecture trained to classify the dominant object in a camera frame or image.

View Models and Code Sample

Resnet50

Image Classification

A Residual Neural Network that will classify the dominant object in a camera frame or image.

View Models and Code Sample

SqueezeNet

Image Classification

A small Deep Neural Network architecture that classifies the dominant object in a camera frame or image.

View Models and Code Sample

DeeplabV3

Image Segmentation

Segment the pixels of a camera frame or image into a predefined set of classes.

View Models

YOLOv3

Object Detection

Locate and classify 80 different types of objects present in a camera frame or image.

View Models and Code Sample

YOLOv3-Tiny

Object Detection

Locate and classify 80 different types of objects present in a camera frame or image.

View Models and Code Sample

PoseNet

Pose Estimation

Estimates up to 17 joint positions for each person in an image.

View Models and Code Sample

Text

BERT-SQuAD

Question Answering

Find answers to questions about paragraphs of text.

View Model and Code Sample

Apple Machine Learning Related APIs (may need user permission within or without xcode by means of Info.plist or something)

Vision

Build features that can process and analyze images and video using computer vision.

View Vision framework

Image Classification

Automatically identify the content in images.

View API

Image Saliency

Quantify and visualize the key part of an image or where in the image people are likely to look.

View API

Image Alignment

Analyze and manage the alignment of images.

View API

Image Similarity

Generate a feature print to compute distance between images.

View API

Object Detection

Find and label objects in images.

View API

Object Tracking

Track moving objects in video.

View API

Trajectory Detection

Detect the trajectory of objects in motion in video.

View API

Contour Detection

Trace the edges of objects and features in images and video.

View API

Text Detection

Detect regions of visible text in images.

View API

Text Recognition

Find, recognize, and extract text from images.

View API

Face Detection

Detect human faces in images.

View API

Face Tracking

Track faces from a camera feed in real time.

View API

Face Landmarks

Find facial features in images by detecting landmarks on faces.

View API

Face Capture Quality

Compare face capture quality in a set of images.

View API

Human Body Detection

Find regions that contain human bodies in images.

View API

Body Pose

Detect landmarks on people in images and video.

View API

Hand Pose

Detect landmarks on human hands in images and video.

View API

Animal Recognition

Find cats and dogs in images.

View API

Barcode Detection

Detect and analyze barcodes in images.

View API

Rectangle Detection

Find rectangular regions in images.

View API

Horizon Detection

Determine the horizon angle in images.

View API

Optical Flow

Analyze the pattern of motion of objects between consecutive video frames.

View API

Person Segmentation New

Produce a matte image for a person in an image.

View API

Document Detection New

Detect rectangular regions in images that contain text.

View API

Natural Language

Analyze natural language text and deduce its language-specific metadata.

View Natural Language framework

Tokenization

Enumerate the words in text strings.

View API

Language Identification

Recognize the language of bodies of text.

View API

Named Entity Recognition

Use a linguistic tagger to name entities in a string.

View API

Part of Speech Tagging

Classify nouns, verbs, adjectives, and other parts of speech in a string.

View API

Word Embedding

Get a vector representation for any word and find similarity between two words or nearest neighbors for a word.

View API

Sentence Embedding

Get a vector representation for any string and find similarity between two strings.

View API

Sentiment Analysis

Score text as positive, negative, or neutral based on the sentiment.

View API

Speech

Take advantage of speech recognition and saliency features for a variety of languages.

View Speech framework

Speech Recognition

Recognize and analyze speech in audio and get back data like transcripts.

View API

Sound Analysis

Analyze audio and recognize it as a particular type, such as laughter or applause.

View Sound Analysis framework

Sound Classification

Analyze sounds in audio using the built-in sound classifier or a custom Core ML sound classification model.

View API

Read More