Project structure of: PaddlePaddle/PaddleVideo
__init__.py
Python module, licenses, imports PaddleVideo class.README.md
Video action detection for abnormal behavior. SlowFast+FasterRCNN.
get_image_label.py
Trains and validates object detection models using video frames.README.md
Detect UAVs in restricted zones using PaddleDetection.
action.py
Basketball action detection Python scriptlogger.py
Custom logger class for news stripper appfeature_extractor.py
Audio feature extraction for basketball actions.model_config.py
ModelAudio: Extracts audio features, slices data, appends to list.vgg_params.py
Global VGGish model parameters. Audio feature extraction, PCA, embedding. Adjustable settings.
audio_infer.py
Audio inference model for predicting basketball actions.bmn_infer.py
Basketball action detection via BMN inferencinglstm_infer.py
Basketball LSTM action detection model with GPU optim.pptsm_infer.py
PaddleVideo-based action detection with PPTSM inference
__init__.py
Registers readers for different file formats. Alphabetical order.audio_reader.py
AudioReader class for YouTube-8M dataset management.bmninf_reader.py
Reads and processes BMN model data for video analysis.feature_reader.py
Python FeatureReader for YouTube-8M dataset.reader_utils.py
Video input data reader classes and singleton reader_zoo.tsminf_reader.py
TSMINF reader: multiprocessing, jpg format, ML transformations
config_utils.py
Loads, processes, prints config files with nested dictionaries.preprocess.py
Python FFmpeg utils for frames, audio, MP4 downloadprocess_result.py
Processes video results, applies NMS
eval.py
Best IOU and score threshold for evaluating basketball actions found.predict.py
Basketball Action Prediction, JSON Output.
README.md
Basketball action detection app, F1-score 80.14%, PaddlePaddle 2.0
__init__.py
EIVideo init.py: Sets root path, defines constants, constructs full paths.api.py
EIVideo API: Retrieves, converts, and annotates images/videos.main.py
Trains PaddleVideo model with distributed training and mixed precision.__init__.py
EIVideo/paddlevideo initialization file__init__.py
Imports, defines, exports, licensesbuilder.py
PaddleVideo dataset loader, signal handlers for SIGINT/TERM.__init__.py
Image preprocessing pipelines for PaddleVideo's EIVideo application.compose.py
Flexible composition of pipeline components.custom_transforms_f.py
Paddle Video's image preprocessing classes for resizing, aspect ratio adjustment, and custom cropping transforms.
registry.py
Organizes PaddleVideo functionalities in 4 registries.
__init__.py
Apache licensed PaddleVideo library init file for VOSMetric and build_metric functions.base.py
Abstract base class for PaddleVideo metrics.build.py
Apache License v2.0, Python build metric toolregistry.py
Registry initializer for metric management.vos_metric.py
VOS Metric: Video Object Segmentation.
__init__.py
Imports and registers PaddleVideo modules, includes popular models.__init__.py
DeepLab import for deeplab_manet module included in all list.aspp_manet.py
ASPP-MANET backbone model initialization in Paddledecoder_manet.py
Paddle Decoder class for Manet architecturedeeplab_manet.py
Introduces FrozenBatchNorm2d, DeepLab network backbone with freezed BatchNorm layers.resnet_manet.py
ResNet-MANET model with BatchNorm, ReLU, residual blocks.
builder.py
Builds computer vision models with configuration.__init__.py
PaddleVideo framework: BaseSegment, Manet classes.__init__.py
PaddlePaddle components for video segmentationbase.py
Base class for semi-Video Object Segmentation in PaddlePaddlemanet_stage1.py
Manet Stage 1 Video Segmentation
__init__.py
Module init, copyright, license, imports IntVOS.IntVOS.py
Compute L2 distances, apply attention, feature extraction, and pooling.
registry.py
Registry classes for video pipeline components in PaddleVideo's EIVideo.weight_init.py
Initialize weights for PaddlePaddle layer with custom options
__init__.py
Imports "test_model" from "test.py", adds to alltest.py
Test model function for gradient-free testing with multi-card support.
__init__.py
PaddleVideo utility functions and classes import, logger/profiler setup.build_utils.py
Builds object from config and registry.config.py
EIVideo: Config file parsing, printing, visualizing, and checking.dist_utils.py
Distributed computing utilities for PaddleVideo's EIVideo module.logger.py
Colorful, configurable, non-propagating logger for PaddleVideo.manet_utils.py
OpenCV, PaddleVideo utilities with PyTorch init.precise_bn.py
PreciseBN: recompute BN stats for improved accuracyprofiler.py
Profiler initialization and stop for PaddlePaddle's operator-level timing.record.py
Logs video processing metrics for clarity.registry.py
Registry class for mapping names to objectssave_load.py
Adapts ViT model, loads/saves PaddlePaddle models.
version.py
PaddleVideo version: 0.0.1, Apache License 2.0
README.MD
Chinese CLI video annotation tool guidesetup.py
Source citation required for codeversion.py
EIVideo version 0.1a by Acer Zhang (credit requested)
__init__.py
QEIVideo Root & Versionbuild_gui.py
Video processing GUI builder with PyQt5.__init__.py
Author, date, copyright - EIVideo's QEIVideo gui init.demo.py
DrawFrame class QWidget for path drawing and mouse events.ui_main_window.py
EIVideo: UI-driven video/painting app with "Hi" message.
start.py
Initialize QApplication, create GUI, run event loop.__init__.py
EIVideo QEIVideo gui module init comment
__init__.py
QEIVideo gui init, author, date, copyright.demo.py
PyQt5 video player UI with interactive buttons and tabs.
version.py
EIVideo app version info by Acer Zhang 01/11/2022.PaintBoard.py
PaintBoard: Widget for drawing, clearing, changing pen attributes, and retrieving content.
README.md
Windows video annotation tool with Baidu AI, maintained by QPT-Family on GitHub.cmd
Updating EIVideo on GitHub, managing branches and code.demo.ui
Qt application UI video demo with interactive elements.
README.md
Fight Recognition model guide for PaddleVideo.
README.md
OpenPose for figure skating action data processing
download.sh
Download, extract, and delete 4 tar files related to FootballAction.
dataset_url.list
13 EuroCup2016 football videos from BCEBOSdownload_dataset.sh
Downloads 12 EuroCup2016 videos using wget.url.list
Unique EuroCup2016 URLs for MP4 videos.url_val.list
List of hashed MP4 URLs for EuroCup2016 videos.
get_frames_pcm.py
Python script extracts frames and audio from MP4 videos.get_instance_for_bmn.py
Generates BMN-formatted output from ground truth data.get_instance_for_lstm.py
Calculates IoU, checks hits, splits datasets for handling football actions.get_instance_for_pptsm.py
Processes video data, extracts action instances, creates dataset.
extract_bmn.py
Classify and extract features from videos using pre-trained model.extract_feat.py
Baidu models extract audio, classify videos.
action.py
Action detection system with ML/DL for Baidu Cloudlogger.py
Custom logger class for news stripper appfeature_extractor.py
Extracts MFCC and STFT audio features for football action detection.model_config.py
Extracts audio features, slices data, returns feature list.vgg_params.py
Global VGGish model parameters for audio feature extraction
audio_infer.py
Audio inference model for predicting football actions.bmn_infer.py
Paddle BMN Infer action detection model, averaging predictions.lstm_infer.py
LSTM model for football action predictionpptsm_infer.py
PPTSM model inference for football actions.
__init__.py
Imports and registers readers for map files.audio_reader.py
AudioReader class for YouTube-8M dataset management.bmninf_reader.py
BMNINF reader for football action detectionfeature_reader.py
Attention-based LSTM feature reader for FootballActionreader_utils.py
Reader registration and lookup for video input data.tsminf_reader.py
TSMINF Reader: JPG video dataset, threading, action detection.
config_utils.py
Loads and processes config files for organization.preprocess.py
Four FFmpeg functions for video, audio handlingprocess_result.py
Processes video results with NMS
eval.py
Evaluates precision, recall, and F1 scores for football action prediction models.predict.py
Loads model, reads videos, predicts actions, stores results.
README.md
Enhanced FootballAction model in PaddleVideo.
config.py
Ma-Net config file: Imports, parses, trains, tests.custom_transforms_f.py
Data augmentation transforms for PaddlePaddleDAVIS2017.md
Download and organize DAVIS2017 dataset.DAVIS2017_cn.md
DAVIS2017 dataset prep for Ma-Netdavis_2017_f.py
DAVIS 2017 dataset loader, Ma-Net preprocessinghelpers.py
Helps with image data conversion, masking, and normalization.samplers.py
Randomly samples identities and instances with replacement.
aspp.py
ASPP module layer for Ma-Net CNN__init__.py
Backbone network builder for specified model and stride.drn.py
Deep Residual Network (DRN) model & MA-Net architecture in PaddlePaddlemobilenet.py
MobileNetV2 model for Ma-Net application.resnet.py
ResNet architecture with batch normalization, ReLU, output strides, residual connections.xception.py
AlignedXception network for image classification
decoder.py
Decoder neural network layer for class predictiondeeplab.py
Deeplab: BatchNorm, ASPP, decoder, freeze.IntVOS.py
Video object segmentation with PaddlePaddle and neural networks.loss.py
Custom loss function for image classification tasks with optional hard example mining.
README.md
MA-Net PaddleVideo model testing and training scriptREADME_cn.md
Chinese Ma-Net video segmentation model README.run.sh
Train, test DeepLabV3_coco on DAVIS dataset.test.py
DAVIS2017 image processing with PaddlePaddle, 8-turn interactive classification.train_stage1.py
Ma-Net video object detection, training and visualization.train_stage2.py
Trains stage 2 Ma-Net models, applies loss, evaluates performance.api.py
Tensor handling utilities for PyTorch and PaddlePaddlemask_damaging.py
Damages and scales mask with rotations and translations.meters.py
Computes and stores average values.utils.py
Converts labels to RGB color map.
download.sh
Downloads ernie model, checkpoints, and test dataset.eval_and_save_model.sh
Evaluates and saves AttentionLstmErnie model in specified directories.inference.sh
Inference script for AttentionLstmErnie model.README.md
Multimodal video classification with PaddlePaddle 2.0accuracy_metrics.py
Accuracy metrics calculator for multimodal video tagging models.config.py
Merges and prints config sections.__init__.py
MultimodalVideoTag: ATTENTIONLSTMERNIE reader imported.ernie_task_reader.py
ERNIE task reader: preprocesses text, formats sequences for ERNIE models.feature_reader.py
Multimodal LSTM, attention, NextVlad feature reader.reader_utils.py
ReaderZoo manages reader instances with singleton pattern.tokenization.py
Tokenization and Unicode conversion for text
eval_and_save_model.py
Multimodal video tagging with PaddlePaddle and AttentionLstmErnie.inference.py
Paddle Video Inference: Multimodal Tagging, GPU Supportattention_lstm_ernie.py
ERNIE-based LSTM attention model for multimodal video taggingernie.py
ERNIE model, multimodal video tagging, Paddletransformer_encoder.py
Transformer encoder layer for NLP tasks
train.py
Video training: PaddlePaddle, feeds, outputs, loss, optimizer, build, train, save.utils.py
Multimodal Video Tagging Utilities
train.sh
Train Attention LSTM Ernie model with GPU optimization.
Readme.md
PP-Care video understanding app with TSM and ResNet50
prepare_dataset.py
Prepare PaddleVideo and PPHuman dataset keypoints.
README.md
Trains PaddleVideo JSON for PP-Human inference
README.md
PaddleVideo applications: action detection, recognition, classification, and analysis.__init__.py
Imports base model and trainer modules.base_dataset.py
Base class for video feature datasets.base_model.py
Base class for PaddleVideo modelsbase_trainer.py
T2VLAD trainer: multi-epoch, metrics tracking, model saving.
download_features.sh
Download, extract MSRVTT datasets
data_loaders.py
Efficient data loader for training with LRU cache, PP library, MSRVTT dataset.MSRVTT_dataset.py
MSR-Vtt dataset loader and checker
__init__.py
Imports all logger functions and classes in PaddleVideo's T2VLAD app.log_parser.py
Computes performance metrics for epochs.logger.py
Configures logger based on JSON file or uses basic setup.
loss.py
Contrastive loss, max margin ranking, T2VLAD modelsmetric.py
Calculates retrieval metrics and offers visualization options.model.py
Video analysis CENet model with T2VLAD and BERT.net_vlad.py
T2VLAD: VLAD representation network.text.py
TextEmbedding: Word2Vec for video descriptions and queries. CPU-only, GPT or Word2Vec.
parse_config.py
ConfigParser: argument parsers, slave mode, directories, config, experiment settings.README.md
Trains T2VLAD text video retrieval model using PaddleVideo on MSR-VTT dataset.README_en.md
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval.test.py
Compress predictions using PaddleVideo library.train.py
Trains video analysis model, handles args, saves checkpoints__init__.py
Imports all from "trainer" module.trainer.py
Trains video retrieval model, efficient sample copies, logs progress, computes Mean Average Precision.
__init__.py
Imports all util module functions.util.py
Data processing, categorizing, feature adjustment, Tensor format utilities.
README.md
Table Tennis Action Recognition with VideoSwinTransformer
submission_format_transfer.py
Converts JSON timestamps, formats for table tennis analysis.
extract_bmn_for_tabletennis.py
Classify videos, extract features, predict bounding boxes.
fix_bad_label.py
Fixes bad labels in Table Tennis applicationget_instance_for_bmn.py
Generates BMN model ground truth data for table tennis.gts_format_transfer.py
JSON file format transfer toolaction.py
Baidu Cloud action detection Python scriptlogger.py
Custom logger for action detection in Table Tennis appfeature_extractor.py
Extracts audio features for Table Tennis prediction using VGG-16.model_config.py
Audio feature extraction for Table Tennis predictionsvgg_params.py
Global VGGish model parameters defined, with customizable audio features extraction.
audio_infer.py
Audio inference model for PaddleVideo predictionsbmn_infer.py
GPU-optimized action detection inference modellstm_infer.py
Table Tennis LSTM Action Detection Modelpptsm_infer.py
InferModel class for PPTSM model inference with GPU.
__init__.py
Imports reader classes, registers alphabetically.bmninf_reader.py
BMNINFReader: TableTennis Action Detection Dataset.feature_reader.py
Table tennis action detection from YouTube-8M dataset using LSTM, attention cluster, and nextVlad models.reader_utils.py
ReaderZoo class for PaddleVideo TableTennis error handling and reader management.
config_utils.py
PaddleVideo TableTennis config parsing utilspreprocess.py
Video, audio extraction and sorting utility.process_result.py
Calculates video results with suppression, removes overlaps, and processes video properties.
eval.py
Optimize F1 scores for table tennis action prediction.predict.py
Video prediction setup for TableTennis using PaddleVideo
val_split.py
Splits JSON table tennis gts into training and validation sets
main.py
Trains PaddleVideo models, supports dist. test.__init__.py
PaddleVideo library license and imports.__init__.py
Video dataset loading and processing module for PaddleVideo.builder.py
Python file constructs video pipelines using PaddleVideo & PaddlePaddle.__init__.py
Video and frame datasets for PaddleVideo.base.py
Video dataset class for loading and preparing data.frame_rec.py
PaddleVideo's FrameRecDataset for raw frame loading and transformation.video.py
PaddleVideo: Efficient Video Dataset Loading and Processing
__init__.py
PaddleVideo library functions for video analysis tasks.augmentations.py
Scaling and multi-cropping for PaddleVideocompose.py
Composes video pipeline elements, handles lists, old config workaround.decode.py
PaddleVideo's VideoDecoder: Decodes MP4, RGB frames, audio, and masks.mix.py
Mixup operator for video quality assessment.sample.py
Sampler class for PIL image data sampling in video files.
registry.py
Manages registries for pipelines and datasets in PaddleVideo.
__init__.py
Video Quality Assessment app initializationbase.py
BaseMetric: Foundation for video quality metrics, subclasses required.build.py
Build and import metrics for PaddleVideo library.quality_metric.py
Calculates PLCC & SROCC using numpy and scipy stats.registry.py
Video quality assessment metrics registry in PaddleVideo library.
__init__.py
Video quality assessment models registration and exportation.__init__.py
Backbone models: ResNet, ResNetTweaksTSM imported and defined.resnet.py
ResNet backbone using ConvBNLayer and blocks.resnet_tweaks_tsm.py
TSM ResNet model with configurable layers
builder.py
Model builder for computer vision components__init__.py
PaddleVideo: BaseRecognizer & Recognizer2D classes defined.__init__.py
Imports, adds to all, and disclaims.base.py
Base class for model recognizers in PaddleVideorecognizer2d.py
Trains 2D models using PaddleVideo's Recognizer2D class
__init__.py
Head models for Video Quality Assessment in PaddleVideo.base.py
PaddleVideo BaseHead model for Video Quality Assessmenttsm_rec_head.py
TSMRecHead: TSN-based classifier for TSNstsn_head.py
TSN head for video quality assessment
__init__.py
Implements loss functions for PaddleVideo.base.py
Base loss function for PaddleVideo, requires subclass for _forward method.l1_loss.py
L1 Loss: Image/Video Quality Assessmentsmooth_l1_loss.py
Smooth L1 Loss Function
registry.py
Registers various models in PaddleVideo's VQA app.weight_init.py
Custom weight initialization in PaddlePaddle
__init__.py
PaddleVideo: Building Video Quality Optimizer and LR.custom_lr.py
Learning rate schedulers for PaddleVideo optimization.lr.py
Creates learning rate scheduler for PaddleVideooptimizer.py
Constructs optimizer and learning rate scheduler for parameter optimization.
__init__.py
Import, define, train, test functions for PaddleVideo's Video Quality Assessment module.test.py
Tests Paddle model with parallel processingtrain.py
Train video quality model with PaddleVideo.
__init__.py
PaddleVideo utils module with Registry, build, and more.build_utils.py
Module Builder from Configconfig.py
Config and AttrDict functions for PaddleVideo.dist_utils.py
Dist utils for PaddleVideo's distributed video quality assessment.logger.py
Logger class for PaddleVideo's VQA app, enabling distributed logging.precise_bn.py
Precise batch normalization updates for PaddleVideo.record.py
Records metrics for Video Quality Assessment in PaddleVideo.registry.py
Registry: Objects registration and retrieval via names.save_load.py
Save, load functions for model weights using Paddle.
version.py
PaddleVideo library version "0.0.1" under Apache License 2.0.
README.md
Video quality assessment model using ppTSM network on KonVid-150k dataset.run.sh
TSM model training and testing scriptsave_model.sh
Saves best model from TSM_pptsm.yaml with 32 segments.setup.py
Video understanding with PaddlePaddle toolkits setup.
eval.py
Prepares environment, imports libraries, defines functions, and runs tests.FineTune.md
Fine-tune VideoTag model with custom data, AttentionLSTM, TSN__init__.py
Video metrics import from metrics_utilaccuracy_metrics.py
Calculates video accuracy metrics in PaddleVideo's VideoTag app.
metrics_util.py
Evaluates metrics in video analysis tasks.average_precision_calculator.py
Calculates interpolated average precision for classification tasks.eval_util.py
Video classification model evaluation metrics in PaddleVideomean_average_precision_calculator.py
Mean Average Precision Calculator for Youtube-8m Dataset
__init__.py
Registered models AttentionLSTM and TSN for easy retrieval.__init__.py
Imports functions and classes from "attention_lstm.py" for easy access.attention_lstm.py
AttentionLSTM model for video tagginglstm_attention.py
LSTM Attention Model: Dynamic LSTM, Dropout
model.py
VideoTag app's Python model module for Paddle.__init__.py
Imports all "tsn" subdirectory components.tsn.py
TSN model class with segmentation and training parameters.tsn_res_model.py
TSN ResNet model for PaddlePaddle
utils.py
Decompress, download functions and AttrDict class.
predict.py
Predicts video tags using PaddleVideo's models.__init__.py
VideoTag reader classes imported, registered. Alphabetical order.feature_reader.py
YouTube-8M LSTM DataReader in Python.kinetics_reader.py
KineticsReader: Efficient Kinetics dataset reader with data augmentation.reader_utils.py
Register and retrieve readers in Zoo
README.md
VideoTag: Large-scale video classification via image modeling and sequence learning.Run.md
VideoTag app installation guide and usage.Test.md
Test VideoTag model on custom data using Python script.train.py
Video tagging model training in Pythontsn_extractor.py
Inference script for videos with Infer model.config_utils.py
Config utils for PaddleVideo's VideoTag apptrain_utils.py
PaddlePaddle training utility with dataloader.utility.py
Ensures PaddlePaddle version 1.6.0 installed, handles GPU usage.
videotag_test.py
Video tagging model performance testing with PaddlePaddle and PaddleVideo.
README.md
Run benchmark script for TimeSformer model in PaddleVideo.run_all.sh
Benchmark TimeSformer on PaddleVideo with UCF101 dataset.run_benchmark.sh
TimeSformer video classification benchmark tests
prepare_asrf_data.py
Prepares ASRF data for classification.transform_segmentation_label.py
Process video data for labeling and organization.
download_dataset.sh
Download and unzip NTU-RGB-D skeleton data, then statistics.get_raw_denoised_data.py
Processes and denoises NTU RGB-D dataset data.get_raw_skes_data.py
Extracts body info from skeleton data using NTU RGB-D dataset.seq_transformation.py
Python script for NTU RGB+D data processing, splitting, and encoding.
auto-log.cmake
Autolog external project finder and inclusion with CMake
postprocess_op.h
Softmax Inplace Postprocessing Classpreprocess_op.h
Image preprocessing operations for PaddleVideo.utility.h
Utility functions for PaddleVideo operations.video_rec.h
Video recognition class with OpenCV, PaddlePaddle integration
readme.md
Deploys PaddleVideo models, C++, displays results, requires libcudnn.so fix.readme_en.md
Deploy PaddleVideo models on Linux with Docker support.main.cpp
Process video frames using OpenCV and PaddleVideo.postprocess_op.cpp
Softmax postprocessing for PaddleVideo.preprocess_op.cpp
Normalizes and scales images for inference in PaddleVideo library.utility.cpp
Utility functions: ReadDict, frame capture, and conversion.video_rec.cpp
Video AI inference, processing time measurement, GPU optimized.
build.sh
Setup and compile script for OpenCV, PaddlePaddle, CUDA, cuDNN, TensorRT.
paddle_env_install.sh
Setup PaddleVideo C++ serving environment with TensorRT.preprocess_ops.py
Composes image processing steps, preprocesses video frames.readme.md
Deploy Paddle Serving in Docker with GPU/CPU options.readme_en.md
Accelerates PaddleServing installation with Docker, Linux, and GPU support.run_cpp_serving.sh
Runs PaddleVideo server with PP-TSM/TSN models on different ports.serving_client.py
Video client for Paddle Serving using PaddleVideo
predict_onnx.py
Perform video object detection with ONNX predictor.readme.md
Converts PaddlePaddle to ONNX for inferencereadme_en.md
Convert Paddle2ONNX PP-TSN model for video prediction.
pipeline_http_client.py
Serves PaddleVideo models, parses args, sends video data via HTTP.pipeline_rpc_client.py
Web serving client for PaddleVideo modelsreadme.md
Deploy PaddlePaddle model for serving with PaddleServing.readme_en.md
Deploy PaddleServing for HTTP deep learning prediction.recognition_web_service.py
Python web service for image recognition using PaddlePaddleutils.py
Converts video frames to numpy array, base64 strings.
quant_post_static.py
Introduces quantization in PaddleVideo for GPU utilization.readme.md
PaddleVideo model compression with PaddleSlim.readme_en.md
Model compression library for PaddleVideo with quantization, pruning, and distillation.
benchmark.md
PaddleVideo: Speed benchmark, Slowfast 2x faster, Action segmentation on Breakfast dataset.ActivityNet.md
ActivityNet dataset prep for PaddleVideoAVA.md
AVA dataset: download, cut, frame extraction, organization.fsd.md
Figure Skating Dataset: 30 fps, Open Pose key points, train/test data available.k400.md
Download and extract Kinetics-400 dataset frames.msrvtt.md
MSR-VTT: 10K videos, ActBERT model, multi-modal transformers.ntu-rgbd.md
Prepares NTU RGB+D dataset for CTR-GCNOxford_RobotCar.md
Oxford-RobotCar: Day-Night Depth Estimation DatasetREADME.md
Comprehensive action recognition datasets table.SegmentationDataset.md
Video action segmentation dataset using breakfast, 50salads, and gtea datasets with pre-training model features.ucf101.md
UCF101 dataset organization toolucf24.md
UCF24 dataset preparation with PaddleVideoyoutube8m.md
Massive YouTube video classification dataset.
install.md
Install PaddlePaddle and PaddleVideo, requirements, setup, usage.SlowFast_FasterRCNN_en.md
Train SlowFast Faster RCNN model for action detection
adds.md
Estimates depth using day/night images with ADDS-DepthNet code.
bmn.md
BMN model: temporal action proposal generation with commands.yowo.md
YOWO: Single-stage, channel fusion, attention, UCF101-24 pre-trained.
actbert.md
ActBERT: Multimodal, global action, TNT block, SOTA video-language.
transnetv2.md
TransNetV2: Deep Learning Video Segmentation Model
README.md
Model Zoo: Comprehensive Action Recognition & Segmentation Models.agcn.md
AGCN: Improved ST-GCN for Video Recognitionagcn2s.md
AGCN2s: Enhanced ST-GCN for motion recognition.attention_lstm.md
AttentionLSTM: LSTMs, Attention layer, YouTube-8M, PaddleVideoctrgcn.md
PaddlePaddle, CTR-GCN, NTU-RGB+D, bone-based behavior recognition, 99.9988% accuracymovinet.md
MoViNet: Lightweight Video Recognition Modelposec3d.md
Trains PoseC3D model, tests without GPU.pp-timesformer.md
PP-TimeSformer: Enhanced TimeSformer for video recognition, multi-GPU support.pp-tsm.md
PP-TSM: Action Recognition, UCF101/Kinetics, PaddlePaddle, ResNet101pp-tsn.md
Enhanced TSN model for video recognitionslowfast.md
SlowFast model: Video Recognition with Multigrid Training.stgcn.md
ST-GCN model training, testing, and inference.timesformer.md
TimeSformer: Efficient Video Classifiertokenshift_transformer.md
Token Shift Transformer: High Accuracy Video Classificationtsm.md
Trains TSM model on UCF-101, Kinetics-400 datasets. ResNet-50, PaddlePaddle, AMP, Momentum optimization, L2_Decay, three sampling methods.tsn.md
TSN: 2D-CNN video classification with ResNet-50 and Kinetics-400.tsn_dali.md
Improve TSN training speed with DALI for action recognition.videoswin.md
SOTA Video-Swin-Transformer model with multi-scale, efficient attention.
asrf.md
Improved video action segmentation model using PaddlePaddle.cfbi.md
CFBI video segmentation model, ECCV 2020.mstcn.md
Trains, evaluates, and compares MS-TCN video action segmentation models.
quick_start.md
PaddleVideo: Install, Use, Action Recognitiontools.md
Tools for PaddleVideo: model parameters, FLOPs calculation, and exported model testing. Python 3.7 required.accelerate.md
English/Chinese tutorial links provided.Action Recognition Datasets
Action Recognition Datasets: Essential resources for training and evaluation.Action Recognition Papers
Efficient Action Recognition Methodsconfig.md
Inversion-of-Control, Dependency Injection with Config Filecustomized_usage.md
Customize PaddleVideo framework elements.demos
Demo: Action Recognition with Various Algorithmsdeployment.md
Convert dygraph models for deployment with PaddleInference.modular_design.md
Modular Design Tutorial Translationpp-tsm.md
Optimized video recognition model for high performance and fast inference.Spatio-Temporal Action Detection Papers
Comprehensive list of Spatio-Temporal Action Detection papers, 2015-2017.summarize.md
Video Classification: RGB, Skeleton, Deep LearningTemporal Action Detection Papers
Action detection and localization in videos through research papers.TSM.md
TSM: Efficient Video Understanding, Balanced Performance
usage.md
PaddleVideo Linux setup, multi-card training, log format, fine-tuning, benchmarking
main.py
Parallelized PaddleVideo model training with distributed environments.MANIFEST.in
Includes essential files for PaddleVideo package distribution.__init__.py
PaddleVideo library initialization.__init__.py
PaddleVideo loader/init functions for datasets, dataloaders, pipelines.builder.py
Constructs PaddleVideo pipeline, builds loader for distributed training.dali_loader.py
Dali-powered Paddle video loader__init__.py
Imports PaddleVideo dataset classes for video understanding.actbert_dataset.py
ActBERT dataset setup for PaddlePaddle video processingasrf_dataset.py
Action segmentation dataset loader for PaddleVideoava_dataset.py
AVA Dataset Class for PaddleVideobase.py
BaseDataset class for PaddlePaddle with loading, preparing, retrieving. Supports list results.bmn_dataset.py
BMNDataset: Action localization video datasets loader.davis_dataset.py
VOS Test class for Davis 2017 dataset in PaddleVideo.feature.py
PaddleVideo library's FeatureDataset initialization and methods.frame.py
PaddleVideo library: load, transform, process video data.MRI.py
MRI dataset loader for action recognitionMRI_SlowFast.py
MRI_SlowFast: Imports, classes, processes video data for training/validation.ms_tcn_dataset.py
MS-TCN dataset class and loadermsrvtt.py
Prepares MSRVTT dataset for training/testing.oxford.py
Oxford dataset class for PaddleVideo.skeleton.py
Skeleton Dataset Loader for Action Recognitionslowfast_video.py
Video dataset for action recognition using PaddleVideo's SFVideoDataset.ucf101_skeleton.py
UCF101 Skeleton Dataset PaddleVideo Classucf24_dataset.py
Ucf24Dataset: PaddleVideo's UCF24 dataset loader.video.py
VideoDataset: Raw video loader with index, handling corruption.
__init__.py
PaddleVideo pipeline components and modules import.anet_pipeline.py
PaddleVideo: Feature extraction, IoU calculation, and data preparation.augmentations.py
PaddleVideo loader with resize, augmentation.augmentations_ava.py
AVA dataset augmentation and resizing in PaddleVideo.compose.py
Unifies pipeline components, flexible transformations.decode.py
TimeSformer model for video decoding in PaddleVideodecode_image.py
Decode image pipeline for PaddleVideo.decode_sampler.py
Video decoding pipeline with PIL conversion.decode_sampler_MRI.py
Decodes and samples MRI frames for SFMRI.mix.py
Video Mix Operator: Data Augmentation for Image Classificationmultimodal.py
Introduces FeaturePadding class for multimodal data preprocessing.sample.py
Python code defines PaddleVideo pipeline for efficient frame sampling.sample_ava.py
SampleFrames class for PaddleVideo loader pipelines in OpenCV.sample_ucf24.py
SamplerUCF24: Pipeline for sampling frames from videos.segmentation.py
Multi-scale segmentation for PaddleVideo.segmentation_pipline.py
Segmentation sampler for PaddleVideo pipelines.skeleton_pipeline.py
Data processing classes for efficient PaddleVideo.
registry.py
Custom pipeline and dataset registries for PaddleVideo.
__init__.py
Imports various metrics for video analysis.__init__.py
Imports ANETproposal as public API.anet_prop.py
Efficiently calculates metrics for ActivityNet proposals.
metrics.py
Calculates precision, recall, and CorLoc metrics for object detection.np_box_list.py
Manages bounding boxes, checks validity.np_box_ops.py
Numpy bounding box ops for IoU calculationobject_detection_evaluation.py
Object detection evaluation module for AVA dataset with mAP metrics.per_image_evaluation.py
Python code for AVA performance metrics evaluation.README.md
Action recognition metrics evaluation toolstandard_fields.py
Standardizes object detection metrics.
ava_metric.py
AVA metric computation for PaddleVideo.ava_utils.py
AVA metric handling for video sequencesbase.py
PaddleVideo Base Metrics Classbmn_metric.py
Calculates BMN metric for object detection.build.py
Python script, Apache License 2.0, builds metrics from configcenter_crop_metric.py
Center crop metric for PaddleVideo.center_crop_metric_MRI.py
Calculates top-1/5 accuracy metrics for image classification tasks with multi-GPU support.depth_metric.py
DepthMetric: Process, accumulate, all-reduce, and average metrics.msrvtt_metric.py
MS-RNN/VTT model metrics computation and loggingmulti_crop_metric.py
MultiCrop metric for PaddleVideo accuracyrecall.py
Calculate recall for object detection using IoU and image proposals.registry.py
Manage diverse metrics with Registry class in utils module.segmentation_metric.py
Label change detection with precision, recall, F1 score in video segmentation.skeleton_metric.py
Skeleton metric: Skeleton-based model performance evaluation.transnetv2_metric.py
TransNetV2 metrics: precision, recall, F1-score calculation.ucf24_utils.py
Ucf24Metrics: UCF101 dataset utilities for precision/recall, mAP.vos_metric.py
VOSMetric: Video Object Segmentation Metricsaverage_precision_calculator.py
Average precision calculator for video detection tasks.eval_util.py
Evaluates PaddleVideo performance in Youtube8m using Hit@1 and precision.mean_average_precision_calculator.py
Calculates mAP for ranked lists with interpolated precisions.
yowo_metric.py
YOWOMetric class for PaddleVideo metrics calculation and logging
__init__.py
PaddleVideo model registry and functions initialization__init__.py
Imports MaxIoUAssignerAVA for importability.max_iou_assigner_ava.py
Max IoU Assigner AVA classifier
__init__.py
Backbone models for video analysis in PaddleVideo.actbert.py
BERT embeddings for multimodal video action recognition with ACTBERT backboneadds.py
PaddleVideo backbone with ResNet V1.5, DepthDecoder, and PoseDecoder for diverse pose estimation.agcn.py
AGCN: Adaptive GCN Backbone for Graph Convolution Tasksagcn2s.py
Temporal Convolutional Networks with GCN Units for NTURGB+Dasrf.py
ASRF backbone for PaddleVideo tasksbmn.py
BMN model: Masks, ConvLayers for BMSN backbone in Paddle.aicfbi.py
FPN-based CFBI model for feature extractionctrgcn.py
CTRGCN backbone for video models with batch norm and NTUGraphdarknet.py
Darknet backbone with ConvBNLayer, concatenation, convolutions.deeplab.py
Deeplab: PaddlePaddle Network for Semantic Segmentationmovinet.py
MoViNet model with MobileNetV2 layers for video analysisms_tcn.py
MSTCN backbone, SingleStageModel class, Kaiming initializationpptsm_mv2.py
MobileNetV2 backbone for PaddlePaddlepptsm_mv3.py
PPTSM-Mv3 Backbone for PaddleVideopptsm_v2.py
Python module: PPTSM_v2 video backbone.resnet.py
ResNet backbone model with dynamic blocksresnet3d.py
3D ResNet model for PaddleVideoresnet3d_slowonly.py
SlowOnly 3D ResNet backbone with Slowfast pathwayresnet_slowfast.py
ResNetSlowFast for video recognition with slow-fast pathways.resnet_slowfast_MRI.py
Initialize ResNet SlowFast MRI model for video analysis.resnet_tsm.py
ResNet-TSM Backbone: Deprecated, Needs Updatesresnet_tsm_MRI.py
ResNet-TSM MRI model in PaddleVideo.resnet_tsn_MRI.py
ResNet-TSN model, PaddlePaddle, weights init, results outputresnet_tweaks_tsm.py
TSM ResNet backbone for Temporal Segment Networksresnet_tweaks_tsn.py
ResNet TSN backbones in PaddleVideo library.resnext101.py
ResNeXt-101 PaddlePaddle model defined.stgcn.py
STGCN model for spatio-temporal data processing in Pythonswin_transformer.py
Swin Transformer backbone for image processing in PaddleVideo.toshift_vit.py
Vision Transformer Model with Token Shifttransnetv2.py
OctConv3D 3D convolutional layer for shot transition detection in TransNetV2vit.py
VisionTransformer-based video processing in PaddleVideo.vit_tweaks.py
Vit_tweaks: VisionTransformer enhancements.yowo.py
YOWO: YOLO-style backbone for video classification
bbox_utils.py
Bounding box utilities for YOLO.builder.py
Dynamic video object detection model builder.__init__.py
PaddleVideo framework: base classes for model modeling.__init__.py
PaddleVideo detectors: Base, FastRCNN, TwoStage.base.py
BaseDetector: Parent for detectors, common features and abstract train_step.fast_rcnn.py
FastRCNN: Two-stage object detection.two_stage.py
Two-stage detector SlowFast model Python implementation.
__init__.py
Imports, classes, DepthEstimator, BaseEstimatorbase.py
BaseEstimator class for PaddleVideo modelingdepth_estimator.py
DepthEstimator: Depth estimation model with forward_net, loss.
__init__.py
PaddleVideo localizers for video tasks in various techniques.base.py
Base class for PaddlePaddle localization modelsbmn_localizer.py
BMN Localizer Model for PaddleVideoyowo_localizer.py
YOWOLocalizer: NMS, matching, precision, recall, F-score.yowo_utils.py
YOLOv2 post-processing with PaddlePaddle
__init__.py
Multimodal Model Framework for PaddleVideoactbert.py
Multimodal ActBert model for text, video, action prediction.base.py
Multimodal model base class for PaddleVideo.
__init__.py
PaddleVideo Partitioner Initialization Filebase.py
Base partitioner class for PaddleVideo's modeling frameworktransnetv2_partitioner.py
TransNetV2 Partitioner for PaddleVideo
__init__.py
Imported recognizers for video recognition tasks in PaddleVideo.base.py
Base recognizer model initialization and operation modes.recognizer1d.py
1D recognizer model in PaddleVideorecognizer2d.py
2D video model for PaddleVideo analysisrecognizer3d.py
Recognizer3D: 3D model framework, training, validation, testingrecognizer3dMRI.py
3D Recognizer model for PaddleVideo training, validation, and testing.recognizer_gcn.py
Introduces GCN Recognizer model for PaddleVideo.recognizer_movinet_frame.py
MoViNetRecognizerFrame: Framework for model training, testing, inference.recognizer_transformer.py
Transformer-based recognizer model implementationrecognizer_transformer_MRI.py
RecognizerTransformer_MRI: Multi-view MRI classifier.recognizerDistillation.py
Recognizer distillation in PaddleVideo framework.recognizerMRI.py
2D Image Classifier Model using RecognizerMRI.
__init__.py
Segment models in PaddleVideo, BaseSegment and CFBI.base.py
Semi-Video Object Segmentation abstract base classcfbi.py
CFBI model for image segmentation and video processing with instance-level attention.utils.py
PaddleVideo segment matching, ASPP models, padding, distances, data prep.
__init__.py
PaddleVideo segmenter module with 3 classes for feature extraction, segmentation, and separation.asrf.py
ASRF segmenter: ASRF backbone, head network, validation.base.py
BaseSegmenter: Foundation for PaddleVideo segmenters with train, valid, test, and inference.ms_tcn.py
MS-TCN video segmenter modelutils.py
Gaussian smoothing, Kaiming Uniform for layer init.
__init__.py
Imports various head classes for video tasks.adds_head.py
AddHead: Detection, Loss, Metrics, Multi-GPU Supportagcn2s_head.py
AGCN2sHead: PaddleVideo model head with channels, classes, people, dropoutasrf_head.py
ASRF head: Action recognition, conv. layers, precision, recall, F1attention_lstm_head.py
Attention LSTM head for PaddleVideobase.py
Base PaddleVideo head: Classification, loss/accuracy, label smoothing.bbox_head.py
Object detection bbox head with classification targets, dropout, and recall/precision calculation.cfbi_head.py
CollaborativeEnsemblerMS neural network architecture.ctrgcn_head.py
CTRGCN Head: Neural network head for CTR-GCN model in PaddleVideoi3d_head.py
I3D head with pooling, dropout, and options.movinet_head.py
MoViNetHead: BaseHead extension, unmodified input.ms_tcn_head.py
MS-TCN Head model: F1 scoring, loss calculation.pptimesformer_head.py
PaddlePaddle TimeSformer Headpptsm_head.py
PaddlePaddle Video: ppTSMHead class for TSN models.pptsn_head.py
PaddlePaddle ppTSN head for classification tasks with dropout.roi_extractor.py
RoIAlign extractor for PaddlePaddleroi_head.py
ROI alignment head with NMS for bounding boxes.single_straight3d.py
3D ROI extractor class for feature extractionslowfast_head.py
SlowFast head 3D projection PaddleVideo classstgcn_head.py
STGCNHead: Convolutional layer, forward pass in PaddlePaddletimesformer_head.py
TimeSformer Head: Model Head for TimeSformertoken_shift_head.py
TokenShiftHead: Paddle's Linear classification headtransnetv2_head.py
TransNetV2Head: CV model head, loss calculation.tsm_head.py
TSM head: TSNHead extension for classification taskstsn_head.py
TSN head: Adaptive pooling, dropout, class scores
__init__.py
Imports diverse losses for PaddleVideo tasks.actbert_loss.py
ActBert loss function codeasrf_loss.py
Custom video modeling loss functions definedbase.py
Base loss function class for PaddlePaddle.bmn_loss.py
BMN loss function, PaddleVideo.cross_entropy_loss.py
Custom PaddlePaddle loss for classification tasksdepth_loss.py
Calculates depth losses for disparity tasks.distillation_loss.py
Distillation loss, KL divergence, CrossEntropy, weighted average.transnetv2_loss.py
TransNetV2 loss calculation classyowo_loss.py
Optimize hard examples, YOLOv3-style losses for bounding box location, confidence, classification on GPU.
registry.py
Model registration for efficient organization.__init__.py
Licensing, RandomSampler import.random_sampler.py
RandomSampler: Balanced positive-negative bbox sampling.
weight_init.py
Initialize layer weights in PaddlePaddle.
__init__.py
Imports optimizer and learning rate functions under Apache 2.0 license.custom_lr.py
Custom warmup & decay schedulers for PaddleVideo.lr.py
Learning rate scheduler construction.optimizer.py
Initializes optimizer configurations, handles decay and creates optimizer.
__init__.py
PaddleVideo tasks module initialization filetest.py
Test PaddlePaddle models in paralleltrain.py
Train PaddlePaddle's Distributed Model with AMP, Log, Evaluate, and Save.train_dali.py
Trains DALI TSN model with optimization steps, logs metrics.train_multigrid.py
Trains PaddleVideo models with multigrid configuration.
__init__.py
PaddleVideo utils initialization.build_utils.py
Build utility function for object creation from config and registry.config.py
Configure and load YAML files with AttrDict, logger, visualization support.dist_utils.py
Distributed computing utilities for PaddleVideo.logger.py
PaddleVideo's colorful logger setup with non-propagation.__init__.py
Imports useful functions from PaddleVideo library.batchnorm_helper.py
BatchNorm3D helper for PyTorchinterval_helper.py
Determine evaluation epoch using multigrid schedulemultigrid.py
Manages multigrid training schedules.save_load_helper.py
Ensures state dict consistency and loads pre-trained weights.short_sampler.py
DistributedShortSampler: Efficient Distributed Data Loading
precise_bn.py
Precise Batch Normalization for faster trainingprofiler.py
PaddleVideo Profiler: Performance Analysis and Optimization Toolrecord.py
Records metrics, calculates means, logs progress in training processes.registry.py
Registry for mapping names to objects with registration and retrieval methods.save_load.py
Save/load functions for PaddlePaddle models.
version.py
PaddleVideo version 0.0.1, Apache License 2.0
README.md
Advanced video processing library with deployment supportREADME_en.md
Deep learning library for video processingrun.sh
Trains deep learning models for computer vision tasks using PaddlePaddle.setup.py
PaddleVideo setup: PaddlePaddle-powered video tool, Python 3.7.benchmark_train.sh
Benchmarks PaddleVideo model, adjusting batch sizes and precisions for performance.common_func.sh
Common function script for parsing, setting, and status checking.compare_results.py
Compares log results with ground truth for testing.extract_loss.py
Extracts and processes expressions for calculations.prepare.sh
Prepare PaddlePaddle's video models, handle data & install packagesREADME.md
TIPC-supported PaddleVideo tutorials and features.test_inference_cpp.sh
PaddleVideo model inference tests with MKLDNN or float point precision.test_paddle2onnx.sh
Convert PaddlePaddle to ONNX.test_ptq_inference_python.sh
Script for PaddleVideo model inference tests on GPUs/CPUs with varied batch sizes.test_serving_infer_cpp.sh
Bash script initializes model, serves via Python/C++, tests web function with IFS separation.test_serving_infer_python.sh
Configures model serving, sets up API server, transfers model, and handles cleanup tasks.test_train_dy2static_python.sh
Trains and analyzes two models, compares losses, logs differences.test_train_inference_python.sh
Efficient PaddleVideo model training and evaluation.test_train_inference_python_npu.sh
Update config for NPU, disable MKLDNN on non-x86_64, set Python to 3.9, change exec script to "npu".test_train_inference_python_xpu.sh
XPU PaddleVideo script with Python 3.9 NPU, logs start.
__init__.py
Imports and defines package contents.ava_predict.py
AVA model prediction with PaddleVideo and OpenCV.export_model.py
Export PaddleVideo model functions.predict.py
Paddle Video Tool: Predict with TensorRT YOWOsummary.py
PaddleVideo model construction and evaluation.utils.py
PaddleVideo tools for inference, detection, and pose estimationwheel.py
Wheel.py: Downloads, initializes, and iterates model results for video label prediction.