Project structure of: NExT-ChatV/NExT-Chat
nextchat.py
NextChat model configuration code
eval.py
Training arguments config for model.nextchat_fsdp.py
Trains LlamaDecoderLayer with mixed precision, distributed learningnextchat_non.py
Training args for ML model: output dir, batch size, epochs, learning rate.
nextchat_eval_multi_rec.py
Multi-predict NextChat model evaluation setupnextchat_eval_multi_res.py
Nextchat model prep for evaluation and predictionnextchat_eval_pope.py
Base config for NExT-Chat training, evaluation, and prediction tasks.nextchat_eval_reg_cap.py
Configure NextChat for evaluation & predictionnextchat_stage1.py
NextChat Stage 1 model initialization and training arguments.nextchat_stage2.py
NextChat Stage 2 config: base, train args, model-specific, 2 epochs, 4096 tokens, FairSeqnextchat_stage3.py
NextChat stage 3 config: dataset, model, training params.
DATA.md
Organize Shikra data in 'data' folder, update image settings.eval_pope.sh
Launch 8-process CUDA finetuning job for NextChat model evaluation.eval_rec.sh
Multi-modal language model finetuning with evaluation.eval_reg_cap.sh
Multi-process accelerated ML model fine-tuning, config: nextchat_eval_reg_cap.pyeval_res.sh
Accelerate library launched, executes mllm/pipeline with specific config, CUDA devices.__init__.py
Prepares arguments for program usage.config.py
Training configuration setup.
__init__.py
Imports essentials from base_conversation module.base_conversation.py
Conversation class, styling functions, gradio/OpenAI support, renewable energy benefits.
__init__.py
Imports and consolidates functionalities for data processing.builder.py
Builds NExT-Chat dataset with transforms__init__.py
Imports various process functions for handling data processing.box_process_function.py
BoxFormatProcess & TokenFormatter combined for NExT-Chat model preprocessingchat_process_function.py
Process chat data for AI models.
root.py
Conversation data processors with placeholders and three function types.single_image_convsation.py
Single-image conversation dataset class.single_image_interactive.py
Single image dataset class for chat apps.__init__.py
Imports util functions for data tasks.compute_metrics.py
Compute transformer model metrics and decode IDsconcatenate_dataset.py
Concatenates, undersamples, and oversamples PyTorch datasets.flickr30k_entities_utils.py
Flattens Flickr30kEntities dataset for efficient storage.io.py
Reads images from paths, handles S3, logs timemixin.py
Mixin-based question template dataset classtransform.py
Image and bounding box transformations for NExT-Chat dataset
bash_demo.py
Test bash demo for NextChat model interaction.demo_util.py
Generate AI chat system responses with image processingweb_demo.py
GUI chatbot with image recognition, grounding, captioning, and response capabilities.
__init__.py
Imports modules for trainer classes and collator initialization.base_engine.py
Base engine: collator, prediction, metrics, loss, evaluation, saving.builder.py
Prepare trainer-collator for model with preprocessor dictionarynextchat.py
NextChatTrainer: MMLLM-based chatbot model trainer with state saving.
__init__.py
Load pre-trained models with NextChat.__init__.py
Load pre-existing models for further processing or training.build_nextchat.py
NextChat model builder initialization and customizationbuilder.py
Load pre-trained NExT-Chat models.
__init__.py
Imports NextChat classesnextchat_base.py
Initializes AI chat models, handles multimodal inputs, checks shapes. Prints box_iou mean.nextchat_seg.py
Language Modeling Class with Vision Tower
modeling_sam.py
Image attending Transformer model with MHA, window attention.sam_loss.py
Custom loss functions for segmentation models. Focal, Dice, and IOU losses combined.transforms.py
Image resizing and padding transforms intransforms.py
finetune.py
Trains ML model, logs metrics, saves models, handles errors, multi-predict.finetune_mem.py
Python script finetunes LLaMA with FlashAttn for memory efficiency.
__init__.py
Imports utilities for image and text tasks.box_ops.py
Box operation utilities for bounding box manipulation and more.common.py
Image processing and text manipulation utilities.
README.md
NextChat: Chat LMM with image-based text generation, improved modelrequirements.txt
Essential Python libraries for codebase execution.run_stage1.sh
Fine-tune multimodal language model in 2 epochs.run_stage2.sh
Runs Stage 2, finetunes model with 8 processesrun_stage3.sh
Runs 8-process accelerated training for stage 3 model, saving every 5000 steps.