Project structure of: NExT-ChatV/NExT-Chat
nextchat.pyNextChat model configuration code
eval.pyTraining arguments config for model.nextchat_fsdp.pyTrains LlamaDecoderLayer with mixed precision, distributed learningnextchat_non.pyTraining args for ML model: output dir, batch size, epochs, learning rate.
nextchat_eval_multi_rec.pyMulti-predict NextChat model evaluation setupnextchat_eval_multi_res.pyNextchat model prep for evaluation and predictionnextchat_eval_pope.pyBase config for NExT-Chat training, evaluation, and prediction tasks.nextchat_eval_reg_cap.pyConfigure NextChat for evaluation & predictionnextchat_stage1.pyNextChat Stage 1 model initialization and training arguments.nextchat_stage2.pyNextChat Stage 2 config: base, train args, model-specific, 2 epochs, 4096 tokens, FairSeqnextchat_stage3.pyNextChat stage 3 config: dataset, model, training params.
DATA.mdOrganize Shikra data in 'data' folder, update image settings.eval_pope.shLaunch 8-process CUDA finetuning job for NextChat model evaluation.eval_rec.shMulti-modal language model finetuning with evaluation.eval_reg_cap.shMulti-process accelerated ML model fine-tuning, config: nextchat_eval_reg_cap.pyeval_res.shAccelerate library launched, executes mllm/pipeline with specific config, CUDA devices.__init__.pyPrepares arguments for program usage.config.pyTraining configuration setup.
__init__.pyImports essentials from base_conversation module.base_conversation.pyConversation class, styling functions, gradio/OpenAI support, renewable energy benefits.
__init__.pyImports and consolidates functionalities for data processing.builder.pyBuilds NExT-Chat dataset with transforms__init__.pyImports various process functions for handling data processing.box_process_function.pyBoxFormatProcess & TokenFormatter combined for NExT-Chat model preprocessingchat_process_function.pyProcess chat data for AI models.
root.pyConversation data processors with placeholders and three function types.single_image_convsation.pySingle-image conversation dataset class.single_image_interactive.pySingle image dataset class for chat apps.__init__.pyImports util functions for data tasks.compute_metrics.pyCompute transformer model metrics and decode IDsconcatenate_dataset.pyConcatenates, undersamples, and oversamples PyTorch datasets.flickr30k_entities_utils.pyFlattens Flickr30kEntities dataset for efficient storage.io.pyReads images from paths, handles S3, logs timemixin.pyMixin-based question template dataset classtransform.pyImage and bounding box transformations for NExT-Chat dataset
bash_demo.pyTest bash demo for NextChat model interaction.demo_util.pyGenerate AI chat system responses with image processingweb_demo.pyGUI chatbot with image recognition, grounding, captioning, and response capabilities.
__init__.pyImports modules for trainer classes and collator initialization.base_engine.pyBase engine: collator, prediction, metrics, loss, evaluation, saving.builder.pyPrepare trainer-collator for model with preprocessor dictionarynextchat.pyNextChatTrainer: MMLLM-based chatbot model trainer with state saving.
__init__.pyLoad pre-trained models with NextChat.__init__.pyLoad pre-existing models for further processing or training.build_nextchat.pyNextChat model builder initialization and customizationbuilder.pyLoad pre-trained NExT-Chat models.
__init__.pyImports NextChat classesnextchat_base.pyInitializes AI chat models, handles multimodal inputs, checks shapes. Prints box_iou mean.nextchat_seg.pyLanguage Modeling Class with Vision Tower
modeling_sam.pyImage attending Transformer model with MHA, window attention.sam_loss.pyCustom loss functions for segmentation models. Focal, Dice, and IOU losses combined.transforms.pyImage resizing and padding transforms intransforms.py
finetune.pyTrains ML model, logs metrics, saves models, handles errors, multi-predict.finetune_mem.pyPython script finetunes LLaMA with FlashAttn for memory efficiency.
__init__.pyImports utilities for image and text tasks.box_ops.pyBox operation utilities for bounding box manipulation and more.common.pyImage processing and text manipulation utilities.
README.mdNextChat: Chat LMM with image-based text generation, improved modelrequirements.txtEssential Python libraries for codebase execution.run_stage1.shFine-tune multimodal language model in 2 epochs.run_stage2.shRuns Stage 2, finetunes model with 8 processesrun_stage3.shRuns 8-process accelerated training for stage 3 model, saving every 5000 steps.