[email protected]
		
	commited on
		
		
					Commit 
							
							·
						
						07a7f16
	
1
								Parent(s):
							
							5a57d92
								
Add training code link
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -34,7 +34,7 @@ In this repo, we are open-sourcing NVLM-1.0-D-72B (decoder-only architecture), t 
     | 
|
| 34 | 
         | 
| 35 | 
         | 
| 36 | 
         
             
            ## Reference(s)
         
     | 
| 37 | 
         
            -
            [Paper](https://arxiv.org/abs/2409.11402)   [Inference Code (HF)](https://huggingface.co/nvidia/NVLM-D-72B/tree/main)   [Training Code 
     | 
| 38 | 
         | 
| 39 | 
         
             
            ## Benchmark Results
         
     | 
| 40 | 
         
             
            We train our model with legacy [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/legacy) and adapt the codebase to Huggingface for model hosting, reproducibility, and inference.
         
     | 
| 
         @@ -103,7 +103,7 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows 
     | 
|
| 103 | 
         
             
            When converting Megatron checkpoint to Huggingface, we adapt [InternVL codebase](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B) to support model loading and multi-GPU inference in HF. 
         
     | 
| 104 | 
         
             
            We also use the tokenizer from [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/tree/main) when adapting the tokenizer to Huggingface, as it contains extra special tokens for vision tasks, e.g., `<|vision_pad|>`. 
         
     | 
| 105 | 
         
             
            We train NVLM-1.0-D-72B based on the [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct/tree/main) text-only model and [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) ViT model with our large-scale high-quality multimodal dataset. 
         
     | 
| 106 | 
         
            -
            For training code, please refer to [Megatron- 
     | 
| 107 | 
         | 
| 108 | 
         | 
| 109 | 
         
             
            ### Prepare the environment
         
     | 
| 
         | 
|
| 34 | 
         | 
| 35 | 
         | 
| 36 | 
         
             
            ## Reference(s)
         
     | 
| 37 | 
         
            +
            [Paper](https://arxiv.org/abs/2409.11402)   [Inference Code (HF)](https://huggingface.co/nvidia/NVLM-D-72B/tree/main)   [Training Code](https://github.com/NVIDIA/Megatron-LM/tree/NVLM-1.0/examples/multimodal/nvlm)   [Website](https://research.nvidia.com/labs/adlr/NVLM-1/) 
         
     | 
| 38 | 
         | 
| 39 | 
         
             
            ## Benchmark Results
         
     | 
| 40 | 
         
             
            We train our model with legacy [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/legacy) and adapt the codebase to Huggingface for model hosting, reproducibility, and inference.
         
     | 
| 
         | 
|
| 103 | 
         
             
            When converting Megatron checkpoint to Huggingface, we adapt [InternVL codebase](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B) to support model loading and multi-GPU inference in HF. 
         
     | 
| 104 | 
         
             
            We also use the tokenizer from [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/tree/main) when adapting the tokenizer to Huggingface, as it contains extra special tokens for vision tasks, e.g., `<|vision_pad|>`. 
         
     | 
| 105 | 
         
             
            We train NVLM-1.0-D-72B based on the [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct/tree/main) text-only model and [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) ViT model with our large-scale high-quality multimodal dataset. 
         
     | 
| 106 | 
         
            +
            For training code, please refer to [Megatron-Core](https://github.com/NVIDIA/Megatron-LM/tree/NVLM-1.0/examples/multimodal/nvlm).
         
     | 
| 107 | 
         | 
| 108 | 
         | 
| 109 | 
         
             
            ### Prepare the environment
         
     |