Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational
suhara commited on
Commit
e5610bb
·
verified ·
1 Parent(s): dc376c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -129,9 +129,10 @@ Our models are designed and optimized to run on NVIDIA GPU-accelerated systems.
129
  ## Software Integration
130
 
131
  - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
132
- - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100
133
  - Operating System(s): Linux
134
 
 
135
  ### **Use it with Transformers**
136
 
137
  The snippet below shows how to use this model with Huggingface Transformers (tested on version 4.48.3).
@@ -276,6 +277,9 @@ docker run --runtime nvidia --gpus all \
276
  --mamba_ssm_cache_dtype float32
277
  ```
278
 
 
 
 
279
  #### Using Budget Control with a vLLM Server
280
 
281
  The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.
 
129
  ## Software Integration
130
 
131
  - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
132
+ - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100, Jetson AGX Thor
133
  - Operating System(s): Linux
134
 
135
+
136
  ### **Use it with Transformers**
137
 
138
  The snippet below shows how to use this model with Huggingface Transformers (tested on version 4.48.3).
 
277
  --mamba_ssm_cache_dtype float32
278
  ```
279
 
280
+ For Jetson AGX Thor, please use [this vLLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3).
281
+
282
+
283
  #### Using Budget Control with a vLLM Server
284
 
285
  The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.