-
amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Text Generation • Updated • 19 • 1 -
amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Text Generation • Updated • 818 • 1 -
amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Updated • 8 -
amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Text Generation • Updated • 24
AMD
AI & ML interests
None defined yet.
together we advance_AI
AI is increasingly pervasive across the modern world. It’s driving our smart technology in retail, cities, factories and healthcare, and transforming our digital homes. AMD offers advanced AI acceleration from data center to edge, enabling high performance and high efficiency to make the world smarter.
Getting Started with Hugging Face Transformers
Looking for how to use the most common transformers on Hugging Face for inference workloads on select AMD Instinct™ accelerators and AMD Radeon™ GPUs using the AMD ROCm™ software? This base knowledge can be leveraged to start fine-tuning from a base model or even start developing your own model. General Linux and ML experience is a required pre-requisite.
1. Confirm you have a supported AMD hardware platform
Is my hardware supported with ROCm on Linux?
2. Install ROCm driver, libraries and tools
Follow the detailed installation instructions for your Linux based platform.
3. Install Machine Learning Frameworks
Pip installation is an easy way to acquire all the required packages and is described in more detail below.
If you prefer to use a container strategy, check out the pre-built images at ROCm Docker Hub and AMD Infinity Hub after installing the required dependancies.
PyTorch
AMD ROCm is fully integrated into the mainline PyTorch ecosystem. Pip wheels are built and tested as part of the stable and nightly releases. Go to pytorch.org and use the 'Install PyTorch' widget. Select 'Stable + Linux + Pip + Python + ROCm' to get the specific pip installation command.
An example command line (note the versioning of the whl file):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
TensorFlow
AMD ROCm is upstreamed into the TensorFlow github repository. Pre-built wheels are hosted on pipy.org
The latest version can be installed with this command:
pip install tensorflow-rocm
4. Use a Hugging Face Model
Now that you have the base requirements installed, get the latest transformer models.
pip install transformers
This allows you to easily import any of the base models into your python application. Here is an example using GPT2 in PyTorch:
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
All of the 200+ standard transformer models are regularly tested with our supported hardware platforms. Note that this also implies that all derivatives of those core models should also function correctly. Let us know if you run into issues at our ROCm Community page
Here are a few of the more popular ones to get you started:
Click on the 'Use in Transformers' button to see the exact code to import a specific model into your Python application.
5. Optimum Support
For a deeper dive into using Hugging Face libraries on AMD GPUs, check out the Optimum page describing details on Flash Attention 2, GPTQ Quantization and ONNX Runtime integration.
Details on getting started with Hugging Face models are available on the Optimum page
Serving a model with TGI
Text Generation Inference (a.k.a “TGI”) provides an end-to-end solution to deploy large language models for inference at scale.
TGI is already usable in production on AMD Instinct™ GPUs through the docker image ghcr.io/huggingface/text-generation-inference:latest-rocm
.
Make sure to refer to the documentation
concerning the support and any limitations.
Benchmarking
The Optimum-Benchmark is available as a utility to easily benchmark the performance of transformers on AMD GPUs, across normal and distributed settings, with various supported optimizations and quantization schemes.
Useful Links and Blogs
- Detailed Llama-3 results Run TGI on AMD Instinct MI300X
- Detailed Llama-2 results show casing the Optimum benchmark on AMD Instinct MI250
- Check out our blog titled Run a Chatgpt-like Chatbot on a Single GPU with ROCm
- Complete ROCm Documentation for installation and usage
- Extended training content and connect with the development community at the Developer Hub
Collections
6
-
amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Text Generation • Updated • 22 -
amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Text Generation • Updated • 22 -
amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid
Updated • 21 -
amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid
Text Generation • Updated • 32