ConvNeXt V2 (base-sized model)
ConvNeXt V2 model pretrained using the FCMAE framework at resolution 224x224, on the ImageNet-1K dataset. It was introduced in the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Woo et al. and first released in this repository.
Additionally, we initialize the head and final layernorm of this model. The head is initialized with a scale of 0.001 (multiply the random init by 0.001), as recommended in the original paper.
Disclaimer: The team releasing ConvNeXT V2 did not write a model card for this model so this model card has been written by the Hugging Face team. This model card is a fork taken from https://huggingface.co/facebook/convnextv2-base-22k-224
Model description
ConvNeXt V2 is a pure convolutional model (ConvNet) that introduces a fully convolutional masked autoencoder framework (FCMAE) and a new Global Response Normalization (GRN) layer to ConvNeXt. ConvNeXt V2 significantly improves the performance of pure ConvNets on various recognition benchmarks.
Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.
How to use
Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
from transformers import AutoImageProcessor, ConvNextV2ForImageClassification
import torch
from datasets import load_dataset
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
preprocessor = AutoImageProcessor.from_pretrained("facebook/convnextv2-base-22k-224")
model = ConvNextV2ForImageClassification.from_pretrained("facebook/convnextv2-base-22k-224")
inputs = preprocessor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),
For more code examples, we refer to the documentation.
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2301-00808,
author = {Sanghyun Woo and
Shoubhik Debnath and
Ronghang Hu and
Xinlei Chen and
Zhuang Liu and
In So Kweon and
Saining Xie},
title = {ConvNeXt {V2:} Co-designing and Scaling ConvNets with Masked Autoencoders},
journal = {CoRR},
volume = {abs/2301.00808},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2301.00808},
doi = {10.48550/arXiv.2301.00808},
eprinttype = {arXiv},
eprint = {2301.00808},
timestamp = {Tue, 10 Jan 2023 15:10:12 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2301-00808.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- Downloads last month
- 3