ConvNeXt V2 (base-sized model)

ConvNeXt V2 model pretrained using the FCMAE framework at resolution 224x224, on the ImageNet-1K dataset. It was introduced in the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Woo et al. and first released in this repository.

Additionally, we initialize the head and final layernorm of this model. The head is initialized with a scale of 0.001 (multiply the random init by 0.001), as recommended in the original paper.

Disclaimer: The team releasing ConvNeXT V2 did not write a model card for this model so this model card has been written by the Hugging Face team. This model card is a fork taken from https://huggingface.co/facebook/convnextv2-base-22k-224

Model description

ConvNeXt V2 is a pure convolutional model (ConvNet) that introduces a fully convolutional masked autoencoder framework (FCMAE) and a new Global Response Normalization (GRN) layer to ConvNeXt. ConvNeXt V2 significantly improves the performance of pure ConvNets on various recognition benchmarks.

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import AutoImageProcessor, ConvNextV2ForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

preprocessor = AutoImageProcessor.from_pretrained("facebook/convnextv2-base-22k-224")
model = ConvNextV2ForImageClassification.from_pretrained("facebook/convnextv2-base-22k-224")

inputs = preprocessor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2301-00808,
  author    = {Sanghyun Woo and
               Shoubhik Debnath and
               Ronghang Hu and
               Xinlei Chen and
               Zhuang Liu and
               In So Kweon and
               Saining Xie},
  title     = {ConvNeXt {V2:} Co-designing and Scaling ConvNets with Masked Autoencoders},
  journal   = {CoRR},
  volume    = {abs/2301.00808},
  year      = {2023},
  url       = {https://doi.org/10.48550/arXiv.2301.00808},
  doi       = {10.48550/arXiv.2301.00808},
  eprinttype = {arXiv},
  eprint    = {2301.00808},
  timestamp = {Tue, 10 Jan 2023 15:10:12 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2301-00808.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Downloads last month: 1

Safetensors

Model size

88.7M params

Tensor type

F32