Video Classification

Here stores all fine-tuned weights of our dataset.

Please view our GitHub and dataset firstly:


Performance comparison of models fine-tuned on MicroG-4M for HAR

Arch TC Backbone #Params (M) mAP (%) F1-score (%) Recall (%) AUROC (%)
C2D 8×8 R50 23.61 29.51 8.09 6.58 83.49
C2D NLN 8×8 R50 30.97 44.64 28.30 24.86 89.40
I3D 8×8 R50 27.33 46.41 26.37 22.25 88.79
I3D NLN 8×8 R50 34.68 47.12 28.07 24.65 88.52
Slow 8×8 R50 31.74 45.19 26.13 22.77 88.49
Slow 4×16 R50 31.74 46.37 28.72 25.38 88.30
SlowFast 8×8 R50 33.76 43.02 22.63 18.98 88.51
SlowFast 4×16 R50 33.76 42.10 23.69 20.18 87.54
MViTv1 16×4 B-CONV 36.34 12.86 5.54 4.66 74.63
MViTv2 16×4 S 34.27 15.14 8.16 7.17 78.61
X3D 13×6 S 2.02 14.07 5.77 4.52 78.23
X3D 16×5 L 4.37 18.70 9.15 7.47 78.27

Note:

  • All models has been pretrained on Kinetics400 dataset and continually trained on MicroG-4M.
  • TC denotes the temporal configuration (frame length × sampling rate).
  • #Params indicates the number of parameters (in millions, M).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train LEI-QI-233/MicroG-4M-models