mlx-DeepDanbooru

Pure MLX implementation of DeepDanbooru Neural Network for Apple Silicon Chips: M1, M2, M3, M4; mlx-DeepDanBooru is available for: MacBook Pro / Air, Mac mini, iMac.

Usage

Image-to-Text, captioning, CLIP by using DeepDanBooru Model on Apple Devices.

MLX DeepDanBooru Model

This mlx-DeepDanBooru Model implementation is inspired by a PyTorch implementation of AUTOMATIC1111/TorchDeepDanbooru

Installation

conda create -n mlx026 python=3.12
conda activate mlx026
#
pip install numpy
pip install pillow

MLX is available on PyPI. To install the Python API, run:

pip install mlx

mlx-DeepDanbooru is base on mlx version: 0.26.1

Inference

python infer.py

Image Interrogate:

import numpy as np
from PIL import Image, ImageDraw

# using apple silicon's MLX 
# not Pytorch
import mlx.core as mx
from mlxDeepDanBooru.mlx_deep_danbooru_model import mlxDeepDanBooruModel


model_path = "models/model-resnet_custom_v3_mlx.npz"
tags_path = 'models/tags-resnet_custom_v3_mlx.npy'

mlx_dan = mlxDeepDanBooruModel()
mlx_dan.load_weights(model_path)
mx.eval(mlx_dan.parameters())


model_tags = np.load(tags_path)
print(f'total tags: {len(model_tags)}')

def danbooru_tags(fpath):
    tags = []
    pic = Image.open(fpath).convert("RGB").resize((512, 512))
    a = np.expand_dims(np.array(pic, dtype=np.float32), 0) / 255

    x = mx.array(a)
    y = mlx_dan(x)[0]

    for n in range(10):
        mlx_dan(x)
    for i, p in enumerate(y):
        if p >= 0.5:
            # 0.5 can be changed for demand: 0.0 ~ 1.0
            #print(model_tags[i].item(), p)
            tags.append(model_tags[i].item())

    return tags

image_count = 0
def image_infer(fpath):
    global image_count
    tags = danbooru_tags(fpath)
    image_count += 1
    return tags


t1 = time.time()
tags_1 = image_infer("example/1.png")
tags_2 = image_infer("example/2.png")

t2 = time.time()

print(tags_1)
# will show tags: ['1girl', 'beach', 'black_hair', 'blurry', 'blurry_background', 'blurry_foreground', 'building', 'bush', 'christmas_tree', 'day', 'depth_of_field', 'field', 'grass', 'lake', 'looking_at_viewer', 'mountain', 'nature', 'outdoors', 'palm_leaf', 'palm_tree', 'park', 'park_bench', 'path', 'photo_background', 'plant', 'river', 'road', 'skirt', 'sky', 'smile', 'striped', 'striped_dress', 'striped_shirt', 'tree', 'vertical-striped_shirt', 'vertical_stripes', 'rating:safe']

print(tags_2)
# will show tags: ['1girl', '3d', 'blurry', 'blurry_background', 'blurry_foreground', 'brown_eyes', 'brown_hair', 'bush', 'christmas_tree', 'cosplay_photo', 'day', 'depth_of_field', 'field', 'floral_print', 'foliage', 'forest', 'garden', 'grass', 'jungle', 'lips', 'long_hair', 'long_sleeves', 'looking_at_viewer', 'nature', 'on_grass', 'outdoors', 'palm_tree', 'park', 'path', 'plant', 'potted_plant', 'realistic', 'smile', 'solo', 'tree', 'upper_body', 'white_dress', 'rating:safe']

print("-----------")
print(f'infer speed(with mlx): {(t2 - t1)/image_count} seconds per image')

Performance

In the example folder, 1024x1024 pixel,

On Mac Mini M4, MLX DeepDanBooru Model inference Speed:

1.7 seconds per image

On Mac Mini M4, MPS + Pytorch inference Speed: 0.8 seconds per image

On Mac Mini M4, CPU + Pytorch inference Speed: 2.5 seconds per image

CURRENTLY

the speed of MPS + Pytorch > MLX.

Bench: 351 images, 720x1280 and 540x720:

In Windows 11, Nvidia RTX 4070 Ti, CUDA+Pytorch:

SPEED: 0.3 seconds per image
Power Consumption: 260 ~ 300 Watt

In Mac mini M4, mlx-DeepDanBooru:

SPEED: 1.68 seconds per image 
Power Consumption: 8 ~ 12 Watt

In Mac mini M4, mlx-DeepDanBooru with multiprocessing, i.e.: run infer_multiprocessing.py:

SPEED: 0.42 seconds per image

Downloads last month: -; Downloads are not tracked for this model. How to track