Spaces:

John6666
/

civitai_to_hf

Running

App Files Files Community

updating issue

#3

by minaiosu - opened Feb 23

Feb 23

Hi John, with some profiles containing over 500 LoRAs/models, I find it challenging to download or update them because the one-hour limit is triggered too quickly. Typically, it downloads around 150–200 files, and then after one hour, I try again. However, it appears to download and overwrite everything, so after reaching 250–300 files, the one-hour limit is triggered again. This means the download rate for updating a profile is only about 50 files per hour.

Is there a way to check the already downloaded files and ignore them so that only missing files are downloaded instead of redownloading everything? I have a profile with over 1,000 LoRAs, and I've been updating it since it only had 100 LoRAs using the monthly tab. However, I find it challenging to download or update profiles with such a high volume of content.

Owner Feb 23

I'll give it a try. For now, since the API for Hugging Face is well-developed, it's easy to check the hash in advance, but the issue was obtaining the SHA256 hash for Civitai. I managed to do that part today. I plan to work on the rest tomorrow.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
import urllib
import re

def get_user_agent():
    return 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0'

def get_civitai_sha256(dl_url: str, api_key=""):
    def is_invalid_file(qs: dict, json: dict, k: str):
        return k in qs.keys() and qs[k][0] != json.get(k, None) and json.get(k, None) is not None

    if "https://civitai.com/api/download/models/" not in dl_url: return None
    user_agent = get_user_agent()
    headers = {'User-Agent': user_agent, 'content-type': 'application/json'}
    if api_key: headers['Authorization'] = f'Bearer {{{api_key}}}'
    base_url = 'https://civitai.com/api/v1/model-versions/'
    params = {}
    session = requests.Session()
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
    session.mount("https://", HTTPAdapter(max_retries=retries))
    m = re.match(r'https://civitai.com/api/download/models/(\d+)\??(.+)?', dl_url)
    if m is None: return None
    url = base_url + m.group(1)
    qs = urllib.parse.parse_qs(m.group(2))
    if "type" not in qs.keys(): qs["type"] = ["Model"]
    try:
        r = session.get(url, params=params, headers=headers, stream=True, timeout=(5.0, 15))
        if not r.ok: return None
        json = dict(r.json())
        if "files" not in json.keys() or not isinstance(json["files"], list): return None
        hash = None
        for d in json["files"]:
            if is_invalid_file(qs, d, "type") or is_invalid_file(qs, d, "format") or is_invalid_file(qs, d, "size") or is_invalid_file(qs, d, "fp"): continue
            hash = d["hashes"]["SHA256"].lower()
        return hash
    except Exception as e:
        print(e)
        return None

print(get_civitai_sha256("https://civitai.com/api/download/models/1335639"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Model&format=SafeTensor"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Training%20Data"))

Owner Feb 24

I found a troublesome bug related to Transformers and Spaces, so I'll investigate that first.

Owner Feb 26

@minaiosu It's not very strict, but I've tried implementing it. If there is already a file that is being downloaded in the repository to which you are uploading, it will be skipped.

Feb 26

•

It worked like a charm, thank you very much! I'm trying to rescue as many profiles as possible after the recent Civitai Buzz monetization changes. Some are just deleting content, while others are archiving it—like this user bluvoll. Is there a way to retrieve archived files without triggering security alerts? or probably possible with a civitai membership key?

Owner Feb 26

Civitai Buzz monetization changes. Some are just deleting content, while others are archiving it—like this user bluvoll.

Oh...

Is there a way to retrieve archived files without triggering security alerts?

idk. We can't download without an API key in the first place, but I don't think there are any download options other than adding an API key...
Do you have any ideas?🤔
https://github.com/civitai/civitai/wiki/REST-API-Reference

Feb 26

"The REST API is designed so that when a model is marked as “Archived,” its files field is empty, meaning no downloadable file is provided. While on the website you might be able to “save to vault” and then download the file through the vault interface, that behavior isn’t exposed via the public API, reverse‐engineering the vault download endpoint—would be an unofficial workaround and may violate Civitai’s terms of service. Essentially, if a creator has archived a model, the intended mechanism is to prevent direct downloads via the API, there isn’t an API‐supported way to bypass this. "

Owner Feb 26

•

Hmmm...
I can help with manual downloads and uploads, but I'm a bit busy with my everyday life.
I'll help out as much as I can, so if you need to manually download a model from a specific author, let me know.
I've never tried downloading an archive...🤔

Edit:
This looks like a pretty big disaster.😭
https://civitai.com/user/bluvoll/posts

Feb 28

Hello John, I think Civitai either updated something or there's an error. It’s no longer grabbing the files properly—or if it does grab them, when downloading, it goes straight to "skipped" and doesn’t actually download them, treating them as if they were already downloaded.

Owner Feb 28

•

It seems to be working fine on my test Spaces with the same code. For now, I've rebooted.🤔
However, it seems that tomorrow is the day of the system change for Civitai, so there may be some changes.

Edit:
I added one error handling for the Skip-related part. It shouldn't change the operation even if it's not there, but just in case.

Mar 1

•

This is weird—it's happening again. I'm receiving 12 links but downloading only 8. Yesterday, the issue was resolved after a reboot. Is there a glitch causing some kind of bottleneck? or probably an issue with the skip code?

Owner Mar 1

•

The current skip code is a simple one (because it's difficult to fetch the exact file name of Civitai before downloading...), so if there is even one file with the same hash in the repository to which it is uploaded, it will be skipped before downloading.
Other than that, it can be said that nothing else is done. However, it is hard to imagine that SHA256 will duplicate that much...
It is a puzzling case that it does not work halfway.

Edit:
One possibility is that I have made a mistake in the download file that is assumed to be the default without options (When the URL ends with a number). Currently, it is set to download the Model, but is that specification enough...? In fact, it may be necessary to specify fp16, etc...?
There may be hidden conditions.

Edit:
A failure so bad it's almost funny. But I found a critical bug. I forgot to escape from the loop. When multiple files are included, there was a case where the wrong hash was returned.🥶 Fixed.

Mar 13

•

NoobAI is now an option, probably VPRED soon too, there are no options yet for the, some are already tagging them as noobai after some illustrous issues https://civitai.com/models/908341/noobreal-v30

Owner Mar 14

Thank you. I didn't know that VPRED was going to be added. I've just added it.😀

Mar 14

got this error, Could not parse server response: SyntaxError: Unexpected token '<', " probably a syntax error somewhere

Owner Mar 14

I think that's a response from the Civitai server. Or rather, if there is a syntax error on this side, the Python interpreter crashes when it is referenced...

Mar 14

so weird, sometimes the error appears but can be fixed by searching and adding the files again, also for a reason it doesnt grab workflow download link, for example https://civitai.com/user/nouvo_ai/models, it only picks 5 links

Owner Mar 14

Nice! I added type=Archive.

Mar 14

tab "Other' also broken, to picking or getting links https://civitai.com/user/rockrockrock/models

Owner Mar 14

Hmm... Maybe fixed.

Apr 2

working well, john can you add the option to download Hunyuan Video?

Owner Apr 3

I've added it.😀 (If the spelling is correct, it should work...)

Apr 6

working pretty good, i think its just missing Wan Video which is becoming more popular

Owner Apr 6

Thanks for pointing that out. I was thinking of adding Wan, so I added the names of other video models, but I forgot about Wan...

Apr 27

hello john i think there is something broken after the recent update, i cant download files and it show skipped or dont download

Owner Apr 27

Thanks. It seems that Gradio, or rather FastAPI, is currently buggy, but if you add the following for older Gradio spaces, it seems to fix the problem. So I think I fixed it.
https://huggingface.co/spaces/John6666/civitai_to_hf/blob/main/requirements.txt

pydantic==2.10.6

May 16

hello john, can you add sd 2.0,sd 3.0,hidream,sdxl lighting,sdxl hyper,svd,pixart a and e, hunyuan 1, for a reason model variaty is increasing

Owner May 17

I tried to make it look like Civitai. There were too many, so I created an “Advanced” accordion and put them in there.

16 days ago

Great job, thank you so much—it’s been incredibly helpful. I noticed that some links include two files (for both positive and negative). Is there any way to add type=Negative&format=Other , here an example,https://civitai.com/models/1526885/glikodinembeddings

Owner 16 days ago

I added it, but it seems that it has not been used except for that one case.😅 That's unusual. Perhaps Civitai introduced it once but decided it was unnecessary.

12 days ago

•

edited 12 days ago

Hi, the repo dup is working pretty well now, but I’m noticing something strange with the download. This profile — https://civitai.com/user/Quiron — has 611 models, but the downloader is only retrieving 244 files, even with all options enabled (except training data), inspected almost all links an no anomalies, the downloader cant just drag them

Owner 12 days ago

Civitai's API is acting up...😅

https://civitai.com/api/v1/models?period=AllTime&username=Quiron&limit=100&page=1
https://civitai.com/api/v1/models?period=AllTime&username=Quiron&limit=100&page=2 # same output...

metadata	
currentPage	2
pageSize	100
# nextPage & nextCursor are missing...

https://civitai.com/api/v1/models?period=AllTime&username=bolero537&limit=100&page=1 # good example

metadata	
nextCursor	"342|3392|474612"
nextPage	"https://civitai.com/api/v1/models?period=AllTime&username=bolero537&limit=100&page=1&cursor=342%7C3392%7C474612"
currentPage	1
pageSize	100

11 days ago

still broken, any way to fix it? or not yet updated?

Owner 11 days ago

There's no way to fix it because Civitai's API itself isn't returning the right results for now...😭

5 days ago

The Civitai REST API (v1) returns a limited number of items per request—typically 100 to 200 models or images—so you'll need proper pagination or cursor looping to retrieve everything. Many users report that even with pagination, the API only returns a fraction of the total—e.g, just 142 models from a user who has thousands—which suggests that some data may not be exposed via the public endpoints. Currently, a working tool like dreamfast/go‑civitai‑downloader supports full-page looping (cursor-based pagination) -this was a opinion about the bug, sadly you have to build the tool

Owner 5 days ago

supports full-page looping (cursor-based pagination)

It was a bug where the page number wasn't returning...😅
I didn't know that, so I implemented it to behave similarly to following a cursor. (Specifying 0 for the page number causes it to follow.) The problem is that the model doesn't have cursor information itself. Probably the database collection process itself, which is running for the API, is missing some data.
We need to find a scraped database created by someone else... or create one ourselves, but I don't have the time to do that...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment