Spaces:
Running
updating issue
Hi John, with some profiles containing over 500 LoRAs/models, I find it challenging to download or update them because the one-hour limit is triggered too quickly. Typically, it downloads around 150–200 files, and then after one hour, I try again. However, it appears to download and overwrite everything, so after reaching 250–300 files, the one-hour limit is triggered again. This means the download rate for updating a profile is only about 50 files per hour.
Is there a way to check the already downloaded files and ignore them so that only missing files are downloaded instead of redownloading everything? I have a profile with over 1,000 LoRAs, and I've been updating it since it only had 100 LoRAs using the monthly tab. However, I find it challenging to download or update profiles with such a high volume of content.
I'll give it a try. For now, since the API for Hugging Face is well-developed, it's easy to check the hash in advance, but the issue was obtaining the SHA256 hash for Civitai. I managed to do that part today. I plan to work on the rest tomorrow.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
import urllib
import re
def get_user_agent():
return 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0'
def get_civitai_sha256(dl_url: str, api_key=""):
def is_invalid_file(qs: dict, json: dict, k: str):
return k in qs.keys() and qs[k][0] != json.get(k, None) and json.get(k, None) is not None
if "https://civitai.com/api/download/models/" not in dl_url: return None
user_agent = get_user_agent()
headers = {'User-Agent': user_agent, 'content-type': 'application/json'}
if api_key: headers['Authorization'] = f'Bearer {{{api_key}}}'
base_url = 'https://civitai.com/api/v1/model-versions/'
params = {}
session = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
session.mount("https://", HTTPAdapter(max_retries=retries))
m = re.match(r'https://civitai.com/api/download/models/(\d+)\??(.+)?', dl_url)
if m is None: return None
url = base_url + m.group(1)
qs = urllib.parse.parse_qs(m.group(2))
if "type" not in qs.keys(): qs["type"] = ["Model"]
try:
r = session.get(url, params=params, headers=headers, stream=True, timeout=(5.0, 15))
if not r.ok: return None
json = dict(r.json())
if "files" not in json.keys() or not isinstance(json["files"], list): return None
hash = None
for d in json["files"]:
if is_invalid_file(qs, d, "type") or is_invalid_file(qs, d, "format") or is_invalid_file(qs, d, "size") or is_invalid_file(qs, d, "fp"): continue
hash = d["hashes"]["SHA256"].lower()
return hash
except Exception as e:
print(e)
return None
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Model&format=SafeTensor"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Training%20Data"))
I found a troublesome bug related to Transformers and Spaces, so I'll investigate that first.
It worked like a charm, thank you very much! I'm trying to rescue as many profiles as possible after the recent Civitai Buzz monetization changes. Some are just deleting content, while others are archiving it—like this user bluvoll. Is there a way to retrieve archived files without triggering security alerts? or probably possible with a civitai membership key?
Civitai Buzz monetization changes. Some are just deleting content, while others are archiving it—like this user bluvoll.
Oh...
Is there a way to retrieve archived files without triggering security alerts?
idk. We can't download without an API key in the first place, but I don't think there are any download options other than adding an API key...
Do you have any ideas?🤔
https://github.com/civitai/civitai/wiki/REST-API-Reference
"The REST API is designed so that when a model is marked as “Archived,” its files field is empty, meaning no downloadable file is provided. While on the website you might be able to “save to vault” and then download the file through the vault interface, that behavior isn’t exposed via the public API, reverse‐engineering the vault download endpoint—would be an unofficial workaround and may violate Civitai’s terms of service. Essentially, if a creator has archived a model, the intended mechanism is to prevent direct downloads via the API, there isn’t an API‐supported way to bypass this. "
Hmmm...
I can help with manual downloads and uploads, but I'm a bit busy with my everyday life.
I'll help out as much as I can, so if you need to manually download a model from a specific author, let me know.
I've never tried downloading an archive...🤔
Edit:
This looks like a pretty big disaster.😭
https://civitai.com/user/bluvoll/posts
Hello John, I think Civitai either updated something or there's an error. It’s no longer grabbing the files properly—or if it does grab them, when downloading, it goes straight to "skipped" and doesn’t actually download them, treating them as if they were already downloaded.
It seems to be working fine on my test Spaces with the same code. For now, I've rebooted.🤔
However, it seems that tomorrow is the day of the system change for Civitai, so there may be some changes.
Edit:
I added one error handling for the Skip-related part. It shouldn't change the operation even if it's not there, but just in case.
This is weird—it's happening again. I'm receiving 12 links but downloading only 8. Yesterday, the issue was resolved after a reboot. Is there a glitch causing some kind of bottleneck? or probably an issue with the skip code?
The current skip code is a simple one (because it's difficult to fetch the exact file name of Civitai before downloading...), so if there is even one file with the same hash in the repository to which it is uploaded, it will be skipped before downloading.
Other than that, it can be said that nothing else is done. However, it is hard to imagine that SHA256 will duplicate that much...
It is a puzzling case that it does not work halfway.
Edit:
One possibility is that I have made a mistake in the download file that is assumed to be the default without options (When the URL ends with a number). Currently, it is set to download the Model, but is that specification enough...? In fact, it may be necessary to specify fp16, etc...?
There may be hidden conditions.
Edit:
A failure so bad it's almost funny. But I found a critical bug. I forgot to escape from the loop. When multiple files are included, there was a case where the wrong hash was returned.🥶 Fixed.
NoobAI is now an option, probably VPRED soon too, there are no options yet for the, some are already tagging them as noobai after some illustrous issues https://civitai.com/models/908341/noobreal-v30
Thank you. I didn't know that VPRED was going to be added. I've just added it.😀
got this error, Could not parse server response: SyntaxError: Unexpected token '<', " probably a syntax error somewhere
I think that's a response from the Civitai server. Or rather, if there is a syntax error on this side, the Python interpreter crashes when it is referenced...
so weird, sometimes the error appears but can be fixed by searching and adding the files again, also for a reason it doesnt grab workflow download link, for example https://civitai.com/user/nouvo_ai/models, it only picks 5 links
Nice! I added type=Archive.
Hmm... Maybe fixed.
working well, john can you add the option to download Hunyuan Video?
I've added it.😀 (If the spelling is correct, it should work...)
working pretty good, i think its just missing Wan Video which is becoming more popular
Thanks for pointing that out. I was thinking of adding Wan, so I added the names of other video models, but I forgot about Wan...
hello john i think there is something broken after the recent update, i cant download files and it show skipped or dont download
Thanks. It seems that Gradio, or rather FastAPI, is currently buggy, but if you add the following for older Gradio spaces, it seems to fix the problem. So I think I fixed it.
https://huggingface.co/spaces/John6666/civitai_to_hf/blob/main/requirements.txt
pydantic==2.10.6
hello john, can you add sd 2.0,sd 3.0,hidream,sdxl lighting,sdxl hyper,svd,pixart a and e, hunyuan 1, for a reason model variaty is increasing
I tried to make it look like Civitai. There were too many, so I created an “Advanced” accordion and put them in there.
Great job, thank you so much—it’s been incredibly helpful. I noticed that some links include two files (for both positive and negative). Is there any way to add type=Negative&format=Other , here an example,https://civitai.com/models/1526885/glikodinembeddings
I added it, but it seems that it has not been used except for that one case.😅 That's unusual. Perhaps Civitai introduced it once but decided it was unnecessary.
Hi, the repo dup is working pretty well now, but I’m noticing something strange with the download. This profile — https://civitai.com/user/Quiron — has 611 models, but the downloader is only retrieving 244 files, even with all options enabled (except training data), inspected almost all links an no anomalies, the downloader cant just drag them
Civitai's API is acting up...😅
https://civitai.com/api/v1/models?period=AllTime&username=Quiron&limit=100&page=1
https://civitai.com/api/v1/models?period=AllTime&username=Quiron&limit=100&page=2 # same output...
metadata
currentPage 2
pageSize 100
# nextPage & nextCursor are missing...
https://civitai.com/api/v1/models?period=AllTime&username=bolero537&limit=100&page=1 # good example
metadata
nextCursor "342|3392|474612"
nextPage "https://civitai.com/api/v1/models?period=AllTime&username=bolero537&limit=100&page=1&cursor=342%7C3392%7C474612"
currentPage 1
pageSize 100
still broken, any way to fix it? or not yet updated?
There's no way to fix it because Civitai's API itself isn't returning the right results for now...😭
The Civitai REST API (v1) returns a limited number of items per request—typically 100 to 200 models or images—so you'll need proper pagination or cursor looping to retrieve everything. Many users report that even with pagination, the API only returns a fraction of the total—e.g, just 142 models from a user who has thousands—which suggests that some data may not be exposed via the public endpoints. Currently, a working tool like dreamfast/go‑civitai‑downloader supports full-page looping (cursor-based pagination) -this was a opinion about the bug, sadly you have to build the tool
supports full-page looping (cursor-based pagination)
It was a bug where the page number wasn't returning...😅
I didn't know that, so I implemented it to behave similarly to following a cursor. (Specifying 0 for the page number causes it to follow.) The problem is that the model doesn't have cursor information itself. Probably the database collection process itself, which is running for the API, is missing some data.
We need to find a scraped database created by someone else... or create one ourselves, but I don't have the time to do that...