updating issue

#3
by minaiosu - opened

Hi John, with some profiles containing over 500 LoRAs/models, I find it challenging to download or update them because the one-hour limit is triggered too quickly. Typically, it downloads around 150–200 files, and then after one hour, I try again. However, it appears to download and overwrite everything, so after reaching 250–300 files, the one-hour limit is triggered again. This means the download rate for updating a profile is only about 50 files per hour.

Is there a way to check the already downloaded files and ignore them so that only missing files are downloaded instead of redownloading everything? I have a profile with over 1,000 LoRAs, and I've been updating it since it only had 100 LoRAs using the monthly tab. However, I find it challenging to download or update profiles with such a high volume of content.

I'll give it a try. For now, since the API for Hugging Face is well-developed, it's easy to check the hash in advance, but the issue was obtaining the SHA256 hash for Civitai. I managed to do that part today. I plan to work on the rest tomorrow.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
import urllib
import re

def get_user_agent():
    return 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0'

def get_civitai_sha256(dl_url: str, api_key=""):
    def is_invalid_file(qs: dict, json: dict, k: str):
        return k in qs.keys() and qs[k][0] != json.get(k, None) and json.get(k, None) is not None

    if "https://civitai.com/api/download/models/" not in dl_url: return None
    user_agent = get_user_agent()
    headers = {'User-Agent': user_agent, 'content-type': 'application/json'}
    if api_key: headers['Authorization'] = f'Bearer {{{api_key}}}'
    base_url = 'https://civitai.com/api/v1/model-versions/'
    params = {}
    session = requests.Session()
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
    session.mount("https://", HTTPAdapter(max_retries=retries))
    m = re.match(r'https://civitai.com/api/download/models/(\d+)\??(.+)?', dl_url)
    if m is None: return None
    url = base_url + m.group(1)
    qs = urllib.parse.parse_qs(m.group(2))
    if "type" not in qs.keys(): qs["type"] = ["Model"]
    try:
        r = session.get(url, params=params, headers=headers, stream=True, timeout=(5.0, 15))
        if not r.ok: return None
        json = dict(r.json())
        if "files" not in json.keys() or not isinstance(json["files"], list): return None
        hash = None
        for d in json["files"]:
            if is_invalid_file(qs, d, "type") or is_invalid_file(qs, d, "format") or is_invalid_file(qs, d, "size") or is_invalid_file(qs, d, "fp"): continue
            hash = d["hashes"]["SHA256"].lower()
        return hash
    except Exception as e:
        print(e)
        return None

print(get_civitai_sha256("https://civitai.com/api/download/models/1335639"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Model&format=SafeTensor"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Training%20Data"))

I found a troublesome bug related to Transformers and Spaces, so I'll investigate that first.

@minaiosu It's not very strict, but I've tried implementing it. If there is already a file that is being downloaded in the repository to which you are uploading, it will be skipped.

It worked like a charm, thank you very much! I'm trying to rescue as many profiles as possible after the recent Civitai Buzz monetization changes. Some are just deleting content, while others are archiving it—like this user bluvoll. Is there a way to retrieve archived files without triggering security alerts? or probably possible with a civitai membership key?

Civitai Buzz monetization changes. Some are just deleting content, while others are archiving it—like this user bluvoll.

Oh...

Is there a way to retrieve archived files without triggering security alerts?

idk. We can't download without an API key in the first place, but I don't think there are any download options other than adding an API key...
Do you have any ideas?🤔
https://github.com/civitai/civitai/wiki/REST-API-Reference

"The REST API is designed so that when a model is marked as “Archived,” its files field is empty, meaning no downloadable file is provided. While on the website you might be able to “save to vault” and then download the file through the vault interface, that behavior isn’t exposed via the public API, reverse‐engineering the vault download endpoint—would be an unofficial workaround and may violate Civitai’s terms of service. Essentially, if a creator has archived a model, the intended mechanism is to prevent direct downloads via the API, there isn’t an API‐supported way to bypass this. "

Hmmm...
I can help with manual downloads and uploads, but I'm a bit busy with my everyday life.
I'll help out as much as I can, so if you need to manually download a model from a specific author, let me know.
I've never tried downloading an archive...🤔

Edit:
This looks like a pretty big disaster.😭
https://civitai.com/user/bluvoll/posts

Hello John, I think Civitai either updated something or there's an error. It’s no longer grabbing the files properly—or if it does grab them, when downloading, it goes straight to "skipped" and doesn’t actually download them, treating them as if they were already downloaded.

It seems to be working fine on my test Spaces with the same code. For now, I've rebooted.🤔
However, it seems that tomorrow is the day of the system change for Civitai, so there may be some changes.

Edit:
I added one error handling for the Skip-related part. It shouldn't change the operation even if it's not there, but just in case.

This is weird—it's happening again. I'm receiving 12 links but downloading only 8. Yesterday, the issue was resolved after a reboot. Is there a glitch causing some kind of bottleneck? or probably an issue with the skip code?

The current skip code is a simple one (because it's difficult to fetch the exact file name of Civitai before downloading...), so if there is even one file with the same hash in the repository to which it is uploaded, it will be skipped before downloading.
Other than that, it can be said that nothing else is done. However, it is hard to imagine that SHA256 will duplicate that much...
It is a puzzling case that it does not work halfway.

Edit:
One possibility is that I have made a mistake in the download file that is assumed to be the default without options (When the URL ends with a number). Currently, it is set to download the Model, but is that specification enough...? In fact, it may be necessary to specify fp16, etc...?
There may be hidden conditions.

Edit:
A failure so bad it's almost funny. But I found a critical bug. I forgot to escape from the loop. When multiple files are included, there was a case where the wrong hash was returned.🥶 Fixed.

NoobAI is now an option, probably VPRED soon too, there are no options yet for the, some are already tagging them as noobai after some illustrous issues https://civitai.com/models/908341/noobreal-v30

Thank you. I didn't know that VPRED was going to be added. I've just added it.😀

got this error, Could not parse server response: SyntaxError: Unexpected token '<', " probably a syntax error somewhere

I think that's a response from the Civitai server. Or rather, if there is a syntax error on this side, the Python interpreter crashes when it is referenced...

so weird, sometimes the error appears but can be fixed by searching and adding the files again, also for a reason it doesnt grab workflow download link, for example https://civitai.com/user/nouvo_ai/models, it only picks 5 links

Nice! I added type=Archive.

tab "Other' also broken, to picking or getting links https://civitai.com/user/rockrockrock/models

Hmm... Maybe fixed.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment