MiaoshouAI commited on
Commit
dfb68da
·
verified ·
1 Parent(s): 98fcffd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -3
README.md CHANGED
@@ -1,3 +1,68 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # Florence-2-base-PromptGen v.15
5
+ This is a major version update for PromptGen. In this new version, two new caption instructions are added: \<GENERATE_TAGS\> and \<MIXED_CAPTION\>
6
+ Also, you will notice a much improved accuracy improved by a new set of training dataset for this version. This version no longer uses Civitai Data due to lora trigger words and inaccurate tags caused for misinterpretation.
7
+
8
+ # About PromptGen
9
+ Florence-2-base-PromptGen is a model trained for [MiaoshouAI Tagger for ComfyUI](https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger).
10
+ It is an advanced image captioning tool based on the [Microsoft Florence-2 Model](https://huggingface.co/microsoft/Florence-2-base) and fine-tuned to perfection.
11
+
12
+ ## Why another tagging model?
13
+ Most vision models today are trained mainly for general vision recognition purposes, but when doing prompting and image tagging for model training, the format and details of the captions is quite different.
14
+ Florence-2-base-PromptGen is trained on such a purpose as aiming to improve the tagging experience and accuracy of the prompt and tagging job. The model is trained based on images and cleaned tags from Civitai so that the end result for tagging the images are the prompts you use to generate these images.
15
+ You won't get annoying captions like "This is image is about a girl..." or
16
+
17
+ ## Instruction prompt:
18
+ \<GENERATE_TAGS\> generate prompt as danbooru style tags
19
+ \<CAPTION\> a one line caption for the image
20
+ \<DETAILED_CAPTION\> a structured caption format which detects the position of the subjects in the image
21
+ \<MORE_DETAILED_CAPTION\> a very detailed description for the image
22
+ \<MIXED_CAPTION\> a Mixed caption style of more detailed caption and tags, this is extremely useful for FLUX model when using T5XXL and CLIP_L together. A new node in MiaoshouTagger ComfyUI is added to support this instruction.
23
+
24
+ ## Version History:
25
+ For version 1.5, you will notice the following
26
+ 1. \<GENERATE_PROMPT\> is deprecated and replace by \<GENERATE_TAGS\>
27
+ 2. A new mode for \<MIXED_CAPTION\>
28
+ 2. A much improve accuracy for \<DETAILED_CAPTION\> and \<MORE_DETAILED_CAPTION\>
29
+ 3. Improved ability for recognizing watermarks on images.
30
+
31
+
32
+ ## How to use:
33
+
34
+ To use this model, you can load it directly from the Hugging Face Model Hub:
35
+
36
+ ```python
37
+
38
+ model = AutoModelForCausalLM.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen-v1.5", trust_remote_code=True)
39
+ processor = AutoProcessor.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen-v1.5", trust_remote_code=True)
40
+
41
+ prompt = "<MORE_DETAILED_CAPTION>"
42
+
43
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
44
+ image = Image.open(requests.get(url, stream=True).raw)
45
+
46
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
47
+
48
+ generated_ids = model.generate(
49
+     input_ids=inputs["input_ids"],
50
+     pixel_values=inputs["pixel_values"],
51
+     max_new_tokens=1024,
52
+     do_sample=False,
53
+     num_beams=3
54
+ )
55
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
56
+
57
+ parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))
58
+
59
+ print(parsed_answer)
60
+ ```
61
+
62
+ ## Use under MiaoshouAI Tagger ComfyUI
63
+ If you just want to use this model, you can use it under ComfyUI-Miaoshouai-Tagger
64
+
65
+ https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger
66
+
67
+ A detailed use and install instruction is already there.
68
+ (If you have already installed MiaoshouAI Tagger, you need to update the node in ComfyUI Manager first or use git pull to get the latest update.)