openbmb/MiniCPM-V-4-gguf · Image analysis in only chinese ??

3 days ago

•

Well, no matter what I do, model returns image analysis only in chinese. ggml-model-Q6_K.gguf with mmproj-model-f16.gguf . The only way it returns results in english is when flash attention is enabled in llama.cpp and k/v cache are both quantized to q5_0 , for example q4_0 or q8_0 or q 5_1 or f16 k/v cache quantizations returns chinese image analysis results no matter what I try. When in conversation mode, it works fine in english, but image analysis returns english responses "ONLY" when q5_0 k/v cache quantization is used.

Edit: I spoke too soon, with q5_0 k/v quantization, it "sometimes" returns results in english but out of 10 repeated queries with exact same command , 6 are in chinese and 4 in english.

tc-mb

OpenBMB org 3 days ago

I received your issue, which seems a bit strange. I'll check it out and get back to you as soon as possible.

tc-mb

OpenBMB org 2 days ago

@phoebdroid I haven't rigorously compared different gguf quantization methods before, but I've reproduced your issue. I'm not sure how to fix it yet.
However, I've found that Q4_0 or Q4_K_M don't cause this issue.
Perhaps you could try using these quantization methods.
I'll also need to spend more time learning the differences and solutions for each gguf quantization method.

phoebdroid

2 days ago

Perhaps you could try using these quantization methods.

you mean model quantized in Q4 or the k/v cache ? because for me k/v at Q4 definitely is Chinese. And as per model I try to stay away from low quantization models especially for image analysis as that seems to degrade quality much worse than in pure text. But I can give it a try if you want me to try Q4 weights

tc-mb

OpenBMB org 2 days ago

@phoebdroid I'm suggesting you use the Q4_0 or Q4_K_M ggufs for testing. I'm also curious why they can accurately follow Chinese and English questions more accurately than the Q6.
I can't quite pinpoint the cause, but I've discovered this phenomenon and wanted to share it with you. If you have some free time, you can try it out.

phoebdroid

2 days ago

I've discovered this phenomenon and wanted to share it with you. If you have some free time, you can try it out.

Sure, I'm downloading Q4K_M as we speak, will test and let you know. As per why higher Quants fail at English while lowers succeed, also baffles me. If I had to take a wild guess I would say maybe the original merge, and the source models had this Chinese tendency, and this tendency is preserved better at higher quants ?? But like I said, I have no idea and just taking a shot in the dark, don't quote me on this.

phoebdroid

2 days ago

•

edited 2 days ago

Ok, I've downloaded the Q4K_M gguf, and yes, this one is not in chinese. H̶o̶w̶e̶v̶e̶r̶, i̶n̶ i̶m̶a̶g̶e̶ r̶e̶c̶o̶g̶n̶i̶t̶i̶o̶n̶ i̶s̶ a̶b̶s̶u̶r̶d̶l̶y̶ b̶a̶d̶ In image analysis it seems to actually be better than Gemma3, for OCR however, it does "OK". Now, first I'll share you my OCR related test, tested against Gemma3 4B IT QAT (Q4_0 gguf) and the model responses evaluated by Qwen3 30B A3B Thinking 2507. I'll share the reference image, and the screen shots of the evalutaion done by Qwen. In my next message, I'll do the same for image analysis.
The reference image is:

And the chat history: (Note, Model 1 is MiniCPM-V-4 , Model 2 is Gemma3)

phoebdroid

2 days ago

•

edited 2 days ago

ok I must apologize and correct myself, MiniCPM-V-4 is NOT absurdly bad for image analysis, as a matter of fact it held itself fairly well against gemma3 in this test ( I had another run where it completely botched an image analysis and I spoke too soon for that) As a matter of fact, in this test, it's even more verbose than gemma3. Here are the test results. (note Qwen has mixed up model names in last evaluation, calling model 1 model 2 and vice versa) Model 1 in this test is MiniCPM-V-4 and model 2 is Gemma3
reference image:

Now screenshots of chat history:

note, in the last re-evaluation again qwen mixed up the model with one llama error, but the rest is correct and it correcly evaluated model 1 (miniCPM-v-4) as superior for image analysis. And I actually agree because it deliers a more verbose output than gemma.

phoebdroid

2 days ago

•

edited 2 days ago

ok wait there is even more, again I spoke too soon. My own user interface tests work, NOT direct CLI calls, because in my image analysis tool code I actually do clean up pass for responses and UTF-8 decoding as :

def _process_image_analysis(command):
"""Helper to run the image analysis subprocess and clean up the output."""
try:
result = subprocess.run(
command,
capture_output=True,
timeout=60
)

    # Decode output using UTF-8, ignoring errors
    raw_output = result.stdout.decode('utf-8', errors='ignore')
    stderr_output = result.stderr.decode('utf-8', errors='ignore')

    # Check for errors in stderr
    if result.returncode != 0:
        error_message = f"Image analysis failed (stderr): {stderr_output}"
        return None, error_message

    # Define technical patterns to filter out
    technical_patterns = [
        'loading model:', 
        'encoding image slice', 
        'image decoded (batch', 
        'decoding image batch',
        'time:',
        'ms',
    ]

    # Split output into lines and filter
    lines = raw_output.split('\n')
    filtered_lines = []
    for line in lines:
        line = line.strip()  # Remove leading/trailing whitespace
        if not line:
            continue  # Skip empty lines
        # Check if line contains any technical pattern
        if any(pattern.lower() in line.lower() for pattern in technical_patterns):
            continue
        filtered_lines.append(line)

    # Join the filtered lines and remove any extra blank lines
    cleaned_output = '\n'.join(filtered_lines).strip()

    return cleaned_output, None

That's why my own user interface calling model via image analysis tool works. now, the portion in my tool call for subprocess call to llama-mtmd-cli is this:

try:
# Build LLaMA CLI command with the image path
llama_command = [
llama_cli_path,
"-m", model_path,
"--mmproj", mmproj_path,
"-p", llm_prompt,
"-c", "4096", "-ngl", "99",
"--no-mmap", "--samplers", "top_k;temperature", "--sampling-seq",
"kt", "--top-k", "20", "--temp", "0.7"
]
llama_command.extend(["--image", str(image_to_analyze)])

    # Use the helper function to process the image analysis
    cleaned_output, error = _process_image_analysis(llama_command)

and the command this creates is exactly this : C:\AI_Workshop\llama\llama.cpp\build\bin>llama-mtmd-cli.exe -m e:\LLMa_Models\ggml-model-Q4_K_M.gguf --mmproj e:\LLMa_Models\mmproj-model-f16.gguf --image C:\AI_Workshop\llama\ChatApp\downloads\2.png -p "Analyze this image in high detail" -c 4096 -ngl 99 --no-mmap --samplers top_k;temperature --sampling-seq kt --top-k 20 --temp 0.7

however when I run that exact same command in terminal the results are ( a few repeptitions to make sure)

image decoded (batch 1/1) in 10 ms

Φ»Ñσ¢╛τëçσ▒òτñ║Σ║åΣ╕Çτ╛ñΘúÄµá╝τï¼τë╣πÇüΦë▓σ╜⌐Θ▓£Φë│τÜäΦºÆΦë▓πÇéΣ╗ûΣ╗¼Σ╝╝Σ╣ÄσñäΣ║ÄΣ╕ÇΣ╕¬σààµ╗íµ┤╗σè¢σÆîσè¿µäƒτÜäσ£║µÖ»Σ╣ïΣ╕¡πÇé

Θªûσàê∩╝îµêæΣ╗¼σÅ»Σ╗Ñτ£ïσê░Σ╕ÇΣ╕¬σ¥Éσ£¿σç│σ¡ÉΣ╕èτÜäΦô¥Φë▓ΦºÆΦë▓πÇéσÑ╣τÜäΦô¥Φë▓µ£ìΦúàσÆîΦô¥Φë▓σç│σ¡Éσ╜óµêÉΣ║åΘ▓£µÿÄτÜäσ»╣µ»ö∩╝îΣ╜┐σ╛ùΦ┐ÖΣ╕¬ΦºÆΦë▓µÿ╛σ╛ùµá╝σñûτ¬üσç║πÇé

σà╢µ¼í∩╝îµêæΣ╗¼σÅ»Σ╗Ñτ£ïσê░Σ╕ÇΣ╕¬τ⌐┐τ¥Çτ┤½Φë▓Σ╕èΦíúπÇüτ║óΦë▓Φúñσ¡ÉΣ╗ÑσÅèσ╜⌐Φë▓µëïσÑùσÆîΘ₧ïσ¡ÉτÜäΦºÆΦë▓πÇéσÑ╣τÜäµ£ìΦúàΘó£Φë▓Θ¥₧σ╕╕Θ▓£Φë│∩╝îτ╗ÖΣ║║Σ╕Çτºìσààµ╗íµ┤╗σè¢σÆîσè¿µäƒτÜäµäƒΦºëπÇé

µ£ÇσÉÄ∩╝îµêæΣ╗¼σÅ»Σ╗Ñτ£ïσê░Σ╕ÇΣ╕¬τ⌐┐τ¥ÇΘ╗æΦë▓τÜ«σñ╣σàïτÜäτö╖µÇºΦºÆΦë▓πÇéΣ╗ûτÜäΘ╗æΦë▓τÜ«σñ╣σàïτ╗ÖΣ║║Σ╕ÇτºìτÑ₧τºÿσÆîσå╖Θà╖τÜäµäƒΦºëπÇé

µÇ╗Σ╜ôΦÇîΦ¿Ç∩╝îΦ┐Öσ╝áσ¢╛τëçσ▒òτñ║Σ║åΣ╕Çτ╛ñΘúÄµá╝τï¼τë╣πÇüΦë▓σ╜⌐Θ▓£Φë│τÜäΦºÆΦë▓πÇéΣ╗ûΣ╗¼τÜäµ£ìΦúàΘó£Φë▓Θ¥₧σ╕╕Θ▓£Φë│∩╝îτ╗ÖΣ║║Σ╕Çτºìσààµ╗íµ┤╗σè¢σÆîσè¿µäƒτÜäµäƒΦºëπÇéσÉîµù╢∩╝îΦ┐ÖΣ║¢ΦºÆΦë▓τÜäΘÇáσ₧ïΣ╣ƒΘÇÅΘ£▓σç║Σ╕ÇτºìτÑ₧τºÿσÆîσå╖Θà╖τÜäµäƒΦºëπÇé

llama_perf_context_print: load time = 338.24 ms
llama_perf_context_print: prompt eval time = 573.19 ms / 611 tokens ( 0.94 ms per token, 1065.96 tokens per second)
llama_perf_context_print: eval time = 518.64 ms / 168 runs ( 3.09 ms per token, 323.92 tokens per second)
llama_perf_context_print: total time = 1635.80 ms / 779 tokens
llama_perf_context_print: graphs reused = 162

image decoded (batch 1/1) in 7 ms

Φ┐Öσ╣àσ¢╛σâÅµÅÅτ╗ÿΣ║åΣ╕Çτ╛ñΣ║║σ¥Éσ£¿Σ╕ÇΣ╕¬µÿÅµÜùτÜäΘàÆσÉºµêûΣ╝æµü»σî║πÇéΣ║║τë⌐τÜäµ£ìΘÑ░σÆîσº┐σè┐µÜùτñ║Σ║åΣ╗ûΣ╗¼σÅ»Φâ╜µ¡úσ£¿Φ┐¢ΦíîµƒÉτºìτñ╛Σ║ñµ┤╗σè¿πÇé

Φ┐ÖΣ╕¬σ£║µÖ»σÅ»Σ╗Ñτö¿Σ║Äσ▒òτñ║τñ╛Σ║ñµ┤╗σè¿πÇüσñ£τöƒµ┤╗µêûΦÖÜµïƒΦºÆΦë▓Σ║Æσè¿τÜäµâàσóâπÇé

llama_perf_context_print: load time = 337.32 ms
llama_perf_context_print: prompt eval time = 574.66 ms / 611 tokens ( 0.94 ms per token, 1063.24 tokens per second)
llama_perf_context_print: eval time = 159.51 ms / 51 runs ( 3.13 ms per token, 319.73 tokens per second)
llama_perf_context_print: total time = 1271.92 ms / 662 tokens
llama_perf_context_print: graphs reused = 49

image decoded (batch 1/1) in 7 ms

Φ┐Öσ╣àσ¢╛σâÅµÅÅτ╗ÿΣ║åΣ╕Çτ╛ñΣ║║σ¥Éσ£¿Σ╕ÇΣ╕¬µÿÅµÜùτÜäΘàÆσÉºµêûΣ╝æµü»σî║πÇéΣ║║τë⌐τÜäµ£ìΘÑ░σÆîσº┐σè┐µÜùτñ║Σ║åΣ╗ûΣ╗¼σÅ»Φâ╜µ¡úσ£¿Φ┐¢ΦíîµƒÉτºìτñ╛Σ║ñµ┤╗σè¿πÇé

Φ┐ÖΣ╕¬σ£║µÖ»σÅ»Σ╗Ñτö¿Σ║Äσ▒òτñ║τñ╛Σ║ñµ┤╗σè¿πÇüσñ£τöƒµ┤╗µêûΦÖÜµïƒΦºÆΦë▓Σ║Æσè¿τÜäµâàσóâπÇé

llama_perf_context_print: load time = 337.32 ms
llama_perf_context_print: prompt eval time = 574.66 ms / 611 tokens ( 0.94 ms per token, 1063.24 tokens per second)
llama_perf_context_print: eval time = 159.51 ms / 51 runs ( 3.13 ms per token, 319.73 tokens per second)
llama_perf_context_print: total time = 1271.92 ms / 662 tokens
llama_perf_context_print: graphs reused = 49

therefore, when called in terminal via direct call to llama-mtmd-cli.exe , the model still outputs weird encoded text (The Q4K_M gguf) it is my own image analysis tool's UTF-8 decoding which made it look like it's working

tc-mb

OpenBMB org 2 days ago

@phoebdroid Is this garbled text?
I didn't quite understand your case the first time I read it.
Could you please send me your test images and questions so I can run some tests and review your response a few more times?
My email: [email protected]

phoebdroid

2 days ago

let's keep conversation here so that other can also read and benefit in the future. What happens is :

Issue: Model Output Uses UTF-8 Encoding, Causing Incompatibility with Standard Windows CMD

Model: ggml-model-Q4_K_M.gguf

Problem Description:

The ggml-model-Q4_K_M.gguf model generates text output that is encoded using the UTF-8 standard. While
UTF-8 is a modern and widely used standard, it is not the default for the standard Windows Command Prompt
(cmd.exe).

This leads to two distinct problems:

Garbled Output in CMD: When the model is run directly in a standard cmd.exe terminal, its output appears
as garbled, unreadable text. This is because the terminal is typically using an older, single-byte
encoding (like Code Page 437) and cannot correctly render the multi-byte characters from the model's
UTF-8 output.
Python Subprocess Errors: When trying to run the model from a Python script using the subprocess module,
it can cause a 'charmap' codec can't encode error. This happens because Python, by default, tries to
communicate with the Windows operating system using the system's legacy code page.

Root Cause Analysis:

The issue is an encoding mismatch. The model is outputting modern UTF-8 text, but the standard Windows
command-line environment is expecting a legacy encoding.

Other models work without this issue because their text output is limited to the basic ASCII character
set. ASCII characters are represented the same way in both UTF-8 and legacy code pages, so the conflict
never occurs. This specific model, however, produces characters (like special symbols or punctuation) that
are outside of the basic ASCII set, which exposes the underlying encoding incompatibility of the
environment.

Conclusion for Report:

The ggml-model-Q4_K_M.gguf model has an implicit requirement for a UTF-8 compatible environment. This
makes it unusable "out-of-the-box" on standard Windows cmd.exe and difficult to automate. The model should
either be modified to produce only ASCII characters for maximum compatibility, or it must be clearly
documented that it requires a UTF-8 terminal (like the modern Windows Terminal, or cmd.exe after running
the command chcp 65001).

tc-mb

OpenBMB org 2 days ago

@phoebdroid So that's the case! Thank you very much for your careful observation.
This is an issue we hadn't anticipated. We will include the UTF-8 dependency in the model readme.

tc-mb pinned discussion 2 days ago