google/gemma-3n-E4B-it-litert-preview · gemma 3n android studio

kevingamaliel

6 days ago

can i add gemma 3n to my android studio project?
please some info and advice
thankyou

capNimo

6 days ago

https://www.datacamp.com/tutorial/gemma-3n

see if this can help you

lkv

Google org 6 days ago

Hi @kevingamaliel ,

Yes, you absolutely can add Gemma 3N to your Android Studio project! Google has designed Gemma 3N specifically for efficient on-device execution, and they provide the necessary tools and resources through Google AI Edge.
You can use the Google AI Edge tools and libraries (specifically the LLM Inference API) to load and run a pre-trained Gemma 3N model directly on your Android device.

Kindly follow this link , here step by step explained , if you have any concerns let us know will assist you. Thank you.

kevingamaliel

5 days ago

hello,

I'm encountering a persistent initialization issue with MediaPipeLlmBackend in an Android RAG setup.

Setup:

Libs:** com.google.mediapipe:tasks-genai:0.10.24, com.google.ai.edge.localagents:localagents-rag:0.2.0
Model:** gemma-3n-E4B-it-int4.task (~4.4GB, SHA1: 12ec504f5e1f4f1039faeff35d0c8f36a0befc09), loaded from the app's internal files directory.
Device:** Samsung S24 Ultra
Architecture:** A SharedLlmBackend singleton (with reference counting) manages a single MediaPipeLlmBackend instance. Both my GemmaSDKService (for direct chat) and AiEdgeRagService (for RAG) use this singleton's acquire() and release() methods.

Problem:
While SharedLlmBackend successfully creates the MediaPipeLlmBackend object, and AiEdgeRagService acquires this instance, the RAG chain's dummy inference calls during its initialization polling loop (10+ retries, >30 seconds) consistently get the W/MediaPipeLlmBackend: LLM inference is not initialized yet! warning. This prevents AiEdgeRagService from becoming ready.

Key Code Snippets:

1.
// In SharedLlmBackend.kt
object SharedLlmBackend {
private var backendInstance: MediaPipeLlmBackend? = null
private var currentModelPath: String? = null
private var referenceCount = 0
private val mutex = Mutex()

    suspend fun acquire(context: Context, path: String): MediaPipeLlmBackend? = mutex.withLock {
        Log.i(TAG, "acquire() called for path: \"$path\". Current backend: \"$currentModelPath\", refCount: $referenceCount")
        if (backendInstance != null && currentModelPath == path) {
            referenceCount++
            Log.i(TAG, "Returning existing backend for \"$path\". New refCount: $referenceCount")
            return@withLock backendInstance
        }
        if (backendInstance != null && currentModelPath != path) { // Path changed
            Log.w(TAG, "Path changed. Closing existing backend for \"$currentModelPath\".")
            try { backendInstance?.close() } catch (e: Exception) { Log.e(TAG, "Error closing previous backend for \"$currentModelPath\": ${e.message}", e) }
        }
        backendInstance = null; currentModelPath = null; referenceCount = 0;

        Log.i(TAG, "Attempting to create new MediaPipeLlmBackend for path: \"$path\"")
        return try {
            val llmOptions = LlmInference.LlmInferenceOptions.builder()
                .setModelPath(path).setMaxTokens(1024).build() // Using 1024 as per user's AiEdgeRagService
            val sessionOptions = LlmInferenceSession.LlmInferenceSessionOptions.builder().build()
            val newBackend = MediaPipeLlmBackend(context.applicationContext, llmOptions, sessionOptions)
            
            backendInstance = newBackend
            currentModelPath = path
            referenceCount = 1
            Log.i(TAG, "Successfully created MediaPipeLlmBackend for \"$path\". New refCount: $referenceCount")
            newBackend
        } catch (e: Exception) {
            Log.e(TAG, "CRITICAL: Failed to create MediaPipeLlmBackend for \"$path\": ${e.message}", e)
            backendInstance = null; currentModelPath = null; referenceCount = 0;
            null
        }
    }
    // ... include release() and close() methods ...
}

2.
// In AiEdgeRagService.initialize()
mediaPipeLlmBackendForRag = SharedLlmBackend.acquire(application.applicationContext, gemmaModelPathForRag!!)
if (mediaPipeLlmBackendForRag == null) {
Log.e(TAG, "[initialize] CRITICAL: SharedLlmBackend.acquire returned null for path: $gemmaModelPathForRag")
throw IllegalStateException("SharedLlmBackend.acquire returned null")
}
Log.i(TAG, "[initialize] Successfully acquired MediaPipeLlmBackend from SharedLlmBackend.")

retrievalAndInferenceChain = RetrievalAndInferenceChain(ChainConfig.create(mediaPipeLlmBackendForRag!!, /* ... */))
Log.i(TAG, "[initialize] RetrievalAndInferenceChain READY.")

var ragBackendChainReady = false
// DEFAULT_MAX_RETRIES = 10, DEFAULT_RETRY_DELAY_MS = 3000L 
repeat(DEFAULT_MAX_RETRIES) { tryIdx -> 
    Log.d(TAG, "[initialize] Attempting dummy RAG inference #${tryIdx + 1}...")
    try {
        val dummyRequest = RetrievalRequest.create("Test RAG readiness.", RetrievalConfig.create(1, 0.1f, RetrievalConfig.TaskType.YoutubeING))
        val response = retrievalAndInferenceChain!!.invoke(dummyRequest, null).await()
        val responseText = response.text?.trim()
        Log.d(TAG, "[initialize] Dummy RAG inference result #${tryIdx + 1}: '${responseText?.take(100)}...'")
        if (responseText?.contains("not initialized", ignoreCase = true) == false && !responseText.isNullOrBlank()) {
            ragBackendChainReady = true; Log.i(TAG, "[initialize] Dummy RAG inference SUCCEEDED on try #${tryIdx + 1}."); return@repeat
        }
        Log.w(TAG, "[initialize] Dummy RAG inference NOT READY (try #${tryIdx + 1}). Response: $responseText")
    } catch (ex: Exception) { Log.w(TAG, "[initialize] Dummy RAG inference FAILED on try #${tryIdx + 1}: ${ex.message}", ex) }
    if (!ragBackendChainReady && tryIdx < DEFAULT_MAX_RETRIES - 1) delay(DEFAULT_RETRY_DELAY_MS)
}
if (!ragBackendChainReady) {
    Log.e(TAG, "[initialize] CRITICAL: RAG backend NOT READY after polling!")
    resetAllVars()
    return@withContext false
}

Key Log Snippets :

I/SharedLlmBackend: acquire() called for path: "/data/user/0/com.your.app/files/gemma-3n-E4B-it-int4.task". Current backend path: "null", refCount: 0
I/SharedLlmBackend: Attempting to create new MediaPipeLlmBackend instance for path: "/data/user/0/com.your.app/files/gemma-3n-E4B-it-int4.task"
I/MediaPipeLlmBackend: Constructor.
I/SharedLlmBackend: Successfully created and acquired new MediaPipeLlmBackend for "/data/user/0/com.your.app/files/gemma-3n-E4B-it-int4.task". New refCount: 1
I/AiEdgeRagService: [initialize] Successfully acquired/created MediaPipeLlmBackend instance from SharedLlmBackend.
I/AiEdgeRagService: [initialize] RetrievalAndInferenceChain READY.
D/AiEdgeRagService: [initialize] Attempting dummy RAG inference #1...
W/MediaPipeLlmBackend: LLM inference is not initialized yet!
D/AiEdgeRagService: [initialize] Dummy RAG inference result #1: 'LLM inference is not initialized yet!...'
W/AiEdgeRagService: [initialize] Dummy RAG inference NOT READY (try #1). Response: LLM inference is not initialized yet!
... (repeats for multiple retries) ...
E/AiEdgeRagService: [initialize] CRITICAL: RAG inference chain (MediaPipeLlmBackend) NOT READY after polling!

Questions:

Is there a known characteristic of the gemma-3n-E4B-it-int4.task model (or large on-device LLMs generally via the LlmInference API) that would cause such a long delay (>30-60 seconds) for the internal inference engine to become fully operational after the MediaPipeLlmBackend object is constructed on Android (e.g., on a capable device like an S24 Ultra)?
Is there a more direct way to check/ensure the readiness of the LlmInference engine managed by MediaPipeLlmBackend, other than attempting an inference call?
Are there specific LlmInference.LlmInferenceOptions (e.g., related to delegates, model loading parameters) or LlmInferenceSession.LlmInferenceSessionOptions that are recommended or critical for robust and timely initialization of this model size/type on Android? (Note: Previous attempts to use BaseOptions.setDelegate() with LlmInferenceOptions led to "unresolved reference" for tasks.core.Delegate.)

We've confirmed the MediaPipeLlmBackend object is created and shared via the singleton. The primary issue appears to be the internal readiness of the LLM engine itself within that backend.

Any insights or diagnostic suggestions would be greatly appreciated.
Thank you!