Building Real-Time OCR on Android With ML Ki

This practical Android engineering guide walks through building a real-time OCR application using ML Kit and CameraX after abandoning a more complex Tesseract-based approach. Beyond text recognition itself, the article focuses on production challenges such as frame throttling, camera lifecycle management, memory leaks, device rotation, low-light performance, OCR validation, and resource cleanup, offering lessons learned from shipping multiple OCR-enabled Android apps.


This content originally appeared on HackerNoon and was authored by Love Garg

In early 2026, I'm sitting there, 8 PM, trying to get Tesseract OCR to work. JNI errors. Native library crashes. My APK is 52MB for some reason. I'm Googling "tesseract android crash armeabi" for the hundredth time.

Wednesday evening I deleted everything and tried ML Kit. Had it working by 10 PM. Felt like an idiot for not doing or trying this before.

I'm not here to trash Tesseract—it's powerful if you know what you're doing. But if you just need to scan some text (receipts, menus, documents, whatever), ML Kit is so much easier it's not even funny. Yeah, you're stuck with Google Play Services. But I've shipped 4 apps with it and exactly zero users have complained about that.

Here's What We're Building

Point camera. See text. That's it.

Everyone thinks the hard part is the actual OCR. Nope. ML Kit handles that fine. The real problems:

  • The camera fires at 60 fps. You process maybe 3.
  • Blurry frames everywhere (or maybe I just have shaky hands?)
  • CameraX lifecycle stuff that makes zero sense until you've leaked memory twice
  • Device rotation (my nemesis)

Built this flow four times now. Still find new ways to screw it up.

Why ML Kit Over Tesseract?

Let me show you what I'm talking about:

| Thing | Tesseract | ML Kit | |----|----|----| | Setup | NDK, native libs, 3hrs of Stack Overflow | One Gradle line | | APK Size | +30-50MB | +10MB (downloads on first run) | | Integration | JNI wrappers, pray it works | Works with CameraX out of box | | Crashes | Different ones per device | Pretty stable | | Handwriting | Bad but trainable | Completely useless | | When it breaks | You're on your own | Usually works |

Tesseract is better if you need custom training or weird fonts. For everything else? ML Kit.

Gradle Setup (The Easy Part)

dependencies {
    implementation("androidx.camera:camera-camera2:1.3.0")
    implementation("androidx.camera:camera-lifecycle:1.3.0")
    implementation("androidx.camera:camera-view:1.3.0")

    implementation("com.google.mlkit:text-recognition:16.0.0")
}

That text-recognition thing downloads ~10MB on first run. Found out when users on mobile data got mad at me. Now I show a warning.

Camera Setup (Where It Gets Real)

class OCRActivity : AppCompatActivity() {
    private lateinit var cameraExecutor: ExecutorService
    private lateinit var textRecognizer: TextRecognizer

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_ocr)

        textRecognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
        cameraExecutor = Executors.newSingleThreadExecutor()

        if (allPermissionsGranted()) {
            startCamera()
        } else {
            requestPermissions()
        }
    }

    private fun startCamera() {
        val cameraProviderFuture = ProcessCameraProvider.getInstance(this)

        cameraProviderFuture.addListener({
            val cameraProvider = cameraProviderFuture.get()

            val preview = Preview.Builder().build().also {
                it.setSurfaceProvider(viewFinder.surfaceProvider)
            }

            val imageAnalyzer = ImageAnalysis.Builder()
                .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
                .build()
                .also {
                    it.setAnalyzer(cameraExecutor, TextAnalyzer(textRecognizer) { result ->
                        runOnUiThread { updateUI(result) }
                    })
                }

            try {
                cameraProvider.unbindAll()
                cameraProvider.bindToLifecycle(
                    this, 
                    CameraSelector.DEFAULT_BACK_CAMERA, 
                    preview, 
                    imageAnalyzer
                )
            } catch (e: Exception) {
                Log.e(TAG, "Camera binding failed", e)
            }
        }, ContextCompat.getMainExecutor(this))
    }
}

See that STRATEGY_KEEP_ONLY_LATEST? Let me explain why this matters.

The Frame Processing Problem

Here's what happens without it:

Camera: "Here's frame 1!"
You: *starts processing frame 1* (takes 200ms)
Camera: "Here's frame 2!"
Camera: "Here's frame 3!"
Camera: "Here's frame 4!"
You: *still processing frame 1*
User: *moves camera to look at something else*
You: *finally finishes frame 1, starts frame 2*
User: "Why is it showing text from 3 seconds ago?"

With KEEPONLYLATEST, it just throws away frames you can't keep up with. Wasteful? Sure. But it's the only way to stay in sync with what the user is actually looking at.

Processing Frames (The Important Stuff)

class TextAnalyzer(
    private val recognizer: TextRecognizer,
    private val onTextRecognized: (Text) -> Unit
) : ImageAnalysis.Analyzer {

    private var lastAnalyzedTimestamp = 0L
    private val throttleMs = 300L  // my magic number

    @androidx.camera.core.ExperimentalGetImage
    override fun analyze(imageProxy: ImageProxy) {
        val currentTimestamp = System.currentTimeMillis()

        // Skip if we processed a frame too recently
        if (currentTimestamp - lastAnalyzedTimestamp < throttleMs) {
            imageProxy.close()
            return
        }

        val mediaImage = imageProxy.image
        if (mediaImage != null) {
            val image = InputImage.fromMediaImage(
                mediaImage, 
                imageProxy.imageInfo.rotationDegrees
            )

            recognizer.process(image)
                .addOnSuccessListener { visionText ->
                    onTextRecognized(visionText)
                    lastAnalyzedTimestamp = currentTimestamp
                }
                .addOnFailureListener { e ->
                    Log.e(TAG, "OCR failed", e)
                }
                .addOnCompleteListener {
                    imageProxy.close()  // DO NOT FORGET THIS
                }
        } else {
            imageProxy.close()
        }
    }
}

About That 300ms Throttle

Tried different values:

  • 100ms: Phone got hot, battery died fast
  • 500ms: Felt laggy and unresponsive
  • 300ms: Sweet spot

Also the processing time varies wildly:

  • "STOP" sign → 50ms
  • Restaurant menu → 400ms
  • Full page of text → sometimes 600ms

The imageProxy.close() Thing

Forget this and your camera freezes after 10 seconds. No error. No crash. Just frozen.

I debugged this for over an hour once. Logs showed nothing. Camera just stopped sending frames. Finally found a random Stack Overflow comment mentioning it. Close the proxy or CameraX stops working.

What You Actually Get

private fun updateUI(result: Text) {
    if (result.text.isBlank()) {
        textView.text = "Point at some text..."
        return
    }

    textView.text = result.text

    // Or if you need more control:
    for (block in result.textBlocks) {
        val blockText = block.text
        val bounds = block.boundingBox  // useful for drawing rectangles

        for (line in block.lines) {
            val lineText = line.text
            // line.confidence is ALWAYS null btw

            for (element in line.elements) {
                // individual words
            }
        }
    }
}

Quick rant: line.confidence is always null on the free on-device model. I spent 2 hours checking my code thinking it was broken. Nope. Just doesn't give confidence scores. Only the paid cloud API does that. Docs mention this but not anywhere obvious.

Things That WILL Break

Memory Leaks

My first version crashed every 40 seconds. Heap dumps, profilers, everything. Finally realized:

override fun onDestroy() {
    super.onDestroy()
    textRecognizer.close()  // forgot this
    cameraExecutor.shutdown()
}

ML Kit keeps models in memory. Big ones. Don't close the recognizer → activity recreates → leak the whole model. Few times and you OOM. Learned this the hard way.

Rotation Hell

I hardcoded rotation to 0 because I only tested portrait. Worked great! Friend tested landscape: "Dude, this is completely broken."

ML Kit was trying to read sideways text. Obviously terrible.

val image = InputImage.fromMediaImage(
    mediaImage,
    imageProxy.imageInfo.rotationDegrees  // just use this
)

Low Light = Bad Times

Tried everything: contrast adjustment, sharpening, noise reduction. Maybe 5% improvement. Not worth the complexity.

What worked:

  • "Please turn on more lights" message
  • Flashlight toggle
  • Higher exposure (but then motion blur)

Sometimes bad input = bad output. No code fix.

Device Performance

My Pixel 6: 80-120ms per frame \n Random Samsung from 2019: 300-500ms

Test on the worst device you can find. Your flagship lies to you.

Real World Use Cases

Extracting Specific Stuff

ML Kit gives you raw text. Want prices? Write a regex.

private fun extractPrice(text: String): String? {
    val pattern = """[$₹€£]\s*\d+(?:[.,]\d{2})?""".toRegex()
    return pattern.find(text)?.value
}

But watch out: ML Kit adds random spaces in numbers when image quality is bad.

"$49.99" → "$ 4 9 . 9 9"

private fun cleanPrice(text: String): String {
    return text.replace("""\s+""".toRegex(), "")
}

Spent an afternoon debugging this.

Document Scanning

For documents, I don't use live scanning. Different approach:

Take photo → crop → OCR once → let user edit

private fun processDocument(bitmap: Bitmap) {
    val image = InputImage.fromBitmap(bitmap, 0)

    recognizer.process(image)
        .addOnSuccessListener { visionText ->
            displayText(visionText.text)
        }
        .addOnFailureListener { e ->
            showError(e)
        }
}

Better because: no motion blur, proper framing, can preprocess.

Testing (The Annoying Part)

Can't really automate OCR testing well. What I do:

Keep a folder of problem images from production. Blurry receipts, tilted text, bad lighting. About 30 images now. Before shipping anything, run through all of them manually.

class OCRTest {
    fun testImage(resourceId: Int, expectedText: String) {
        val bitmap = BitmapFactory.decodeResource(resources, resourceId)
        val image = InputImage.fromBitmap(bitmap, 0)

        recognizer.process(image)
            .addOnSuccessListener { result ->
                assert(result.text.contains(expectedText))
            }
    }
}

Not elegant. Catches regressions though.

When ML Kit Fails

Real limitations:

Handwriting: Completely useless. Built a handwritten notes scanner. Gave up after a week. Need cloud Vision API or specialized service. On-device can't even handle cursive.

Complex layouts: Tables, newspapers, multi-column stuff—text comes back in random order. Receipt scanner that needed items next to prices? The ordering logic took longer than the OCR.

Damaged documents: Coffee stains, faded text, crumpled paper—doesn't work. Tesseract with preprocessing sometimes better but then back to NDK problems.

My translation app uses ML Kit for quick scans (fast, offline) but has "high quality mode" using cloud Vision API. Slower, costs money, way more accurate. Sometimes you compromise.

Wrap Up

Things I wish I knew before starting:

  • 300ms throttle works

  • Close your resources

  • Test on crappy devices

  • Validate everything, OCR lies

  • Handwriting is impossible

    \

If you need perfect accuracy (medical, financial stuff), use the cloud API or add a human review. For normal apps, on-device is fine.

This code is simplified. Real production needs architecture, error handling, all that. But it should save you a week of debugging.

Try ML Kit before Tesseract. Seriously.

\


This content originally appeared on HackerNoon and was authored by Love Garg


Print Share Comment Cite Upload Translate Updates
APA

Love Garg | Sciencx (2026-05-29T12:18:52+00:00) Building Real-Time OCR on Android With ML Ki. Retrieved from https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/

MLA
" » Building Real-Time OCR on Android With ML Ki." Love Garg | Sciencx - Friday May 29, 2026, https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/
HARVARD
Love Garg | Sciencx Friday May 29, 2026 » Building Real-Time OCR on Android With ML Ki., viewed ,<https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/>
VANCOUVER
Love Garg | Sciencx - » Building Real-Time OCR on Android With ML Ki. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/
CHICAGO
" » Building Real-Time OCR on Android With ML Ki." Love Garg | Sciencx - Accessed . https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/
IEEE
" » Building Real-Time OCR on Android With ML Ki." Love Garg | Sciencx [Online]. Available: https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/. [Accessed: ]
rf:citation
» Building Real-Time OCR on Android With ML Ki | Love Garg | Sciencx | https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.