This content originally appeared on HackerNoon and was authored by Love Garg
In early 2026, I'm sitting there, 8 PM, trying to get Tesseract OCR to work. JNI errors. Native library crashes. My APK is 52MB for some reason. I'm Googling "tesseract android crash armeabi" for the hundredth time.
Wednesday evening I deleted everything and tried ML Kit. Had it working by 10 PM. Felt like an idiot for not doing or trying this before.
I'm not here to trash Tesseract—it's powerful if you know what you're doing. But if you just need to scan some text (receipts, menus, documents, whatever), ML Kit is so much easier it's not even funny. Yeah, you're stuck with Google Play Services. But I've shipped 4 apps with it and exactly zero users have complained about that.
Here's What We're Building
Point camera. See text. That's it.
Everyone thinks the hard part is the actual OCR. Nope. ML Kit handles that fine. The real problems:
- The camera fires at 60 fps. You process maybe 3.
- Blurry frames everywhere (or maybe I just have shaky hands?)
- CameraX lifecycle stuff that makes zero sense until you've leaked memory twice
- Device rotation (my nemesis)
Built this flow four times now. Still find new ways to screw it up.
Why ML Kit Over Tesseract?
Let me show you what I'm talking about:
| Thing | Tesseract | ML Kit | |----|----|----| | Setup | NDK, native libs, 3hrs of Stack Overflow | One Gradle line | | APK Size | +30-50MB | +10MB (downloads on first run) | | Integration | JNI wrappers, pray it works | Works with CameraX out of box | | Crashes | Different ones per device | Pretty stable | | Handwriting | Bad but trainable | Completely useless | | When it breaks | You're on your own | Usually works |
Tesseract is better if you need custom training or weird fonts. For everything else? ML Kit.
Gradle Setup (The Easy Part)
dependencies {
implementation("androidx.camera:camera-camera2:1.3.0")
implementation("androidx.camera:camera-lifecycle:1.3.0")
implementation("androidx.camera:camera-view:1.3.0")
implementation("com.google.mlkit:text-recognition:16.0.0")
}
That text-recognition thing downloads ~10MB on first run. Found out when users on mobile data got mad at me. Now I show a warning.
Camera Setup (Where It Gets Real)
class OCRActivity : AppCompatActivity() {
private lateinit var cameraExecutor: ExecutorService
private lateinit var textRecognizer: TextRecognizer
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_ocr)
textRecognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
cameraExecutor = Executors.newSingleThreadExecutor()
if (allPermissionsGranted()) {
startCamera()
} else {
requestPermissions()
}
}
private fun startCamera() {
val cameraProviderFuture = ProcessCameraProvider.getInstance(this)
cameraProviderFuture.addListener({
val cameraProvider = cameraProviderFuture.get()
val preview = Preview.Builder().build().also {
it.setSurfaceProvider(viewFinder.surfaceProvider)
}
val imageAnalyzer = ImageAnalysis.Builder()
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.build()
.also {
it.setAnalyzer(cameraExecutor, TextAnalyzer(textRecognizer) { result ->
runOnUiThread { updateUI(result) }
})
}
try {
cameraProvider.unbindAll()
cameraProvider.bindToLifecycle(
this,
CameraSelector.DEFAULT_BACK_CAMERA,
preview,
imageAnalyzer
)
} catch (e: Exception) {
Log.e(TAG, "Camera binding failed", e)
}
}, ContextCompat.getMainExecutor(this))
}
}
See that STRATEGY_KEEP_ONLY_LATEST? Let me explain why this matters.
The Frame Processing Problem
Here's what happens without it:
Camera: "Here's frame 1!"
You: *starts processing frame 1* (takes 200ms)
Camera: "Here's frame 2!"
Camera: "Here's frame 3!"
Camera: "Here's frame 4!"
You: *still processing frame 1*
User: *moves camera to look at something else*
You: *finally finishes frame 1, starts frame 2*
User: "Why is it showing text from 3 seconds ago?"
With KEEPONLYLATEST, it just throws away frames you can't keep up with. Wasteful? Sure. But it's the only way to stay in sync with what the user is actually looking at.
Processing Frames (The Important Stuff)
class TextAnalyzer(
private val recognizer: TextRecognizer,
private val onTextRecognized: (Text) -> Unit
) : ImageAnalysis.Analyzer {
private var lastAnalyzedTimestamp = 0L
private val throttleMs = 300L // my magic number
@androidx.camera.core.ExperimentalGetImage
override fun analyze(imageProxy: ImageProxy) {
val currentTimestamp = System.currentTimeMillis()
// Skip if we processed a frame too recently
if (currentTimestamp - lastAnalyzedTimestamp < throttleMs) {
imageProxy.close()
return
}
val mediaImage = imageProxy.image
if (mediaImage != null) {
val image = InputImage.fromMediaImage(
mediaImage,
imageProxy.imageInfo.rotationDegrees
)
recognizer.process(image)
.addOnSuccessListener { visionText ->
onTextRecognized(visionText)
lastAnalyzedTimestamp = currentTimestamp
}
.addOnFailureListener { e ->
Log.e(TAG, "OCR failed", e)
}
.addOnCompleteListener {
imageProxy.close() // DO NOT FORGET THIS
}
} else {
imageProxy.close()
}
}
}
About That 300ms Throttle
Tried different values:
- 100ms: Phone got hot, battery died fast
- 500ms: Felt laggy and unresponsive
- 300ms: Sweet spot
Also the processing time varies wildly:
- "STOP" sign → 50ms
- Restaurant menu → 400ms
- Full page of text → sometimes 600ms
The imageProxy.close() Thing
Forget this and your camera freezes after 10 seconds. No error. No crash. Just frozen.
I debugged this for over an hour once. Logs showed nothing. Camera just stopped sending frames. Finally found a random Stack Overflow comment mentioning it. Close the proxy or CameraX stops working.
What You Actually Get
private fun updateUI(result: Text) {
if (result.text.isBlank()) {
textView.text = "Point at some text..."
return
}
textView.text = result.text
// Or if you need more control:
for (block in result.textBlocks) {
val blockText = block.text
val bounds = block.boundingBox // useful for drawing rectangles
for (line in block.lines) {
val lineText = line.text
// line.confidence is ALWAYS null btw
for (element in line.elements) {
// individual words
}
}
}
}
Quick rant: line.confidence is always null on the free on-device model. I spent 2 hours checking my code thinking it was broken. Nope. Just doesn't give confidence scores. Only the paid cloud API does that. Docs mention this but not anywhere obvious.
Things That WILL Break
Memory Leaks
My first version crashed every 40 seconds. Heap dumps, profilers, everything. Finally realized:
override fun onDestroy() {
super.onDestroy()
textRecognizer.close() // forgot this
cameraExecutor.shutdown()
}
ML Kit keeps models in memory. Big ones. Don't close the recognizer → activity recreates → leak the whole model. Few times and you OOM. Learned this the hard way.
Rotation Hell
I hardcoded rotation to 0 because I only tested portrait. Worked great! Friend tested landscape: "Dude, this is completely broken."
ML Kit was trying to read sideways text. Obviously terrible.
val image = InputImage.fromMediaImage(
mediaImage,
imageProxy.imageInfo.rotationDegrees // just use this
)
Low Light = Bad Times
Tried everything: contrast adjustment, sharpening, noise reduction. Maybe 5% improvement. Not worth the complexity.
What worked:
- "Please turn on more lights" message
- Flashlight toggle
- Higher exposure (but then motion blur)
Sometimes bad input = bad output. No code fix.
Device Performance
My Pixel 6: 80-120ms per frame \n Random Samsung from 2019: 300-500ms
Test on the worst device you can find. Your flagship lies to you.
Real World Use Cases
Extracting Specific Stuff
ML Kit gives you raw text. Want prices? Write a regex.
private fun extractPrice(text: String): String? {
val pattern = """[$₹€£]\s*\d+(?:[.,]\d{2})?""".toRegex()
return pattern.find(text)?.value
}
But watch out: ML Kit adds random spaces in numbers when image quality is bad.
"$49.99" → "$ 4 9 . 9 9"
private fun cleanPrice(text: String): String {
return text.replace("""\s+""".toRegex(), "")
}
Spent an afternoon debugging this.
Document Scanning
For documents, I don't use live scanning. Different approach:
Take photo → crop → OCR once → let user edit
private fun processDocument(bitmap: Bitmap) {
val image = InputImage.fromBitmap(bitmap, 0)
recognizer.process(image)
.addOnSuccessListener { visionText ->
displayText(visionText.text)
}
.addOnFailureListener { e ->
showError(e)
}
}
Better because: no motion blur, proper framing, can preprocess.
Testing (The Annoying Part)
Can't really automate OCR testing well. What I do:
Keep a folder of problem images from production. Blurry receipts, tilted text, bad lighting. About 30 images now. Before shipping anything, run through all of them manually.
class OCRTest {
fun testImage(resourceId: Int, expectedText: String) {
val bitmap = BitmapFactory.decodeResource(resources, resourceId)
val image = InputImage.fromBitmap(bitmap, 0)
recognizer.process(image)
.addOnSuccessListener { result ->
assert(result.text.contains(expectedText))
}
}
}
Not elegant. Catches regressions though.
When ML Kit Fails
Real limitations:
Handwriting: Completely useless. Built a handwritten notes scanner. Gave up after a week. Need cloud Vision API or specialized service. On-device can't even handle cursive.
Complex layouts: Tables, newspapers, multi-column stuff—text comes back in random order. Receipt scanner that needed items next to prices? The ordering logic took longer than the OCR.
Damaged documents: Coffee stains, faded text, crumpled paper—doesn't work. Tesseract with preprocessing sometimes better but then back to NDK problems.
My translation app uses ML Kit for quick scans (fast, offline) but has "high quality mode" using cloud Vision API. Slower, costs money, way more accurate. Sometimes you compromise.
Wrap Up
Things I wish I knew before starting:
300ms throttle works
Close your resources
Test on crappy devices
Validate everything, OCR lies
Handwriting is impossible
\
If you need perfect accuracy (medical, financial stuff), use the cloud API or add a human review. For normal apps, on-device is fine.
This code is simplified. Real production needs architecture, error handling, all that. But it should save you a week of debugging.
Try ML Kit before Tesseract. Seriously.
\
This content originally appeared on HackerNoon and was authored by Love Garg
Love Garg | Sciencx (2026-05-29T12:18:52+00:00) Building Real-Time OCR on Android With ML Ki. Retrieved from https://www.scien.cx/2026/05/29/building-real-time-ocr-on-android-with-ml-ki/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.