CameraX + MLKit to create an ultra-simple OCR solution 07/09 Update SLTechnology News&Howtos

CameraX + MLKit to create an ultra-simple OCR solution

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Character recognition technology (OCR) is mainly used in ID card scanning. There are many such technical solutions. This paper introduces the implementation method based on CameraX + MLKit. CameraX is used to view and preview the camera, and MLKit is used to recognize the text in the picture.

1. CameraX implementation camera Preview 1.1 CameraX introduction Android has introduced a new camera frame, Camera2, which is more friendly and powerful than the previous Camera1 support for multiple cameras, but it is also more expensive to use. In this context, Google released CameraX, which is based on Camera2 encapsulation, which greatly improves the ease of use of API. We can use very little code to build camera applications facing specific scenes, and OCR is a typical camera application scenario.

CameraX introduces the concept of UseCase to complete various camera capabilities. UseCase is conducive to the decoupling of functional modules and focus on specific areas for functional development. CameraX provides several commonly used UseCase implementations by default, which can be used in most scenarios.

Preview: provides camera view and preview

ImageCapture: take pictures and save pictures

ImageAnalysis: processing preview frame pictures

Preview and ImageAnalysis UseCase will be used in the OCR scenario of this article. Preview helps us to view and preview the camera, and ImageAnalysis helps us to send the captured images to OCR analysis.

Next, let's use CameraX to complete the camera preview function step by step

1.2The project introduces CameraX. First of all, introduce CameraX related libraries into Gradle as follows

Implementation "androidx.camera:camera-lifecycle:1.2.0" implementation "androidx.camera:camera-view:1.2.0" implementation "androidx.camera:camera-camera2:1.2.0" in addition, you need to use a camera, so apply for camera permission in AndroidManifest

1.3 get an instance of ProcessCameraProviderCameraX accessing the camera through ProcessCameraProvider. As the name implies, ProcessCamera represents the camera services that can be used during each Application Process, so ProcessCameraProvider is a process singleton that is created and obtained through getInstance. Creation is an asynchronous process, so return asynchronously with the help of CameraProviderFuture:

/ / the ProcessCameraProvider instance val cameraProviderFuture = ProcessCameraProvider.getInstance (context) created by cameraProviderFuture asynchronously returns / / listens to ProcessCameraProvider to get successful cameraProviderFuture.addListener (Runnable {/ / get cameraProvider val cameraProvider = cameraProviderFuture.get ()...}, ContextCompat.getMainExecutor (context) / / Executor running by Runnable) successfully obtain ProcessCameraProvider singleton in Runnable, and then you can use it to assemble UseCase and realize camera function.

An important feature of CameraX is LifecycleAware. The camera can be turned on or off automatically according to the foreground and background of the application, reducing the mental burden of developers. ProcessCameraProvider is associated with LifecycleOwner when it adds UseCase.

UseCase calls onStateAttached / onStateDetatched according to Lifecycle, and when we customize UseCase, we can do some custom pre / post processing here.

Add Preview UseCase// Select rear lens val cameraSelector = CameraSelector.Builder () .requireLensFacing (CameraSelector.LENS_FACING_BACK) .build () / add Preivew UseCase cameraProvider.bindToLifecycle (lifecycleOwner, cameraSelector, preview) as above, ProcessCameraProvicer#bindToLifecycle add Preview.

The creation of Preview UseCase is very simple, as follows:

Val preview = Preview.Builder () .build () .ly {setSurfaceProvider (previewView.surfaceProvider)} the key to creating a Preview is to set up the Surface for rendering, which is obtained through PreviewView.

PreviewView is a custom View provided by CameraX to display the camera preview stream, and you can switch TexureView or SurfaceView internally as needed.

SurfaceView has better performance, but before Android 7.0, it is impossible to achieve general custom View capabilities such as rotation, transparency, animation, etc., so you need to use TextureView instead. PreviewView uses performance-first SurfaceView by default. If you need better compatibility, you can set previewView.implementationMode = PreviewView.ImplementationMode.COMPATIBLE.

1.5 layout PreviewView We can use PreviewView for layout in xml as follows

If we use Compose to render UI, the code that displays the camera preview with AndroidView display PreviewView,Compose is roughly as follows:

@ Composablefun CameraScreen () {/ / get ProcessCameraProvider val cameraProviderFuture = remember {ProcessCameraProvider.getInstance (context)} / / display preview AndroidView (modifier = Modifier.fillMaxSize () Factory = {ctx-> PreviewView (ctx). Ly {cameraProviderFuture.addListener ({val cameraProvider = cameraProviderFuture.get () val preview = / / slightly val cameraSelector = / / slightly cameraProvider.unbindAll () cameraProvider .bindToLifecycle (LocalLifecycleOwner.current CameraSelector, preview)}, ContextCompat.getMainExecutor (previewView.context)} 2. MLKit implements character recognition 2.1 MLKit introduction MLKit is Google's machine learning library for mobile developers Help mobile applications use a variety of end-to-end smart technologies offline, such as:

Intelligent visual processing: QR code scanning, text recognition, face detection, object capture, etc.

Natural language processing: language recognition, intelligent reply, automatic translation, etc.

These end-to-end technologies make applications smarter while still maintaining high performance, and more importantly, they are all free and independent of GMS (Google Mobile Service).

2.2 the introduction of MLKit in this article we mainly use the character recognition feature of MLKit. You only need to add the following dependencies:

Implementation 'com.google.mlkit:text-recognition-chinese:16.0.0-6'text-recognition-chinese can recognize Chinese characters, while other Artifact can recognize non-Latin languages such as Japanese and Korean.

2.3 CameraX implementation of image analysis before we realized the camera preview through Preview, then we added ImageAnalysis to CameraProvider, which can receive camera preview frames for image analysis and processing.

Val imageAnalysis = ImageAnalysis.Builder) .setBackpressureStrategy (ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST). Build () .ly / sets the image analyzer setAnalyzer Executors.newSingleThreadExecutor (), OcrAnalyzer result: String-/ / processes OCR based on MLKit, and returns result cameraProvider.bindToLifecycle LocalLifecycleOwner.current, cameraSelector, preview, imageAnalysis / / to add ImageAnalysis capabilities The associated LifecyclesetBackpressureStrategy is a buffer policy for setting the production and consumption of preview frames. Its default value, ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST, means that new rendered frames are automatically discarded before the end of the analysis of each frame, avoiding queuing.

ImageAnalysis#setAnalyzer adds a custom image analyzer. Here we define an OcrAnalyzer that implements the OCR function based on MLKit.

2.4 Custom OcrAnalyzerclass OcrAnalyzer (privateval onRecognized: (result: String)-> Unit): ImageAnalysis.Analyzer {/ / get recognizable Chinese TextRecognition privateval recognition = TextRecognition.getClient (ChineseTextRecognizerOptions.Builder (). Build ()) / a pair of Image for processing override fun analyze (imageProxy: ImageProxy) {val image = imageProxy.image if (image! = null) { Val imageRotation = imageProxy.imageInfo.rotationDegrees val inputImage = InputImage.fromMediaImage (image ImageRotation) recognition.process (inputImage) .addOnSuccessListener {recognizedText-> val textBlocks = recognizedText.textBlocks / / parse textBlocks to get the required information and return extractText (textBlocks)? .let {onRecognized (it)} imageProxy.close ()} .OnSuccessListener { The ImageProxy returned by imageProxy.close ()}} ImageAnalysis.Analyzer contains preview frame information:

ImageProxy.image: image information

ImageInfo.rotationDegrees: the rotation angle of the image obtained according to the device.

InputImage.fromMediaImage gets the specific InputImage based on these two parameters, which is submitted for recognition processing. The recognition here is a TextRecognition that can recognize Chinese.

2.5 parsing TextBlocks will return a data structure such as Block / Line / Element after TextRecognition text recognition, which is conducive to further fine-grained parsing.

Block represents a natural paragraph, consisting of several Line (lines), and each Line contains multiple Element (words).

Suppose we want to get the name and ID number from the ID card, although we are not sure what kind of Block a typesetting like ID card will be identified as, but the name and ID number must be in a different Line. We define the extractText method to aggregate all the Line under the Block and parse them uniformly:

Private fun extractText (textBlocks: List): String {val lines = textBlocks.flatMap {it.lines} var name = "unknown" var id = "unknown" lines.forEach {val lineText = it.elements.joinToString {it.text} if (lineText.contains ("name")) {name = lineText.substringAfter ("name")} if (lineText.contains ("Citizen ID number") (code ") {id = lineText.substringAfter (" Citizen identity Card number ")}} return" $name\ n$id "} the effect of successfully recognizing the text is as follows:

In conclusion, through such a small application scenario of text recognition, we really feel the ease of use of CameraX and MLKit out of the box. As Google's official toolkit, they also have good compatibility with other Jetpack components such as Compose. Thanks to Google's powerful developer ecology, developers can develop their own mobile applications at a low cost.

CameraX: https://developer.android.com/training/camerax

MLKit: https://developers.google.com/ml-kit

This article comes from the official account of Wechat: AndroidPub (ID:gh_e312d1adb6ec), author: fundroid

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.