Problem Statement
Concept from MOMO v1
(DubHacks 2022, YH Wong, R Liao, RJ Mao, M Lee)Video
How it works
The image processing pipeline uses the OpenCV, Tesseract and PIL libraries to run OCR on your uploaded image.
1
Ingest the image
The original image is uploaded to the backend, a Flask server Dockerized and hosted on AWS Lightsail. We use a 1x Small Instance with 1GB of RAM and a shared vCPU, so performance is not optimal.
2
Detect Regions of Interest
Image Convolutions / Blurring · Morphological Dilation
3
Highlight Region in Image
Morphological Dilation / Erosion · Canny Edge Detection · Contour Approximation
4
Perspective Correction to Fit Region
Perspective Transformations / Warping
5
Image Binarization
Apply preprocessing to create a b/w image for better text detection. We use adaptive thresholding to account for differences in brightness / contrast across the region
Binarization · Adaptive Thresholding
6
Run OCR Model To Extract Information
Use PyTesseract to detect text on our image, and parse relevant information with RegEx.
Return this data as the API response.
Discussion
What problems did you encounter?
Are there next steps you would take if you kept working on the project?
How does your approach differ from others? Was that beneficial?
The initial concept was based on my DubHacks 2022 submission, hence the v2. We replaced the original 3rd party OCR API with our own, allowing us to better tweak preprocessing steps to suit receipts better. We are now also able to see intermediate steps as produced by the pipeline. While detection accuracy has not quite caught up, our design allows for modularity of the OCR modules. We could feasibly drop-in a PyTorch or Google Vision model without much modification.