handwritten circular annotation removal from scanned image

Question

I have these images containing the handwritten circular annotation on the printed text images. I want to remove these annotations from the input image. I have tried to apply some of the thresholding methods as discussed in many threads on StackOverflow, but my results are not as I expected.

However, the method that I am using works really well if the annotation is marked by a blue pen but when the annotation is marked by a black pen then the method of thresholding and erosion won’t produce the output as expected.

Here is a sample image of my achieved results on blue annotations with the thresholding and erosion method

Image (input on the left and output on the right)

Code

import cv2
import numpy as np
from google.colab.patches import cv2_imshow

img = cv2.imread("/content/Scan_0101.jpg")
cv2_imshow(img)
wimg = img[:, :, 0]
ret,thresh = cv2.threshold(wimg,120,255,cv2.THRESH_BINARY)
cv2_imshow(thresh)
kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(thresh, kernel, iterations = 1)
mask = cv2.bitwise_or(erosion, thresh)
#cv2_imshow(erosion)

white = np.ones(img.shape,np.uint8)*255
white[:, :, 0] = mask
white[:, :, 1] = mask
white[:, :, 2] = mask
result = cv2.bitwise_or(img, white)
erosion = cv2.erode(result, kernel, iterations = 1)

Here is a sample image of my achieved results on black annotations with the thresholding and erosion method

Image (input on the left and output on the right)

Any suggested approach for this problem? or how this code can be modified to produce the required results.

to expand on Yves's answer: to decompose this picture, you need to start by extracting the text, i.e. OCR. then, whatever's remaining is those drawn circles and background. those two are easy to separate because background is always white... so the circles are what remains when you remove the two known components from the picture. — Christoph Rackwitz, Commented May 19, 2022 at 16:58

score 1 · Accepted Answer · 2022-05-19 12:34:03Z

1

You must understand that as the gray values in the text and those of the hand writings are in the same range, no thresholding method in the world can work.

In fact, no algorithm at all can succeed without "hints" on what characters look like or don't look like. Even the stroke thickness is not distinctive enough.

The only possible indication is that the circles are made of a smooth and long stroke. And removing them where they cross the characters is just impossible.

edited May 19, 2022 at 12:34

answered May 19, 2022 at 12:28

user1196549

By the way, the resolution of these samples is insufficient for reliable OCR.
– user1196549
Commented May 19, 2022 at 12:30

Add a comment |

fana · Accepted Answer · 2022-05-20 05:09:39Z

Some Parts of handwritten circles (on line spacing regions) may be able to extract, with the assumption "many letters align on same line". In your image, upper and lower part of the circle will be extracted, I think.

Then, if you track the black line with starting from the extracted part (with assuming smooth curvature), it may be able to detect the connected handwritten circle.

However... in real, I think such process will encounter many difficulties : especially regarding the fact that characters will be cut off by removing curve.

Collectives™ on Stack Overflow

handwritten circular annotation removal from scanned image

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
image
opencv
image-processing
computer-vision
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonimageopencvimage-processingcomputer-vision or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
image
opencv
image-processing
computer-vision
or ask your own question.