0

I have these images containing the handwritten circular annotation on the printed text images. I want to remove these annotations from the input image. I have tried to apply some of the thresholding methods as discussed in many threads on StackOverflow, but my results are not as I expected.

However, the method that I am using works really well if the annotation is marked by a blue pen but when the annotation is marked by a black pen then the method of thresholding and erosion won’t produce the output as expected.

Here is a sample image of my achieved results on blue annotations with the thresholding and erosion method

Image (input on the left and output on the right)

enter image description here

Code

import cv2
import numpy as np
from google.colab.patches import cv2_imshow

img = cv2.imread("/content/Scan_0101.jpg")
cv2_imshow(img)
wimg = img[:, :, 0]
ret,thresh = cv2.threshold(wimg,120,255,cv2.THRESH_BINARY)
cv2_imshow(thresh)
kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(thresh, kernel, iterations = 1)
mask = cv2.bitwise_or(erosion, thresh)
#cv2_imshow(erosion)

white = np.ones(img.shape,np.uint8)*255
white[:, :, 0] = mask
white[:, :, 1] = mask
white[:, :, 2] = mask
result = cv2.bitwise_or(img, white)
erosion = cv2.erode(result, kernel, iterations = 1)

Here is a sample image of my achieved results on black annotations with the thresholding and erosion method

Image (input on the left and output on the right)

enter image description here

Any suggested approach for this problem? or how this code can be modified to produce the required results.

2
  • What is wimg = img[:, :, 0] doing?
    – FiddleStix
    Commented May 19, 2022 at 11:00
  • to expand on Yves's answer: to decompose this picture, you need to start by extracting the text, i.e. OCR. then, whatever's remaining is those drawn circles and background. those two are easy to separate because background is always white... so the circles are what remains when you remove the two known components from the picture. Commented May 19, 2022 at 16:58

2 Answers 2

1

You must understand that as the gray values in the text and those of the hand writings are in the same range, no thresholding method in the world can work.

In fact, no algorithm at all can succeed without "hints" on what characters look like or don't look like. Even the stroke thickness is not distinctive enough.

The only possible indication is that the circles are made of a smooth and long stroke. And removing them where they cross the characters is just impossible.

1
  • By the way, the resolution of these samples is insufficient for reliable OCR.
    – user1196549
    Commented May 19, 2022 at 12:30
0

Some Parts of handwritten circles (on line spacing regions) may be able to extract, with the assumption "many letters align on same line". In your image, upper and lower part of the circle will be extracted, I think.

Then, if you track the black line with starting from the extracted part (with assuming smooth curvature), it may be able to detect the connected handwritten circle.

However... in real, I think such process will encounter many difficulties : especially regarding the fact that characters will be cut off by removing curve.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.