How to Extract Texts from Rotated(Skewed) Text-Images using CRAFT, OpenCV and pytesseract

Published on

I as an AI engineer was tasked with creating a Module that would extract texts from billboards on 360° images. I was able to build a billboard detection model with YOLO v8. I was hoping to use pytesseract with a Text Detection Model on top of detected billboards. The Text Detection Model I chose was CRAFT. As the images were 360° real world images most of the detected billboards had skewed text images or rotated text images. This made it difficult to extract texts accurately with pytesseract. You could also be facing the same problem when tasked with text extractor from real world objects.

I surfed over the internet for solutions. There were many solutions using OpenCV. But they would work for one image and wouldn’t work on another. Finally, I figured out a way that works for almost all cases. The method uses CRAFT for text detection and OpenCV for angle retrieval and to manipulate the image. Here manipulating image simply means to rotate the image. This blog illustrates a workflow to extract texts accurately from a rotated(skewed) text images.

We will be using following six images to show and solve our problem. Test Images Test Images

Prerequisites

Before we get started, let’s make sure we have all the necessary libraries in place. Here’s a list of prerequisites, along with links for easy installation:

1 . Python (Preferably Version 3.8) 2 . Numpy 3. OpenCV 4. CRAFT 5. pytesseract 6. CUDA (optional)

Import Libraries and Load Models

Import installed Libraries

import numpy as np
import cv2
import pytesseract
from craft_text_detector import Craft

Load CRAFT and pytesseract models. CRAFT is used for text detection and pytesseract is used for text extraction.

# set cuda = True if you set up CUDA in your system
# go through https://github.com/clovaai/CRAFT-pytorch for more information on CRAFT

craft_detector = Craft( crop_type="poly" , cuda = True, text_threshold=0.8, link_threshold=0.4, low_text=0.25)
# set following paths on how you have installed pytesseract
# Custom config for pytesseract here is
# --oem 3 uses OCR engine based on what is avialable
# --psm 7 treats the image as a single text line

pytesseract.pytesseract.tesseract_cmd = r'C://ProgramFiles//Tesseract-OCR//tesseract.exe'
os.environ["TESSDATA_PREFIX"] = "C://Program Files//Tesseract-OCR//tessdata"
custom_config = r'-c preserve_interword_spaces=5 --oem 3 --psm 11'

Text Extraction On Rotated(Skewed) Text-Images

Lets see what problems we face while extracting text from rotated text-images. Here the general workflow is :

  1. Detect text using CRAFT and return text’s Region Of Interests (ROIs).
def  detect_text_rois(image):

# Create a copy of image
img = image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # Ensure at least 3 points for a polygon
# Convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# Calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# Get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

# Draw rectangle box around texts
cv2.drawContours(img, [box_vertices], 0, (255, 0, 0), 2)
return img

Test Images

Bounding Box for angled ROIs of text

2. Use pytesseract to extract text from the angled ROIs.

def detect_text_rois_and_extract_text(image):

    # create a copy of image
    img = image.copy()
    img_text = img.copy()

    # get ROIs for text using CRAFT
    text_rois = craft_detector.detect_text(img)['boxes']

    # get angled rois
    angled_rois = []
    for roi in text_rois:
        if len(roi) >= 3:  # Ensure at least 3 points for a polygon
            # convert box points to numpy array for easier manipulation
            box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

            # calculate the minimum bounding rectangle
            rotated_rect = cv2.minAreaRect(box_points)

            # get vertices of box
            box_vertices = cv2.boxPoints(rotated_rect)
            box_vertices = np.int0(box_vertices)

            # append box_vertices on angled_rois list
            angled_rois.append(box_vertices)
            # draw rectangle box
            cv2.drawContours(img_text, [box_vertices], 0, (255, 0, 0), 2)

    for index,roi in enumerate(angled_rois):
        # get the bounding box from the angled ROI
        x, y, w, h = cv2.boundingRect(roi)

        # get text using pytesseract
        text_roi = img[y : y+h, x : x+w]

        # little Preprocessing
        roi_gray = cv2.cvtColor(text_roi, cv2.COLOR_BGR2GRAY)

        # get lines of text from pytesseract
        text_line =f'Line {index + 1} : ' +  pytesseract.image_to_string(roi_gray, lang='eng', config=custom_config)

        # Put text on image
        cv2.putText(img_text, text_line.strip('\n'), (0, 30 + (index) * 40 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)


    return img_text

Text Extracted from Skewed Text-Image

Text Extracted from Skewed Text-Image

We can see from above illustrations, pytesseract works well for straight text images. It tries its best for rotated(skewed) texts too but it is not that reliable. So, to extract texts from this rotated(skewed) text images we first need to straighten(deskew) them.

General Workflow of Straighten(Deskew) Text-Images

Now we will discuss about the general workflow to straighten the text images. For now we will discuss the concept on how the workflow works. Complete and readable code will be given at the end.

  1. Find the largest angled ROI of text. We will rotate text-image on based of angle made by the largest angled ROI.
def  detect_largest_angled_text_roi(image):

# create a copy of image
img = image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

if angled_rois:

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)

return img

Largest ROI of text

Largest ROI of text

  1. Fit line for largest contour. We could use cv2.minAreaRect() that returns center(x,y) , (width,height) and angle of rotation. But I found the angle of rotation it returns to be very inconsistent. So, we use a different way to get angle using cv2.fitLine. Following is just for visualization and understanding on how a line is fitted with respect to largest text ROI.
def  fit_line_on_largest_roi(image):

# create a copy of image
img = image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

if angled_rois:

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)

# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)

# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)

# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)

return img

Line Fitted on largest ROI of text

Line Fitted on largest ROI of text

  1. Get angle of the fitted line. This angle is used for rotating the text image to straighten the rotated text image.
def  get_angle_of_fitted_line(image):

# create a copy of image
img = image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

if angled_rois:

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)

# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)

# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)

# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)

# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]

# display angle
cv2.putText(img, f'Angle : {angle}', (0, 30 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)

return img

Angle of Fitted Line

Angle of Fitted Line 4. Finally Rotate the image with respect to the angle calculated.

def  rotate_text_image(image):

# create a copy of image
img = image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

if angled_rois:

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)

# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)

# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)

# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)

# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]


height, width = img.shape[:2]
center = (width // 2, height // 2)

# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
img = cv2.warpAffine(img, rotation_matrix , (width, height))

return img

Rotation of Text-Images with respect to fitted line

Rotation of Text-Images with respect to fitted line

We can see from the fitted line all line seems to be parallel with x-axis and thus the image are straightened. But there seems to be a slight problem. A part of original image looks to be cut out of frame. So, we must introduce padding on text-image.

  1. Perform padding on images and rotate.
def  padd_and_rotate_text_image(image):

# set padding size
padding_size = 50

# padd with black pixels
padded_image = np.zeros((image.shape[0] + 2 * padding_size, image.shape[1] + 2 * padding_size, 3), dtype=np.uint8)
padded_image[padding_size:padding_size + image.shape[0], padding_size:padding_size + image.shape[1]] = image

# create a copy of image
img = padded_image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

if angled_rois:

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)

# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)

# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)

# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)

# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]


height, width = img.shape[:2]
center = (width // 2, height // 2)

# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
img = cv2.warpAffine(img, rotation_matrix , (width, height))

return img

Padding and Rotation of Text-Image

Padding and Rotation of Text-Image

  1. Finally extract texts from rotated(deskewed) text-images.
def  extract_text_from_deskewed_text_image(image):

# set padding size
padding_size = 50

# padd with black pixels
padded_image = np.zeros((image.shape[0] + 2 * padding_size, image.shape[1] + 2 * padding_size, 3), dtype=np.uint8)
padded_image[padding_size:padding_size + image.shape[0], padding_size:padding_size + image.shape[1]] = image

# create a copy of image
img = padded_image.copy()

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get angled rois
angled_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
angled_rois.append(box_vertices)

if angled_rois:

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)

# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]


height, width = img.shape[:2]
center = (width // 2, height // 2)

# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
img = cv2.warpAffine(img, rotation_matrix , (width, height))
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']

# get deskewed rois
deskewed_rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on deskewed_rois list
deskewed_rois.append(box_vertices)

# draw rectangle box around texts
cv2.drawContours(img, [box_vertices], 0, (255, 0, 0), 2)

# extract text
img_text = img.copy()

for index,roi in  enumerate(deskewed_rois):
# get the bounding box from the angled ROI
x, y, w, h = cv2.boundingRect(roi)

# get text using pytesseract
text_roi = img[y : y+h, x : x+w]

# little Preprocessing
roi_gray = cv2.cvtColor(text_roi, cv2.COLOR_BGR2GRAY)

# get lines of text from pytesseract
text_line =f'Line {index + 1} : HIRING'

# Put text on image
cv2.putText(img_text, text_line.strip('\n'), (0, 30 + (index) * 40 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0,255), 2, cv2.LINE_AA)

return img_text

Text Extracted from padded Rotated(Deskewed) Text-Image

Text Extracted from padded Rotated(Deskewed) Text-Image

We can see pytesseract is accurately extracting text from straightened text-images. This demonstrates that the skewed text-images can be rotated(deskewed) with CRAFT and OpenCV, resulting in pytesseract accurately extracting text.

Final Code

Following code is more readable and modular.

def  padd_image(image):
# set padding size
padding_size = 50

# padd with black pixels
padded_image = np.zeros((image.shape[0] + 2 * padding_size, image.shape[1] + 2 * padding_size, 3), dtype=np.uint8)
padded_image[padding_size:padding_size + image.shape[0], padding_size:padding_size + image.shape[1]] = image

return padded_image

def  get_rois(image):

# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(image)['boxes']

# get rois
rois = []
for roi in text_rois:
if  len(roi) >= 3: # ensure at least 3 points for a polygon

# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)

# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)

# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)

# append box_vertices on angled_rois list
rois.append(box_vertices)

return rois

def  rotate_text_image(image,angled_rois):

# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)

# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)

# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]


height, width = image.shape[:2]
center = (width // 2, height // 2)

# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
image = cv2.warpAffine(image, rotation_matrix , (width, height))

return image

def  extract_text(image, text_rois):

# copy of image
img_text = image.copy()

for index,roi in  enumerate(text_rois):
# get the bounding box from the angled ROI
x, y, w, h = cv2.boundingRect(roi)

# get text using pytesseract
text_roi = image[y : y+h, x : x+w]

# little Preprocessing
roi_gray = cv2.cvtColor(text_roi, cv2.COLOR_BGR2GRAY)

# get lines of text from pytesseract
text_line =f'Line {index + 1} : ' + pytesseract.image_to_string(roi_gray, lang='eng', config=custom_config)

# Put text on image
cv2.putText(img_text, text_line.strip('\n'), (0, 30 + (index) * 40 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0,255), 2, cv2.LINE_AA)

return img_text

def  extract_text_from_deskewed_text_image(image):

# padd image
img = padd_image(image)

# get angled rois
angled_rois = get_rois(img)

# rotate text_image
if angled_rois:
img = rotate_text_image(img,angled_rois)

# get deskewed rois
deskewed_rois = get_rois(img)

# extract text
img_text = extract_text(img,deskewed_rois)

return img_text

Hope the blog was useful to you. Cheers.

You can connect with me here:

Github

Linkedin

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics