I as an AI engineer was tasked with creating a Module that would extract texts from billboards on 360° images. I was able to build a billboard detection model with YOLO v8. I was hoping to use pytesseract with a Text Detection Model on top of detected billboards. The Text Detection Model I chose was CRAFT. As the images were 360° real world images most of the detected billboards had skewed text images or rotated text images. This made it difficult to extract texts accurately with pytesseract. You could also be facing the same problem when tasked with text extractor from real world objects.
I surfed over the internet for solutions. There were many solutions using OpenCV. But they would work for one image and wouldn’t work on another. Finally, I figured out a way that works for almost all cases. The method uses CRAFT for text detection and OpenCV for angle retrieval and to manipulate the image. Here manipulating image simply means to rotate the image. This blog illustrates a workflow to extract texts accurately from a rotated(skewed) text images.
We will be using following six images to show and solve our problem. Test Images
Prerequisites
Before we get started, let’s make sure we have all the necessary libraries in place. Here’s a list of prerequisites, along with links for easy installation:
1 . Python (Preferably Version 3.8) 2 . Numpy 3. OpenCV 4. CRAFT 5. pytesseract 6. CUDA (optional)
Import Libraries and Load Models
Import installed Libraries
import numpy as np
import cv2
import pytesseract
from craft_text_detector import Craft
Load CRAFT and pytesseract models. CRAFT is used for text detection and pytesseract is used for text extraction.
# set cuda = True if you set up CUDA in your system
# go through https://github.com/clovaai/CRAFT-pytorch for more information on CRAFT
craft_detector = Craft( crop_type="poly" , cuda = True, text_threshold=0.8, link_threshold=0.4, low_text=0.25)
# set following paths on how you have installed pytesseract
# Custom config for pytesseract here is
# --oem 3 uses OCR engine based on what is avialable
# --psm 7 treats the image as a single text line
pytesseract.pytesseract.tesseract_cmd = r'C://ProgramFiles//Tesseract-OCR//tesseract.exe'
os.environ["TESSDATA_PREFIX"] = "C://Program Files//Tesseract-OCR//tessdata"
custom_config = r'-c preserve_interword_spaces=5 --oem 3 --psm 11'
Text Extraction On Rotated(Skewed) Text-Images
Lets see what problems we face while extracting text from rotated text-images. Here the general workflow is :
- Detect text using CRAFT and return text’s Region Of Interests (ROIs).
def detect_text_rois(image):
# Create a copy of image
img = image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # Ensure at least 3 points for a polygon
# Convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# Calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# Get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
# Draw rectangle box around texts
cv2.drawContours(img, [box_vertices], 0, (255, 0, 0), 2)
return img
Bounding Box for angled ROIs of text
2. Use pytesseract to extract text from the angled ROIs.
def detect_text_rois_and_extract_text(image):
# create a copy of image
img = image.copy()
img_text = img.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # Ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
# draw rectangle box
cv2.drawContours(img_text, [box_vertices], 0, (255, 0, 0), 2)
for index,roi in enumerate(angled_rois):
# get the bounding box from the angled ROI
x, y, w, h = cv2.boundingRect(roi)
# get text using pytesseract
text_roi = img[y : y+h, x : x+w]
# little Preprocessing
roi_gray = cv2.cvtColor(text_roi, cv2.COLOR_BGR2GRAY)
# get lines of text from pytesseract
text_line =f'Line {index + 1} : ' + pytesseract.image_to_string(roi_gray, lang='eng', config=custom_config)
# Put text on image
cv2.putText(img_text, text_line.strip('\n'), (0, 30 + (index) * 40 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)
return img_text
Text Extracted from Skewed Text-Image
We can see from above illustrations, pytesseract works well for straight text images. It tries its best for rotated(skewed) texts too but it is not that reliable. So, to extract texts from this rotated(skewed) text images we first need to straighten(deskew) them.
General Workflow of Straighten(Deskew) Text-Images
Now we will discuss about the general workflow to straighten the text images. For now we will discuss the concept on how the workflow works. Complete and readable code will be given at the end.
- Find the largest angled ROI of text. We will rotate text-image on based of angle made by the largest angled ROI.
def detect_largest_angled_text_roi(image):
# create a copy of image
img = image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
if angled_rois:
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)
return img
Largest ROI of text
- Fit line for largest contour. We could use cv2.minAreaRect() that returns center(x,y) , (width,height) and angle of rotation. But I found the angle of rotation it returns to be very inconsistent. So, we use a different way to get angle using cv2.fitLine. Following is just for visualization and understanding on how a line is fitted with respect to largest text ROI.
def fit_line_on_largest_roi(image):
# create a copy of image
img = image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
if angled_rois:
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)
# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)
# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)
# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)
return img
Line Fitted on largest ROI of text
- Get angle of the fitted line. This angle is used for rotating the text image to straighten the rotated text image.
def get_angle_of_fitted_line(image):
# create a copy of image
img = image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
if angled_rois:
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)
# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)
# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)
# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)
# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]
# display angle
cv2.putText(img, f'Angle : {angle}', (0, 30 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)
return img
Angle of Fitted Line 4. Finally Rotate the image with respect to the angle calculated.
def rotate_text_image(image):
# create a copy of image
img = image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
if angled_rois:
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)
# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)
# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)
# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)
# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]
height, width = img.shape[:2]
center = (width // 2, height // 2)
# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
img = cv2.warpAffine(img, rotation_matrix , (width, height))
return img
Rotation of Text-Images with respect to fitted line
We can see from the fitted line all line seems to be parallel with x-axis and thus the image are straightened. But there seems to be a slight problem. A part of original image looks to be cut out of frame. So, we must introduce padding on text-image.
- Perform padding on images and rotate.
def padd_and_rotate_text_image(image):
# set padding size
padding_size = 50
# padd with black pixels
padded_image = np.zeros((image.shape[0] + 2 * padding_size, image.shape[1] + 2 * padding_size, 3), dtype=np.uint8)
padded_image[padding_size:padding_size + image.shape[0], padding_size:padding_size + image.shape[1]] = image
# create a copy of image
img = padded_image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
if angled_rois:
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# draw rectangle box around texts
cv2.drawContours(img, [largest_roi], 0, (255, 0, 0), 2)
# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)
# calculate start and end points for the line to be drawn
lefty = int((-x*vy/vx) + y)
righty = int(((img.shape[1]-x)*vy/vx)+y)
# fit line
cv2.line(img, (0, lefty), (img.shape[1]-1, righty), (0, 255, 0), 2)
# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]
height, width = img.shape[:2]
center = (width // 2, height // 2)
# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
img = cv2.warpAffine(img, rotation_matrix , (width, height))
return img
Padding and Rotation of Text-Image
- Finally extract texts from rotated(deskewed) text-images.
def extract_text_from_deskewed_text_image(image):
# set padding size
padding_size = 50
# padd with black pixels
padded_image = np.zeros((image.shape[0] + 2 * padding_size, image.shape[1] + 2 * padding_size, 3), dtype=np.uint8)
padded_image[padding_size:padding_size + image.shape[0], padding_size:padding_size + image.shape[1]] = image
# create a copy of image
img = padded_image.copy()
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get angled rois
angled_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
angled_rois.append(box_vertices)
if angled_rois:
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)
# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]
height, width = img.shape[:2]
center = (width // 2, height // 2)
# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
img = cv2.warpAffine(img, rotation_matrix , (width, height))
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(img)['boxes']
# get deskewed rois
deskewed_rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on deskewed_rois list
deskewed_rois.append(box_vertices)
# draw rectangle box around texts
cv2.drawContours(img, [box_vertices], 0, (255, 0, 0), 2)
# extract text
img_text = img.copy()
for index,roi in enumerate(deskewed_rois):
# get the bounding box from the angled ROI
x, y, w, h = cv2.boundingRect(roi)
# get text using pytesseract
text_roi = img[y : y+h, x : x+w]
# little Preprocessing
roi_gray = cv2.cvtColor(text_roi, cv2.COLOR_BGR2GRAY)
# get lines of text from pytesseract
text_line =f'Line {index + 1} : HIRING'
# Put text on image
cv2.putText(img_text, text_line.strip('\n'), (0, 30 + (index) * 40 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0,255), 2, cv2.LINE_AA)
return img_text
Text Extracted from padded Rotated(Deskewed) Text-Image
We can see pytesseract is accurately extracting text from straightened text-images. This demonstrates that the skewed text-images can be rotated(deskewed) with CRAFT and OpenCV, resulting in pytesseract accurately extracting text.
Final Code
Following code is more readable and modular.
def padd_image(image):
# set padding size
padding_size = 50
# padd with black pixels
padded_image = np.zeros((image.shape[0] + 2 * padding_size, image.shape[1] + 2 * padding_size, 3), dtype=np.uint8)
padded_image[padding_size:padding_size + image.shape[0], padding_size:padding_size + image.shape[1]] = image
return padded_image
def get_rois(image):
# get ROIs for text using CRAFT
text_rois = craft_detector.detect_text(image)['boxes']
# get rois
rois = []
for roi in text_rois:
if len(roi) >= 3: # ensure at least 3 points for a polygon
# convert box points to numpy array for easier manipulation
box_points = np.array(roi, dtype=np.int32).reshape(-1, 2)
# calculate the minimum bounding rectangle
rotated_rect = cv2.minAreaRect(box_points)
# get vertices of box
box_vertices = cv2.boxPoints(rotated_rect)
box_vertices = np.int0(box_vertices)
# append box_vertices on angled_rois list
rois.append(box_vertices)
return rois
def rotate_text_image(image,angled_rois):
# get largest roi
largest_roi = max(angled_rois, key=cv2.contourArea)
# get fitted line
[vx,vy,x,y] = cv2.fitLine(largest_roi, cv2.DIST_L2,0,0.01,0.01)
# get angle
angle_rad = np.arctan2(vy, vx)
angle = np.degrees(angle_rad)[0]
height, width = image.shape[:2]
center = (width // 2, height // 2)
# if angle made by roi is 90 degree then no rotation needed
if angle != 90:
rotation_matrix = cv2.getRotationMatrix2D(center, angle , scale=1.0)
image = cv2.warpAffine(image, rotation_matrix , (width, height))
return image
def extract_text(image, text_rois):
# copy of image
img_text = image.copy()
for index,roi in enumerate(text_rois):
# get the bounding box from the angled ROI
x, y, w, h = cv2.boundingRect(roi)
# get text using pytesseract
text_roi = image[y : y+h, x : x+w]
# little Preprocessing
roi_gray = cv2.cvtColor(text_roi, cv2.COLOR_BGR2GRAY)
# get lines of text from pytesseract
text_line =f'Line {index + 1} : ' + pytesseract.image_to_string(roi_gray, lang='eng', config=custom_config)
# Put text on image
cv2.putText(img_text, text_line.strip('\n'), (0, 30 + (index) * 40 ), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0,255), 2, cv2.LINE_AA)
return img_text
def extract_text_from_deskewed_text_image(image):
# padd image
img = padd_image(image)
# get angled rois
angled_rois = get_rois(img)
# rotate text_image
if angled_rois:
img = rotate_text_image(img,angled_rois)
# get deskewed rois
deskewed_rois = get_rois(img)
# extract text
img_text = extract_text(img,deskewed_rois)
return img_text
Hope the blog was useful to you. Cheers.
You can connect with me here: