Real-Time Image Processing using WebSockets and Flask in Python and JavaScript

Real-time image processing has gained a lot of importance in recent years due to its various applications. The applications range from facial recognition to autonomous vehicles. In this tutorial, we…

Photo by Pankaj Patel on Unsplash

Introduction

Real-time image processing has gained a lot of importance in recent years due to its various applications. The applications range from facial recognition to autonomous vehicles. In this tutorial, we will learn how to perform real-time image processing using the client’s webcam and the given code. We will be using Flask, Socket.IO, OpenCV, and HTML/CSS/JavaScript to create a simple web application that captures real-time video from the user’s camera, performs some image processing on it, and displays the processed video back to the user. You can find the complete code in this GitHub repository:

Prerequisites

Before we start, you should have a basic understanding of Python, HTML, CSS, and JavaScript. You will also need to install the following requirements before proceeding. It’s best to create a virtual environment also before installing any of the requirements. You can do that by running the following commands:

# Create Virtual Environment
python -m venv env

# Activate Virtual Environment
# Linux
source env/bin/activate
# Windows
source env\Scripts\activate

Install requirements by running pip install -r requirements.txt

# requirements.txt

Flask-SocketIO==4.3.1
python-engineio==3.13.2
python-socketio==4.6.0
Flask==2.0.3
Werkzeug==2.0.3
opencv_python==4.7.0.68
numpy==1.24.2

Understanding the Code

The code provided consists of three main parts: HTML, JavaScript, and Python.

HTML

The HTML code contains a video element to display the captured video and a canvas element to capture the video frame and send it to the server for processing. The JavaScript file is linked at the bottom of the HTML code, which contains the logic to capture the video frames and send them to the server.

JavaScript

The JavaScript code contains the logic to capture the video frames, convert them to Base64 format, and send them to the server using Socket.IO. The code also receives the processed image from the server and displays it on the webpage.

Python

The Python code contains the Flask application and Socket.IO code that receives the video frames, processes them, and sends back the processed image to the client.

Building the Application

Now let’s dive into building the application step by step.

Step 1: Setting Up Flask and Socket.IO

We will start by creating a Flask application and setting up Socket.IO:

from flask import Flask, render_template, send_from_directory
from flask_socketio import SocketIO, emit

app = Flask(__name__, static_folder="./templates/static")
app.config["SECRET_KEY"] = "secret!"
socketio = SocketIO(app)

Step 2: Setting up the Index Route

We will now set up the index route, which will be the main page of our web application. In this step, we will create a simple HTML template that will contain the video and canvas elements.

@app.route("/")
def index():
    return render_template("index.html")

HTML Template

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <title>Flask Client Camera Web App</title>

    <style>
     #video {
      transform: rotateY(180deg);
      -webkit-transform:rotateY(180deg); /* Safari and Chrome */
      -moz-transform:rotateY(180deg); /* Firefox */

     }
    </style>
    <script
      src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script
      src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.0.3/socket.io.js"></script>
  </head>

  <body>

    <div id="container">
      <video autoplay playsinline id="videoElement"></video>
      <canvas id="canvas" width="400" height="300"></canvas>
    </div>

    <div class= 'video'>
      <img id="photo" width="400" height="300">
    </div>
    <script src="{{ url_for('static',filename='script.js') }}"></script>
  </body>

</html>

Step 3: Capturing Video from the User’s Camera

Now that we have set up the HTML and CSS, we can move on to writing the JavaScript code. We will first create a socket object to connect to the server using Socket.IO. We will then set up the canvas to capture the video stream from the user’s camera.

var socket = io.connect(
  window.location.protocol + "//" + document.domain + ":" + location.port
);
socket.on("connect", function () {
  console.log("Connected...!", socket.connected);
});

var canvas = document.getElementById("canvas");
var context = canvas.getContext("2d");
const video = document.querySelector("#videoElement");

video.width = 400;
video.height = 300;

if (navigator.mediaDevices.getUserMedia) {
  navigator.mediaDevices
    .getUserMedia({
      video: true,
    })
    .then(function (stream) {
      video.srcObject = stream;
      video.play();
    })
    .catch(function (err0r) {});
}

Here, we create a socket object and connect it to the server. We also set up the canvas to capture the video stream from the user’s camera using the getUserMedia API. We set the width and height of the video to 400 and 300, respectively. Next, we set up a function to capture the video stream from the user’s camera at a certain frame rate and send it to the server for processing.

const FPS = 10;
setInterval(() => {
  width = video.width;
  height = video.height;
  context.drawImage(video, 0, 0, width, height);
  var data = canvas.toDataURL("image/jpeg", 0.5);
  context.clearRect(0, 0, width, height);
  socket.emit("image", data);
}, 1000 / FPS);

socket.on("processed_image", function (image) {
  photo.setAttribute("src", image);
});

Here, we set the frame rate to 10 frames per second using the FPS constant. We then capture the video stream by drawing it onto the canvas and obtaining the base64-encoded data URL of the canvas using the toDataURL() method. We clear the canvas after each frame using the clearRect() method. We then send the base64-encoded data URL to the server using the emit() method of the socket object. We emit the data with the ‘image’ event. We also set up a function to receive the processed image from the server and display it on the web page using the processed_image event. Now that we have written the JavaScript code, we can move on to writing the Python code to process the image on the server.

Step 4: Processing Client Webcam Stream

The Python code will use Flask and Flask-SocketIO to set up a web server and handle incoming connections from the client. First, we need to import the required libraries.

import base64
import os
import cv2
import numpy as np
from flask import Flask, render_template, send_from_directory
from flask_socketio import SocketIO, emit

We use the Flask library to create an instance of the web application, and the Flask-SocketIO library to add support for real-time communication between the client and server. Next, we create a Flask application and initialize the SocketIO object.

app = Flask(__name__, static_folder="./templates/static")
app.config["SECRET_KEY"] = "secret!"
socketio = SocketIO(app)

We set a secret key for the Flask application, which is used to sign session cookies. We then initialize the SocketIO object with our Flask application. Next, we define a helper function that will convert a base64-encoded image to a NumPy array that can be processed using OpenCV.

def base64_to_image(base64_string):
    # Extract the base64 encoded binary data from the input string
    base64_data = base64_string.split(",")[1]
    # Decode the base64 data to bytes
    image_bytes = base64.b64decode(base64_data)
    # Convert the bytes to numpy array
    image_array = np.frombuffer(image_bytes, dtype=np.uint8)
    # Decode the numpy array as an image using OpenCV
    image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
    return image

This function takes a base64-encoded string as input, extracts the binary data, decodes it to bytes, and converts it to a NumPy array. Finally, it decodes the NumPy array as an image using OpenCV and returns the resulting image. Next, we define a function to handle incoming connections from the client.

@socketio.on("connect")
def test_connect():
    print("Connected")
    emit("my response", {"data": "Connected"})

This function is called whenever a client connects to the server. It simply prints a message to the console and sends a “my response” event back to the client with the message “Connected”. We then define a function to handle incoming images from the client.

@socketio.on("image")
def receive_image(image):
    # Decode the base64-encoded image data
    image = base64_to_image(image)

    # Perform image processing using OpenCV
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    frame_resized = cv2.resize(gray, (640, 360))

    # Encode the processed image as a JPEG-encoded base64 string
    encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 90]
    result, frame_encoded = cv2.imencode(".jpg", frame_resized, encode_param)
    processed_img_data = base64.b64encode(frame_encoded).decode()

    # Prepend the base64-encoded string with the data URL prefix
    b64_src = "data:image/jpg;base64,"
    processed_img_data = b64_src + processed_img_data

    # Send the processed image back to the client
    emit("processed_image", processed_img_data)

This function is called whenever an “image” event is received from the client. It first decodes the base64-encoded image data using the base64_to_image helper function we defined earlier. Next we run the application:

if __name__ == "__main__":
    socketio.run(app, debug=True, port=5000, host='0.0.0.0')

Here’s the complete python code

import base64
import os

import cv2
import numpy as np
from flask import Flask, render_template, send_from_directory
from flask_socketio import SocketIO, emit

app = Flask(__name__, static_folder="./templates/static")
app.config["SECRET_KEY"] = "secret!"
socketio = SocketIO(app)


@app.route("/favicon.ico")
def favicon():
    return send_from_directory(
        os.path.join(app.root_path, "static"),
        "favicon.ico",
        mimetype="image/vnd.microsoft.icon",
    )


def base64_to_image(base64_string):
    # Extract the base64 encoded binary data from the input string
    base64_data = base64_string.split(",")[1]
    # Decode the base64 data to bytes
    image_bytes = base64.b64decode(base64_data)
    # Convert the bytes to numpy array
    image_array = np.frombuffer(image_bytes, dtype=np.uint8)
    # Decode the numpy array as an image using OpenCV
    image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
    return image


@socketio.on("connect")
def test_connect():
    print("Connected")
    emit("my response", {"data": "Connected"})


@socketio.on("image")
def receive_image(image):
    # Decode the base64-encoded image data
    image = base64_to_image(image)

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    frame_resized = cv2.resize(gray, (640, 360))

    encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 90]

    result, frame_encoded = cv2.imencode(".jpg", frame_resized, encode_param)

    processed_img_data = base64.b64encode(frame_encoded).decode()

    b64_src = "data:image/jpg;base64,"
    processed_img_data = b64_src + processed_img_data

    emit("processed_image", processed_img_data)


@app.route("/")
def index():
    return render_template("index.html")


if __name__ == "__main__":
    socketio.run(app, debug=True, port=5000, host='0.0.0.0')

With both the Python and JavaScript code in place, we can start the Flask server and test the application by visiting http://localhost:5000 in a web browser. If everything is set up correctly, we should be able to see a live video stream from the webcam, which is being processed in real-time by the Python code running on the server.

Conclusion

in conclusion, we have seen how to perform real-time image processing using the client’s webcam and Python Flask as the backend server. We used JavaScript to capture the video stream from the user’s camera and send it to the server for processing. On the server side, we decoded the image data using OpenCV and performed some image processing operations on it. Finally, we encoded the processed image data and sent it back to the client to display on the web page. Real-time image processing has numerous applications, such as face detection, object tracking, and augmented reality. By using the techniques and tools demonstrated in this tutorial, you can build powerful web applications that perform complex image processing tasks in real-time.

References

You made it to the end of the article! Thanks for reading and hope you learned a lot, If You Like My Content and Want To Connect with me You can do that by:

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics