Explore the future of Web Scraping. Request a free invite to ScrapeCon 2024

How to import a CSV file into a MySQL database using Python

Using iris data, we will learn how to import a CSV file into a MySQL database using Python

image

Prerequisites

Python 3.8.3, MySQL Workbench 8.0.22, mysql-connector-python

To perform this task, you will need to:

  1. Prepare or identify your data

  2. Connect to MySQL and create a database

  3. Create a table and Import the CSV data into the MySQL database

Step 1. Prepare or identify your data

To begin, prepare or identify the CSV file that you’d like to import to MySQL database. For example, I loaded iris data from GitHub.

import pandas as pd
irisData = pd.read_csv(‘[https://github.com/Muhd-Shahid/Write-Raw-File-into-Database-Server/raw/main/iris.csv',index_col=False](https://github.com/Muhd-Shahid/Write-Raw-File-into-Database-Server/raw/main/iris.csv',index_col=False))
irisData.head()

image

Where, data is the comma delimiter. Each line of the CSV file is terminated by a newline character.

Step 2. Connect to the MySQL and create a database

Now let’s create connection object to connect to MySQL server. The connect() constructor creates a connection to the MySQL server and returns a MySQLConnection object.

import mysql.connector as msql
from mysql.connector import Error

try:
    conn = msql.connect(host='localhost', user='root',
                        password='sql@123')
    if conn.is_connected():
        cursor = conn.cursor()
        cursor.execute("CREATE DATABASE irisDB")
        print("irisDB database is created")

except Error as e:
    print("Error while connecting to MySQL", e)

Step 3. Create a table & Import the CSV data into the MySQL table

3.1. Inserting Pandas DataFrames Into Databases Using INSERT

First, we will insure that column and datatype parameters in the SQL table that we will create must match the number of columns and data types of the CSV file.

Data types of iris DataData types of iris Data

We will create iris table under irisDB database and insert the records in MySQL server.

try:
    conn = msql.connect(host='localhost',
                           database='irisDB', user='root',
                           password='sql@123')
    if conn.is_connected():
        cursor = conn.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)
        cursor.execute('DROP TABLE IF EXISTS iris;')
        print('Creating table....')
        cursor.execute("CREATE TABLE iris (sepal_length FLOAT(2,1)
                        NOT NULL, sepal_width FLOAT(2,1) NOT NULL,
                        petal_length FLOAT(2,1) NOT NULL,
                        petal_width FLOAT(2,1),species CHAR(11)NOT
                        NULL)")
        print("iris table is created....")
        for i,row in irisData.iterrows():
            sql = "INSERT INTO irisdb.iris VALUES (%s,%s,%s,%s,%s)"
            cursor.execute(sql, tuple(row))
            print("Record inserted")
            # the connection is not autocommitted by default, so we
             must commit to save our changes
            conn.commit()
except Error as e:
    print("Error while connecting to MySQL", e)

3.1.1. Query the database to check our work

let’s query the database to make sure that our inserted data has been saved correctly.

# Execute query
sql = "SELECT * FROM iris"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

image

3.2. Inserting Pandas DataFrames into a Database Using the to_sql() Function

# import the module
from sqlalchemy import create_engine
# create sqlalchemy engine
engine = create_engine("mysql+pymysql://{user}:{pw}[@localhost](http://twitter.com/localhost)/{db}"
                      .format(user="root", pw="sql@123",
                      db="irisdb"))
# Insert whole DataFrame into MySQL
irisData.to_sql('iris', con = engine, if_exists = 'append', chunksize = 1000,index=False)

What’s happening here?

  • iris is the name of table into which we want to insert our DataFrame.

  • con = engine provides the connection details (recall that we created engine using our authentication details in the previous step).

  • if_exists = 'append'checks whether the table we specified already exists or not, and then appends the new data (if it does exist) or creates a new table (if it doesn’t).

  • chunksize writes records in batches of a given size at a time. By default, all rows will be written at once.

3.2.1. Query the database to check our work

Again, let’s query the database to make sure that our inserted data has been saved correctly.

# Execute query
sql = "SELECT * FROM iris"
cursor.execute(sql)

# Fetch all the records
result = cursor.fetchall()
for i in result:
    print(i)

image

# Close the connection
if (conn.is_connected()):
    cursor.close()
    conn.close()
    print(“MySQL connection is closed”)

Conclusion

In this tutorial, We have discussed how to import Pandas DataFrames into MySQL databases using two different methods, including the highly efficient to_sql() method.

All the code for this article is available as a Jupyter Notebook on GitHub.

Find the Fastest Way to Import CSV Data Into MySQL Database Using Python




Continue Learning