Colab is a hosted Jupyter Notebook service that lets you run Python code on the cloud with access to GPUs and TPUs. While it’s mainly used for Python, Colab also lets you run SQL commands, making it easy to combine data manipulation and analysis into your work. Of the two main ways to use SQL in Colab, using magic commands is the simplest and most straightforward. To start with this, first we need to import the required libraries.
SQL on Google Colab
import pandas as pd
import sqlite3
Next, we take a Pandas dataframe input_df
and upload it to table_name
SQLITE table. We define SQL helper functions as follows.
def pd_to_sqlDB(input_df: pd.DataFrame,
table_name: str,
db_name: str = 'default.db') -> None:
'''Take a Pandas dataframe `input_df` and upload it to `table_name` SQLITE table
Args:
input_df (pd.DataFrame): Dataframe containing data to upload to SQLITE
table_name (str): Name of the SQLITE table to upload to
db_name (str, optional): Name of the SQLITE Database in which the table is created.
Defaults to 'default.db'.
'''
# Step 1: Setup local logging
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
# Step 2: Find columns in the dataframe
cols = input_df.columns
cols_string = ','.join(cols)
val_wildcard_string = ','.join(['?'] * len(cols))
# Step 3: Connect to a DB file if it exists, else crete a new file
con = sqlite3.connect(db_name)
cur = con.cursor()
logging.info(f'SQL DB {db_name} created')
# Step 4: Create Table
sql_string = f"""CREATE TABLE {table_name} ({cols_string});"""
cur.execute(sql_string)
logging.info(f'SQL Table {table_name} created with {len(cols)} columns')
# Step 5: Upload the dataframe
rows_to_upload = input_df.to_dict(orient='split')['data']
sql_string = f"""INSERT INTO {table_name} ({cols_string}) VALUES ({val_wildcard_string});"""
cur.executemany(sql_string, rows_to_upload)
logging.info(f'{len(rows_to_upload)} rows uploaded to {table_name}')
# Step 6: Commit the changes and close the connection
con.commit()
con.close()
def sql_query_to_pd(sql_query_string: str, db_name: str ='default.db') -> pd.DataFrame:
'''Execute an SQL query and return the results as a pandas dataframe
Args:
sql_query_string (str): SQL query string to execute
db_name (str, optional): Name of the SQLITE Database to execute the query in.
Defaults to 'default.db'.
Returns:
pd.DataFrame: Results of the SQL query in a pandas dataframe
'''
# Step 1: Connect to the SQL DB
con = sqlite3.connect(db_name)
# Step 2: Execute the SQL query
cursor = con.execute(sql_query_string)
# Step 3: Fetch the data and column names
result_data = cursor.fetchall()
cols = [description[0] for description in cursor.description]
# Step 4: Close the connection
con.close()
# Step 5: Return as a dataframe
return pd.DataFrame(result_data, columns=cols)
Finally, we can execute query using the code given below
def pd_to_sqlDB(input_df: pd.DataFrame,
table_name: str,
db_name: str = 'default.db') -> None:
'''Take a Pandas dataframe `input_df` and upload it to `table_name` SQLITE table
Args:
input_df (pd.DataFrame): Dataframe containing data to upload to SQLITE
table_name (str): Name of the SQLITE table to upload to
db_name (str, optional): Name of the SQLITE Database in which the table is created.
Defaults to 'default.db'.
'''
# Step 1: Setup local logging
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
# Step 2: Find columns in the dataframe
cols = input_df.columns
cols_string = ','.join(cols)
val_wildcard_string = ','.join(['?'] * len(cols))
# Step 3: Connect to a DB file if it exists, else crete a new file
con = sqlite3.connect(db_name)
cur = con.cursor()
logging.info(f'SQL DB {db_name} created')
# Step 4: Create Table
sql_string = f"""CREATE TABLE {table_name} ({cols_string});"""
cur.execute(sql_string)
logging.info(f'SQL Table {table_name} created with {len(cols)} columns')
# Step 5: Upload the dataframe
rows_to_upload = input_df.to_dict(orient='split')['data']
sql_string = f"""INSERT INTO {table_name} ({cols_string}) VALUES ({val_wildcard_string});"""
cur.executemany(sql_string, rows_to_upload)
logging.info(f'{len(rows_to_upload)} rows uploaded to {table_name}')
# Step 6: Commit the changes and close the connection
con.commit()
con.close()
def sql_query_to_pd(sql_query_string: str, db_name: str ='default.db') -> pd.DataFrame:
'''Execute an SQL query and return the results as a pandas dataframe
Args:
sql_query_string (str): SQL query string to execute
db_name (str, optional): Name of the SQLITE Database to execute the query in.
Defaults to 'default.db'.
Returns:
pd.DataFrame: Results of the SQL query in a pandas dataframe
'''
# Step 1: Connect to the SQL DB
con = sqlite3.connect(db_name)
# Step 2: Execute the SQL query
cursor = con.execute(sql_query_string)
# Step 3: Fetch the data and column names
result_data = cursor.fetchall()
cols = [description[0] for description in cursor.description]
# Step 4: Close the connection
con.close()
# Step 5: Return as a dataframe
return pd.DataFrame(result_data, columns=cols)
Advantages of using SQL in Google Colab:
- Simplicity: The syntax is straightforward and easy to understand, making it accessible to beginners and experienced users alike.
- Efficiency: Single-line queries can be executed swiftly, while multi-line queries can be organized into code blocks for better readability.
- Integration with Python: SQL commands can be seamlessly interlaced with Python code, enabling a powerful combination of data analysis and programming.
Whether you’re a seasoned data scientist or a budding SQL enthusiast, use SQL within the versatile environment of Google Colab.
Full implemented code and used dataset is available at jyotidabass/SQL-on-google-colab (github.com)
Cheers!! Happy reading!! Keep learning!!