IBM Data Engineering: Databases and SQL for Data Science with Python
The IBM course on Databases and SQL for Data Science with Python provides a hands-on introduction to querying, analyzing, and managing relational data using SQL. While the fundamentals were familiar from previous work, the course helped reinforce best practices around joins, subqueries, and database design, while also introducing key concepts like stored procedures, views, and ACID transactions essential for scalable data workflows
Module 1: Getting Started with SQL
1. SQL and Databases: The Basics
SQL: Structured Query Language for interacting with relational databases
Database: Structured collection of data (tables, columns, rows)
DBMS: Software system to manage databases (e.g. MySQL, PostgreSQL, SQLite)
SQL commands (DML):
SELECT
,INSERT
,UPDATE
,DELETE
2. SELECT Statement
Retrieves data from one or more tables
SELECT * FROM table; SELECT column1, column2 FROM table;
Use
WHERE
for filtering rows based on a condition:
SELECT * FROM books WHERE book_id = 'B1';
Comparators:
=
,!=
,<>
,<
,>
,<=
,>=
3. Aggregations and Row Limiting
COUNT(*)
: count all rowsCOUNT(column)
: count non-null rowsDISTINCT
: returns unique values from a columnLIMIT
+OFFSET
: restrict and paginate results
4. Data Insertion and Updates
INSERT INTO table (col1, col2) VALUES ('val1', 'val2');
UPDATE books SET title = 'New Title' WHERE book_id = 'B1';
DELETE FROM books WHERE book_id IN ('B2', 'B3');
Module 2: Relational Databases and Tables
1. Core Concepts
Entity = table, Attribute = column
Primary Key = unique row identifier
Foreign Key = column that links to another table's primary key
Common data types:
INT
,VARCHAR
,DATE
, etc.
2. Schema Modifications
Add column:
ALTER TABLE author ADD telephone_number BIGINT;
Modify data type:
ALTER TABLE author MODIFY telephone_number VARCHAR(15);
Delete all rows:
TRUNCATE TABLE books;
3. Table Creation and Deletion
CREATE TABLE author (
id VARCHAR(5),
name VARCHAR(100),
PRIMARY KEY (id)
);
DROP TABLE author;
Module 3: Intermediate SQL
1. String Patterns and Ranges
LIKE '%text%'
for pattern matchingBETWEEN a AND b
for range filtering (inclusive)IN ('A', 'B')
for value list matching
2. Sorting and Grouping
ORDER BY column [DESC]
GROUP BY
clusters rows for aggregate functionsHAVING
filters grouped results (used after aggregation)
3. Built-in Functions
Aggregate:
SUM()
,AVG()
,MIN()
,MAX()
,COUNT()
Scalar/String:
ROUND()
,LENGTH()
,UCASE()
,LCASE()
Can be aliased using
AS
4. Date/Time Functions
Functions like:
YEAR()
,MONTH()
,DAYOFWEEK()
,HOUR()
,MINUTE()
Used for slicing and analyzing datetime data
5. Subqueries
Nested queries in
SELECT
,WHERE
, orFROM
Can be used as derived tables:
SELECT * FROM (SELECT id FROM table) AS sub;
6. Multi-Table Queries
Implicit JOIN (older style):
SELECT * FROM table1, table2 WHERE table1.id = table2.id;
Aliased joins:
SELECT a.name, b.salary FROM emp a, sal b WHERE a.id = b.emp_id;
Module 4: Accessing Databases Using Python
1. SQL from Jupyter Notebooks
%load_ext sql
enables magic commands%sql sqlite:///dbfile.db
connects to SQLite database
2. Python DB-API
Core objects:
Connection
: manages the database sessionCursor
: runs queries and retrieves results
Typical usage:
conn = sqlite3.connect('db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM table")
data = cursor.fetchall()
3. SQL Magic and Jupyter Commands
%ls
,%timeit
,%history
,%%sql
,%%HTML
, etc.Write multi-line SQL with
%%sql
4. Working with pandas
Load CSV:
df = pd.read_csv("file.csv")
Load into SQL:
df.to_sql("table", conn)
Query into DataFrame:
df = pd.read_sql("SELECT * FROM table", conn)
Visualize with seaborn, matplotlib
5. Optional: IBM DB2
Use
ibm_db.connect()
andibm_db.exec_immediate()
Module 5: Assignment & Real-World Practices
1. Real-World CSV Handling
Use backticks for special characters in headers:
`column name`
Use
LIMIT
in SQL instead of slicing in pandasBreak long queries across lines using
\
2. Schema Exploration
SQLite:
sqlite_master
,PRAGMA TABLE_INFO()
MySQL:
SHOW TABLES
,DESCRIBE table;
DB2:
SELECT * FROM SYSCAT.TABLES
Module 6: Advanced SQL for Data Engineers
1. Views
Virtual tables defined by a SELECT statement
Don't store data; updated dynamically
Good for:
Simplifying joins
Hiding sensitive fields
Restricting columns
CREATE VIEW view_name AS SELECT ... FROM ... WHERE ...;
2. Stored Procedures
Encapsulate logic in a DB function
Accept parameters and return results
Benefits:
Performance
Reusability
Security
DELIMITER $$
CREATE PROCEDURE my_proc(IN var INT)
BEGIN
-- SQL logic
END $$
DELIMITER ;
CALL my_proc(100);
3. ACID Transactions
Ensure data safety in multi-step operations
BEGIN;
-- SQL statements
COMMIT;
-- or
ROLLBACK;
ACID = Atomic, Consistent, Isolated, Durable
4. Joins Recap
INNER JOIN
: only matchesLEFT JOIN
: all from left + matches from rightRIGHT JOIN
: all from right + matches from leftFULL JOIN
: everything with NULLs where missing
SELECT * FROM a
LEFT JOIN b ON a.id = b.id;
Final Thoughts
This course did a great job combining SQL fundamentals with real-world data workflows. The refresher on basic statements like SELECT
, INSERT
, and JOIN
helped to support more advanced ideas like subqueries, views, and transactions in context.
What stood out most was how smoothly SQL integrates with Python through tools like Jupyter and pandas — making it a practical tool for querying and analyzing data. The optional material on stored procedures and ACID transactions also introduced a more engineering-focused perspective I know will come in handy.
Ultimately, this course reminded me that knowing how to write a query is just one part — understanding why you're designing the data the way you are is what takes things to the next level.
Course Certificate: View on Coursera
All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.