The IBM course on Databases and SQL for Data Science with Python provides a hands-on introduction to querying, analyzing, and managing relational data using SQL. While the fundamentals were familiar from previous work, the course helped reinforce best practices around joins, subqueries, and database design, while also introducing key concepts like stored procedures, views, and ACID transactions essential for scalable data workflows

Module 1: Getting Started with SQL

1. SQL and Databases: The Basics

SQL: Structured Query Language for interacting with relational databases
Database: Structured collection of data (tables, columns, rows)
DBMS: Software system to manage databases (e.g. MySQL, PostgreSQL, SQLite)
SQL commands (DML): SELECT, INSERT, UPDATE, DELETE

2. SELECT Statement

Retrieves data from one or more tables

SELECT * FROM table; SELECT column1, column2 FROM table;

Use WHERE for filtering rows based on a condition:

SELECT * FROM books WHERE book_id = 'B1';

Comparators: =, !=, <>, <, >, <=, >=

3. Aggregations and Row Limiting

COUNT(*): count all rows
COUNT(column): count non-null rows
DISTINCT: returns unique values from a column
LIMIT + OFFSET: restrict and paginate results

4. Data Insertion and Updates

INSERT INTO table (col1, col2) VALUES ('val1', 'val2');
UPDATE books SET title = 'New Title' WHERE book_id = 'B1';
DELETE FROM books WHERE book_id IN ('B2', 'B3');

Module 2: Relational Databases and Tables

1. Core Concepts

Entity = table, Attribute = column
Primary Key = unique row identifier
Foreign Key = column that links to another table's primary key
Common data types: INT, VARCHAR, DATE, etc.

2. Schema Modifications

Add column:

ALTER TABLE author ADD telephone_number BIGINT;

Modify data type:

ALTER TABLE author MODIFY telephone_number VARCHAR(15);

Delete all rows:

TRUNCATE TABLE books;

3. Table Creation and Deletion

CREATE TABLE author (
id VARCHAR(5),
name VARCHAR(100),
PRIMARY KEY (id)
);
DROP TABLE author;

Module 3: Intermediate SQL

1. String Patterns and Ranges

LIKE '%text%' for pattern matching
BETWEEN a AND b for range filtering (inclusive)
IN ('A', 'B') for value list matching

2. Sorting and Grouping

ORDER BY column [DESC]
GROUP BY clusters rows for aggregate functions
HAVING filters grouped results (used after aggregation)

3. Built-in Functions

Aggregate: SUM(), AVG(), MIN(), MAX(), COUNT()
Scalar/String: ROUND(), LENGTH(), UCASE(), LCASE()
Can be aliased using AS

4. Date/Time Functions

Functions like: YEAR(), MONTH(), DAYOFWEEK(), HOUR(), MINUTE()
Used for slicing and analyzing datetime data

5. Subqueries

Nested queries in SELECT, WHERE, or FROM
Can be used as derived tables:

SELECT * FROM (SELECT id FROM table) AS sub;

6. Multi-Table Queries

Implicit JOIN (older style):

SELECT * FROM table1, table2 WHERE table1.id = table2.id;

Aliased joins:

SELECT a.name, b.salary FROM emp a, sal b WHERE a.id = b.emp_id;

Module 4: Accessing Databases Using Python

1. SQL from Jupyter Notebooks

%load_ext sql enables magic commands
%sql sqlite:///dbfile.db connects to SQLite database

2. Python DB-API

Core objects:
- Connection: manages the database session
- Cursor: runs queries and retrieves results
Typical usage:

conn = sqlite3.connect('db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM table")
data = cursor.fetchall()

3. SQL Magic and Jupyter Commands

%ls, %timeit, %history, %%sql, %%HTML, etc.
Write multi-line SQL with %%sql

4. Working with pandas

Load CSV:

df = pd.read_csv("file.csv")

Load into SQL:

df.to_sql("table", conn)

Query into DataFrame:

df = pd.read_sql("SELECT * FROM table", conn)

Visualize with seaborn, matplotlib

5. Optional: IBM DB2

Use ibm_db.connect() and ibm_db.exec_immediate()

Module 5: Assignment & Real-World Practices

1. Real-World CSV Handling

Use backticks for special characters in headers: `column name`
Use LIMIT in SQL instead of slicing in pandas
Break long queries across lines using \

2. Schema Exploration

SQLite: sqlite_master, PRAGMA TABLE_INFO()
MySQL: SHOW TABLES, DESCRIBE table;
DB2: SELECT * FROM SYSCAT.TABLES

Module 6: Advanced SQL for Data Engineers

1. Views

Virtual tables defined by a SELECT statement
Don't store data; updated dynamically
Good for:
- Simplifying joins
- Hiding sensitive fields
- Restricting columns

CREATE VIEW view_name AS SELECT ... FROM ... WHERE ...;

2. Stored Procedures

Encapsulate logic in a DB function
Accept parameters and return results
Benefits:
- Performance
- Reusability
- Security

DELIMITER $$
CREATE PROCEDURE my_proc(IN var INT)
BEGIN
-- SQL logic
END $$
DELIMITER ;
CALL my_proc(100);

3. ACID Transactions

Ensure data safety in multi-step operations

BEGIN;
-- SQL statements
COMMIT;
-- or
ROLLBACK;

ACID = Atomic, Consistent, Isolated, Durable

4. Joins Recap

INNER JOIN: only matches
LEFT JOIN: all from left + matches from right
RIGHT JOIN: all from right + matches from left
FULL JOIN: everything with NULLs where missing

SELECT * FROM a
LEFT JOIN b ON a.id = b.id;

Final Thoughts

This course did a great job combining SQL fundamentals with real-world data workflows. The refresher on basic statements like SELECT, INSERT, and JOIN helped to support more advanced ideas like subqueries, views, and transactions in context.

What stood out most was how smoothly SQL integrates with Python through tools like Jupyter and pandas — making it a practical tool for querying and analyzing data. The optional material on stored procedures and ACID transactions also introduced a more engineering-focused perspective I know will come in handy.

Ultimately, this course reminded me that knowing how to write a query is just one part — understanding why you're designing the data the way you are is what takes things to the next level.

Course Certificate: View on Coursera

All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.

IBM Data Engineering: Databases and SQL for Data Science with Python