IBM Data Engineering: Databases and SQL for Data Science with Python

The IBM course on Databases and SQL for Data Science with Python provides a hands-on introduction to querying, analyzing, and managing relational data using SQL. While the fundamentals were familiar from previous work, the course helped reinforce best practices around joins, subqueries, and database design, while also introducing key concepts like stored procedures, views, and ACID transactions essential for scalable data workflows

 

Module 1: Getting Started with SQL

1. SQL and Databases: The Basics

  • SQL: Structured Query Language for interacting with relational databases

  • Database: Structured collection of data (tables, columns, rows)

  • DBMS: Software system to manage databases (e.g. MySQL, PostgreSQL, SQLite)

  • SQL commands (DML): SELECT, INSERT, UPDATE, DELETE

2. SELECT Statement

  • Retrieves data from one or more tables

SELECT * FROM table; SELECT column1, column2 FROM table;

  • Use WHERE for filtering rows based on a condition:

SELECT * FROM books WHERE book_id = 'B1';

  • Comparators: =, !=, <>, <, >, <=, >=

3. Aggregations and Row Limiting

  • COUNT(*): count all rows

  • COUNT(column): count non-null rows

  • DISTINCT: returns unique values from a column

  • LIMIT + OFFSET: restrict and paginate results

4. Data Insertion and Updates

INSERT INTO table (col1, col2) VALUES ('val1', 'val2');

UPDATE books SET title = 'New Title' WHERE book_id = 'B1';

DELETE FROM books WHERE book_id IN ('B2', 'B3');


Module 2: Relational Databases and Tables

1. Core Concepts

  • Entity = table, Attribute = column

  • Primary Key = unique row identifier

  • Foreign Key = column that links to another table's primary key

  • Common data types: INT, VARCHAR, DATE, etc.

2. Schema Modifications

  • Add column:

ALTER TABLE author ADD telephone_number BIGINT;

  • Modify data type:

ALTER TABLE author MODIFY telephone_number VARCHAR(15);

  • Delete all rows:

TRUNCATE TABLE books;

3. Table Creation and Deletion

CREATE TABLE author (

id VARCHAR(5),

name VARCHAR(100),

PRIMARY KEY (id)

);

DROP TABLE author;


Module 3: Intermediate SQL

1. String Patterns and Ranges

  • LIKE '%text%' for pattern matching

  • BETWEEN a AND b for range filtering (inclusive)

  • IN ('A', 'B') for value list matching

2. Sorting and Grouping

  • ORDER BY column [DESC]

  • GROUP BY clusters rows for aggregate functions

  • HAVING filters grouped results (used after aggregation)

3. Built-in Functions

  • Aggregate: SUM(), AVG(), MIN(), MAX(), COUNT()

  • Scalar/String: ROUND(), LENGTH(), UCASE(), LCASE()

  • Can be aliased using AS

4. Date/Time Functions

  • Functions like: YEAR(), MONTH(), DAYOFWEEK(), HOUR(), MINUTE()

  • Used for slicing and analyzing datetime data

5. Subqueries

  • Nested queries in SELECT, WHERE, or FROM

  • Can be used as derived tables:

SELECT * FROM (SELECT id FROM table) AS sub;

6. Multi-Table Queries

  • Implicit JOIN (older style):

SELECT * FROM table1, table2 WHERE table1.id = table2.id;

  • Aliased joins:

SELECT a.name, b.salary FROM emp a, sal b WHERE a.id = b.emp_id;


Module 4: Accessing Databases Using Python

1. SQL from Jupyter Notebooks

  • %load_ext sql enables magic commands

  • %sql sqlite:///dbfile.db connects to SQLite database

2. Python DB-API

  • Core objects:

    • Connection: manages the database session

    • Cursor: runs queries and retrieves results

  • Typical usage:

conn = sqlite3.connect('db')

cursor = conn.cursor()

cursor.execute("SELECT * FROM table")

data = cursor.fetchall()

3. SQL Magic and Jupyter Commands

  • %ls, %timeit, %history, %%sql, %%HTML, etc.

  • Write multi-line SQL with %%sql

4. Working with pandas

  • Load CSV:

df = pd.read_csv("file.csv")

  • Load into SQL:

df.to_sql("table", conn)

  • Query into DataFrame:

df = pd.read_sql("SELECT * FROM table", conn)

  • Visualize with seaborn, matplotlib

5. Optional: IBM DB2

  • Use ibm_db.connect() and ibm_db.exec_immediate()


Module 5: Assignment & Real-World Practices

1. Real-World CSV Handling

  • Use backticks for special characters in headers: `column name`

  • Use LIMIT in SQL instead of slicing in pandas

  • Break long queries across lines using \

2. Schema Exploration

  • SQLite: sqlite_master, PRAGMA TABLE_INFO()

  • MySQL: SHOW TABLES, DESCRIBE table;

  • DB2: SELECT * FROM SYSCAT.TABLES


Module 6: Advanced SQL for Data Engineers

1. Views

  • Virtual tables defined by a SELECT statement

  • Don't store data; updated dynamically

  • Good for:

    • Simplifying joins

    • Hiding sensitive fields

    • Restricting columns

CREATE VIEW view_name AS SELECT ... FROM ... WHERE ...;

2. Stored Procedures

  • Encapsulate logic in a DB function

  • Accept parameters and return results

  • Benefits:

    • Performance

    • Reusability

    • Security

DELIMITER $$

CREATE PROCEDURE my_proc(IN var INT)

BEGIN

-- SQL logic

END $$

DELIMITER ;

CALL my_proc(100);

3. ACID Transactions

  • Ensure data safety in multi-step operations

BEGIN;

-- SQL statements

COMMIT;

-- or

ROLLBACK;

  • ACID = Atomic, Consistent, Isolated, Durable

4. Joins Recap

  • INNER JOIN: only matches

  • LEFT JOIN: all from left + matches from right

  • RIGHT JOIN: all from right + matches from left

  • FULL JOIN: everything with NULLs where missing

SELECT * FROM a

LEFT JOIN b ON a.id = b.id;


Final Thoughts

This course did a great job combining SQL fundamentals with real-world data workflows. The refresher on basic statements like SELECT, INSERT, and JOIN helped to support more advanced ideas like subqueries, views, and transactions in context.

What stood out most was how smoothly SQL integrates with Python through tools like Jupyter and pandas — making it a practical tool for querying and analyzing data. The optional material on stored procedures and ACID transactions also introduced a more engineering-focused perspective I know will come in handy.

Ultimately, this course reminded me that knowing how to write a query is just one part — understanding why you're designing the data the way you are is what takes things to the next level.

Course Certificate: View on Coursera

All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.

Next
Next

IBM Data Engineering: Introduction to Relational Databases