IBM Data Engineering: Introduction to Relational Databases
The IBM course on Introduction to Relational Databases is part of the professional certificate path and focuses on the fundamentals of relational database design, structure, and practical usage. While much of the hands-on work felt familiar from past projects, this course helped solidify the theoretical background, especially around data modeling, normalization, and the relational model
Module 1: Relational Database Concepts
This module covered a lot of foundational theory around how databases are structured, used, and designed. It explained the differences between types of data (structured, semi-structured, unstructured), where relational databases fit in, and why they're useful.
Key ideas:
Data types can be structured (spreadsheets, SQL tables), semi-structured (JSON, XML), or unstructured (text files, images, videos).
File formats like
.csv
,.json
,.xlsx
,.xml
, etc., come into play depending on how and where the data is stored.Relational databases store structured data in a table format with columns and rows. Examples include MySQL, PostgreSQL, IBM Db2, Oracle, etc.
The idea of information models vs data models was new to me:
An information model is more abstract and used for understanding relationships and concepts (used early in planning).
A data model is a technical implementation of the information model, with specific data types, constraints, and table structure.
Also reviewed:
Entity Relationship Diagrams (ERDs) and how they map to tables.
Basics of relational theory like sets, tuples, relations.
OLTP vs OLAP systems: one for fast transactional updates, the other for big analytical queries.
Things I wanted to remember:
Use
VARCHAR
for variable-length strings, andCHAR(n)
when values are always the same length.Data types help with sorting, filtering, and enforcing consistency.
PostgreSQL and MySQL both support most standard SQL concepts, even though they differ in tooling.
The idea of using a conceptual model first (like an ERD) is really useful before jumping into schema design
Self-check notes:
Q: Why separate information and data models?
A: Because the information model captures the business logic, while the data model implements it in SQL.
Q: What's the role of ERDs?
A: They visualize how entities relate to each other, which helps with normalization and planning joins.
Q: What makes a relation different from a table?
A: Technically nothing — a relation is just the formal name for a table in relational algebra
Module 2: Using Relational Databases
This module was more hands-on, and went into the actual process of creating tables, loading data, defining keys, and understanding normalization. It also explained the hierarchy of database components (instances, databases, schemas, objects).
Key SQL types:
DDL (Data Definition Language):
CREATE
,ALTER
,DROP
,TRUNCATE
DML (Data Manipulation Language):
INSERT
,SELECT
,UPDATE
,DELETE
Tables can be created either through GUI tools or by writing SQL directly. The course walked through examples in both MySQL and PostgreSQL. It also introduced tools like mysqldump
for backing up and restoring, and how to import/export CSVs.
Indexes
Improve search speed by storing pointers to row locations.
Great for
SELECT
queries but come with performance costs forINSERT
/UPDATE
/DELETE
.Can enforce uniqueness when created as
UNIQUE INDEX
.
Constraints
Primary key: ensures each row is unique
Foreign key: enforces relationship between tables
CHECK: restricts allowed values
NOT NULL, DEFAULT, etc.
Normalization
This was one of the more important concepts:
1NF: Every field holds atomic values (no multiple values per cell)
2NF: Every non-key column must depend on the full primary key
3NF: No transitive dependencies (columns must depend directly on the key)
In practice, this means splitting tables and removing repeated values. It improves data integrity and reduces redundancy. The course also noted that OLAP systems may relax normalization to prioritize performance.
Self-check notes:
Q: When should you use TRUNCATE
instead of DELETE
?
A: When you want to remove all rows quickly and don't need to log each deletion.
Q: What’s the tradeoff of indexing a table heavily?
A: Reads get faster, but inserts and updates get slower because indexes must be maintained.
Q: What does 2NF actually fix?
A: It removes partial dependencies by making sure each column depends on the entire primary key, not just part of it.
Module 3: MySQL and PostgreSQL
This module introduced practical environments. I worked with PostgreSQL using pgAdmin
and with MySQL using phpMyAdmin
. The tools were slightly different, but the concepts were the same.
What I did:
Created databases and tables through GUI tools.
Used CLI commands like
pg_dump
,mysqldump
,source
, andrestore
.Explored different types of GUIs (Workbench, phpMyAdmin, pgAdmin) and what each was good for.
Key tools and concepts:
In MySQL:
mysqldump
to back upmysql
with<
to restorephpMyAdmin
as a web GUI
In PostgreSQL:
pgAdmin
for ERDs, data import/export, and visual table editingpg_dump
andpsql < file.sql
for CLI backup/restore
Views and Materialized Views
A view is like a virtual table defined by a
SELECT
query. It can simplify joins or hide sensitive columns.A materialized view stores the data physically and can be refreshed manually. Useful for performance when data doesn't change frequently.
Module 4: Normalization in Practice
This module gave me a project-based task: take a messy dataset and normalize it into 2NF and 3NF. I had to identify entities, extract attributes, create separate tables, and define keys and relationships.
What I did:
Split multi-product transactions into two tables:
sales_transaction
andsales_detail
Moved product category and type info into a new
product_type
tableUsed composite keys where appropriate
Defined foreign keys linking all relevant tables
My ERD structure ended up like this:
sales_transaction
(transaction-level info)sales_detail
(line items, withtransaction_id
+product_id
as composite PK)product
(individual items)product_type
(category and type info)
Self-check notes:
Q: Why split the transaction into a header/detail structure?
A: Because each transaction can have multiple products. Putting all that in one table violates 2NF.
Q: What’s the benefit of moving category/type into their own table?
A: To avoid redundant strings and make the data easier to update or extend (e.g. add metadata to a type).
Q: Should composite keys be used often?
A: Only when both fields together are necessary to uniquely identify a row — like product-in-transaction.
Final Thoughts
This course struck a nice balance between database theory and hands-on practice. It helped me better understand the reason behind things I was already doing and filled in a few conceptual gaps - especially around normalization, constraints, and the different roles of views in data workflows.
What stood out most was how consistent relational principles are across different platforms. Once I understood the basics - like ER diagrams, primary/foreign keys, normalization rules, and the SQL commands - I could apply that knowledge equally well in MySQL, PostgreSQL, or Db2.
It’s also a good reminder that no matter how user-friendly the tools get, it’s the schema thinking that really matters. That’s what keeps your data clean, scalable, and reliable.
Course Certificate: View on Coursera
All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.