IBM Data Engineering: Introduction to Relational Databases

The IBM course on Introduction to Relational Databases is part of the professional certificate path and focuses on the fundamentals of relational database design, structure, and practical usage. While much of the hands-on work felt familiar from past projects, this course helped solidify the theoretical background, especially around data modeling, normalization, and the relational model

 

Module 1: Relational Database Concepts

This module covered a lot of foundational theory around how databases are structured, used, and designed. It explained the differences between types of data (structured, semi-structured, unstructured), where relational databases fit in, and why they're useful.

Key ideas:

  • Data types can be structured (spreadsheets, SQL tables), semi-structured (JSON, XML), or unstructured (text files, images, videos).

  • File formats like .csv, .json, .xlsx, .xml, etc., come into play depending on how and where the data is stored.

  • Relational databases store structured data in a table format with columns and rows. Examples include MySQL, PostgreSQL, IBM Db2, Oracle, etc.

The idea of information models vs data models was new to me:

  • An information model is more abstract and used for understanding relationships and concepts (used early in planning).

  • A data model is a technical implementation of the information model, with specific data types, constraints, and table structure.

Also reviewed:

  • Entity Relationship Diagrams (ERDs) and how they map to tables.

  • Basics of relational theory like sets, tuples, relations.

  • OLTP vs OLAP systems: one for fast transactional updates, the other for big analytical queries.

Things I wanted to remember:

  • Use VARCHAR for variable-length strings, and CHAR(n) when values are always the same length.

  • Data types help with sorting, filtering, and enforcing consistency.

  • PostgreSQL and MySQL both support most standard SQL concepts, even though they differ in tooling.

  • The idea of using a conceptual model first (like an ERD) is really useful before jumping into schema design

Self-check notes:

Q: Why separate information and data models?
A: Because the information model captures the business logic, while the data model implements it in SQL.

Q: What's the role of ERDs?
A: They visualize how entities relate to each other, which helps with normalization and planning joins.

Q: What makes a relation different from a table?
A: Technically nothing — a relation is just the formal name for a table in relational algebra


Module 2: Using Relational Databases

This module was more hands-on, and went into the actual process of creating tables, loading data, defining keys, and understanding normalization. It also explained the hierarchy of database components (instances, databases, schemas, objects).

Key SQL types:

  • DDL (Data Definition Language): CREATE, ALTER, DROP, TRUNCATE

  • DML (Data Manipulation Language): INSERT, SELECT, UPDATE, DELETE

Tables can be created either through GUI tools or by writing SQL directly. The course walked through examples in both MySQL and PostgreSQL. It also introduced tools like mysqldump for backing up and restoring, and how to import/export CSVs.

Indexes

  • Improve search speed by storing pointers to row locations.

  • Great for SELECT queries but come with performance costs for INSERT/UPDATE/DELETE.

  • Can enforce uniqueness when created as UNIQUE INDEX.

Constraints

  • Primary key: ensures each row is unique

  • Foreign key: enforces relationship between tables

  • CHECK: restricts allowed values

  • NOT NULL, DEFAULT, etc.

Normalization

This was one of the more important concepts:

  • 1NF: Every field holds atomic values (no multiple values per cell)

  • 2NF: Every non-key column must depend on the full primary key

  • 3NF: No transitive dependencies (columns must depend directly on the key)

In practice, this means splitting tables and removing repeated values. It improves data integrity and reduces redundancy. The course also noted that OLAP systems may relax normalization to prioritize performance.

Self-check notes:

Q: When should you use TRUNCATE instead of DELETE?
A: When you want to remove all rows quickly and don't need to log each deletion.

Q: What’s the tradeoff of indexing a table heavily?
A: Reads get faster, but inserts and updates get slower because indexes must be maintained.

Q: What does 2NF actually fix?
A: It removes partial dependencies by making sure each column depends on the entire primary key, not just part of it.


Module 3: MySQL and PostgreSQL

This module introduced practical environments. I worked with PostgreSQL using pgAdmin and with MySQL using phpMyAdmin. The tools were slightly different, but the concepts were the same.

What I did:

  • Created databases and tables through GUI tools.

  • Used CLI commands like pg_dump, mysqldump, source, and restore.

  • Explored different types of GUIs (Workbench, phpMyAdmin, pgAdmin) and what each was good for.

Key tools and concepts:

  • In MySQL:

    • mysqldump to back up

    • mysql with < to restore

    • phpMyAdmin as a web GUI

  • In PostgreSQL:

    • pgAdmin for ERDs, data import/export, and visual table editing

    • pg_dump and psql < file.sql for CLI backup/restore

Views and Materialized Views

  • A view is like a virtual table defined by a SELECT query. It can simplify joins or hide sensitive columns.

  • A materialized view stores the data physically and can be refreshed manually. Useful for performance when data doesn't change frequently.


Module 4: Normalization in Practice

This module gave me a project-based task: take a messy dataset and normalize it into 2NF and 3NF. I had to identify entities, extract attributes, create separate tables, and define keys and relationships.

What I did:

  • Split multi-product transactions into two tables: sales_transaction and sales_detail

  • Moved product category and type info into a new product_type table

  • Used composite keys where appropriate

  • Defined foreign keys linking all relevant tables

My ERD structure ended up like this:

  • sales_transaction (transaction-level info)

  • sales_detail (line items, with transaction_id + product_id as composite PK)

  • product (individual items)

  • product_type (category and type info)

Self-check notes:

Q: Why split the transaction into a header/detail structure?
A: Because each transaction can have multiple products. Putting all that in one table violates 2NF.

Q: What’s the benefit of moving category/type into their own table?
A: To avoid redundant strings and make the data easier to update or extend (e.g. add metadata to a type).

Q: Should composite keys be used often?
A: Only when both fields together are necessary to uniquely identify a row — like product-in-transaction.


Final Thoughts

This course struck a nice balance between database theory and hands-on practice. It helped me better understand the reason behind things I was already doing and filled in a few conceptual gaps - especially around normalization, constraints, and the different roles of views in data workflows.

What stood out most was how consistent relational principles are across different platforms. Once I understood the basics - like ER diagrams, primary/foreign keys, normalization rules, and the SQL commands - I could apply that knowledge equally well in MySQL, PostgreSQL, or Db2.

It’s also a good reminder that no matter how user-friendly the tools get, it’s the schema thinking that really matters. That’s what keeps your data clean, scalable, and reliable.

Course Certificate: View on Coursera

All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.

Next
Next

Turbulence Chapter 5: Scale-Resolving Simulations (SRS)