IBM Data Engineering: Python

This course was the second part of the IBM Data Engineering Certificate. It covered Python programming fundamentals, data structures, file handling, basic data manipulation with pandas and numpy, and practical tools for data collection like REST APIs and web scraping.

 

Module 1 – Python Basics

This module introduced the core building blocks of Python, including data types, expressions, variables, and string manipulation. It also covered how to run and document Python code using Jupyter Notebooks. The content serves as a general-purpose foundation for later data work.

Topics:

  • Python characteristics: readable syntax, large standard library, popular in data science

  • Jupyter Notebooks: code and markdown cells, interactive execution

  • Data types: int, float, str, bool

  • Type casting: float(2) returns 2.0

  • Expressions and variables: storing values using =, working with operators

  • String operations:

    • Indexing: x[0], x[-1]

    • Slicing: x[::2]

    • Methods: .upper(), .lower(), .replace(), .find()

    • Escape sequences: \n, \t, \\


Module 2 – Python Data Structures

The second module focused on Python’s built-in data structures and how to use them for organizing and manipulating collections of data. Lists, tuples, dictionaries, and sets were covered along with their characteristics and associated methods. Understanding these is essential for working with more complex data formats later in the course.

Lists and Tuples:

  • Both are ordered sequences

  • Tuples are immutable, created with (a, b)

  • Lists are mutable, created with [a, b]

  • List methods: .append(), .extend(), del, slicing, cloning with A = B[:]

  • Tuples can’t be changed after creation

Dictionaries:

  • Key-value pairs, created with {key: value}

  • Keys must be unique and immutable

  • Access with dict['key']

  • Methods: .keys(), .values()

Sets:

  • Unordered, only unique elements

  • Created with {} or set()

  • Methods: .union(), .intersection(), .issubset(), .issuperset()


Module 3 – Python Programming Fundamentals

This part introduced control flow and reusable logic through conditionals, loops, and functions. Exception handling and object-oriented programming were also introduced to encourage better code structure and error management. The material helps build clean and maintainable Python code that can be scaled for data projects.

Conditionals:

  • if, elif, else

  • Comparison operators: ==, !=, <, >

  • Logical operators: and, or, not

Loops:

  • for loops for iterating over ranges or sequences

  • while loops for repeating based on conditions

  • range(start, stop) excludes stop value

  • enumerate() for getting index and value in a loop

Functions:

  • Defined with def

  • Can take multiple or optional parameters

  • return keyword sends back a value

  • Local vs global scope

  • Variable-length input: *args

Exception Handling:

  • Use try, except, else, finally

  • Common exceptions: ZeroDivisionError, ValueError, IndexError, KeyError, TypeError, FileNotFoundError

Classes and Objects:

  • Create a class using class ClassName(object):

  • Constructor: __init__(self, ...)

  • Attributes and methods accessed with self.attribute and self.method()

  • dir(object) shows available attributes and methods


Module 4 – Working with Data in Python

This module focused on reading, writing, and parsing data from various file formats. It introduced pandas for structured tabular data and numpy for numerical arrays and operations. The content builds core data handling skills needed for cleaning and exploring real-world datasets.

This module introduced the core building blocks of Python, including data types, expressions, variables, and string manipulation. It also covered how to run and document Python code using Jupyter Notebooks. The content serves as a general-purpose foundation for later data work.

File handling with open():

  • Modes: 'r', 'w', 'a', 'r+', 'a+'

  • with open(...) automatically closes file

  • .readline(), .readlines(), .read(n), .seek(), .tell()

  • Writing lines: loop with .write()

pandas basics:

  • import pandas as pd

  • Read files: pd.read_csv(), pd.read_excel()

  • Create DataFrame from dict: pd.DataFrame({...})

  • Access data: df['col'], df[['col1', 'col2']], .iloc[], .loc[]

  • Filter with condition: df[df['col'] > value]

  • Use .unique() to get unique values

NumPy:

  • import numpy as np

  • Arrays: np.array([1, 2, 3])

  • Shape info: .ndim, .shape, .size

  • Element-wise math: a + b, a * 2

  • Dot product: np.dot(a, b)

  • Universal functions: .mean(), .max(), .min()

  • np.linspace(start, stop, num) for evenly spaced values


Module 5 – REST APIs, Web Scraping, and Working with Files

The final module shifted into external data acquisition using web technologies. It covered REST API calls with requests, HTML parsing with BeautifulSoup, and handling of formats like JSON and XML. These tools are essential for collecting data from dynamic online sources and integrating them into a Python-based workflow.

REST APIs:

  • APIs allow access to external systems or data

  • HTTP methods: GET, POST, PUT, DELETE

  • URL structure: scheme (http://), base, route

  • Parameters passed with params={...}

  • Headers provide metadata (e.g. Authorization)

  • JSON response accessed with .json()

  • Status codes:

    • 200 success

    • 400 client error

    • 500 server error

Example:

import requests

url = "http://httpbin.org/get"

payload = {"name": "Joseph", "ID": "123"}

r = requests.get(url, params=payload)

data = r.json()

Web Scraping:

  • Use requests to get HTML

  • Use BeautifulSoup to parse HTML

  • HTML structure: tags (<html>, <head>, <body>, <tr>, <td>)

  • Use .find(), .find_all(), .parent, .children to navigate

  • Extract attributes with .attrs, text with .text

Example:

from bs4 import BeautifulSoup

html = requests.get("https://example.com").text

soup = BeautifulSoup(html, "html.parser")

rows = soup.find_all("tr")

File Formats:

  • CSV:

use pandas.read_csv()

  • JSON:

import json

with open("file.json") as f:

data = json.load(f)

  • XML:

import xml.etree.ElementTree as ET

tree = ET.parse("file.xml")

root = tree.getroot()

  • HTML tables:

use pandas.read_html()


Final Thoughts

This course provided a well-rounded introduction to Python with a focus on practical skills used in data science. Topics ranged from syntax and control flow to file parsing, DataFrame operations, and programmatic data collection. The final module, which covered APIs, HTML scraping, and format handling, introduced tools commonly used in real-world data pipelines. These skills are foundational for any further work in analytics, machine learning, or data engineering.

The next step will be a hands-on mini-project that reinforces these concepts, but there are also plans to continue practicing independently—especially around authenticated APIs, structured data scraping, and integration into existing tools and workflows.

Course Certificate: View on Coursera

All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.

Previous
Previous

Turbulence Chapter 3: Near-Wall Modeling

Next
Next

IBM Data Engineering: Introduction