This course was the second part of the IBM Data Engineering Certificate. It covered Python programming fundamentals, data structures, file handling, basic data manipulation with pandas and numpy, and practical tools for data collection like REST APIs and web scraping.

Module 1 – Python Basics

This module introduced the core building blocks of Python, including data types, expressions, variables, and string manipulation. It also covered how to run and document Python code using Jupyter Notebooks. The content serves as a general-purpose foundation for later data work.

Topics:

Python characteristics: readable syntax, large standard library, popular in data science
Jupyter Notebooks: code and markdown cells, interactive execution
Data types: int, float, str, bool
Type casting: float(2) returns 2.0
Expressions and variables: storing values using =, working with operators
String operations:
- Indexing: x[0], x[-1]
- Slicing: x[::2]
- Methods: .upper(), .lower(), .replace(), .find()
- Escape sequences: \n, \t, \\

Module 2 – Python Data Structures

The second module focused on Python’s built-in data structures and how to use them for organizing and manipulating collections of data. Lists, tuples, dictionaries, and sets were covered along with their characteristics and associated methods. Understanding these is essential for working with more complex data formats later in the course.

Lists and Tuples:

Both are ordered sequences
Tuples are immutable, created with (a, b)
Lists are mutable, created with [a, b]
List methods: .append(), .extend(), del, slicing, cloning with A = B[:]
Tuples can’t be changed after creation

Dictionaries:

Key-value pairs, created with {key: value}
Keys must be unique and immutable
Access with dict['key']
Methods: .keys(), .values()

Sets:

Unordered, only unique elements
Created with {} or set()
Methods: .union(), .intersection(), .issubset(), .issuperset()

Module 3 – Python Programming Fundamentals

This part introduced control flow and reusable logic through conditionals, loops, and functions. Exception handling and object-oriented programming were also introduced to encourage better code structure and error management. The material helps build clean and maintainable Python code that can be scaled for data projects.

Conditionals:

if, elif, else
Comparison operators: ==, !=, <, >
Logical operators: and, or, not

Loops:

for loops for iterating over ranges or sequences
while loops for repeating based on conditions
range(start, stop) excludes stop value
enumerate() for getting index and value in a loop

Functions:

Defined with def
Can take multiple or optional parameters
return keyword sends back a value
Local vs global scope
Variable-length input: *args

Exception Handling:

Use try, except, else, finally
Common exceptions: ZeroDivisionError, ValueError, IndexError, KeyError, TypeError, FileNotFoundError

Classes and Objects:

Create a class using class ClassName(object):
Constructor: __init__(self, ...)
Attributes and methods accessed with self.attribute and self.method()
dir(object) shows available attributes and methods

Module 4 – Working with Data in Python

This module focused on reading, writing, and parsing data from various file formats. It introduced pandas for structured tabular data and numpy for numerical arrays and operations. The content builds core data handling skills needed for cleaning and exploring real-world datasets.

This module introduced the core building blocks of Python, including data types, expressions, variables, and string manipulation. It also covered how to run and document Python code using Jupyter Notebooks. The content serves as a general-purpose foundation for later data work.

File handling with `open()`:

Modes: 'r', 'w', 'a', 'r+', 'a+'
with open(...) automatically closes file
.readline(), .readlines(), .read(n), .seek(), .tell()
Writing lines: loop with .write()

pandas basics:

import pandas as pd
Read files: pd.read_csv(), pd.read_excel()
Create DataFrame from dict: pd.DataFrame({...})
Access data: df['col'], df[['col1', 'col2']], .iloc[], .loc[]
Filter with condition: df[df['col'] > value]
Use .unique() to get unique values

NumPy:

import numpy as np
Arrays: np.array([1, 2, 3])
Shape info: .ndim, .shape, .size
Element-wise math: a + b, a * 2
Dot product: np.dot(a, b)
Universal functions: .mean(), .max(), .min()
np.linspace(start, stop, num) for evenly spaced values

Module 5 – REST APIs, Web Scraping, and Working with Files

The final module shifted into external data acquisition using web technologies. It covered REST API calls with requests, HTML parsing with BeautifulSoup, and handling of formats like JSON and XML. These tools are essential for collecting data from dynamic online sources and integrating them into a Python-based workflow.

REST APIs:

APIs allow access to external systems or data
HTTP methods: GET, POST, PUT, DELETE
URL structure: scheme (http://), base, route
Parameters passed with params={...}
Headers provide metadata (e.g. Authorization)
JSON response accessed with .json()
Status codes:
- 200 success
- 400 client error
- 500 server error

Example:

import requests

url = "http://httpbin.org/get"

payload = {"name": "Joseph", "ID": "123"}

r = requests.get(url, params=payload)

data = r.json()

Web Scraping:

Use requests to get HTML
Use BeautifulSoup to parse HTML
HTML structure: tags (<html>, <head>, <body>, <tr>, <td>)
Use .find(), .find_all(), .parent, .children to navigate
Extract attributes with .attrs, text with .text

Example:

from bs4 import BeautifulSoup

html = requests.get("https://example.com").text

soup = BeautifulSoup(html, "html.parser")

rows = soup.find_all("tr")

File Formats:

CSV:

use pandas.read_csv()

JSON:

import json

with open("file.json") as f:

data = json.load(f)

XML:

import xml.etree.ElementTree as ET

tree = ET.parse("file.xml")

root = tree.getroot()

HTML tables:

use pandas.read_html()

Final Thoughts

This course provided a well-rounded introduction to Python with a focus on practical skills used in data science. Topics ranged from syntax and control flow to file parsing, DataFrame operations, and programmatic data collection. The final module, which covered APIs, HTML scraping, and format handling, introduced tools commonly used in real-world data pipelines. These skills are foundational for any further work in analytics, machine learning, or data engineering.

The next step will be a hands-on mini-project that reinforces these concepts, but there are also plans to continue practicing independently—especially around authenticated APIs, structured data scraping, and integration into existing tools and workflows.

Course Certificate: View on Coursera

All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.

IBM Data Engineering: Python