IBM Data Engineering: Python
This course was the second part of the IBM Data Engineering Certificate. It covered Python programming fundamentals, data structures, file handling, basic data manipulation with pandas
and numpy
, and practical tools for data collection like REST APIs and web scraping.
Module 1 – Python Basics
This module introduced the core building blocks of Python, including data types, expressions, variables, and string manipulation. It also covered how to run and document Python code using Jupyter Notebooks. The content serves as a general-purpose foundation for later data work.
Topics:
Python characteristics: readable syntax, large standard library, popular in data science
Jupyter Notebooks: code and markdown cells, interactive execution
Data types:
int
,float
,str
,bool
Type casting:
float(2)
returns2.0
Expressions and variables: storing values using
=
, working with operatorsString operations:
Indexing:
x[0]
,x[-1]
Slicing:
x[::2]
Methods:
.upper()
,.lower()
,.replace()
,.find()
Escape sequences:
\n
,\t
,\\
Module 2 – Python Data Structures
The second module focused on Python’s built-in data structures and how to use them for organizing and manipulating collections of data. Lists, tuples, dictionaries, and sets were covered along with their characteristics and associated methods. Understanding these is essential for working with more complex data formats later in the course.
Lists and Tuples:
Both are ordered sequences
Tuples are immutable, created with
(a, b)
Lists are mutable, created with
[a, b]
List methods:
.append()
,.extend()
,del
, slicing, cloning withA = B[:]
Tuples can’t be changed after creation
Dictionaries:
Key-value pairs, created with
{key: value}
Keys must be unique and immutable
Access with
dict['key']
Methods:
.keys()
,.values()
Sets:
Unordered, only unique elements
Created with
{}
orset()
Methods:
.union()
,.intersection()
,.issubset()
,.issuperset()
Module 3 – Python Programming Fundamentals
This part introduced control flow and reusable logic through conditionals, loops, and functions. Exception handling and object-oriented programming were also introduced to encourage better code structure and error management. The material helps build clean and maintainable Python code that can be scaled for data projects.
Conditionals:
if
,elif
,else
Comparison operators:
==
,!=
,<
,>
Logical operators:
and
,or
,not
Loops:
for
loops for iterating over ranges or sequenceswhile
loops for repeating based on conditionsrange(start, stop)
excludes stop valueenumerate()
for getting index and value in a loop
Functions:
Defined with
def
Can take multiple or optional parameters
return
keyword sends back a valueLocal vs global scope
Variable-length input:
*args
Exception Handling:
Use
try
,except
,else
,finally
Common exceptions:
ZeroDivisionError
,ValueError
,IndexError
,KeyError
,TypeError
,FileNotFoundError
Classes and Objects:
Create a class using
class ClassName(object):
Constructor:
__init__(self, ...)
Attributes and methods accessed with
self.attribute
andself.method()
dir(object)
shows available attributes and methods
Module 4 – Working with Data in Python
This module focused on reading, writing, and parsing data from various file formats. It introduced pandas
for structured tabular data and numpy
for numerical arrays and operations. The content builds core data handling skills needed for cleaning and exploring real-world datasets.
This module introduced the core building blocks of Python, including data types, expressions, variables, and string manipulation. It also covered how to run and document Python code using Jupyter Notebooks. The content serves as a general-purpose foundation for later data work.
File handling with open()
:
Modes:
'r'
,'w'
,'a'
,'r+'
,'a+'
with open(...)
automatically closes file.readline()
,.readlines()
,.read(n)
,.seek()
,.tell()
Writing lines: loop with
.write()
pandas basics:
import pandas as pd
Read files:
pd.read_csv()
,pd.read_excel()
Create DataFrame from dict:
pd.DataFrame({...})
Access data:
df['col']
,df[['col1', 'col2']]
,.iloc[]
,.loc[]
Filter with condition:
df[df['col'] > value]
Use
.unique()
to get unique values
NumPy:
import numpy as np
Arrays:
np.array([1, 2, 3])
Shape info:
.ndim
,.shape
,.size
Element-wise math:
a + b
,a * 2
Dot product:
np.dot(a, b)
Universal functions:
.mean()
,.max()
,.min()
np.linspace(start, stop, num)
for evenly spaced values
Module 5 – REST APIs, Web Scraping, and Working with Files
The final module shifted into external data acquisition using web technologies. It covered REST API calls with requests
, HTML parsing with BeautifulSoup
, and handling of formats like JSON and XML. These tools are essential for collecting data from dynamic online sources and integrating them into a Python-based workflow.
REST APIs:
APIs allow access to external systems or data
HTTP methods:
GET
,POST
,PUT
,DELETE
URL structure: scheme (
http://
), base, routeParameters passed with
params={...}
Headers provide metadata (e.g.
Authorization
)JSON response accessed with
.json()
Status codes:
200
success400
client error500
server error
Example:
import requests
url = "http://httpbin.org/get"
payload = {"name": "Joseph", "ID": "123"}
r = requests.get(url, params=payload)
data = r.json()
Web Scraping:
Use
requests
to get HTMLUse
BeautifulSoup
to parse HTMLHTML structure: tags (
<html>
,<head>
,<body>
,<tr>
,<td>
)Use
.find()
,.find_all()
,.parent
,.children
to navigateExtract attributes with
.attrs
, text with.text
Example:
from bs4 import BeautifulSoup
html = requests.get("https://example.com").text
soup = BeautifulSoup(html, "html.parser")
rows = soup.find_all("tr")
File Formats:
CSV:
use pandas.read_csv()
JSON:
import json
with open("file.json") as f:
data = json.load(f)
XML:
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
HTML tables:
use pandas.read_html()
Final Thoughts
This course provided a well-rounded introduction to Python with a focus on practical skills used in data science. Topics ranged from syntax and control flow to file parsing, DataFrame operations, and programmatic data collection. The final module, which covered APIs, HTML scraping, and format handling, introduced tools commonly used in real-world data pipelines. These skills are foundational for any further work in analytics, machine learning, or data engineering.
The next step will be a hands-on mini-project that reinforces these concepts, but there are also plans to continue practicing independently—especially around authenticated APIs, structured data scraping, and integration into existing tools and workflows.
Course Certificate: View on Coursera
All notes and opinions are personal interpretations of the IBM Python for Data Science, AI & Development course on Coursera.