Data Visualization & Pandas
Course At a Glance
Category
Data Science
Level
Intermediate
Age Group
14–17 years
Prerequisite
Introduction to Data Analysis
Duration
28 Hours
Modules
4 Modules
Program Outcomes
By the end of this course, students will be able to:
- 1
Use pandas and Python to efficiently load, manipulate, and clean structured datasets.
- 2
Apply advanced data visualisation techniques to explore trends and relationships in data.
- 3
Develop data-driven insights and present findings through clear and informative visualisations.
Introduction to Pandas & DataFrames
Students build a deep working knowledge of pandas — the professional standard for data manipulation in Python. Topics progress from Series and DataFrames to advanced selection, groupby, pivot tables, and cross-tabulation.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 1.1 | Pandas Series & DataFrames | Review core pandas data structures. Understand Series as 1D array and DataFrame as 2D table. Explore the index. | Build: Create a DataFrame of 10 countries and a Series of GDP values from dicts/lists. Inspect with shape/dtypes. | pd.Series(), pd.DataFrame(), .index, .dtype, .shape, .values |
| 1.2 | Loading & Inspecting Real Datasets | Load CSV, Excel, and JSON files into DataFrames. Set column names, skip headers, and parse dates. | Guided Exploration: Load datasets in different formats. Print summary statistics and identify columns needing cleaning. | pd.read_csv(), pd.read_excel(), pd.read_json(), parse_dates=, .info(), .describe() |
| 1.3 | Indexing & Selecting Data | Master selection using [], .loc[], and .iloc[]. Select single cells, slices, and multiple columns. | Exercises: Use .loc and .iloc to select specific rows and columns from a 50-row student dataset. | df['col'], df[['a','b']], .loc[row, col], .iloc[i, j] |
| 1.4 | Boolean Filtering & Compound Conditions | Filter rows using boolean conditions and compound operators (&, |, ~). Use .isin() and .between(). | Data Detective: Load a cities dataset and filter using multiple progressive conditions (e.g., European cities > 1M population). | df[mask], &, |, ~, .isin(), .between(), .str.startswith() |
| 1.5 | Sorting & Ranking | Sort with .sort_values(). Add ranks with .rank(). Use .nlargest() and .nsmallest() for quick top-N queries. | Build: 'Leaderboard Generator' — sort a video game sales dataset and find the top 3 games per genre. | .sort_values(), ascending=, .rank(), .nlargest(), .nsmallest() |
| 1.6 | Adding, Renaming & Dropping Columns | Add computed columns using direct assignment and np.where(). Rename columns, drop unwanted ones, and reorder. | Build: Add a pass/fail column to a student dataset using np.where(). Rename and reorder columns. | df['new'] = expr, np.where(), .rename(), .drop(), .copy() |
| 1.7 | GroupBy & Aggregation | Split DataFrames by categorical columns. Apply single/multiple aggregations (.sum, .mean) using .agg(). | Build: 'Sales Summary Report' — group retail data by category/region to compute total revenue and max order. | .groupby(), .agg({'col': ['mean','sum']}), named agg, .reset_index() |
| 1.8 | Pivot Tables & Cross-Tabulation | Create pivot tables to reshape data. Use pd.crosstab() for frequency cross-tabulations and add margins. | Analysis: Create a pivot table and crosstab on a Titanic dataset to explore survival rates by class and gender. | pd.pivot_table(), pd.crosstab(), values=, aggfunc=, margins= |
Data Cleaning & Preparation
Students master professional-grade data cleaning techniques: missing value imputation, deduplication, standardisation, type conversion, merging, and reshaping with melt and stack.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 2.1 | Detecting & Handling Missing Values | Audit missing data per column. Visualise patterns with sns.heatmap(). Choose to drop or fill values. | Audit Lab: Generate a missing data summary for a housing dataset. Visualise and justify a strategy. | .isnull(), .isnull().sum(), .dropna(), .fillna(), sns.heatmap(df.isnull()) |
| 2.2 | Filling & Imputing Missing Values | Apply fill strategies (.fillna, .ffill, .bfill). Impute missing values based on group means using .transform(). | Build: 'Smart Imputer' — fill missing age with median, and missing salary with group mean by job role. | .fillna(df['col'].mean()), .ffill(), .bfill(), .transform('mean') |
| 2.3 | Removing Duplicates & Inconsistent Data | Remove duplicate rows. Identify and fix inconsistent categorical values using .replace() and string methods. | Clean-Up Sprint: Fix duplicate entries, inconsistent gender labels, and trailing whitespace in a customer dataset. | .duplicated(), .drop_duplicates(), .replace(), .str.lower(), .str.strip() |
| 2.4 | Data Type Conversion & Parsing | Convert types with .astype(). Parse datetime strings with pd.to_datetime() and extract datetime components. | Build: Parse string prices to float, parse date strings to datetime, and extract month/day for an orders dataset. | .astype(), pd.to_datetime(), .dt.year, .dt.month_name(), errors='coerce' |
| 2.5 | String Operations on Text Data | Use the .str accessor for vectorised string operations (.split, .replace, .extract). | Text Mining Task: Extract review word counts, star ratings using regex, and count keyword mentions in product reviews. | .str.contains(), .str.split(), .str.extract(r'pattern'), .str.replace() |
| 2.6 | Merging & Joining DataFrames | Combine DataFrames using pd.merge() (inner, left, right, outer) and pd.concat() for stacking. | Build: 'Student Database Merge' — merge student info, scores, and attendance on student_id. Compare join types. | pd.merge(how='inner/left/outer'), on=, pd.concat(axis=0/1) |
| 2.7 | Reshaping Data: Melt & Stack | Transform wide-format to long-format using pd.melt(). Use .stack() and .unstack() to pivot axes. | Reshape Challenge: Melt a wide-format student score dataset into long format, group by subject, and compute averages. | pd.melt(), id_vars=, value_vars=, .stack(), .unstack() |
| 2.8 | Full Cleaning Pipeline Project | Apply end-to-end cleaning to a raw dataset: audit, handle missing, fix types, standardise, and merge. | Capstone Clean: Clean a deliberately messy dataset. Produce a documented pipeline and a before/after quality report. | Full Module 2 — pandas cleaning pipeline |
Advanced Data Visualisation
Students create publication-quality and interactive visualisations using pandas plotting, matplotlib, seaborn, and plotly. Includes grouped bars, KDE, pair plots, time series, and multi-panel dashboards.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 3.1 | Pandas Built-in Plotting | Use pandas .plot() as a wrapper for matplotlib to quickly create line, bar, hist, and scatter plots. | Quick-Plot Session: Create 4 distinct charts from an economic dataset using only df.plot() with titles. | df.plot(kind='bar/line/hist/box'), df['col'].plot(), title=, xlabel= |
| 3.2 | Advanced Bar & Line Charts | Create grouped and stacked bar charts. Plot multi-line time series and add secondary y-axes. | Build: 'Grouped Comparison Chart' — plot grouped/stacked bars for multi-year sales data. Plot a multi-line time series. | df.plot(kind='bar', stacked=True), ax.twinx(), plt.legend() |
| 3.3 | Distribution Plots: Histograms, KDE & Box Plots | Overlay KDE curves on histograms. Create box plots and violin plots to show medians, IQRs, and outliers. | Distribution Analysis: Plot overlapping histograms, KDE comparisons, box plots, and violin plots for exam scores. | sns.histplot(kde=True), sns.boxplot(), sns.violinplot(), hue=, alpha= |
| 3.4 | Scatter Plots, Pair Plots & Correlation | Create scatter plots with hue/size mapping. Add regression lines. Use pairplots and correlation heatmaps. | Correlation Exploration: Visualise the Iris dataset using scatter plots, sns.pairplot(), and a correlation heatmap. | sns.scatterplot(hue=, size=), sns.regplot(), sns.pairplot(), .corr(), sns.heatmap(annot=True) |
| 3.5 | Time Series Visualisation | Set datetime indices. Resample data by month/year and compute rolling window moving averages. | Build: 'Trend Chart' — plot raw daily values, a 7-day rolling average, and resampled monthly totals with annotations. | .set_index(), .resample('M').sum(), .rolling(7).mean(), plt.annotate() |
| 3.6 | Categorical Plots & Count Charts | Create count plots, barplots with confidence intervals, and swarm plots for categorical frequency data. | Build: 'Survey Data Visualiser' — create count plots, bar plots, and swarm plots on a categorical survey dataset. | sns.countplot(), sns.barplot(), sns.swarmplot(), order=, palette= |
| 3.7 | Multi-Panel Dashboards & Annotation | Design multi-panel figures using plt.subplots(). Add figure-level annotations and share axes cleanly. | Build: 'Analytical Dashboard' — combine 6 different charts into a single styled dashboard figure with annotations. | plt.subplots(2,3), sharex=, tight_layout(), ax.annotate(), plt.suptitle() |
| 3.8 | Interactive Visualisation with Plotly | Create interactive bar, scatter, and choropleth (map) charts with hover and zoom functionality. | Build: Recreate charts using plotly.express and export as interactive HTML files, including a world choropleth map. | import plotly.express as px, px.scatter(), px.choropleth(), .write_html() |
Data Project / Mini Capstone
Students apply all skills to a self-chosen real-world dataset. The full pipeline — loading, cleaning, EDA, visualisation, interpretation, and presentation — is completed as a data story.
| # | Lesson Title | What Students Learn | Activity / Project | Key Methods / Libraries |
|---|---|---|---|---|
| 4.1 | Capstone Briefing & Dataset Selection | Choose a real-world dataset and define a specific analytical question for the capstone data story. | Dataset Audit: Browse 4 provided datasets, run .info(), and write a 1-paragraph project brief stating the central question. | pd.read_csv(), .info(), .describe(), project brief |
| 4.2 | Analysis Planning | Translate the central question into 4–6 sub-questions mapped to specific pandas operations and chart types. | Planning Deliverable: Submit a detailed Analysis Plan mapping questions to pandas/seaborn methods. | Analysis planning, chart-type mapping, cleaning checklist |
| 4.3 | Data Loading & Full Cleaning Pipeline | Execute a complete cleaning pipeline (missing data, types, text standardisation, duplicates) on the dataset. | Build Sprint: Output a data quality report (before vs after) and save the clean dataset as a CSV. | .isnull(), .dropna(), .fillna(), .astype(), pd.to_datetime(), .drop_duplicates() |
| 4.4 | Exploratory Data Analysis | Answer sub-questions using pandas filtering, groupby, and pivoting. Identify interesting patterns. | Build Sprint: Write pandas code to answer each sub-question. Add a one-sentence interpretation for each output. | .groupby(), .agg(), .sort_values(), .corr(), pd.pivot_table() |
| 4.5 | Visualisation Build | Create one carefully designed chart per sub-question. Combine them into a polished dashboard figure. | Build Sprint: Produce all visualisations with correct labels and annotations, combining them into a final dashboard. | sns, plt.subplots(), plt.annotate(), plt.savefig(), px (plotly) |
| 4.6 | Data Story & Written Findings | Write a structured findings narrative (introduction, methodology, findings, conclusion, limitations). | Report Writing: Write the data story referencing specific statistics from the charts. Peer-review for clarity. | Data storytelling, insight writing, referencing statistics |
| 4.7 | Presentation Preparation & Rehearsal | Structure a 6-minute presentation: overview, cleaning, key findings, and conclusion. | Dress Rehearsal: Deliver a timed practice presentation. Refine chart narration based on teacher feedback. | Presentation structure, data storytelling, chart narration |
| 4.8 | Final Capstone Presentation Day | Deliver the completed data project presentation, narrating the charts and handling Q&A. | Final Presentation: 6-minute live data project presentation. Assessed on cleaning, depth, visuals, and insight. | Full course — Pandas & Data Visualisation |
Teaching Notes & Tips
Pacing Guidance
Each module contains 8 lessons of approximately 50–60 minutes, totalling ~28 hours. Module 2 (Cleaning) lessons 2.6 (merging) and 2.7 (reshaping) often need an extra 15–20 minutes. Module 4 runs as project sprints.
Differentiation
Advanced students can explore: pandas method chaining, pd.qcut(), advanced plotly Dash for interactive dashboards, or scikit-learn for basic regression. Core focus should be placed on fundamental pandas cleaning operations.
Assessment Criteria
Capstone assessed on: (1) Cleaning Quality — complete, documented pipeline. (2) Analysis Depth. (3) Visualisation Quality — proper formatting, annotations. (4) Interpretation — data-driven insights. (5) Presentation.
Tools & Environment
Recommended: JupyterLab (via Anaconda) for inline chart rendering and documentation. Required libraries: pandas, numpy, matplotlib, seaborn, plotly. Python 3.9+. Students should navigate Jupyter seamlessly.
Suggested Capstone Datasets
World Happiness Report (Kaggle), Netflix Titles (Kaggle), Airbnb Listings, Global CO₂ Emissions, Olympic Games History, Spotify Top Songs.
Prior Knowledge Expected
Students should be comfortable with: pandas basics (.read_csv, .head), matplotlib chart creation, Python functions, and loops. Completion of Introduction to Data Analysis is strongly recommended.
Data Visualization & Pandas · Intermediate · Ages 14–17 · © Course Curriculum
Enroll Your Child Now