FREE Live Master Session: Scratch Game Development for Kids

    Register for Free →
    Back to Programs
    📈Course Curriculum

    Data Visualization & Pandas

    Data Science· Intermediate· Ages 14–17· 28 Hours

    Course At a Glance

    Category

    Data Science

    Level

    Intermediate

    Age Group

    14–17 years

    Prerequisite

    Introduction to Data Analysis

    Duration

    28 Hours

    Modules

    4 Modules

    Program Outcomes

    By the end of this course, students will be able to:

    • 1

      Use pandas and Python to efficiently load, manipulate, and clean structured datasets.

    • 2

      Apply advanced data visualisation techniques to explore trends and relationships in data.

    • 3

      Develop data-driven insights and present findings through clear and informative visualisations.

    Module 1

    Introduction to Pandas & DataFrames

    Students build a deep working knowledge of pandas — the professional standard for data manipulation in Python. Topics progress from Series and DataFrames to advanced selection, groupby, pivot tables, and cross-tabulation.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    1.1Pandas Series & DataFramesReview core pandas data structures. Understand Series as 1D array and DataFrame as 2D table. Explore the index.Build: Create a DataFrame of 10 countries and a Series of GDP values from dicts/lists. Inspect with shape/dtypes.pd.Series(), pd.DataFrame(), .index, .dtype, .shape, .values
    1.2Loading & Inspecting Real DatasetsLoad CSV, Excel, and JSON files into DataFrames. Set column names, skip headers, and parse dates.Guided Exploration: Load datasets in different formats. Print summary statistics and identify columns needing cleaning.pd.read_csv(), pd.read_excel(), pd.read_json(), parse_dates=, .info(), .describe()
    1.3Indexing & Selecting DataMaster selection using [], .loc[], and .iloc[]. Select single cells, slices, and multiple columns.Exercises: Use .loc and .iloc to select specific rows and columns from a 50-row student dataset.df['col'], df[['a','b']], .loc[row, col], .iloc[i, j]
    1.4Boolean Filtering & Compound ConditionsFilter rows using boolean conditions and compound operators (&, |, ~). Use .isin() and .between().Data Detective: Load a cities dataset and filter using multiple progressive conditions (e.g., European cities > 1M population).df[mask], &, |, ~, .isin(), .between(), .str.startswith()
    1.5Sorting & RankingSort with .sort_values(). Add ranks with .rank(). Use .nlargest() and .nsmallest() for quick top-N queries.Build: 'Leaderboard Generator' — sort a video game sales dataset and find the top 3 games per genre..sort_values(), ascending=, .rank(), .nlargest(), .nsmallest()
    1.6Adding, Renaming & Dropping ColumnsAdd computed columns using direct assignment and np.where(). Rename columns, drop unwanted ones, and reorder.Build: Add a pass/fail column to a student dataset using np.where(). Rename and reorder columns.df['new'] = expr, np.where(), .rename(), .drop(), .copy()
    1.7GroupBy & AggregationSplit DataFrames by categorical columns. Apply single/multiple aggregations (.sum, .mean) using .agg().Build: 'Sales Summary Report' — group retail data by category/region to compute total revenue and max order..groupby(), .agg({'col': ['mean','sum']}), named agg, .reset_index()
    1.8Pivot Tables & Cross-TabulationCreate pivot tables to reshape data. Use pd.crosstab() for frequency cross-tabulations and add margins.Analysis: Create a pivot table and crosstab on a Titanic dataset to explore survival rates by class and gender.pd.pivot_table(), pd.crosstab(), values=, aggfunc=, margins=
    Module 2

    Data Cleaning & Preparation

    Students master professional-grade data cleaning techniques: missing value imputation, deduplication, standardisation, type conversion, merging, and reshaping with melt and stack.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    2.1Detecting & Handling Missing ValuesAudit missing data per column. Visualise patterns with sns.heatmap(). Choose to drop or fill values.Audit Lab: Generate a missing data summary for a housing dataset. Visualise and justify a strategy..isnull(), .isnull().sum(), .dropna(), .fillna(), sns.heatmap(df.isnull())
    2.2Filling & Imputing Missing ValuesApply fill strategies (.fillna, .ffill, .bfill). Impute missing values based on group means using .transform().Build: 'Smart Imputer' — fill missing age with median, and missing salary with group mean by job role..fillna(df['col'].mean()), .ffill(), .bfill(), .transform('mean')
    2.3Removing Duplicates & Inconsistent DataRemove duplicate rows. Identify and fix inconsistent categorical values using .replace() and string methods.Clean-Up Sprint: Fix duplicate entries, inconsistent gender labels, and trailing whitespace in a customer dataset..duplicated(), .drop_duplicates(), .replace(), .str.lower(), .str.strip()
    2.4Data Type Conversion & ParsingConvert types with .astype(). Parse datetime strings with pd.to_datetime() and extract datetime components.Build: Parse string prices to float, parse date strings to datetime, and extract month/day for an orders dataset..astype(), pd.to_datetime(), .dt.year, .dt.month_name(), errors='coerce'
    2.5String Operations on Text DataUse the .str accessor for vectorised string operations (.split, .replace, .extract).Text Mining Task: Extract review word counts, star ratings using regex, and count keyword mentions in product reviews..str.contains(), .str.split(), .str.extract(r'pattern'), .str.replace()
    2.6Merging & Joining DataFramesCombine DataFrames using pd.merge() (inner, left, right, outer) and pd.concat() for stacking.Build: 'Student Database Merge' — merge student info, scores, and attendance on student_id. Compare join types.pd.merge(how='inner/left/outer'), on=, pd.concat(axis=0/1)
    2.7Reshaping Data: Melt & StackTransform wide-format to long-format using pd.melt(). Use .stack() and .unstack() to pivot axes.Reshape Challenge: Melt a wide-format student score dataset into long format, group by subject, and compute averages.pd.melt(), id_vars=, value_vars=, .stack(), .unstack()
    2.8Full Cleaning Pipeline ProjectApply end-to-end cleaning to a raw dataset: audit, handle missing, fix types, standardise, and merge.Capstone Clean: Clean a deliberately messy dataset. Produce a documented pipeline and a before/after quality report.Full Module 2 — pandas cleaning pipeline
    Module 3

    Advanced Data Visualisation

    Students create publication-quality and interactive visualisations using pandas plotting, matplotlib, seaborn, and plotly. Includes grouped bars, KDE, pair plots, time series, and multi-panel dashboards.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    3.1Pandas Built-in PlottingUse pandas .plot() as a wrapper for matplotlib to quickly create line, bar, hist, and scatter plots.Quick-Plot Session: Create 4 distinct charts from an economic dataset using only df.plot() with titles.df.plot(kind='bar/line/hist/box'), df['col'].plot(), title=, xlabel=
    3.2Advanced Bar & Line ChartsCreate grouped and stacked bar charts. Plot multi-line time series and add secondary y-axes.Build: 'Grouped Comparison Chart' — plot grouped/stacked bars for multi-year sales data. Plot a multi-line time series.df.plot(kind='bar', stacked=True), ax.twinx(), plt.legend()
    3.3Distribution Plots: Histograms, KDE & Box PlotsOverlay KDE curves on histograms. Create box plots and violin plots to show medians, IQRs, and outliers.Distribution Analysis: Plot overlapping histograms, KDE comparisons, box plots, and violin plots for exam scores.sns.histplot(kde=True), sns.boxplot(), sns.violinplot(), hue=, alpha=
    3.4Scatter Plots, Pair Plots & CorrelationCreate scatter plots with hue/size mapping. Add regression lines. Use pairplots and correlation heatmaps.Correlation Exploration: Visualise the Iris dataset using scatter plots, sns.pairplot(), and a correlation heatmap.sns.scatterplot(hue=, size=), sns.regplot(), sns.pairplot(), .corr(), sns.heatmap(annot=True)
    3.5Time Series VisualisationSet datetime indices. Resample data by month/year and compute rolling window moving averages.Build: 'Trend Chart' — plot raw daily values, a 7-day rolling average, and resampled monthly totals with annotations..set_index(), .resample('M').sum(), .rolling(7).mean(), plt.annotate()
    3.6Categorical Plots & Count ChartsCreate count plots, barplots with confidence intervals, and swarm plots for categorical frequency data.Build: 'Survey Data Visualiser' — create count plots, bar plots, and swarm plots on a categorical survey dataset.sns.countplot(), sns.barplot(), sns.swarmplot(), order=, palette=
    3.7Multi-Panel Dashboards & AnnotationDesign multi-panel figures using plt.subplots(). Add figure-level annotations and share axes cleanly.Build: 'Analytical Dashboard' — combine 6 different charts into a single styled dashboard figure with annotations.plt.subplots(2,3), sharex=, tight_layout(), ax.annotate(), plt.suptitle()
    3.8Interactive Visualisation with PlotlyCreate interactive bar, scatter, and choropleth (map) charts with hover and zoom functionality.Build: Recreate charts using plotly.express and export as interactive HTML files, including a world choropleth map.import plotly.express as px, px.scatter(), px.choropleth(), .write_html()
    Module 4

    Data Project / Mini Capstone

    Students apply all skills to a self-chosen real-world dataset. The full pipeline — loading, cleaning, EDA, visualisation, interpretation, and presentation — is completed as a data story.

    Approx. 7 hrs
    #Lesson TitleWhat Students LearnActivity / ProjectKey Methods / Libraries
    4.1Capstone Briefing & Dataset SelectionChoose a real-world dataset and define a specific analytical question for the capstone data story.Dataset Audit: Browse 4 provided datasets, run .info(), and write a 1-paragraph project brief stating the central question.pd.read_csv(), .info(), .describe(), project brief
    4.2Analysis PlanningTranslate the central question into 4–6 sub-questions mapped to specific pandas operations and chart types.Planning Deliverable: Submit a detailed Analysis Plan mapping questions to pandas/seaborn methods.Analysis planning, chart-type mapping, cleaning checklist
    4.3Data Loading & Full Cleaning PipelineExecute a complete cleaning pipeline (missing data, types, text standardisation, duplicates) on the dataset.Build Sprint: Output a data quality report (before vs after) and save the clean dataset as a CSV..isnull(), .dropna(), .fillna(), .astype(), pd.to_datetime(), .drop_duplicates()
    4.4Exploratory Data AnalysisAnswer sub-questions using pandas filtering, groupby, and pivoting. Identify interesting patterns.Build Sprint: Write pandas code to answer each sub-question. Add a one-sentence interpretation for each output..groupby(), .agg(), .sort_values(), .corr(), pd.pivot_table()
    4.5Visualisation BuildCreate one carefully designed chart per sub-question. Combine them into a polished dashboard figure.Build Sprint: Produce all visualisations with correct labels and annotations, combining them into a final dashboard.sns, plt.subplots(), plt.annotate(), plt.savefig(), px (plotly)
    4.6Data Story & Written FindingsWrite a structured findings narrative (introduction, methodology, findings, conclusion, limitations).Report Writing: Write the data story referencing specific statistics from the charts. Peer-review for clarity.Data storytelling, insight writing, referencing statistics
    4.7Presentation Preparation & RehearsalStructure a 6-minute presentation: overview, cleaning, key findings, and conclusion.Dress Rehearsal: Deliver a timed practice presentation. Refine chart narration based on teacher feedback.Presentation structure, data storytelling, chart narration
    4.8Final Capstone Presentation DayDeliver the completed data project presentation, narrating the charts and handling Q&A.Final Presentation: 6-minute live data project presentation. Assessed on cleaning, depth, visuals, and insight.Full course — Pandas & Data Visualisation

    Teaching Notes & Tips

    Pacing Guidance

    Each module contains 8 lessons of approximately 50–60 minutes, totalling ~28 hours. Module 2 (Cleaning) lessons 2.6 (merging) and 2.7 (reshaping) often need an extra 15–20 minutes. Module 4 runs as project sprints.

    Differentiation

    Advanced students can explore: pandas method chaining, pd.qcut(), advanced plotly Dash for interactive dashboards, or scikit-learn for basic regression. Core focus should be placed on fundamental pandas cleaning operations.

    Assessment Criteria

    Capstone assessed on: (1) Cleaning Quality — complete, documented pipeline. (2) Analysis Depth. (3) Visualisation Quality — proper formatting, annotations. (4) Interpretation — data-driven insights. (5) Presentation.

    Tools & Environment

    Recommended: JupyterLab (via Anaconda) for inline chart rendering and documentation. Required libraries: pandas, numpy, matplotlib, seaborn, plotly. Python 3.9+. Students should navigate Jupyter seamlessly.

    Suggested Capstone Datasets

    World Happiness Report (Kaggle), Netflix Titles (Kaggle), Airbnb Listings, Global CO₂ Emissions, Olympic Games History, Spotify Top Songs.

    Prior Knowledge Expected

    Students should be comfortable with: pandas basics (.read_csv, .head), matplotlib chart creation, Python functions, and loops. Completion of Introduction to Data Analysis is strongly recommended.

    Data Visualization & Pandas · Intermediate · Ages 14–17 · © Course Curriculum

    Enroll Your Child Now