📊Course Curriculum

Introduction to Data Analysis

Data Science· Beginner· Ages 14–17· 30 Hours

Course At a Glance

Program Outcomes

By the end of this course, students will be able to:

1
Understand basic concepts of data and how to analyse datasets using Python.
2
Perform simple data manipulation and analysis using Python lists, dictionaries, NumPy, and pandas.
3
Visualise data using charts to identify patterns and present findings clearly.

Module 1

Introduction to Data & Python Review

Students learn what data is and how it is used in the real world. A structured Python review (lists, dicts, functions, CSV files) builds readiness for data analysis. NumPy is introduced as the numerical foundation for data work.

Approx. 8 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Tools / Concepts
1.1	What is Data & Why Does It Matter?	Define data and explore how it is used. Distinguish between quantitative and qualitative data. Understand the data analysis pipeline: Collect → Clean → Analyse → Visualise → Interpret.	Discussion & Exploration: Find 3 real-world datasets online. Identify data types and questions they could answer.	`Data types: int, float, str, bool, categorical`
1.2	Python Review: Variables & Data Types	Refresh variables, strings, integers, floats, booleans, and type conversion. Use type() and f-strings. Perform basic calculations.	Warm-Up Exercises: Calculate total, average, highest, and lowest scores manually using Python arithmetic. Print formatted results.	`int, float, str, type(), round(), f-strings`
1.3	Python Review: Lists & Loops	Revisit lists. Create, index, slice, and iterate over lists with for loops. Apply built-in functions: sum(), len(), min(), max(), sorted().	Build: 'Class Score Analyser' — compute total, count, average, min, and max of a list using functions.	`[], sum(), len(), min(), max(), sorted(), for loop`
1.4	Python Review: Dictionaries & Functions	Revisit dictionaries (key: value pairs) for row representation. Write functions to organise code into reusable tools.	Build: 'Student Record Store' — store records in a list of dicts. Write functions: get_average(), get_top_student(), and print_report().	`dict, list of dicts, def, return, .items()`
1.5	Loading & Inspecting Data from CSV Files	Introduce CSV format. Use Python's csv module to load a dataset into a list of dictionaries and inspect the data.	Guided Exercise: Load 'students.csv'. Print row count, column names, first 5 rows, and data types.	`import csv, csv.DictReader, list(), keys()`
1.6	Data Quality: Missing & Inconsistent Data	Understand 'dirty data': missing values, formatting issues, duplicates, outliers. Learn basic cleaning strategies.	Clean-Up Lab: Write a Python script to read a messy CSV, fix formatting, skip missing rows, and remove duplicates.	`.strip(), .lower(), if val != '', set() for duplicates`
1.7	Introduction to NumPy	Install/import NumPy. Understand NumPy arrays. Use vectorised arithmetic and compute statistics like mean and standard deviation.	Exercises: Convert a list to a NumPy array. Compute mean, median, standard deviation, and weighted average.	`import numpy as np, np.array(), np.mean(), np.std()`
1.8	Module 1 Review & Mini Challenge	Consolidate Module 1 skills: Python basics, CSV loading, data cleaning, and NumPy basics.	Mini Challenge: Load a real CSV dataset, clean it, compute stats with NumPy, and print a summary.	`Full Module 1 — csv, numpy, cleaning, stats`

Module 2

Working with Data in Python

Students learn to manipulate and analyse datasets using descriptive statistics, filtering, grouping, and pandas DataFrames. The module progresses from pure Python techniques to professional-grade pandas operations.

Approx. 8 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Tools / Concepts
2.1	Descriptive Statistics	Understand measures of centre (mean, median, mode) and spread (range, variance, std dev). Know when to use mean vs. median.	Analysis Task: Compute mean and median for two datasets (one with outliers). Explain differences.	`np.mean(), np.median(), statistics.mode(), np.std(), np.var()`
2.2	Filtering & Sorting Data	Filter a dataset using list comprehension and boolean indexing. Sort lists of dictionaries using sorted() with a lambda.	Build: 'Data Detective' — filter students by score, sort descending, and answer questions.	`[x for x in data if cond], sorted(key=lambda), np boolean indexing`
2.3	Grouping & Aggregating Data	Group data by category and compute aggregate stats. Build a frequency counter and introduce split-apply-combine.	Build: 'Subject Analyser' — group student records by subject to compute average, min, and max scores.	`dict grouping, defaultdict, frequency counter, group aggregation`
2.4	Introduction to Pandas	Understand pandas DataFrames. Load CSVs with pd.read_csv() and inspect with .head(), .info(), and .describe().	Guided Exploration: Load a dataset, run inspection methods, filter rows, and select columns.	`import pandas as pd, pd.read_csv(), .head(), .describe(), .loc[], .iloc[]`
2.5	Pandas: Filtering, Sorting & Adding Columns	Apply boolean filtering (&, \|), sort with .sort_values(), and add new computed columns.	Build: 'Grade Report Generator' — filter passing students, add a grade band column, and save using .to_csv().	`df[df['col'] > val], &, \|, .sort_values(), df['new'] = expr, .to_csv()`
2.6	Pandas: Grouping & Aggregation	Use .groupby() to split a DataFrame and compute statistics with .agg(), .mean(), and .sum().	Build: 'Department Summary Report' — group an employee dataset by department and calculate averages.	`.groupby(), .agg(), .reset_index(), .mean(), .count()`
2.7	Handling Missing Data & Data Types	Detect missing values. Decide whether to drop (.dropna()) or fill (.fillna()). Convert data types and rename columns.	Clean-Up Project: Given a messy dataset, write a full cleaning pipeline (detect, handle missing, fix types, rename).	`.isnull(), .dropna(), .fillna(), .astype(), .rename()`
2.8	Exploratory Data Analysis (EDA) Mini Project	Apply all Module 2 skills to perform a complete Exploratory Data Analysis on a real-world dataset.	EDA Project: Load, inspect, clean, group, and compute statistics for a chosen dataset. Write a summary of 3 findings.	`Full Module 2 — pandas EDA workflow`

Module 3

Data Visualisation Basics

Students learn to create, customise, and interpret data visualisations using matplotlib and seaborn. Chart types covered include line, bar, pie, histogram, scatter, and heatmap.

Approx. 8 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Tools / Concepts
3.1	Introduction to Matplotlib	Understand the anatomy of a matplotlib figure. Create, customise, and save basic line charts.	Build: 'Temperature Trend Chart' — plot a week of daily temperatures as a line chart with labels and a grid.	`import matplotlib.pyplot as plt, plt.plot(), plt.title(), plt.savefig()`
3.2	Bar Charts	Create vertical and horizontal bar charts. Customise colours and add data labels using plt.text().	Build: 'Subject Average Scores' — plot vertical bar charts for scores and a horizontal chart for top countries.	`plt.bar(), plt.barh(), plt.text(), color, edgecolor, width`
3.3	Pie Charts & Donut Charts	Create pie and donut charts. Understand when to use part-to-whole charts and how to explode slices.	Build: 'Grade Distribution Pie Chart' — visualise grade bands and explode a slice. Convert to a donut chart.	`plt.pie(), labels, autopct, explode, wedgeprops (donut)`
3.4	Histograms & Distributions	Create histograms with plt.hist(). Choose bins and understand skewness, spread, and outliers.	Build: 'Score Distribution' — plot a histogram with 10 bins. Compare two classes using transparency.	`plt.hist(), bins, alpha, edgecolor, density`
3.5	Scatter Plots & Correlation	Create scatter plots to explore relationships between two numerical variables. Add a linear trend line.	Build: 'Study Hours vs Score' — scatter plot study hours against test scores. Add a linear trend line.	`plt.scatter(), s, c, alpha, np.polyfit(), np.poly1d()`
3.6	Multiple Plots & Subplots	Create dashboard-style figures with plt.subplots(). Arrange charts in rows and columns with a shared title.	Build: 'Data Dashboard' — a 2x2 subplot figure showing a bar chart, histogram, pie chart, and scatter plot.	`plt.subplots(), fig, ax, plt.suptitle(), figsize, tight_layout()`
3.7	Customisation & Styling	Apply matplotlib style sheets. Customise tick labels, add annotations, and set axis limits.	Style Sprint: Apply 'seaborn-v0_8' to the Data Dashboard. Add descriptive annotations and rotate labels.	`plt.style.use(), plt.annotate(), plt.xticks(rotation=), cmap`
3.8	Introduction to Seaborn	Install and import seaborn. Use seaborn for cleaner defaults and statistical charts like histplots and heatmaps.	Comparison Exercise: Recreate matplotlib charts using seaborn. Create a correlation heatmap.	`import seaborn as sns, sns.barplot(), sns.histplot(), sns.heatmap(), hue=`

Module 4

Mini Data Project

Students apply all skills to a self-chosen real-world dataset. The full data analysis pipeline — loading, cleaning, exploring, visualising, and interpreting — is completed and presented.

Approx. 6 hrs

#	Lesson Title	What Students Learn	Activity / Project	Key Tools / Concepts
4.1	Project Briefing & Dataset Selection	Choose a real-world dataset. Explore datasets briefly with .head() and .describe() to ensure viability.	Dataset Exploration: Load 4 provided datasets, explore, select one, and write 2 questions it could answer.	`pd.read_csv(), .head(), .describe(), .info()`
4.2	Project Planning & Questions	Write 3–5 specific analysis questions. Plan pandas operations and chart types for each question.	Planning Deliverable: Complete a Project Plan Sheet specifying dataset, questions, analysis steps, and charts.	`Planning — analysis questions, chart type mapping`
4.3	Data Loading & Cleaning	Perform a full data quality audit and apply a cleaning pipeline (handle NaNs, fix types, drop duplicates).	Build Sprint: Clean dataset. Print 'before and after' comparisons and save 'clean_data.csv'.	`.isnull().sum(), .dropna(), .fillna(), .astype(), .drop_duplicates()`
4.4	Exploratory Analysis	Answer analysis questions using pandas filtering, grouping, aggregation, and sorting. Identify interesting patterns.	Build Sprint: Run pandas code to answer all questions. Annotate each result with a one-sentence interpretation.	`.groupby(), .agg(), .sort_values(), .value_counts(), .corr()`
4.5	Data Visualisation	Create one chart per question using matplotlib/seaborn. Combine all charts into a multi-panel figure.	Build Sprint: Produce charts, apply a consistent style, and add highlight annotations. Save the dashboard.	`plt.subplots(), plt.savefig(), sns charts, plt.style.use(), plt.annotate()`
4.6	Findings & Written Interpretation	Write a structured analysis report interpreting the visualisations. Practise data storytelling.	Report Writing: Complete a findings template. Write 2–3 sentences interpreting each chart clearly.	`Interpretation, insight writing, data storytelling`
4.7	Presentation Preparation	Structure a 5-minute presentation: dataset intro, questions, findings with charts, and dataset limitations.	Dress Rehearsal: Deliver a timed practice presentation. Teacher provides feedback on clarity and confidence.	`Presentation structure, data storytelling`
4.8	Final Presentation Day	Deliver the completed mini data project presentation, answering Q&A.	Final Presentation: 5-minute live presentation of charts and findings. Assessed on analysis depth, chart clarity, and interpretation.	`Full course — Data Analysis pipeline`

Teaching Notes & Tips

Pacing Guidance

Each module contains 8 lessons of approximately 50–60 minutes, plus a shorter Module 4 (6 hrs). Lessons 1.7 (NumPy) and 2.4 (pandas intro) often need extra time. Module 4 runs as project sprints.

Differentiation

Advanced students can explore: pandas .pivot_table(), time-series analysis with pd.to_datetime(), interactive charts with plotly, or basic linear regression with scikit-learn. Students needing support should focus on core pandas before seaborn.

Assessment Criteria

Mini project assessed on: (1) Data Loading & Cleaning. (2) Analysis Depth. (3) Visualisation Quality. (4) Interpretation in plain English. (5) Presentation Confidence.

Tools & Environment

Recommended: Jupyter Notebook or JupyterLab (via Anaconda). Alternatively, VS Code with Jupyter extension. Required libraries: numpy, pandas, matplotlib, seaborn. Python 3.9+ recommended.

Suggested Datasets (Module 4)

World Happiness Report (Kaggle), Video Game Sales (Kaggle), Titanic Passengers, Premier League Results, COVID-19 Statistics, Student Performance Dataset.

Prior Knowledge Expected

Students must be confident with: Python variables, lists, for loops, dictionaries, writing functions, and reading files. Students who have completed Python Fundamentals are well prepared.

Enroll Your Child Now