Introduction to Data Analysis
Course At a Glance
Category
Data Science
Level
Beginner
Age Group
14–17 years
Prerequisite
Basic Python Knowledge
Duration
30 Hours
Modules
4 Modules
Program Outcomes
By the end of this course, students will be able to:
- 1
Understand basic concepts of data and how to analyse datasets using Python.
- 2
Perform simple data manipulation and analysis using Python lists, dictionaries, NumPy, and pandas.
- 3
Visualise data using charts to identify patterns and present findings clearly.
Introduction to Data & Python Review
Students learn what data is and how it is used in the real world. A structured Python review (lists, dicts, functions, CSV files) builds readiness for data analysis. NumPy is introduced as the numerical foundation for data work.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 1.1 | What is Data & Why Does It Matter? | Define data and explore how it is used. Distinguish between quantitative and qualitative data. Understand the data analysis pipeline: Collect → Clean → Analyse → Visualise → Interpret. | Discussion & Exploration: Find 3 real-world datasets online. Identify data types and questions they could answer. | Data types: int, float, str, bool, categorical |
| 1.2 | Python Review: Variables & Data Types | Refresh variables, strings, integers, floats, booleans, and type conversion. Use type() and f-strings. Perform basic calculations. | Warm-Up Exercises: Calculate total, average, highest, and lowest scores manually using Python arithmetic. Print formatted results. | int, float, str, type(), round(), f-strings |
| 1.3 | Python Review: Lists & Loops | Revisit lists. Create, index, slice, and iterate over lists with for loops. Apply built-in functions: sum(), len(), min(), max(), sorted(). | Build: 'Class Score Analyser' — compute total, count, average, min, and max of a list using functions. | [], sum(), len(), min(), max(), sorted(), for loop |
| 1.4 | Python Review: Dictionaries & Functions | Revisit dictionaries (key: value pairs) for row representation. Write functions to organise code into reusable tools. | Build: 'Student Record Store' — store records in a list of dicts. Write functions: get_average(), get_top_student(), and print_report(). | dict, list of dicts, def, return, .items() |
| 1.5 | Loading & Inspecting Data from CSV Files | Introduce CSV format. Use Python's csv module to load a dataset into a list of dictionaries and inspect the data. | Guided Exercise: Load 'students.csv'. Print row count, column names, first 5 rows, and data types. | import csv, csv.DictReader, list(), keys() |
| 1.6 | Data Quality: Missing & Inconsistent Data | Understand 'dirty data': missing values, formatting issues, duplicates, outliers. Learn basic cleaning strategies. | Clean-Up Lab: Write a Python script to read a messy CSV, fix formatting, skip missing rows, and remove duplicates. | .strip(), .lower(), if val != '', set() for duplicates |
| 1.7 | Introduction to NumPy | Install/import NumPy. Understand NumPy arrays. Use vectorised arithmetic and compute statistics like mean and standard deviation. | Exercises: Convert a list to a NumPy array. Compute mean, median, standard deviation, and weighted average. | import numpy as np, np.array(), np.mean(), np.std() |
| 1.8 | Module 1 Review & Mini Challenge | Consolidate Module 1 skills: Python basics, CSV loading, data cleaning, and NumPy basics. | Mini Challenge: Load a real CSV dataset, clean it, compute stats with NumPy, and print a summary. | Full Module 1 — csv, numpy, cleaning, stats |
Working with Data in Python
Students learn to manipulate and analyse datasets using descriptive statistics, filtering, grouping, and pandas DataFrames. The module progresses from pure Python techniques to professional-grade pandas operations.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 2.1 | Descriptive Statistics | Understand measures of centre (mean, median, mode) and spread (range, variance, std dev). Know when to use mean vs. median. | Analysis Task: Compute mean and median for two datasets (one with outliers). Explain differences. | np.mean(), np.median(), statistics.mode(), np.std(), np.var() |
| 2.2 | Filtering & Sorting Data | Filter a dataset using list comprehension and boolean indexing. Sort lists of dictionaries using sorted() with a lambda. | Build: 'Data Detective' — filter students by score, sort descending, and answer questions. | [x for x in data if cond], sorted(key=lambda), np boolean indexing |
| 2.3 | Grouping & Aggregating Data | Group data by category and compute aggregate stats. Build a frequency counter and introduce split-apply-combine. | Build: 'Subject Analyser' — group student records by subject to compute average, min, and max scores. | dict grouping, defaultdict, frequency counter, group aggregation |
| 2.4 | Introduction to Pandas | Understand pandas DataFrames. Load CSVs with pd.read_csv() and inspect with .head(), .info(), and .describe(). | Guided Exploration: Load a dataset, run inspection methods, filter rows, and select columns. | import pandas as pd, pd.read_csv(), .head(), .describe(), .loc[], .iloc[] |
| 2.5 | Pandas: Filtering, Sorting & Adding Columns | Apply boolean filtering (&, |), sort with .sort_values(), and add new computed columns. | Build: 'Grade Report Generator' — filter passing students, add a grade band column, and save using .to_csv(). | df[df['col'] > val], &, |, .sort_values(), df['new'] = expr, .to_csv() |
| 2.6 | Pandas: Grouping & Aggregation | Use .groupby() to split a DataFrame and compute statistics with .agg(), .mean(), and .sum(). | Build: 'Department Summary Report' — group an employee dataset by department and calculate averages. | .groupby(), .agg(), .reset_index(), .mean(), .count() |
| 2.7 | Handling Missing Data & Data Types | Detect missing values. Decide whether to drop (.dropna()) or fill (.fillna()). Convert data types and rename columns. | Clean-Up Project: Given a messy dataset, write a full cleaning pipeline (detect, handle missing, fix types, rename). | .isnull(), .dropna(), .fillna(), .astype(), .rename() |
| 2.8 | Exploratory Data Analysis (EDA) Mini Project | Apply all Module 2 skills to perform a complete Exploratory Data Analysis on a real-world dataset. | EDA Project: Load, inspect, clean, group, and compute statistics for a chosen dataset. Write a summary of 3 findings. | Full Module 2 — pandas EDA workflow |
Data Visualisation Basics
Students learn to create, customise, and interpret data visualisations using matplotlib and seaborn. Chart types covered include line, bar, pie, histogram, scatter, and heatmap.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 3.1 | Introduction to Matplotlib | Understand the anatomy of a matplotlib figure. Create, customise, and save basic line charts. | Build: 'Temperature Trend Chart' — plot a week of daily temperatures as a line chart with labels and a grid. | import matplotlib.pyplot as plt, plt.plot(), plt.title(), plt.savefig() |
| 3.2 | Bar Charts | Create vertical and horizontal bar charts. Customise colours and add data labels using plt.text(). | Build: 'Subject Average Scores' — plot vertical bar charts for scores and a horizontal chart for top countries. | plt.bar(), plt.barh(), plt.text(), color, edgecolor, width |
| 3.3 | Pie Charts & Donut Charts | Create pie and donut charts. Understand when to use part-to-whole charts and how to explode slices. | Build: 'Grade Distribution Pie Chart' — visualise grade bands and explode a slice. Convert to a donut chart. | plt.pie(), labels, autopct, explode, wedgeprops (donut) |
| 3.4 | Histograms & Distributions | Create histograms with plt.hist(). Choose bins and understand skewness, spread, and outliers. | Build: 'Score Distribution' — plot a histogram with 10 bins. Compare two classes using transparency. | plt.hist(), bins, alpha, edgecolor, density |
| 3.5 | Scatter Plots & Correlation | Create scatter plots to explore relationships between two numerical variables. Add a linear trend line. | Build: 'Study Hours vs Score' — scatter plot study hours against test scores. Add a linear trend line. | plt.scatter(), s, c, alpha, np.polyfit(), np.poly1d() |
| 3.6 | Multiple Plots & Subplots | Create dashboard-style figures with plt.subplots(). Arrange charts in rows and columns with a shared title. | Build: 'Data Dashboard' — a 2x2 subplot figure showing a bar chart, histogram, pie chart, and scatter plot. | plt.subplots(), fig, ax, plt.suptitle(), figsize, tight_layout() |
| 3.7 | Customisation & Styling | Apply matplotlib style sheets. Customise tick labels, add annotations, and set axis limits. | Style Sprint: Apply 'seaborn-v0_8' to the Data Dashboard. Add descriptive annotations and rotate labels. | plt.style.use(), plt.annotate(), plt.xticks(rotation=), cmap |
| 3.8 | Introduction to Seaborn | Install and import seaborn. Use seaborn for cleaner defaults and statistical charts like histplots and heatmaps. | Comparison Exercise: Recreate matplotlib charts using seaborn. Create a correlation heatmap. | import seaborn as sns, sns.barplot(), sns.histplot(), sns.heatmap(), hue= |
Mini Data Project
Students apply all skills to a self-chosen real-world dataset. The full data analysis pipeline — loading, cleaning, exploring, visualising, and interpreting — is completed and presented.
| # | Lesson Title | What Students Learn | Activity / Project | Key Tools / Concepts |
|---|---|---|---|---|
| 4.1 | Project Briefing & Dataset Selection | Choose a real-world dataset. Explore datasets briefly with .head() and .describe() to ensure viability. | Dataset Exploration: Load 4 provided datasets, explore, select one, and write 2 questions it could answer. | pd.read_csv(), .head(), .describe(), .info() |
| 4.2 | Project Planning & Questions | Write 3–5 specific analysis questions. Plan pandas operations and chart types for each question. | Planning Deliverable: Complete a Project Plan Sheet specifying dataset, questions, analysis steps, and charts. | Planning — analysis questions, chart type mapping |
| 4.3 | Data Loading & Cleaning | Perform a full data quality audit and apply a cleaning pipeline (handle NaNs, fix types, drop duplicates). | Build Sprint: Clean dataset. Print 'before and after' comparisons and save 'clean_data.csv'. | .isnull().sum(), .dropna(), .fillna(), .astype(), .drop_duplicates() |
| 4.4 | Exploratory Analysis | Answer analysis questions using pandas filtering, grouping, aggregation, and sorting. Identify interesting patterns. | Build Sprint: Run pandas code to answer all questions. Annotate each result with a one-sentence interpretation. | .groupby(), .agg(), .sort_values(), .value_counts(), .corr() |
| 4.5 | Data Visualisation | Create one chart per question using matplotlib/seaborn. Combine all charts into a multi-panel figure. | Build Sprint: Produce charts, apply a consistent style, and add highlight annotations. Save the dashboard. | plt.subplots(), plt.savefig(), sns charts, plt.style.use(), plt.annotate() |
| 4.6 | Findings & Written Interpretation | Write a structured analysis report interpreting the visualisations. Practise data storytelling. | Report Writing: Complete a findings template. Write 2–3 sentences interpreting each chart clearly. | Interpretation, insight writing, data storytelling |
| 4.7 | Presentation Preparation | Structure a 5-minute presentation: dataset intro, questions, findings with charts, and dataset limitations. | Dress Rehearsal: Deliver a timed practice presentation. Teacher provides feedback on clarity and confidence. | Presentation structure, data storytelling |
| 4.8 | Final Presentation Day | Deliver the completed mini data project presentation, answering Q&A. | Final Presentation: 5-minute live presentation of charts and findings. Assessed on analysis depth, chart clarity, and interpretation. | Full course — Data Analysis pipeline |
Teaching Notes & Tips
Pacing Guidance
Each module contains 8 lessons of approximately 50–60 minutes, plus a shorter Module 4 (6 hrs). Lessons 1.7 (NumPy) and 2.4 (pandas intro) often need extra time. Module 4 runs as project sprints.
Differentiation
Advanced students can explore: pandas .pivot_table(), time-series analysis with pd.to_datetime(), interactive charts with plotly, or basic linear regression with scikit-learn. Students needing support should focus on core pandas before seaborn.
Assessment Criteria
Mini project assessed on: (1) Data Loading & Cleaning. (2) Analysis Depth. (3) Visualisation Quality. (4) Interpretation in plain English. (5) Presentation Confidence.
Tools & Environment
Recommended: Jupyter Notebook or JupyterLab (via Anaconda). Alternatively, VS Code with Jupyter extension. Required libraries: numpy, pandas, matplotlib, seaborn. Python 3.9+ recommended.
Suggested Datasets (Module 4)
World Happiness Report (Kaggle), Video Game Sales (Kaggle), Titanic Passengers, Premier League Results, COVID-19 Statistics, Student Performance Dataset.
Prior Knowledge Expected
Students must be confident with: Python variables, lists, for loops, dictionaries, writing functions, and reading files. Students who have completed Python Fundamentals are well prepared.
Introduction to Data Analysis · Beginner · Ages 14–17 · © Course Curriculum
Enroll Your Child Now