5
Evaluate

Data Analysis Project

Project Overview

"Conduct an original data analysis investigation using real air quality data, applying statistical and visualization methods to answer a research question of your choosing."

Project Options

Option A: Trend Analysis

Analyze 10+ years of data to detect trends.

  • Is air quality improving or worsening?
  • How do trends vary by pollutant?
  • What factors explain observed trends?

Option B: Spatial Analysis

Compare air quality across locations.

  • Urban vs. suburban vs. rural differences
  • Environmental justice analysis
  • Near-road vs. background sites

Option C: Event Analysis

Study air quality during specific events.

  • Wildfire smoke episodes
  • COVID-19 lockdown effects
  • Holiday fireworks impacts

Option D: Predictive Modeling

Build and evaluate forecast models.

  • Next-day concentration prediction
  • AQI category classification
  • Model comparison study

Project Requirements

Minimum Data Requirements

  • At least 1 year of data (or equivalent for spatial analysis)
  • Data from at least 2 sources (e.g., AQS + meteorology, multiple sites)
  • At least 1000 data points total

Analysis Requirements

  • Descriptive statistics (mean, median, percentiles, etc.)
  • At least one inferential statistical test (regression, t-test, etc.)
  • Appropriate handling of missing data

Visualization Requirements

  • At least 4 distinct visualizations
  • Different chart types (not 4 line plots)
  • Professional formatting and labeling

Deliverables

Technical Report (50%)

  1. Introduction and research question
  2. Data description and sources
  3. Methods (analysis approach)
  4. Results with visualizations
  5. Discussion and conclusions
  6. Limitations and future work
  7. References

Presentation (30%)

  • 10-minute oral presentation
  • Slides with key visualizations
  • Clear explanation for general audience
  • Response to questions

Reproducibility Package (20%)

  • Data files (or access instructions)
  • Analysis code/spreadsheet
  • Documentation for replication

Assessment Rubric

CriterionExcellent (4)Proficient (3)Developing (2)Beginning (1)
Research QuestionClear, focused, answerable with dataGood question, minor issuesVague or too broadNo clear question
Data QualityAppropriate data, documented, quality-checkedGood data with documentationLimited documentationInappropriate data
AnalysisAppropriate methods, correctly appliedGood methods, minor issuesSome methodological problemsInappropriate methods
VisualizationClear, effective, professionalGood visualizationsAdequate but issuesPoor or misleading
ConclusionsWell-supported, acknowledges limitationsGenerally supportedOverreach from dataUnsupported claims
CommunicationClear, well-organized, professionalClear presentationUnclear in placesDifficult to follow

Timeline

WeekMilestoneDeliverable
1Topic selection and data acquisitionProposal (1 page)
2Data cleaning and explorationInitial EDA summary
3Analysis and visualizationDraft figures
4Report writing and presentation prepFinal report and slides

Unit Summary

This unit has introduced data science methods for air quality analysis: from diverse data sources and statistical techniques to machine learning and visualization. The capstone project challenges you to synthesize these skills by conducting original research. The ability to work with real data, apply appropriate methods, and communicate findings clearly is valuable far beyond air quality - these are transferable skills for any data-rich field.

← Lesson 4: Visualization Unit 8: Engineering →