Data Analysis Project
Project Overview
"Conduct an original data analysis investigation using real air quality data, applying statistical and visualization methods to answer a research question of your choosing."
Project Options
Option A: Trend Analysis
Analyze 10+ years of data to detect trends.
- Is air quality improving or worsening?
- How do trends vary by pollutant?
- What factors explain observed trends?
Option B: Spatial Analysis
Compare air quality across locations.
- Urban vs. suburban vs. rural differences
- Environmental justice analysis
- Near-road vs. background sites
Option C: Event Analysis
Study air quality during specific events.
- Wildfire smoke episodes
- COVID-19 lockdown effects
- Holiday fireworks impacts
Option D: Predictive Modeling
Build and evaluate forecast models.
- Next-day concentration prediction
- AQI category classification
- Model comparison study
Project Requirements
Minimum Data Requirements
- At least 1 year of data (or equivalent for spatial analysis)
- Data from at least 2 sources (e.g., AQS + meteorology, multiple sites)
- At least 1000 data points total
Analysis Requirements
- Descriptive statistics (mean, median, percentiles, etc.)
- At least one inferential statistical test (regression, t-test, etc.)
- Appropriate handling of missing data
Visualization Requirements
- At least 4 distinct visualizations
- Different chart types (not 4 line plots)
- Professional formatting and labeling
Deliverables
Technical Report (50%)
- Introduction and research question
- Data description and sources
- Methods (analysis approach)
- Results with visualizations
- Discussion and conclusions
- Limitations and future work
- References
Presentation (30%)
- 10-minute oral presentation
- Slides with key visualizations
- Clear explanation for general audience
- Response to questions
Reproducibility Package (20%)
- Data files (or access instructions)
- Analysis code/spreadsheet
- Documentation for replication
Assessment Rubric
| Criterion | Excellent (4) | Proficient (3) | Developing (2) | Beginning (1) |
|---|---|---|---|---|
| Research Question | Clear, focused, answerable with data | Good question, minor issues | Vague or too broad | No clear question |
| Data Quality | Appropriate data, documented, quality-checked | Good data with documentation | Limited documentation | Inappropriate data |
| Analysis | Appropriate methods, correctly applied | Good methods, minor issues | Some methodological problems | Inappropriate methods |
| Visualization | Clear, effective, professional | Good visualizations | Adequate but issues | Poor or misleading |
| Conclusions | Well-supported, acknowledges limitations | Generally supported | Overreach from data | Unsupported claims |
| Communication | Clear, well-organized, professional | Clear presentation | Unclear in places | Difficult to follow |
Timeline
| Week | Milestone | Deliverable |
|---|---|---|
| 1 | Topic selection and data acquisition | Proposal (1 page) |
| 2 | Data cleaning and exploration | Initial EDA summary |
| 3 | Analysis and visualization | Draft figures |
| 4 | Report writing and presentation prep | Final report and slides |
Unit Summary
This unit has introduced data science methods for air quality analysis: from diverse data sources and statistical techniques to machine learning and visualization. The capstone project challenges you to synthesize these skills by conducting original research. The ability to work with real data, apply appropriate methods, and communicate findings clearly is valuable far beyond air quality - these are transferable skills for any data-rich field.