README.md

Air Quality ETL Pipeline Example

Description

This project contains an example for how to use XetHub to store and run an ETL (Extract, Transform, Load) pipeline. The data pipeline processes air quality measurements, retrieved using the Open AQ API, and displays the data interactively.

Data Sources

OpenAQ - API designed for aggregating and sharing open air quality data from around the world.

This current data set uses JSON aggregation of the data up through 2022.

Parameters

  • pm1 - PM1 ➡️ Particulate matter less than 1 micrometer in diameter mass concentration, µg/m³
  • pm10 - PM10 ➡️ Particulate matter less than 10 micrometers in diameter mass concentration, µg/m³
  • pm25 - PM2.5 ➡️ Particulate matter less than 2.5 micrometers in diameter mass concentration, µg/m³
  • um003 - PM0.3 ➡️ count, particles/cm³
  • um005 - PM0.5 ➡️ count, particles/cm³
  • um010 - PM1 ➡️ count, particles/cm³
  • um025 - PM2.5 ➡️ count, particles/cm³
  • um050 - PM5.0 ➡️ count, particles/cm³
  • um100 - PM10 ➡️ count, particles/cm³
  • pressure ➡️ Atmospheric or barometric pressure, hPa
  • temperature ➡️ °C
  • humidity ➡️ %

Pipeline

  • Individual units upload the current set of measurements to data/raw_upload/<country>/<location>/. See data/scripts/upload_new_data.sh for an example of doing this.
  • XetHub Actions: runs the ETL process every day (src/etl/aggregate.py), which scrapes all the raw uploads and creates an aggregated csv file with all previous measurements for a site.
  • Capsules: A capsule is run using streamlit (src/capsule/app.py) to display a data exploration app for the repo.
File List Total items: 7
Name Last Commit Size Last Modified
.xethub/workflows
data Added aggregated data. 3 months ago
src Bugfix to capsule. 3 months ago
.gitattributes
79 B
.gitignore
23 B
README.md Added new source file for aggregation. 1.6 KiB 3 months ago
requirements.txt
121 B

Repository Size

Loading repo size...

Commits 193 commits

File Types