RasterFrames
RasterFrames brings together Earth-observation (EO) data access, cloud computing, and DataFrame-based data science. It provides a DataFrame-centric view over arbitrary raster data in a horizontally scalable compute environment, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of Spark ML algorithms. By using the DataFrame as a unified cognitive and compute model, RasterFrames makes the rapidly growing EO data footprint accessible to general analysts and EO specialists in a form that scales from the laptop to the supercomputer.
Core features
-
Language Support
- Python
- Scala
- SQL
-
Readers
- Catalog-based Spark DataSource for heterogeneous multi-band raster data sets
- GeoTIFF format via GeoTrellis JVM reader
- GDAL-supported formats via GeoTrellis GDAL bindings
- GeoTrellis layer reader
- GeoJSON format via JTS parser
- Landsat 8 and MODIS NBAR on AWS PDS catalog readers
-
Writers
- GeoTIFF
- GeoTrellis layers
- Parquet compatible
-
Spatial Relations
- Spatial relation query & filtering support via GeoMesa
- Standard DE-9IM topological relations: Intersects, Contains, Within, Covers, etc.
- Raster join between DataFrames of arbitrary raster data
- Spatial join between raster and vector DataFrames
-
Operations
- "Map Algebra"
- Reprojection
- Masking
- Rasterization
- NoData and cell-type handling
- Spatio-temporal and metadata filtering
- Local, zonal, and aggregate statistics
-
Interoperability
- Spark Ecosystem, including Spark ML
- Numpy tile encoding
- Pandas conversions
- GeoTrellis
Implemented Standards
- Geographic JSON (GeoJSON)
- Georeferenced Tagged Image File Format (GeoTIFF)
- Well-Known Binary (WKB)
- Well-Known Text (WKT)