Conversion between GeoJSON and GeoParquet Format in Python and Javascript

GeoJSON

GeoJSON is a format for encoding a variety of geographic data structures. A GeoJSON object may represent a geometry, a feature, or a collection of features. GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a geometry object and additional properties, and a feature collection represents a list of features.

GeoParquet

GeoParquet is a Parquet file that contains a column of Parquet-encoded GeoJSON strings. A GeoParquet file can be read by any Parquet-compatible system, and the GeoJSON strings can be decoded into GeoJSON objects. GeoParquet files are useful for storing and querying large collections of GeoJSON objects.

Usage of GeoJSON and GeoParquet in high-resolution spatial transcriptomics

High-resolution spatial transcriptomics (ST) is a technique that allows researchers to study gene expression in individual cells within a tissue sample. ST data is typically stored as a collection of spatially resolved gene expression profiles, where each profile is associated with a spatial location in the tissue. GeoJSON and GeoParquet formats are commonly used to store and query ST data, as they provide a flexible and efficient way to represent spatial information.

For example, 10x XOA software will perform cell segmentation and store the cell boundaries at cell_boundaries.parquet, while this file is a simple parquet file without geometry objects. We can use the following code to convert it to a GeoParquet file containing Polygons:


from shapely.geometry import Polygon, Point
import pandas as pd
import geopandas as gpd
cell_boundaries = pd.read_parquet(
    "/Volumes/lzxc/10x_Xenium_5k/Xenium_V1_Human_Colon_Cancer_P1_CRC_Add_on_FFPE/" + 
    "cell_boundaries.parquet"
)
cell_boundaries_agg = cell_boundaries.groupby("cell_id").agg({"coords":list})
polys = list(map(lambda coords: Polygon(coords), cell_boundaries_agg['coords']))
gdf = gpd.GeoDataFrame(geometry=polys)
gdf['id'] = list(cell_boundaries_agg.index)
gdf.to_parquet("cell_boundaries.geoparquet")
						

The GeoParquet file can then be used to visualize the cell boundaries

Conversion in Python

There are several Python libraries that can be used to convert between GeoJSON and GeoParquet format. One popular library is GeoPandas, which provides a high-level interface for working with geospatial data. The following code snippet shows how to convert between GeoJSON and GeoParquet format using GeoPandas:


import geopandas as gpd
gdf = gpd.read_file('data.geojson')
gdf.to_parquet('data.geoparquet')
                        

Conversion in Javascript

There are also several Javascript libraries that can be used to convert between GeoJSON and GeoParquet format. One popular library is Turf.js, which provides a wide range of geospatial functions. The following code snippet shows how to convert between GeoJSON and GeoParquet format using @loaders.gl/wkt:


// Function to fetch a Parquet file and parse it to GeoJSON using @loaders.gl/wkt
async function fetchParquetFile(parquetUrl) {
    // log the starting time
    console.time("fetching parquet file")
    const response = await fetch(`https://example.com/${parquetUrl}`);
    const parquetBuffer = await response.arrayBuffer();
    // log the total time used for fetching the parquet file
    console.timeEnd("fetching parquet file")
    // log the starting time for reading and decoding the parquet file
    console.time("reading and decoding parquet file")
    const parquetBytes = new Uint8Array(parquetBuffer);
    const decodedBytes = readParquet(parquetBytes);
    const arrowTable = tableFromIPC(decodedBytes);

    const geojsonFeatures = arrowTable.getChildAt(0).toArray()
        .map((row) => {
            const data = parseSync(row, WKBLoader);
            return data
        })
    const feature_id = arrowTable.getChildAt(1).toArray()
    var feature_cluster = null;
    if (arrowTable.getChildAt(2) !== undefined) {
        feature_cluster = arrowTable.getChildAt(2).toArray()
    }
    const geojson = {
        "type": "FeatureCollection",
        "features": []
    }
    for (let i = 0; i < geojsonFeatures.length; i++) {
        const geometry = geojsonFeatures[i];
        if (geometry.coordinates === undefined) {
            var points = [];
            for (var j = 0; j < geometry.positions.value.length; j += 2) {
                points.push([geometry.positions.value[j], geometry.positions.value[j + 1]]);
            }
            geometry.coordinates = [points]
        }
        geojson.features.push({
            "type": "Feature",
            "geometry": geometry,
            "properties": {
                "id": feature_id[i],
                "cluster": feature_cluster !== null ? feature_cluster[i] : null
            }
        })
    }
    // log the total time used for reading and decoding the parquet file
    console.timeEnd("reading and decoding parquet file")
    return geojson
};