API reference

Core

pyogrio.detect_write_driver(path)

Attempt to infer the driver for a path by extension or prefix. Only drivers that support write capabilities will be detected.

If the path cannot be resolved to a single driver, a ValueError will be raised.

Parameters
pathstr
Returns
str

name of the driver, if detected

pyogrio.get_gdal_config_option(name)

Get the value for a GDAL configuration option.

Parameters
namestr

name of the option to retrive

Returns
value of the option or None if not set

'ON' / 'OFF' are normalized to True / False.

pyogrio.list_drivers(read=False, write=False)

List drivers available in GDAL.

Parameters
read: bool, optional (default: False)

If True, will only return drivers that are known to support read capabilities.

write: bool, optional (default: False)

If True, will only return drivers that are known to support write capabilities.

Returns
dict

Mapping of driver name to file mode capabilities: "r": read, "w": write. Drivers that are available but with unknown support are marked with "?"

pyogrio.list_layers(path_or_buffer, /)

List layers available in an OGR data source.

NOTE: includes both spatial and nonspatial layers.

Parameters
pathstr or pathlib.Path
Returns
ndarray shape (2, n)

array of pairs of [<layer name>, <layer geometry type>] Note: geometry is None for nonspatial layers.

pyogrio.read_bounds(path_or_buffer, /, layer=None, skip_features=0, max_features=None, where=None, bbox=None, mask=None)

Read bounds of each feature.

This can be used to assist with spatial indexing and partitioning, in order to avoid reading all features into memory. It is roughly 2-3x faster than reading the full geometry and attributes of a dataset.

Parameters
pathpathlib.Path or str

data source path

layerint or str, optional (default: first layer)

If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.

skip_featuresint, optional (default: 0)

Number of features to skip from the beginning of the file before returning features. Must be less than the total number of features in the file.

max_featuresint, optional (default: None)

Number of features to read from the file. Must be less than the total number of features in the file minus skip_features (if used).

wherestr, optional (default: None)

Where clause to filter features in layer by attribute values. Uses a restricted form of SQL WHERE clause, defined here: http://ogdi.sourceforge.net/prop/6.2.CapabilitiesMetadata.html Examples: "ISO_A3 = 'CAN'", "POP_EST > 10000000 AND POP_EST < 100000000"

bboxtuple of (xmin, ymin, xmax, ymax), optional (default: None)

If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned.

maskShapely geometry, optional (default: None)

If present, will be used to filter records whose geometry intersects this geometry. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this geometry will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of this geometry will be returned. Requires Shapely >= 2.0. Cannot be combined with bbox keyword.

Returns
tuple of (fids, bounds)

fids are global IDs read from the FID field of the dataset bounds are ndarray of shape(4, n) containing xmin, ymin, xmax, ymax

pyogrio.read_info(path_or_buffer, /, layer=None, encoding=None, force_feature_count=False, force_total_bounds=False, **kwargs)

Read information about an OGR data source.

crs, geometry and total_bounds will be None and features will be 0 for a nonspatial layer.

features will be -1 if this is an expensive operation for this driver. You can force it to be calculated using the force_feature_count parameter.

total_bounds is the 2-dimensional extent of all features within the dataset: (xmin, ymin, xmax, ymax). It will be None if this is an expensive operation for this driver or if the data source is nonspatial. You can force it to be calculated using the force_total_bounds parameter.

Parameters
pathstr or pathlib.Path
layer[type], optional

Name or index of layer in data source. Reads the first layer by default.

encoding[type], optional (default: None)

If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.

force_feature_countbool, optional (default: False)

True if the feature count should be computed even if it is expensive.

force_total_boundsbool, optional (default: False)

True if the total bounds should be computed even if it is expensive.

**kwargs

Additional driver-specific dataset open options passed to OGR. Invalid options will trigger a warning.

Returns
dict

A dictionary with the following keys:

{
    "crs": "<crs>",
    "fields": <ndarray of field names>,
    "dtypes": <ndarray of field dtypes>,
    "encoding": "<encoding>",
    "geometry_type": "<geometry type>",
    "features": <feature count or -1>,
    "total_bounds": <tuple with total bounds or None>,
    "driver": "<driver>",
    "capabilities": "<dict of driver capabilities>"
    "dataset_metadata": "<dict of dataset metadata or None>"
    "layer_metadata": "<dict of layer metadata or None>"
}
pyogrio.set_gdal_config_options(options)

Set GDAL configuration options.

Options are listed here: https://trac.osgeo.org/gdal/wiki/ConfigOptions

No error is raised if invalid option names are provided.

These options are applied for an entire session rather than for individual functions.

Parameters
optionsdict

If present, provides a mapping of option name / value pairs for GDAL configuration options. True / False are normalized to 'ON' / 'OFF'. A value of None for a config option can be used to clear out a previously set value.

GeoPandas integration

pyogrio.read_dataframe(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, mask=None, fids=None, sql=None, sql_dialect=None, fid_as_index=False, use_arrow=None, arrow_to_pandas_kwargs=None, **kwargs)

Read from an OGR data source to a GeoPandas GeoDataFrame or Pandas DataFrame. If the data source does not have a geometry column or read_geometry is False, a DataFrame will be returned.

Requires geopandas >= 0.8.

Parameters
path_or_bufferpathlib.Path or str, or bytes buffer

A dataset path or URI, or raw buffer.

layerint or str, optional (default: first layer)

If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.

encodingstr, optional (default: None)

If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.

columnslist-like, optional (default: all columns)

List of column names to import from the data source. Column names must exactly match the names in the data source, and will be returned in the order they occur in the data source. To avoid reading any columns, pass an empty list-like.

read_geometrybool, optional (default: True)

If True, will read geometry into a GeoSeries. If False, a Pandas DataFrame will be returned instead.

force_2dbool, optional (default: False)

If the geometry has Z values, setting this to True will cause those to be ignored and 2D geometries to be returned

skip_featuresint, optional (default: 0)

Number of features to skip from the beginning of the file before returning features. If greater than available number of features, an empty DataFrame will be returned. Using this parameter may incur significant overhead if the driver does not support the capability to randomly seek to a specific feature, because it will need to iterate over all prior features.

max_featuresint, optional (default: None)

Number of features to read from the file.

wherestr, optional (default: None)

Where clause to filter features in layer by attribute values. If the data source natively supports SQL, its specific SQL dialect should be used (eg. SQLite and GeoPackage: SQLITE, PostgreSQL). If it doesn’t, the OGRSQL WHERE syntax should be used. Note that it is not possible to overrule the SQL dialect, this is only possible when you use the sql parameter. Examples: "ISO_A3 = 'CAN'", "POP_EST > 10000000 AND POP_EST < 100000000"

bboxtuple of (xmin, ymin, xmax, ymax) (default: None)

If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned. Cannot be combined with mask keyword.

maskShapely geometry, optional (default: None)

If present, will be used to filter records whose geometry intersects this geometry. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this geometry will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of this geometry will be returned. Requires Shapely >= 2.0. Cannot be combined with bbox keyword.

fidsarray-like, optional (default: None)

Array of integer feature id (FID) values to select. Cannot be combined with other keywords to select a subset (skip_features, max_features, where, bbox, mask, or sql). Note that the starting index is driver and file specific (e.g. typically 0 for Shapefile and 1 for GeoPackage, but can still depend on the specific file). The performance of reading a large number of features usings FIDs is also driver specific.

sqlstr, optional (default: None)

The SQL statement to execute. Look at the sql_dialect parameter for more information on the syntax to use for the query. When combined with other keywords like columns, skip_features, max_features, where, bbox, or mask, those are applied after the SQL query. Be aware that this can have an impact on performance, (e.g. filtering with the bbox or mask keywords may not use spatial indexes). Cannot be combined with the layer or fids keywords.

sql_dialectstr, optional (default: None)

The SQL dialect the SQL statement is written in. Possible values:

  • None: if the data source natively supports SQL, its specific SQL dialect will be used by default (eg. SQLite and Geopackage: SQLITE, PostgreSQL). If the data source doesn’t natively support SQL, the OGRSQL dialect is the default.

  • OGRSQL’: can be used on any data source. Performance can suffer when used on data sources with native support for SQL.

  • SQLITE’: can be used on any data source. All spatialite functions can be used. Performance can suffer on data sources with native support for SQL, except for Geopackage and SQLite as this is their native SQL dialect.

fid_as_indexbool, optional (default: False)

If True, will use the FIDs of the features that were read as the index of the GeoDataFrame. May start at 0 or 1 depending on the driver.

use_arrowbool, optional (default: False)

Whether to use Arrow as the transfer mechanism of the read data from GDAL to Python (requires GDAL >= 3.6 and pyarrow to be installed). When enabled, this provides a further speed-up. Defaults to False, but this default can also be globally overridden by setting the PYOGRIO_USE_ARROW=1 environment variable.

arrow_to_pandas_kwargsdict, optional (default: None)

When use_arrow is True, these kwargs will be passed to the to_pandas call for the arrow to pandas conversion.

**kwargs

Additional driver-specific dataset open options passed to OGR. Invalid options will trigger a warning.

Returns
GeoDataFrame or DataFrame (if no geometry is present)
pyogrio.write_dataframe(df, path, layer=None, driver=None, encoding=None, geometry_type=None, promote_to_multi=None, nan_as_null=True, append=False, dataset_metadata=None, layer_metadata=None, metadata=None, dataset_options=None, layer_options=None, **kwargs)

Write GeoPandas GeoDataFrame to an OGR file format.

Parameters
dfGeoDataFrame or DataFrame

The data to write. For attribute columns of the “object” dtype, all values will be converted to strings to be written to the output file, except None and np.nan, which will be set to NULL in the output file.

pathstr

path to file

layer :str, optional (default: None)

layer name

driverstring, optional (default: None)

The OGR format driver used to write the vector file. By default write_dataframe attempts to infer driver from path.

encodingstr, optional (default: None)

If present, will be used as the encoding for writing string values to the file.

geometry_typestring, optional (default: None)

By default, the geometry type of the layer will be inferred from the data, after applying the promote_to_multi logic. If the data only contains a single geometry type (after applying the logic of promote_to_multi), this type is used for the layer. If the data (still) contains mixed geometry types, the output layer geometry type will be set to “Unknown”.

This parameter does not modify the geometry, but it will try to force the layer type of the output file to this value. Use this parameter with caution because using a non-default layer geometry type may result in errors when writing the file, may be ignored by the driver, or may result in invalid files. Possible values are: “Unknown”, “Point”, “LineString”, “Polygon”, “MultiPoint”, “MultiLineString”, “MultiPolygon” or “GeometryCollection”.

promote_to_multibool, optional (default: None)

If True, will convert singular geometry types in the data to their corresponding multi geometry type for writing. By default, will convert mixed singular and multi geometry types to multi geometry types for drivers that do not support mixed singular and multi geometry types. If False, geometry types will not be promoted, which may result in errors or invalid files when attempting to write mixed singular and multi geometry types to drivers that do not support such combinations.

nan_as_nullbool, default True

For floating point columns (float32 / float64), whether NaN values are written as “null” (missing value). Defaults to True because in pandas NaNs are typically used as missing value. Note that when set to False, behaviour is format specific: some formats don’t support NaNs by default (e.g. GeoJSON will skip this property) or might treat them as null anyway (e.g. GeoPackage).

appendbool, optional (default: False)

If True, the data source specified by path already exists, and the driver supports appending to an existing data source, will cause the data to be appended to the existing records in the data source. NOTE: append support is limited to specific drivers and GDAL versions.

dataset_metadatadict, optional (default: None)

Metadata to be stored at the dataset level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.

layer_metadatadict, optional (default: None)

Metadata to be stored at the layer level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.

metadatadict, optional (default: None)

alias of layer_metadata

dataset_optionsdict, optional

Dataset creation option (format specific) passed to OGR. Specify as a key-value dictionary.

layer_optionsdict, optional

Layer creation option (format specific) passed to OGR. Specify as a key-value dictionary.

**kwargs

Additional driver-specific dataset or layer creation options passed to OGR. pyogrio will attempt to automatically pass those keywords either as dataset or as layer creation option based on the known options for the specific driver. Alternatively, you can use the explicit dataset_options or layer_options keywords to manually do this (for example if an option exists as both dataset and layer option).

Reading as Arrow data

pyogrio.raw.read_arrow(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, mask=None, fids=None, sql=None, sql_dialect=None, return_fids=False, **kwargs)

Read OGR data source into a pyarrow Table.

See docstring of read for parameters.

Returns
(dict, pyarrow.Table)

Returns a tuple of meta information about the data source in a dict, and a pyarrow Table with data.

Meta is: {

“crs”: “<crs>”, “fields”: <ndarray of field names>, “encoding”: “<encoding>”, “geometry_type”: “<geometry_type>”, “geometry_name”: “<name of geometry column in arrow table>”,

}

pyogrio.raw.open_arrow(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, mask=None, fids=None, sql=None, sql_dialect=None, return_fids=False, batch_size=65536, **kwargs)

Open OGR data source as a stream of pyarrow record batches.

See docstring of read for parameters.

The RecordBatchStreamReader is reading from a stream provided by OGR and must not be accessed after the OGR dataset has been closed, i.e. after the context manager has been closed.

Returns
(dict, pyarrow.RecordBatchStreamReader)

Returns a tuple of meta information about the data source in a dict, and a pyarrow RecordBatchStreamReader with data.

Meta is: {

“crs”: “<crs>”, “fields”: <ndarray of field names>, “encoding”: “<encoding>”, “geometry_type”: “<geometry_type>”, “geometry_name”: “<name of geometry column in arrow table>”,

}

Examples

>>> from pyogrio.raw import open_arrow
>>> import pyarrow as pa
>>> import shapely
>>>
>>> with open_arrow(path) as source:
>>>     meta, reader = source
>>>     for table in reader:
>>>         geometries = shapely.from_wkb(table[meta["geometry_name"]])