API reference
Core
- pyogrio.detect_write_driver(path)
Attempt to infer the driver for a path by extension or prefix. Only drivers that support write capabilities will be detected.
If the path cannot be resolved to a single driver, a ValueError will be raised.
- Parameters
- pathstr
- Returns
- str
name of the driver, if detected
- pyogrio.get_gdal_config_option(name)
Get the value for a GDAL configuration option.
- Parameters
- namestr
name of the option to retrive
- Returns
- value of the option or None if not set
'ON'
/'OFF'
are normalized toTrue
/False
.
- pyogrio.list_drivers(read=False, write=False)
List drivers available in GDAL.
- Parameters
- read: bool, optional (default: False)
If True, will only return drivers that are known to support read capabilities.
- write: bool, optional (default: False)
If True, will only return drivers that are known to support write capabilities.
- Returns
- dict
Mapping of driver name to file mode capabilities:
"r"
: read,"w"
: write. Drivers that are available but with unknown support are marked with"?"
- pyogrio.list_layers(path_or_buffer, /)
List layers available in an OGR data source.
NOTE: includes both spatial and nonspatial layers.
- Parameters
- pathstr or pathlib.Path
- Returns
- ndarray shape (2, n)
array of pairs of [<layer name>, <layer geometry type>] Note: geometry is None for nonspatial layers.
- pyogrio.read_bounds(path_or_buffer, /, layer=None, skip_features=0, max_features=None, where=None, bbox=None)
Read bounds of each feature.
This can be used to assist with spatial indexing and partitioning, in order to avoid reading all features into memory. It is roughly 2-3x faster than reading the full geometry and attributes of a dataset.
- Parameters
- pathpathlib.Path or str
data source path
- layerint or str, optional (default: first layer)
If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.
- skip_featuresint, optional (default: 0)
Number of features to skip from the beginning of the file before returning features. Must be less than the total number of features in the file.
- max_featuresint, optional (default: None)
Number of features to read from the file. Must be less than the total number of features in the file minus
skip_features
(if used).- wherestr, optional (default: None)
Where clause to filter features in layer by attribute values. Uses a restricted form of SQL WHERE clause, defined here: http://ogdi.sourceforge.net/prop/6.2.CapabilitiesMetadata.html Examples:
"ISO_A3 = 'CAN'"
,"POP_EST > 10000000 AND POP_EST < 100000000"
- bboxtuple of (xmin, ymin, xmax, ymax), optional (default: None)
If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned.
- Returns
- tuple of (fids, bounds)
fids are global IDs read from the FID field of the dataset bounds are ndarray of shape(4, n) containing
xmin
,ymin
,xmax
,ymax
- pyogrio.read_info(path_or_buffer, /, layer=None, encoding=None, force_feature_count=False, force_total_bounds=False, **kwargs)
Read information about an OGR data source.
crs
,geometry
andtotal_bounds
will beNone
andfeatures
will be 0 for a nonspatial layer.features
will be -1 if this is an expensive operation for this driver. You can force it to be calculated using theforce_feature_count
parameter.total_bounds
is the 2-dimensional extent of all features within the dataset: (xmin, ymin, xmax, ymax). It will be None if this is an expensive operation for this driver or if the data source is nonspatial. You can force it to be calculated using theforce_total_bounds
parameter.- Parameters
- pathstr or pathlib.Path
- layer[type], optional
Name or index of layer in data source. Reads the first layer by default.
- encoding[type], optional (default: None)
If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.
- force_feature_countbool, optional (default: False)
True if the feature count should be computed even if it is expensive.
- force_total_boundsbool, optional (default: False)
True if the total bounds should be computed even if it is expensive.
- **kwargs
Additional driver-specific dataset open options passed to OGR. Invalid options will trigger a warning.
- Returns
- dict
A dictionary with the following keys:
{ "crs": "<crs>", "fields": <ndarray of field names>, "dtypes": <ndarray of field dtypes>, "encoding": "<encoding>", "geometry": "<geometry type>", "features": <feature count or -1>, "total_bounds": <tuple with total bounds or None>, "driver": "<driver>", "capabilities": "<dict of driver capabilities>" "dataset_metadata": "<dict of dataset metadata or None>" "layer_metadata": "<dict of layer metadata or None>" }
- pyogrio.set_gdal_config_options(options)
Set GDAL configuration options.
Options are listed here: https://trac.osgeo.org/gdal/wiki/ConfigOptions
No error is raised if invalid option names are provided.
These options are applied for an entire session rather than for individual functions.
- Parameters
- optionsdict
If present, provides a mapping of option name / value pairs for GDAL configuration options.
True
/False
are normalized to'ON'
/'OFF'
. A value ofNone
for a config option can be used to clear out a previously set value.
GeoPandas integration
- pyogrio.read_dataframe(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, fids=None, sql=None, sql_dialect=None, fid_as_index=False, use_arrow=False, **kwargs)
Read from an OGR data source to a GeoPandas GeoDataFrame or Pandas DataFrame. If the data source does not have a geometry column or
read_geometry
is False, a DataFrame will be returned.Requires
geopandas
>= 0.8.- Parameters
- path_or_bufferpathlib.Path or str, or bytes buffer
A dataset path or URI, or raw buffer.
- layerint or str, optional (default: first layer)
If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.
- encodingstr, optional (default: None)
If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.
- columnslist-like, optional (default: all columns)
List of column names to import from the data source. Column names must exactly match the names in the data source, and will be returned in the order they occur in the data source. To avoid reading any columns, pass an empty list-like.
- read_geometrybool, optional (default: True)
If True, will read geometry into a GeoSeries. If False, a Pandas DataFrame will be returned instead.
- force_2dbool, optional (default: False)
If the geometry has Z values, setting this to True will cause those to be ignored and 2D geometries to be returned
- skip_featuresint, optional (default: 0)
Number of features to skip from the beginning of the file before returning features. If greater than available number of features, an empty DataFrame will be returned. Using this parameter may incur significant overhead if the driver does not support the capability to randomly seek to a specific feature, because it will need to iterate over all prior features.
- max_featuresint, optional (default: None)
Number of features to read from the file.
- wherestr, optional (default: None)
Where clause to filter features in layer by attribute values. If the data source natively supports SQL, its specific SQL dialect should be used (eg. SQLite and GeoPackage: SQLITE, PostgreSQL). If it doesn’t, the OGRSQL WHERE syntax should be used. Note that it is not possible to overrule the SQL dialect, this is only possible when you use the
sql
parameter. Examples:"ISO_A3 = 'CAN'"
,"POP_EST > 10000000 AND POP_EST < 100000000"
- bboxtuple of (xmin, ymin, xmax, ymax) (default: None)
If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned.
- fidsarray-like, optional (default: None)
Array of integer feature id (FID) values to select. Cannot be combined with other keywords to select a subset (
skip_features
,max_features
,where
,bbox
orsql
). Note that the starting index is driver and file specific (e.g. typically 0 for Shapefile and 1 for GeoPackage, but can still depend on the specific file). The performance of reading a large number of features usings FIDs is also driver specific.- sqlstr, optional (default: None)
The SQL statement to execute. Look at the sql_dialect parameter for more information on the syntax to use for the query. When combined with other keywords like
columns
,skip_features
,max_features
,where
orbbox
, those are applied after the SQL query. Be aware that this can have an impact on performance, (e.g. filtering with thebbox
keyword may not use spatial indexes). Cannot be combined with thelayer
orfids
keywords.- sql_dialectstr, optional (default: None)
The SQL dialect the SQL statement is written in. Possible values:
None: if the data source natively supports SQL, its specific SQL dialect will be used by default (eg. SQLite and Geopackage: SQLITE, PostgreSQL). If the data source doesn’t natively support SQL, the OGRSQL dialect is the default.
‘OGRSQL’: can be used on any data source. Performance can suffer when used on data sources with native support for SQL.
‘SQLITE’: can be used on any data source. All spatialite functions can be used. Performance can suffer on data sources with native support for SQL, except for Geopackage and SQLite as this is their native SQL dialect.
- fid_as_indexbool, optional (default: False)
If True, will use the FIDs of the features that were read as the index of the GeoDataFrame. May start at 0 or 1 depending on the driver.
- use_arrowbool, default False
Whether to use Arrow as the transfer mechanism of the read data from GDAL to Python (requires GDAL >= 3.6 and pyarrow to be installed). When enabled, this provides a further speed-up.
- **kwargs
Additional driver-specific dataset open options passed to OGR. Invalid options will trigger a warning.
- Returns
- GeoDataFrame or DataFrame (if no geometry is present)
- pyogrio.write_dataframe(df, path, layer=None, driver=None, encoding=None, geometry_type=None, promote_to_multi=None, nan_as_null=True, append=False, dataset_metadata=None, layer_metadata=None, metadata=None, dataset_options=None, layer_options=None, **kwargs)
Write GeoPandas GeoDataFrame to an OGR file format.
- Parameters
- dfGeoDataFrame or DataFrame
The data to write. For attribute columns of the “object” dtype, all values will be converted to strings to be written to the output file, except None and np.nan, which will be set to NULL in the output file.
- pathstr
path to file
- layer :str, optional (default: None)
layer name
- driverstring, optional (default: None)
The OGR format driver used to write the vector file. By default write_dataframe attempts to infer driver from path.
- encodingstr, optional (default: None)
If present, will be used as the encoding for writing string values to the file.
- geometry_typestring, optional (default: None)
By default, the geometry type of the layer will be inferred from the data, after applying the promote_to_multi logic. If the data only contains a single geometry type (after applying the logic of promote_to_multi), this type is used for the layer. If the data (still) contains mixed geometry types, the output layer geometry type will be set to “Unknown”.
This parameter does not modify the geometry, but it will try to force the layer type of the output file to this value. Use this parameter with caution because using a non-default layer geometry type may result in errors when writing the file, may be ignored by the driver, or may result in invalid files. Possible values are: “Unknown”, “Point”, “LineString”, “Polygon”, “MultiPoint”, “MultiLineString”, “MultiPolygon” or “GeometryCollection”.
- promote_to_multibool, optional (default: None)
If True, will convert singular geometry types in the data to their corresponding multi geometry type for writing. By default, will convert mixed singular and multi geometry types to multi geometry types for drivers that do not support mixed singular and multi geometry types. If False, geometry types will not be promoted, which may result in errors or invalid files when attempting to write mixed singular and multi geometry types to drivers that do not support such combinations.
- nan_as_nullbool, default True
For floating point columns (float32 / float64), whether NaN values are written as “null” (missing value). Defaults to True because in pandas NaNs are typically used as missing value. Note that when set to False, behaviour is format specific: some formats don’t support NaNs by default (e.g. GeoJSON will skip this property) or might treat them as null anyway (e.g. GeoPackage).
- appendbool, optional (default: False)
If True, the data source specified by path already exists, and the driver supports appending to an existing data source, will cause the data to be appended to the existing records in the data source. NOTE: append support is limited to specific drivers and GDAL versions.
- dataset_metadatadict, optional (default: None)
Metadata to be stored at the dataset level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.
- layer_metadatadict, optional (default: None)
Metadata to be stored at the layer level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.
- metadatadict, optional (default: None)
alias of layer_metadata
- dataset_optionsdict, optional
Dataset creation option (format specific) passed to OGR. Specify as a key-value dictionary.
- layer_optionsdict, optional
Layer creation option (format specific) passed to OGR. Specify as a key-value dictionary.
- **kwargs
Additional driver-specific dataset or layer creation options passed to OGR. pyogrio will attempt to automatically pass those keywords either as dataset or as layer creation option based on the known options for the specific driver. Alternatively, you can use the explicit dataset_options or layer_options keywords to manually do this (for example if an option exists as both dataset and layer option).
Reading as Arrow data
- pyogrio.raw.read_arrow(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, fids=None, sql=None, sql_dialect=None, return_fids=False, **kwargs)
Read OGR data source into a pyarrow Table.
See docstring of read for parameters.
- Returns
- (dict, pyarrow.Table)
Returns a tuple of meta information about the data source in a dict, and a pyarrow Table with data.
- Meta is: {
“crs”: “<crs>”, “fields”: <ndarray of field names>, “encoding”: “<encoding>”, “geometry_type”: “<geometry_type>”, “geometry_name”: “<name of geometry column in arrow table>”,
}
- pyogrio.raw.open_arrow(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, fids=None, sql=None, sql_dialect=None, return_fids=False, batch_size=65536, **kwargs)
Open OGR data source as a stream of pyarrow record batches.
See docstring of read for parameters.
The RecordBatchStreamReader is reading from a stream provided by OGR and must not be accessed after the OGR dataset has been closed, i.e. after the context manager has been closed.
- Returns
- (dict, pyarrow.RecordBatchStreamReader)
Returns a tuple of meta information about the data source in a dict, and a pyarrow RecordBatchStreamReader with data.
- Meta is: {
“crs”: “<crs>”, “fields”: <ndarray of field names>, “encoding”: “<encoding>”, “geometry_type”: “<geometry_type>”, “geometry_name”: “<name of geometry column in arrow table>”,
}
Examples
>>> from pyogrio.raw import open_arrow >>> import pyarrow as pa >>> import shapely >>> >>> with open_arrow(path) as source: >>> meta, reader = source >>> for table in reader: >>> geometries = shapely.from_wkb(table[meta["geometry_name"]])