API reference

Core

Bulk-oriented vector I/O using OGR.

pyogrio.detect_write_driver(path)

Attempt to infer the driver for a path by extension or prefix.

Only drivers that support write capabilities will be detected.

If the path cannot be resolved to a single driver, a ValueError will be raised.

Parameters:

pathstr: data source path

Returns:

str: name of the driver, if detected

pyogrio.get_gdal_config_option(name)

Get the value for a GDAL configuration option.

Parameters:

namestr: name of the option to retrive

Returns:

value of the option or None if not set: 'ON' / 'OFF' are normalized to True / False.

pyogrio.list_drivers(read=False, write=False, append=False) → dict[str, str]

List drivers available in GDAL.

Parameters:

read: bool, optional (default: False): If True, will only return drivers that are known to support read capabilities.
write: bool, optional (default: False): If True, will only return drivers that are known to support write capabilities.
append: bool, optional (default: False): If True, will only return drivers that are known to support append capabilities. .. versionadded:: 0.13.0

Returns:

dict: Mapping of driver name to file mode capabilities: "r": read, "a": append, "w": write. Drivers that are available but with unknown support are marked with "?"

Changed in version 0.13.0: Added the a flag, which is available for GDAL >= 3.11.

pyogrio.list_drivers_details() → dict[str, dict]

List all available drivers with detailed information.

For each driver, the following properties are included:

long_name: the long name of the driver.
read: a boolean indicating if the driver supports opening and reading an existing file.
append: a boolean indicating if the driver supports appending rows to an existing file. This property is None if GDAL < 3.11.
write: a boolean indicating if the driver supports creating and writing new files.
help_topic_url: an URL to the GDAL documentation for the help topic of this driver.
extensions: a list of file extensions associated with this driver, if any.

Returns:

dict of dicts: Mapping of driver short name to a dict with detailed driver properties.

pyogrio.list_layers(path_or_buffer, /)

List layers available in an OGR data source.

NOTE: includes both spatial and nonspatial layers.

Parameters:

path_or_bufferstr, pathlib.Path, bytes, or file-like: A dataset path or URI, raw buffer, or file-like object with a read method.

Returns:

ndarray shape (2, n): array of pairs of [<layer name>, <layer geometry type>] Note: geometry is None for nonspatial layers.

pyogrio.read_bounds(path_or_buffer, /, layer=None, skip_features=0, max_features=None, where=None, bbox=None, mask=None)

Read bounds of each feature.

This can be used to assist with spatial indexing and partitioning, in order to avoid reading all features into memory. It is roughly 2-3x faster than reading the full geometry and attributes of a dataset.

Parameters:

path_or_bufferstr, pathlib.Path, bytes, or file-like: A dataset path or URI, raw buffer, or file-like object with a read method.
layerint or str, optional (default: first layer): If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.
skip_featuresint, optional (default: 0): Number of features to skip from the beginning of the file before returning features. Must be less than the total number of features in the file.
max_featuresint, optional (default: None): Number of features to read from the file. Must be less than the total number of features in the file minus skip_features (if used).
wherestr, optional (default: None): Where clause to filter features in layer by attribute values. Uses a restricted form of SQL WHERE clause, defined here: http://ogdi.sourceforge.net/prop/6.2.CapabilitiesMetadata.html Examples: "ISO_A3 = 'CAN'", "POP_EST > 10000000 AND POP_EST < 100000000"
bboxtuple of (xmin, ymin, xmax, ymax), optional (default: None): If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned.
maskShapely geometry, optional (default: None): If present, will be used to filter records whose geometry intersects this geometry. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this geometry will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of this geometry will be returned. Requires Shapely >= 2.0. Cannot be combined with bbox keyword.

Returns:

tuple of (fids, bounds): fids are global IDs read from the FID field of the dataset bounds are ndarray of shape(4, n) containing xmin, ymin, xmax, ymax

pyogrio.read_info(path_or_buffer, /, layer=None, encoding=None, force_feature_count=False, force_total_bounds=False, **kwargs)

Read information about an OGR data source.

crs, geometry and total_bounds will be None and features will be 0 for a nonspatial layer.

features will be -1 if this is an expensive operation for this driver. You can force it to be calculated using the force_feature_count parameter.

total_bounds is the 2-dimensional extent of all features within the dataset: (xmin, ymin, xmax, ymax). It will be None if this is an expensive operation for this driver or if the data source is nonspatial. You can force it to be calculated using the force_total_bounds parameter.

fid_column is the name of the FID field in the data source, if the FID is physically stored (e.g. in GPKG). If the FID is just a sequence, fid_column will be “” (e.g. ESRI Shapefile).

geometry_name is the name of the field where the main geometry is stored in the data data source, if the field name can by customized (e.g. in GPKG). If no custom name is supported, geometry_name will be “” (e.g. ESRI Shapefile).

encoding will be UTF-8 if either the native encoding is likely to be UTF-8 or GDAL can automatically convert from the detected native encoding to UTF-8.

Parameters:

path_or_bufferstr, pathlib.Path, bytes, or file-like: A dataset path or URI, raw buffer, or file-like object with a read method.
layerstr or int, optional: Name or index of layer in data source. Reads the first layer by default.
encodingstr, optional (default: None): If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.
force_feature_countbool, optional (default: False): True if the feature count should be computed even if it is expensive.
force_total_boundsbool, optional (default: False): True if the total bounds should be computed even if it is expensive.
**kwargs: Additional driver-specific dataset open options passed to OGR. Invalid options will trigger a warning.

Returns:

dict

A dictionary with the following keys:

{
    "layer_name": "<layer name>",
    "crs": "<crs>",
    "fields": <ndarray of field names>,
    "dtypes": <ndarray of field dtypes>,
    "ogr_types": <ndarray of OGR field types>,
    "ogr_subtypes": <ndarray of OGR field subtypes>,
    "encoding": "<encoding>",
    "fid_column": "<fid column name or "">",
    "geometry_name": "<geometry column name or "">",
    "geometry_type": "<geometry type>",
    "features": <feature count or -1>,
    "total_bounds": <tuple with total bounds or None>,
    "driver": "<driver>",
    "capabilities": "<dict of driver capabilities>"
    "dataset_metadata": "<dict of dataset metadata or None>"
    "layer_metadata": "<dict of layer metadata or None>"
}

pyogrio.set_gdal_config_options(options)

Set GDAL configuration options.

Options are listed here: https://trac.osgeo.org/gdal/wiki/ConfigOptions

No error is raised if invalid option names are provided.

These options are applied for an entire session rather than for individual functions.

Parameters:

optionsdict: If present, provides a mapping of option name / value pairs for GDAL configuration options. True / False are normalized to 'ON' / 'OFF'. A value of None for a config option can be used to clear out a previously set value.

pyogrio.vsi_curl_clear_cache(prefix: str = '')

Clean local cache associated with /vsicurl/.

When a prefix is provided, only cached state for any file or directory starting with that prefix will be (exposing VSICurlPartialClearCache). If no prefix is specified, the entire local cache is cleared (exposing VSICurlClearCache).

Parameters:

prefixstr: Filename or prefix to clear associated cache. If not specified clear all cache.

pyogrio.vsi_listtree(path: str | Path, pattern: str | None = None)

Recursively list the contents of a VSI directory.

An fnmatch pattern can be specified to filter the directories/files returned.

Parameters:

pathstr or pathlib.Path: Path to the VSI directory to be listed.
patternstr, optional: Pattern to filter results, in fnmatch format.

pyogrio.vsi_rmtree(path: str | Path)

Recursively remove VSI directory.

Parameters:

pathstr or pathlib.Path: path to the VSI directory to be removed.

pyogrio.vsi_unlink(path: str | Path)

Remove a VSI file.

Parameters:

pathstr or pathlib.Path: path to vsimem file to be removed

pyogrio.__version__: The pyogrio version (str).

pyogrio.__gdal_version__: The GDAL version used by pyogrio (tuple of int).

pyogrio.__gdal_version_string__: The GDAL version used by pyogrio (str).

pyogrio.__gdal_geos_version__: The version of GEOS used by GDAL (tuple of int).

GeoPandas integration

pyogrio.read_dataframe(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, mask=None, fids=None, sql=None, sql_dialect=None, fid_as_index=False, use_arrow=None, on_invalid='raise', arrow_to_pandas_kwargs=None, datetime_as_string=False, mixed_offsets_as_utc=True, **kwargs)

Read from an OGR data source to a GeoPandas GeoDataFrame or Pandas DataFrame.

If the data source does not have a geometry column or read_geometry is False, a DataFrame will be returned.

If you read data with datetime columns containing time zone information, check out the notes below.

Requires geopandas >= 0.8.

Parameters:

path_or_bufferpathlib.Path or str, or bytes buffer

A dataset path or URI, raw buffer, or file-like object with a read method.

layerint or str, optional (default: first layer)

If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.

encodingstr, optional (default: None)

If present, will be used as the encoding for reading string values from the data source. By default will automatically try to detect the native encoding and decode to UTF-8.

columnslist-like, optional (default: all columns)

List of column names to import from the data source. Column names must exactly match the names in the data source, and will be returned in the order they occur in the data source. To avoid reading any columns, pass an empty list-like. If combined with where parameter, must include columns referenced in the where expression or the data may not be correctly read; the data source may return empty results or raise an exception (behavior varies by driver).

read_geometrybool, optional (default: True)

If True, will read geometry into a GeoSeries. If False, a Pandas DataFrame will be returned instead.

force_2dbool, optional (default: False)

If the geometry has Z values, setting this to True will cause those to be ignored and 2D geometries to be returned

skip_featuresint, optional (default: 0)

Number of features to skip from the beginning of the file before returning features. If greater than available number of features, an empty DataFrame will be returned. Using this parameter may incur significant overhead if the driver does not support the capability to randomly seek to a specific feature, because it will need to iterate over all prior features.

max_featuresint, optional (default: None)

Number of features to read from the file.

wherestr, optional (default: None)

Where clause to filter features in layer by attribute values. If the data source natively supports SQL, its specific SQL dialect should be used (eg. SQLite and GeoPackage: SQLITE, PostgreSQL). If it doesn’t, the OGRSQL WHERE syntax should be used. Note that it is not possible to overrule the SQL dialect, this is only possible when you use the sql parameter. Examples: "ISO_A3 = 'CAN'", "POP_EST > 10000000 AND POP_EST < 100000000"

bboxtuple of (xmin, ymin, xmax, ymax) (default: None)

If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned. Cannot be combined with mask keyword.

maskShapely geometry, optional (default: None)

If present, will be used to filter records whose geometry intersects this geometry. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this geometry will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of this geometry will be returned. Requires Shapely >= 2.0. Cannot be combined with bbox keyword.

fidsarray-like, optional (default: None)

Array of integer feature id (FID) values to select. Cannot be combined with other keywords to select a subset (skip_features, max_features, where, bbox, mask, or sql). Note that the starting index is driver and file specific (e.g. typically 0 for Shapefile and 1 for GeoPackage, but can still depend on the specific file). The performance of reading a large number of features usings FIDs is also driver specific and depends on the value of use_arrow. The order of the rows returned is undefined. If you would like to sort based on FID, use fid_as_index=True to have the index of the GeoDataFrame returned set to the FIDs of the features read. If use_arrow=True, the number of FIDs is limited to 4997 for drivers with ‘OGRSQL’ as default SQL dialect. To read a larger number of FIDs, set user_arrow=False.

sqlstr, optional (default: None)

The SQL statement to execute. Look at the sql_dialect parameter for more information on the syntax to use for the query. When combined with other keywords like columns, skip_features, max_features, where, bbox, or mask, those are applied after the SQL query. Be aware that this can have an impact on performance, (e.g. filtering with the bbox or mask keywords may not use spatial indexes). Cannot be combined with the layer or fids keywords.

sql_dialectstr, optional (default: None)

The SQL dialect the SQL statement is written in. Possible values:

None: if the data source natively supports SQL, its specific SQL dialect will be used by default (eg. SQLite and Geopackage: SQLITE, PostgreSQL). If the data source doesn’t natively support SQL, the OGRSQL dialect is the default.

‘OGRSQL’: can be used on any data source. Performance can suffer when used on data sources with native support for SQL.

‘SQLITE’: can be used on any data source. All spatialite functions can be used. Performance can suffer on data sources with native support for SQL, except for Geopackage and SQLite as this is their native SQL dialect.

fid_as_indexbool, optional (default: False)

If True, will use the FIDs of the features that were read as the index of the GeoDataFrame. May start at 0 or 1 depending on the driver.

use_arrowbool, optional (default: False)

Whether to use Arrow as the transfer mechanism of the read data from GDAL to Python (requires GDAL >= 3.6 and pyarrow to be installed). When enabled, this provides a further speed-up. Defaults to False, but this default can also be globally overridden by setting the PYOGRIO_USE_ARROW=1 environment variable.

on_invalidstr, optional (default: “raise”)

The action to take when an invalid geometry is encountered. Possible values:

raise: an exception will be raised if a WKB input geometry is invalid.
warn: invalid WKB geometries will be returned as None and a warning will be raised.
ignore: invalid WKB geometries will be returned as None without a warning.
fix: an effort is made to fix invalid input geometries (currently just unclosed rings). If this is not possible, they are returned as None without a warning. Requires GEOS >= 3.11 and shapely >= 2.1.

arrow_to_pandas_kwargsdict, optional (default: None)

When use_arrow is True, these kwargs will be passed to the to_pandas call for the arrow to pandas conversion.

datetime_as_stringbool, optional (default: False)

If True, will return datetime columns as detected by GDAL as ISO8601 strings and mixed_offsets_as_utc will be ignored.

mixed_offsets_as_utc: bool, optional (default: True)

By default, datetime columns are read as the pandas datetime64 dtype. This can represent the data as-is in the case that the column contains only naive datetimes (without time zone information), only UTC datetimes, or if all datetimes in the column have the same time zone offset. Note that in time zones with daylight saving time, datetimes will have different offsets throughout the year!

For columns that don’t comply with the above, i.e. columns that contain mixed offsets, the behavior depends on the value of this parameter:

If True (default), such datetimes are converted to UTC. In the case of a mixture of time zone aware and naive datetimes, the naive datetimes are assumed to be in UTC already. Datetime columns returned will always be pandas datetime64.
If False, such datetimes with mixed offsets are returned with those offsets preserved. Because pandas datetime64 columns don’t support mixed time zone offsets, such columns are returned as object columns with python datetime values with fixed offsets. If you want to roundtrip datetimes without data loss, this is the recommended option, but you lose the functionality of a datetime64 column.

If datetime_as_string is True, this option is ignored.

**kwargs

Additional driver-specific dataset open options passed to OGR. Invalid options will trigger a warning.

Returns:

GeoDataFrame or DataFrame (if no geometry is present)

Notes

When you have datetime columns with time zone information, it is important to note that GDAL only represents time zones as UTC offsets, whilst pandas uses IANA time zones (via pytz or zoneinfo). As a result, even if a column in a DataFrame contains datetimes in a single time zone, this will often still result in mixed time zone offsets being written for time zones where daylight saving time is used (e.g. +01:00 and +02:00 offsets for time zone Europe/Brussels). When roundtripping through GDAL, the information about the original time zone is lost, only the offsets can be preserved. By default, pyogrio.read_dataframe() will convert columns with mixed offsets to UTC to return a datetime64 column. If you want to preserve the original offsets, you can use datetime_as_string=True or mixed_offsets_as_utc=False.

https://gdal.org/user/ogr_sql_dialect.html#ogr-sql-dialect

https://gdal.org/user/ogr_sql_dialect.html#where

https://gdal.org/user/sql_sqlite_dialect.html#sql-sqlite-dialect

https://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html

https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas

pyogrio.write_dataframe(df, path, layer=None, driver=None, encoding=None, geometry_type=None, promote_to_multi=None, nan_as_null=True, append=False, use_arrow=None, dataset_metadata=None, layer_metadata=None, metadata=None, dataset_options=None, layer_options=None, **kwargs)

Write GeoPandas GeoDataFrame to an OGR file format.

Parameters:

dfGeoDataFrame or DataFrame

The data to write. For attribute columns of the “object” dtype, all values will be converted to strings to be written to the output file, except None and np.nan, which will be set to NULL in the output file.

pathstr or io.BytesIO

path to output file on writeable file system or an io.BytesIO object to allow writing to memory. Will raise NotImplementedError if an open file handle is passed; use BytesIO instead. NOTE: support for writing to memory is limited to specific drivers.

layerstr, optional (default: None)

layer name to create. If writing to memory and layer name is not provided, it layer name will be set to a UUID4 value.

driverstring, optional (default: None)

The OGR format driver used to write the vector file. By default attempts to infer driver from path. Must be provided to write to memory.

encodingstr, optional (default: None)

If present, will be used as the encoding for writing string values to the file. Use with caution, only certain drivers support encodings other than UTF-8.

geometry_typestring, optional (default: None)

By default, the geometry type of the layer will be inferred from the data, after applying the promote_to_multi logic. If the data only contains a single geometry type (after applying the logic of promote_to_multi), this type is used for the layer. If the data (still) contains mixed geometry types, the output layer geometry type will be set to “Unknown”.

This parameter does not modify the geometry, but it will try to force the layer type of the output file to this value. Use this parameter with caution because using a non-default layer geometry type may result in errors when writing the file, may be ignored by the driver, or may result in invalid files. Possible values are: “Unknown”, “Point”, “LineString”, “Polygon”, “MultiPoint”, “MultiLineString”, “MultiPolygon” or “GeometryCollection”.

promote_to_multibool, optional (default: None)

If True, will convert singular geometry types in the data to their corresponding multi geometry type for writing. By default, will convert mixed singular and multi geometry types to multi geometry types for drivers that do not support mixed singular and multi geometry types. If False, geometry types will not be promoted, which may result in errors or invalid files when attempting to write mixed singular and multi geometry types to drivers that do not support such combinations.

nan_as_nullbool, default True

For floating point columns (float32 / float64), whether NaN values are written as “null” (missing value). Defaults to True because in pandas NaNs are typically used as missing value. Note that when set to False, behaviour is format specific: some formats don’t support NaNs by default (e.g. GeoJSON will skip this property) or might treat them as null anyway (e.g. GeoPackage).

appendbool, optional (default: False)

If True, the data source specified by path already exists, and the driver supports appending to an existing data source, will cause the data to be appended to the existing records in the data source. Not supported for writing to in-memory files. NOTE: append support is limited to specific drivers and GDAL versions.

use_arrowbool, optional (default: False)

Whether to use Arrow as the transfer mechanism of the data to write from Python to GDAL (requires GDAL >= 3.8 and pyarrow to be installed). When enabled, this provides a further speed-up. Defaults to False, but this default can also be globally overridden by setting the PYOGRIO_USE_ARROW=1 environment variable. Using Arrow does not support writing an object-dtype column with mixed types.

dataset_metadatadict, optional (default: None)

Metadata to be stored at the dataset level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.

layer_metadatadict, optional (default: None)

Metadata to be stored at the layer level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.

metadatadict, optional (default: None)

alias of layer_metadata

dataset_optionsdict, optional

Dataset creation options (format specific) passed to OGR. Specify as a key-value dictionary.

layer_optionsdict, optional

Layer creation options (format specific) passed to OGR. Specify as a key-value dictionary.

**kwargs

Additional driver-specific dataset or layer creation options passed to OGR. pyogrio will attempt to automatically pass those keywords either as dataset or as layer creation option based on the known options for the specific driver. Alternatively, you can use the explicit dataset_options or layer_options keywords to manually do this (for example if an option exists as both dataset and layer option).

Notes

When you have datetime columns with time zone information, it is important to note that GDAL only represents time zones as UTC offsets, whilst pandas uses IANA time zones (via pytz or zoneinfo). As a result, even if a column in a DataFrame contains datetimes in a single time zone, this will often still result in mixed time zone offsets being written for time zones where daylight saving time is used (e.g. +01:00 and +02:00 offsets for time zone Europe/Brussels).

Object dtype columns containing datetime or pandas.Timestamp objects will also be written as datetime fields, preserving time zone information where possible.

Arrow integration

pyogrio.read_arrow(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, mask=None, fids=None, sql=None, sql_dialect=None, return_fids=False, datetime_as_string=False, **kwargs)

Read OGR data source into a pyarrow Table.

See docstring of read for parameters.

Returns:

(dict, pyarrow.Table)

Returns a tuple of meta information about the returned data in a dict, and a pyarrow Table with data.

Meta is: {: “crs”: “<crs>”, “fields”: <ndarray of field names>, “dtypes”: <ndarray of numpy dtypes corresponding to fields>, “ogr_types”: <ndarray of OGR types corresponding to fields>, “ogr_subtypes”: <ndarray of OGR subtypes corresponding to fields>, “encoding”: “<encoding>”, “geometry_type”: “<geometry_type>”, “geometry_name”: “<name of geometry column in arrow table>”, “fid_column”: “<name of FID column in arrow table>”

}

pyogrio.open_arrow(path_or_buffer, /, layer=None, encoding=None, columns=None, read_geometry=True, force_2d=False, skip_features=0, max_features=None, where=None, bbox=None, mask=None, fids=None, sql=None, sql_dialect=None, return_fids=False, batch_size=65536, use_pyarrow=False, datetime_as_string=False, **kwargs)

Open OGR data source as a stream of Arrow record batches.

See docstring of read for parameters.

The returned object is reading from a stream provided by OGR and must not be accessed after the OGR dataset has been closed, i.e. after the context manager has been closed.

By default this functions returns a generic stream object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_stream__ method). This object can then be consumed by your Arrow implementation of choice that supports this protocol. Optionally, you can specify use_pyarrow=True to directly get the stream as a pyarrow.RecordBatchReader.

Returns:

(dict, pyarrow.RecordBatchReader or ArrowStream)

Returns a tuple of meta information about the data source in a dict, and a data stream object (a generic ArrowStream object, or a pyarrow RecordBatchReader if use_pyarrow is set to True).

Meta is: {: “crs”: “<crs>”, “fields”: <ndarray of field names>, “dtypes”: <ndarray of numpy dtypes corresponding to fields>, “ogr_types”: <ndarray of OGR types corresponding to fields>, “ogr_subtypes”: <ndarray of OGR subtypes corresponding to fields>, “encoding”: “<encoding>”, “geometry_type”: “<geometry_type>”, “geometry_name”: “<name of geometry column in arrow table>”, “fid_column”: “<name of FID column in arrow table>”

}

Other Parameters:

batch_sizeint (default: 65_536): Maximum number of features to retrieve in a batch.
use_pyarrowbool (default: False): If True, return a pyarrow RecordBatchReader instead of a generic ArrowStream object. In the default case, this stream object needs to be passed to another library supporting the Arrow PyCapsule Protocol to consume the stream of data.
datetime_as_stringbool, optional (default: False): If True, will return datetime dtypes as detected by GDAL as strings, as Arrow doesn’t support e.g. mixed time zones.

Examples

>>> from pyogrio.raw import open_arrow
>>> import pyarrow as pa
>>> import shapely
>>>
>>> with open_arrow(path) as source:
>>>     meta, stream = source
>>>     # wrap the arrow stream object in a pyarrow RecordBatchReader
>>>     reader = pa.RecordBatchReader.from_stream(stream)
>>>     geom_col = meta["geometry_name"] or "wkb_geometry"
>>>     for batch in reader:
>>>         geometries = shapely.from_wkb(batch[geom_col])

The returned stream object needs to be consumed by a library implementing the Arrow PyCapsule Protocol. In the above example, pyarrow is used through its RecordBatchReader. For this case, you can also specify use_pyarrow=True to directly get this result as a short-cut:

>>> with open_arrow(path, use_pyarrow=True) as source:
>>>     meta, reader = source
>>>     geom_col = meta["geometry_name"] or "wkb_geometry"
>>>     for batch in reader:
>>>         geometries = shapely.from_wkb(batch[geom_col])

pyogrio.write_arrow(arrow_obj, path, layer=None, driver=None, geometry_name=None, geometry_type=None, crs=None, encoding=None, append=False, dataset_metadata=None, layer_metadata=None, metadata=None, dataset_options=None, layer_options=None, **kwargs)

Write an Arrow-compatible data source to an OGR file format.

Parameters:

arrow_obj

The Arrow data to write. This can be any Arrow-compatible tabular data object that implements the Arrow PyCapsule Protocol (i.e. has an __arrow_c_stream__ method), for example a pyarrow Table or RecordBatchReader.

pathstr or io.BytesIO

path to output file on writeable file system or an io.BytesIO object to allow writing to memory NOTE: support for writing to memory is limited to specific drivers.

layerstr, optional (default: None)

layer name to create. If writing to memory and layer name is not provided, it layer name will be set to a UUID4 value.

driverstring, optional (default: None)

The OGR format driver used to write the vector file. By default attempts to infer driver from path. Must be provided to write to memory.

geometry_namestr, optional (default: None)

The name of the column in the input data that will be written as the geometry field. Will be inferred from the input data if the geometry column is annotated as an “geoarrow.wkb” or “ogc.wkb” extension type. Otherwise needs to be specified explicitly.

geometry_typestr

The geometry type of the written layer. Currently, this needs to be specified explicitly when creating a new layer with geometries. Possible values are: “Unknown”, “Point”, “LineString”, “Polygon”, “MultiPoint”, “MultiLineString”, “MultiPolygon” or “GeometryCollection”.

This parameter does not modify the geometry, but it will try to force the layer type of the output file to this value. Use this parameter with caution because using a wrong layer geometry type may result in errors when writing the file, may be ignored by the driver, or may result in invalid files.

crsstr, optional (default: None)

WKT-encoded CRS of the geometries to be written.

encodingstr, optional (default: None)

Only used for the .dbf file of ESRI Shapefiles. If not specified, uses the default locale.

appendbool, optional (default: False)

If True, the data source specified by path already exists, and the driver supports appending to an existing data source, will cause the data to be appended to the existing records in the data source. Not supported for writing to in-memory files. NOTE: append support is limited to specific drivers and GDAL versions.

dataset_metadatadict, optional (default: None)

Metadata to be stored at the dataset level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.

layer_metadatadict, optional (default: None)

Metadata to be stored at the layer level in the output file; limited to drivers that support writing metadata, such as GPKG, and silently ignored otherwise. Keys and values must be strings.

metadatadict, optional (default: None)

alias of layer_metadata

dataset_optionsdict, optional

Dataset creation options (format specific) passed to OGR. Specify as a key-value dictionary.

layer_optionsdict, optional

Layer creation options (format specific) passed to OGR. Specify as a key-value dictionary.

**kwargs

Additional driver-specific dataset or layer creation options passed to OGR. pyogrio will attempt to automatically pass those keywords either as dataset or as layer creation option based on the known options for the specific driver. Alternatively, you can use the explicit dataset_options or layer_options keywords to manually do this (for example if an option exists as both dataset and layer option).