The ohsome-planet tool can be used to transform OSM (history) PBF files and OSM replication OSC files into Parquet format with native GEO support. Second, you can use it to turn an OSM changeset file (osm.bz2) into a PostgreSQL database table and keep it up-to-date with the OSM planet replication changeset files.
It creates the actual OSM elements geometries for nodes, ways and relations. The tool can join information from OSM changesets such as hashtags, OSM editor or usernames. You can join country codes to every OSM element by passing a boundary dataset as additional input.
You can use the ohsome-planet data to perform a wide range of geospatial analyses, e.g. using DuckDB, GeoPandas or QGIS. Display the data directly on a map and start playing around!
Installation requires Java 21.
First, clone the repository and its submodules. Then, build it with Maven.
git clone --recurse-submodules https://github.com/GIScience/ohsome-planet.git
cd ohsome-planet
./mvnw clean package -DskipTestsThere are three main modes to run ohsome-planet.
- Contributions: OSM .pbf --> Parquet
- Changesets: OSM Changesets .bz2 --> PostgreSQL
- Replication: OSM Diffs .osc --> Parquet / PostgreSQL
Transform OSM (history/latest) .pbf file into parquet format.
You can download the full latest or history planet or download PBF files for smaller regions from Geofabrik.
To process a given .pbf file, provide it in the --pbf parameter in the following example.
Here we use a history file for Berlin obtained from GeoFabrik.
java -jar ohsome-planet-cli/target/ohsome-planet.jar contributions \
--data /data/ohsome-planet/berlin \
--pbf /data/osm/berlin-internal.osh.pbf \
--changeset-db "jdbc:postgresql://localhost:5432/postgres?user=your_user&password=your_password" \
--country-file /data/world.csv \
--parallel 8 \
--overwrite The parameters --parallel, --country-file, --changeset-db and --overwrite are optional. Find more detailed information on usage here: docs/CLI.md. To see all available parameters, call the tool with --help parameter.
When using a history PBF file, the output files are split into history and latest contributions.
All contributions which are a) not deleted and b) visible in OSM at the timestamp of the extract are considered as latest.
The remaining contributions, e.g. deleted or old versions, are considered as history.
The number of threads (--parallel parameter) defines the number of files which will be created.
/data/ohsome-planet/berlin
└── contributions
├── history
│ ├── node-0-history.parquet
│ ├── ...
│ ├── way-0-history.parquet
│ ├── ...
│ ├── relation-0-history.parquet
│ └── ...
└── latest
├── node-0-latest.parquet
├── ...
├── way-0-latest.parquet
├── ...
├── relation-0-latest.parquet
└── ...
Import OSM changesets .bz2 file to PostgreSQL.
First, create an empty PostgreSQL database with PostGIS extension or provide a connection to an existing database. For instance, you can set it up like this.
export OHSOME_PLANET_DB_USER=your_password
export OHSOME_PLANET_DB_PASSWORD=your_user
docker run -d \
--name ohsome_planet_changeset_db \
-e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
-e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
-p 5432:5432 \
postgis/postgisSecond, download the full changeset file from the OSM planet server. If you want to clip the extent to a smaller region, you can use the changeset-filter command of the osmium library. This might take a few minutes. Currently, there is no provider for pre-processed or regional changeset file extracts.
osmium changeset-filter \
--bbox=8.319,48.962,8.475,49.037 \
changesets-latest.osm.bz2 \
changesets-latest-karlsruhe.osm.bz2 Then, process the OSM changesets .bz2 file like in the following example.
java -jar ohsome-planet-cli/target/ohsome-planet.jar changesets \
--bz2 data/changesets-latest-karlsruhe.osm.bz2 \
--changeset-db "jdbc:postgresql://localhost:5432/postgres?user=your_user&password=your_password" \
--create-tables \
--overwriteThe parameters --create-tables and --overwrite are optional. Find more detailed information on usage here: docs/CLI.md. To see all available parameters, call the tool with --help parameter.
Transform OSM replication .osc files into parquet format. Keep changeset PostgreSQL database up-to-date.
The ohsome-planet tool can also be used to generate updates from the replication files provided by the OSM Planet server. GeoFabrik also provides updates for regional extracts.
If you want to update both datasets your command should look like this:
java -jar ohsome-planet-cli/target/ohsome-planet.jar replications \
--data path/to/data \
--changeset-db "jdbc:postgresql://localhost:5432/postgres?user=your_user&password=your_password" \
--parallel 8 \
--country-file data/world.csv \
--parquet-data path/to/parquet/output/ \
--continueJust like for the contributions command you can use the optional parameters --parallel, --country-file, --parquet-data arguments here as well.
The optional --continue flag can be used to make the update process run as a continuous service, which will wait and fetch new changes from the OSM planet server.
If you want to only update changesets you can use the --just-changesets flag. You can do the same for contributions with --just-contributions.
Find more detailed information on usage here: docs/CLI.md. To see all available parameters, call the tool with --help parameter.
Contributions will be written as Parquet files matching those found in the replication source.
This mimics the structure of the OSM Planet Server.
You can use the top level state files (state.txt or state.csv) to find the most recent sequence number.
/data/ohsome-planet/berlin
└── updates
├── 006
│ ├── 942
│ │ ├── 650.opc.parquet
│ │ ├── 650.state.txt
│ │ ├── ...
│ │ ├── 001.opc.parquet
│ │ └── 001.state.txt
│ ├── 941
│ ├── ...
│ └── 001
├── state.csv
└── state.txt
You can inspect your results easily using DuckDB. Take a look at our collection of useful queries to find many analysis examples.
-- list all columns
DESCRIBE FROM read_parquet('contributions/*/*.parquet');
-- result
┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ status │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ valid_from │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ valid_to │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ osm_type │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ osm_id │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ osm_version │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ osm_minor_version │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ osm_edits │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ osm_last_edit │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ user │ STRUCT(id INTEGER, "name" VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ tags │ MAP(VARCHAR, VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ tags_before │ MAP(VARCHAR, VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ changeset │ STRUCT(id BIGINT, created_at TIMESTAMP WITH TIME ZONE, closed_at TIMESTAMP WITH TIME ZONE, tags MAP(VARCHAR, VARCHAR), hashtags VARCHAR[], editor VARCHAR, numChanges INTEGER) │ YES │ NULL │ NULL │ NULL │
│ bbox │ STRUCT(xmin DOUBLE, ymin DOUBLE, xmax DOUBLE, ymax DOUBLE) │ YES │ NULL │ NULL │ NULL │
│ centroid │ STRUCT(x DOUBLE, y DOUBLE) │ YES │ NULL │ NULL │ NULL │
│ xzcode │ STRUCT("level" INTEGER, code BIGINT) │ YES │ NULL │ NULL │ NULL │
│ geometry_type │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ geometry │ GEOMETRY │ YES │ NULL │ NULL │ NULL │
│ area │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ area_delta │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ length │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ length_delta │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ contrib_type │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ refs_count │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ refs │ BIGINT[] │ YES │ NULL │ NULL │ NULL │
│ members_count │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ members │ STRUCT("type" VARCHAR, id BIGINT, "timestamp" TIMESTAMP WITH TIME ZONE, "role" VARCHAR, geometry_type VARCHAR, geometry BLOB)[] │ YES │ NULL │ NULL │ NULL │
│ countries │ VARCHAR[] │ YES │ NULL │ NULL │ NULL │
│ build_time │ BIGINT │ YES │ NULL │ NULL │ NULL │
├───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┤
│ 29 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘This is a list of resources that you might want to take a look at to get a better understanding of the core concepts used for this project. In general, you should gain some understanding of the raw OSM (history) data format and know how to build geometries from nodes, ways and relations. Furthermore, knowledge about (Geo)Parquet files is useful as well.
What is the OSM PBF File Format?
- https://wiki.openstreetmap.org/wiki/PBF_Format
- you can download history PBF files for smaller regions from Geofabrik
- full planet downloads: https://planet.openstreetmap.org/planet/full-history/
What is parquet?
- https://parquet.apache.org/docs/file-format/
- https://github.com/apache/parquet-java
- https://github.com/apache/parquet-format
What is RocksDB?
- RocksDB is a storage engine with key/value interface, where keys and values are arbitrary byte streams. It is a C++ library. It was developed at Facebook based on LevelDB and provides backwards-compatible support for LevelDB APIs.
- https://github.com/facebook/rocksdb/wiki
How to build OSM geometries (for multipolygons)?
- https://wiki.openstreetmap.org/wiki/Relation:multipolygon#Examples_in_XML
- https://osmcode.org/osm-testdata/
- https://github.com/GIScience/oshdb/blob/a196cc990a75fa35841ca0908f323c3c9fc06b9a/oshdb-util/src/main/java/org/heigit/ohsome/oshdb/util/geometry/OSHDBGeometryBuilderInternal.java#L469
- For relations that consist of more than 500 members we skip
MultiPolygongeometry building and fall back toGeometryCollection. CheckMEMBERS_THRESHOLDinohsome-contributions/src/main/java/org/heigit/ohsome/contributions/contrib/ContributionGeometry.java. - For contributions with status
deletedwe use the geometry of the previous version. This allows you to spatially filter also for deleted elements, e.g. by bounding box. In the sense of OSM deleted elements do not have any geometry.