pbfcut Tutorial: Step-by-Step Setup and Usage Tipspbfcut is a command-line utility (or library — depending on your environment) designed to efficiently extract, filter, and split Protocolbuffer Binary Format (PBF) files—commonly used for OpenStreetMap (OSM) data—into smaller, more manageable sections. This tutorial walks through installation, typical workflows, advanced options, performance tips, and troubleshooting so you can integrate pbfcut into data pipelines or use it for offline map processing.
What is a PBF file and why use pbfcut?
A PBF file stores OSM data (nodes, ways, relations) in a compressed binary format. PBF is preferred over plain XML for large datasets because it’s smaller and faster to parse. However, individual PBF files, such as planet.osm.pbf or country extracts, can still be gigabytes in size. pbfcut helps by:
- extracting specific geographic regions or object types,
- splitting large PBFs into smaller tile-based or size-limited chunks,
- applying simple filters to reduce downstream processing time.
Use pbfcut when you need to reduce dataset size, extract subregions, or prepare inputs for tools like osm2pgsql, imposm, or custom OSM processors.
Installing pbfcut
Installation steps vary by platform and by whether pbfcut is provided as a standalone binary, part of a toolkit, or a Python/Node library. Below are common installation patterns.
- Prebuilt binary (Linux/macOS)
- Download the release for your OS from the project’s GitHub releases.
- Make it executable and move it to your PATH:
chmod +x pbfcut sudo mv pbfcut /usr/local/bin/
- Homebrew (macOS / Linuxbrew)
- If available via a tap:
brew install <tap>/pbfcut
- If available via a tap:
- From source (C++/Go/Rust projects)
- Clone the repo, then build (example for Go):
git clone https://github.com/example/pbfcut.git cd pbfcut go build ./cmd/pbfcut
- Clone the repo, then build (example for Go):
- Python package (if pbfcut exposes CLI via Python)
- Via pip:
pip install pbfcut
- Then verify:
pbfcut --version
- Via pip:
If the project you’re using differs, consult its README for exact steps.
Basic usage patterns
Below are common command-line patterns. Replace filenames and coordinates with your own.
-
Extract a bounding box
pbfcut extract --bbox minLon,minLat,maxLon,maxLat input.osm.pbf -o output.osm.pbf
This keeps nodes/ways/relations intersecting the bbox.
-
Split by tile grid or size
pbfcut split --tile 0.25 input.osm.pbf -o outdir/
pbfcut split --size 500M input.osm.pbf -o outdir/
-
Filter by object type or tag
pbfcut filter --type node,way --tag amenity=school input.osm.pbf -o schools.osm.pbf
-
Convert to other formats (if supported)
pbfcut convert --format osmxml input.osm.pbf -o output.osm.xml
Always run with --help
to see tool-specific flags:
pbfcut --help
Example: Extracting a city from a country PBF
- Identify city bounding box (use OpenStreetMap, Geofabrik extracts, or a gazetteer).
- Run extract:
pbfcut extract --bbox -0.489,51.28,0.236,51.686 planet-latest.osm.pbf -o london.osm.pbf
- Optionally filter to relevant features (roads, buildings, landuse):
pbfcut filter --tag highway,* --tag building,* london.osm.pbf -o london_roads_buildings.osm.pbf
Advanced options and tips
- Keep node/way/relation consistency: When filtering ways or relations, ensure referenced nodes are kept. Use flags like
--keep-referenced
or--complete-ways
depending on the implementation. - Use streaming to avoid high memory use: Prefer streamed processing for very large PBFs.
- Parallel processing: If supported, enable multiple worker threads to speed parsing and writing (e.g.,
--threads 4
). - Preserve metadata: If you need timestamps, changeset IDs, or user info, enable
--preserve-metadata
. - Limit by tags using boolean logic: Some versions support complex tag expressions (e.g.,
'(amenity=school or amenity=university) and building=*'
).
Performance considerations
- I/O fast path: Place input and output on SSDs and use large buffer sizes.
- Memory: Monitor RAM; increase virtual memory or use streaming if you run out.
- CPU: Use multiple cores if the tool supports threading.
- Temporary files: Clean or use tmpfs for intermediate files to improve speed.
Integrating pbfcut into pipelines
- With osm2pgsql: Extract the region first, then import the smaller PBF to reduce import time.
- With vector tile generation: Split input by tile and run tile generator per tile in parallel.
- With CI/CD: Use pbfcut in build steps to produce lightweight test fixtures.
Example bash snippet to split and process tiles in parallel:
pbfcut split --tile 0.5 input.osm.pbf -o tiles/ ls tiles/*.pbf | xargs -n1 -P8 -I{} sh -c 'tile_processor {}'
Troubleshooting
- Missing ways/nodes after filtering: enable options that keep referenced nodes/ways or run a repair step to rebuild topology.
- Corrupt output: verify input integrity (use osmium or osmosis to check) and ensure pbfcut is up-to-date.
- Slow performance: check disk I/O, ensure no swap thrashing, enable threading, or split input before processing.
- Permission issues: ensure executable permissions and write access to output directories.
Alternatives and complementary tools
- osmium-tool: robust toolkit for PBF manipulation with many features.
- osmosis: long-standing Java-based tool for OSM data processing.
- osmconvert/osmfilter: lightweight filtering and conversion utilities.
- imposm/osm2pgsql: importers for databases and tile renderers.
Tool | Strengths | When to use |
---|---|---|
pbfcut | Simple extraction and splitting, easy CLI | Quick region splits or tag-based slices |
osmium-tool | Powerful and fast, many operations | Complex transformations and repairs |
osmosis | Flexible pipelines, many plugins | Legacy workflows and Java environments |
osmconvert | Very fast for simple conversions | Quick format changes and basic filters |
Example workflows
- Create a small development dataset:
- Extract bbox -> filter tags -> split into small files for CI.
- Prepare tiles for vector tile renderer:
- Split by tile -> process each tile in parallel -> feed into tile pipeline.
- Reduce planet file to country:
- Extract country polygon -> filter for features of interest -> import to PostGIS.
Final notes
- Read the pbfcut project’s README for exact flags and behavior; implementations vary.
- Test on a small sample before running on large PBFs.
- Keep backups of original files; extraction and filtering are destructive operations.
If you want, tell me your OS and whether you have a prebuilt pbfcut binary or need to build from source — I’ll give exact commands tailored to your setup.
Leave a Reply