Binary Format Specification
The Raptor GTFS Pipeline converts GTFS data into a compact binary format optimized for the RAPTOR routing algorithm. This format consists of three main binary files and a JSON manifest.
File Structure
output_directory/
├── routes.bin # Route data (v2 format)
├── stops.bin # Stop data (v2 format)
├── index.bin # Index data
├── manifest.json # Metadata and checksums
└── (optional) debug files if --debug-json true
routes.bin (v2 Format)
Header Structure
Magic: b"RRT2" (4 bytes)
Schema Version: uint16 (2 bytes) = 2
Route Count: uint32 (4 bytes)
Route Data Structure
For each route:
Route ID: uint32 (4 bytes)
Name Length: uint16 (2 bytes)
Name: UTF-8 bytes (variable)
Stop Count: uint32 (4 bytes)
Trip Count: uint32 (4 bytes)
Stop IDs: stop_count × uint32 (4 bytes each)
Trip IDs: trip_count × uint32 (4 bytes each)
Flat Stop Times: (trip_count × stop_count) × int32 (4 bytes each, delta-encoded, row-major)
Encoding Details
- Trips are pre-sorted by departure time at first stop (ascending order)
- Delta encoding: Per trip row, first value is absolute, subsequent values are deltas
- Row-major order: All stop times for trip 1, then trip 2, etc.
Example Structure
# Conceptual representation
Route1 = {
"id": 1,
"name": "Bakerloo Line",
"stop_ids": [101, 102, 103, 104],
"trip_ids": [1001, 1002],
"stop_times": [
# Trip 1001: [08:00, 08:05, 08:10, 08:15] (absolute times)
# Trip 1002: [08:30, 08:35, 08:40, 08:45] (absolute times)
[08:00, 08:05, 08:10, 08:15, 08:30, 08:35, 08:40, 08:45]
]
}
stops.bin (v2 Format)
Header Structure
Magic: b"RST2" (4 bytes)
Schema Version: uint16 (2 bytes) = 2
Stop Count: uint32 (4 bytes)
Stop Data Structure
For each stop:
Stop ID: uint32 (4 bytes)
Name Length: uint16 (2 bytes)
Name: UTF-8 bytes (variable)
Latitude: float64 (8 bytes)
Longitude: float64 (8 bytes)
Route Reference Count: uint32 (4 bytes)
Route IDs: route_ref_count × uint32 (4 bytes each)
Transfer Count: uint32 (4 bytes)
Transfers: transfer_count × {
Target Stop ID: uint32 (4 bytes)
Walk Time: int32 (4 bytes)
}
Example Structure
# Conceptual representation
Stop1 = {
"id": 101,
"name": "Victoria Station",
"lat": 51.4967,
"lon": -0.1433,
"route_ids": [1, 2, 3], # Routes serving this stop
"transfers": [
{"target_stop_id": 102, "walk_time": 120}, # 2 minutes
{"target_stop_id": 103, "walk_time": 180} # 3 minutes
]
}
index.bin Format
Header Structure
Magic: b"RIDX" (4 bytes)
Schema Version: uint16 (2 bytes)
Index Data Structure
# Stop-to-Routes Index
Pairs Count: uint32 (4 bytes)
For each pair:
Stop ID: uint32 (4 bytes)
Route Count: uint32 (4 bytes)
Route IDs: route_count × uint32 (4 bytes each)
# Route Offsets
Count: uint32 (4 bytes)
For each route:
Route ID: uint32 (4 bytes)
Offset: uint64 (8 bytes)
# Stop Offsets
Count: uint32 (4 bytes)
For each stop:
Stop ID: uint32 (4 bytes)
Offset: uint64 (8 bytes)
manifest.json
The manifest file contains metadata, checksums, and statistics about the conversion process.
Example manifest.json
{
"schema_version": 2,
"tool_version": "0.1.0",
"created_at": "2024-12-06T14:30:00.123456",
"inputs": {
"gtfs_path": "england_gtfs.zip",
"gtfs_stats": {
"stops": 1234,
"routes": 56,
"trips": 7890,
"stop_times": 123456
}
},
"outputs": {
"routes.bin": {
"sha256": "a1b2c3...",
"size": 1234567
},
"stops.bin": {
"sha256": "d4e5f6...",
"size": 890123
},
"index.bin": {
"sha256": "789abc...",
"size": 456789
}
},
"stats": {
"stops": 1234,
"routes": 56,
"trips": 7890,
"stop_times": 123456,
"transfers": 4567
},
"build": {
"python": "3.11.0",
"platform": "Linux-5.15.0-x86_64",
"timestamp": 1701878600.123456
},
"config": {
"compression": true,
"split_by_periods": false,
"gen_transfers": false,
"speed_walk": 1.33,
"transfer_cutoff": 500
}
}
Binary Format Benefits
Size Efficiency
- Compact representation: Binary format is significantly smaller than JSON
- Delta encoding: Reduces storage for stop times
- Efficient indexing: Optimized for RAPTOR algorithm access patterns
Performance
- Fast loading: Binary data loads quickly into memory
- Cache-friendly: Data structures designed for CPU cache efficiency
- Direct access: Indexes allow O(1) lookups for routing operations
Data Integrity
- Checksums: SHA-256 hashes for all binary files
- Versioning: Schema version tracking
- Metadata: Complete conversion history and statistics
Reading Binary Files
While the binary format is designed for programmatic access by routing algorithms, you can inspect the data:
# Convert to JSON for inspection
python -m raptor_pipeline.cli convert \
--input path/to/gtfs \
--output ./inspect_data \
--format both \
--debug-json true
# This will generate both binary and JSON files for comparison
Format Evolution
The current format is version 2. Key improvements over version 1:
- Better compression: Enhanced delta encoding
- Improved indexing: More efficient route and stop lookups
- Enhanced metadata: Detailed statistics and checksums
- Transfer support: Built-in transfer information
Technical Notes
Endianness
All integers use little-endian encoding for compatibility with x86/x64 architectures.
String Encoding
All text fields use UTF-8 encoding.
Floating Point
Coordinates use IEEE 754 double precision (64-bit) floating point format.
File Integrity
The manifest.json file contains SHA-256 checksums for verifying file integrity after transfer or storage.