Data Dictionary



title: "Transitland Datasets: File Formats and Data Dictionary"

Data Files and Formats

Each Dataset package includes data in the following standardized formats:

  • stops.csv - Tabular data ready to view as a spreadsheet, or import into GIS using latitude and longitude columns
  • stops.geojsonl - Geospatial vector data ready to load into GIS or data-science tools
  • routes.csv - Tabular data ready to view as a spreadsheet (no geometries)
  • routes.geojsonl - Geospatial vector data including each route's shape
The GeoJSON Lines (GeoJSONL) format has a single feature on each line, separated by a line break. Unlike standard GeoJSON, which must be fully loaded into memory, GeoJSONL can be parsed in a streaming manner. Many GIS and data-science tools can work with GeoJSONL, since it's just a variant of newline-deliminted JSON. For more information, see Interline's blog posts about GeoJSONL.

Data Dictionary

Each Dataset zip file includes:

  • Full data dictionary in human-readable Markdown format
  • Frictionless DataPackage v2 as machine-readable JSON
  • Complete field descriptions and examples

The latest data dictionary is also printed below for reference.

Route-Oriented Dataset Schema

The routes.csv file includes four main sets of columns and the routes.geojsonl file includes four main sets of matching properties:

  1. Route record and metadata - Basic route information
  2. Agency record and metadata - Agency serving the route
  3. Route headway data - Quantitative summary of route frequency
  4. Feed metadata - Source feed information

Route Record and Metadata

  • route_onestop_id - Transitland's unique identifier for the route. Can be used to construct a link to https://www.transit.land/routes/<route_onestop_id>
  • route_short_name - Rider-facing short name for the route
  • route_long_name - Rider-facing longer, descriptive name
  • route_desc - Rider-facing description
  • route_type - Vehicle type enum (see Route Types below)
  • route_id - GTFS ID used within the source feed. This will not be unique across other feeds.
  • route_color - Hex RGB value for map representation

Agency Information

  • agency_id - GTFS ID from source feed for the agency; may not be unique across different feeds (see )
  • agency_name - Rider-facing agency name

Route Headway Data

Headway data is calculated for each route direction and day-of-week category (Monday-Friday, Saturday, Sunday). Headways represent the median number of seconds between departing trips within specific time windows.

Time windows:

  • 7am-9am (peak morning)
  • 9am-4pm (midday)
  • 4pm-6pm (peak evening)
  • 6pm-7am (overnight)

Columns include:

  • <prefix>_selected_service_date - Transitland's algorithm selected this as a representative service date. The following data is all calculated from this service day's schedules.
  • <prefix>_departure_times - Space-separated list of departure times
  • <prefix>_headway_<time>_mean - Average headway in seconds
  • <prefix>_headway_<time>_median - Median headway in seconds
  • <prefix>_selected_stop_id - GTFS ID from source feed of the stop used to calculate headways
  • <prefix>_selected_stop_intid - Internal stop ID within Transitland data; ensured to be globally unique.
  • <prefix>_selected_stop_name - Rider-facing stop name

Prefixes:

  • hw_best - Most departures
  • hw_dow1_dir0/1 - Monday-Friday, direction 0/1
  • hw_dow6_dir0/1 - Saturday, direction 0/1
  • hw_dow7_dir0/1 - Sunday, direction 0/1

Route Types

Defined by the GTFS specification and extended by Transitland. For the complete list of route types including extended codes (100-1700+), see .

Standard GTFS route types:

  • 0: Tram, Streetcar, Light rail
  • 1: Subway, Metro
  • 2: Rail
  • 3: Bus
  • 4: Ferry
  • 5: Cable tram
  • 6: Aerial lift
  • 7: Funicular
  • 11: Trolleybus
  • 12: Monorail

Stop-Oriented Dataset Schema

The stops.csv file includes five main sets of columns and the stops.geojsonl file includes five sets of properties on each feature:

  1. Stop record and metadata - Basic stop information
  2. Administrative boundaries - Geographic and political boundaries
  3. Route records and metadata - Up to 5 routes serving the stop
  4. Transit service summary - Frequency of arrivals/departures
  5. Feed metadata - Source feed information

Stop Record and Metadata

  • stop_onestop_id - Transitland's unique identifier for the stop location (see )
  • stop_id - GTFS ID from source feed for the stop (may not be unique across different feeds)
  • stop_name - Name provided by transit operator
  • stop_desc - Optional description
  • stop_lon - Longitude coordinate
  • stop_lat - Latitude coordinate

Administrative Boundaries

  • adm0_name - Country name (e.g., "United States of America")
  • adm0_iso - Country ISO code following ISO 3166-1 standard (e.g., "US")
  • adm1_name - State/province name (e.g., "Oregon")
  • adm1_iso - State/province ISO code following ISO 3166-2 standard (e.g., "US-OR")

Route Records and Metadata

Each stop can be served by up to 5 different routes. For each route (1-5), the following information is provided:

  • agency_id_N - ID of the transit operator
  • agency_name_N - Name of the transit operator
  • route_id_N - ID of the route
  • route_short_name_N - Short name of the route
  • route_long_name_N - Long name of the route
  • route_type_N - Vehicle type of the route. See route types
  • route_color_N - Hex RGB value for map representation

Transit Service Summary

Service frequency is calculated based on scheduled departures for a designated "target week" near the dataset delivery date. Includes:

  • departure_count_dowN - Total trips departing on day N (both directions)
  • departure_count_dowN_dir0 - Trips departing on day N (inbound)
  • departure_count_dowN_dir1 - Trips departing on day N (outbound)

Where N ranges from 1 (Monday) to 7 (Sunday).

Feed metadata

Each stop or route record includes the following columns/properties providing information on the source lineage.

  • feed_id - Transitland's Onestop ID for source feed (see ). Can be used to construct links to https://www.transit.land/feeds/<feed_id>
  • feed_version_sha1 - Feed version identifier (see ). Can be used to construct links to https://www.transit.land/feed-versions/<feed_version_sha1>
  • feed_version_fetched_at - Timestamp for when Transitland fetched the feed version from which this stop or route came. Can be used to evaluate data freshness.