From Raw GPS to GTFS: A Real-World Open Dataset for Bus Travel Time Prediction


Mansurova A. Mussina A. Aubakirov S. Nugumanova A. Yedilkhan D.
August 2025Multidisciplinary Digital Publishing Institute (MDPI)

Data
2025#10Issue 8

The data descriptor introduces an open, high-resolution dataset of real-world bus operations in Astana, Kazakhstan, captured from GPS trajectories between July and September 2024. The data covers three high-frequency routes and have been processed into a GTFS format, enabling direct use with existing transit modeling tools. Unlike typical static GTFS feeds, this dataset provides empirically observed dwell times, run times, and travel times, offering a detailed snapshot of operational variability in urban bus systems. The dataset supports applications in machine learning–based travel time prediction, timetable optimization, and transit reliability analysis, especially in settings where live feeds are unavailable. By releasing this dataset publicly, we aim to promote transparent, data-driven transport research in emerging urban contexts. Dataset: https://doi.org/10.5281/zenodo.15769359. Dataset License: Creative Commons Attribution 4.0 International

bus operations , bus travel time prediction , GPS , GTFS , open data , public transportation , smart city , transit analytics

Text of the article Перейти на текст статьи

Big Data and Blockchain Technologies Research and Innovation Center, Astana IT University, Astana, 020000, Kazakhstan
Department of Computer Science, Al-Farabi Kazakh National University, 71 al-Farabi Avenue, Almaty, 050040, Kazakhstan
Smart City Research and Innovation Center, Astana IT University, Astana, 020000, Kazakhstan

Big Data and Blockchain Technologies Research and Innovation Center
Department of Computer Science
Smart City Research and Innovation Center

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026