Introduction

This post will go through how to process a GTFS file in order to view the results as a standard time table. We want a result that is similar to standard time tables like those found here MVV Time Tables. By processing GTFS results in this manor produces a table that is both easy to read and easy to check for any issues with the GTFS feed. This can be very helpful when trying to debug or check GTFS for accuracy and let non-technical users easily access the time tables within a GTFS file.

Step 1: Read in GTFS feed

Any GTFS feed can be used, but in this example I will be using the Germany wide GTFS feed produced by DELFI (GTFS Germany) and extracting the data that intersects Berlin. For the Berlin boarder shape I downloaded the NUTS dataset. Warning for a large GTFS file it can take up to a few minutes to read in the GTFS feed.

library(sf)
library(dplyr)
library(leaflet)
library(data.table)

# read in gtfs feed
gtfs <- tidytransit::read_gtfs(path = "../../Datasets/20230904_fahrplaene_gesamtdeutschland_gtfs.zip")

# read in NUTS polylines and filter to berlin
NUTS <- st_read(dsn = "../../Datasets/NUTS_RG_01M_2021_4326.shp", 
                layer = "NUTS_RG_01M_2021_4326") %>% 
  filter(LEVL_CODE == 3) %>% 
  filter(CNTR_CODE == "DE") %>% 
  filter(NUTS_NAME == "Berlin") %>%
  st_transform(crs = 3035) %>% # convert to projected meter system covering germany
  st_buffer(dist = 5000) %>% # extract extra area round berlin by 5 km
  st_transform(crs = "WGS84") # convert back to WGS84
# take a look at the area we want to extract
leaflet() %>%
  addTiles() %>%
  addPolygons(data = NUTS)

Step 2: Filter for Exact Date and Berlin Area

When working with GTFS, I found it generally more useful to filter for an exact date rather then just a generic day during the week. The main reason for this is that a route can have minimal or drastic changes depending on planned construction projects. By selecting an exact date we can evaluate the route to see if it is going on the attended “normal” path or if it is being deviated for some reason. This can be especially important if the GTFS feed is being using with routing software like that in the R5R package. The R5R package requires a single day to run the routing analysis, so we want to ensure that that day chose aligns with what we expect travelers to use or we want them to use. This can lead to some odd coding decisions in this case, in which I will switch between two different GTFS processing packages (gtfsools) and (tidytransit). In the first step I used tidytransit to read in the dataset so that I could use the filter by exact day function. This causes the initial load time to be longer as tidytransit generally takes longer to read in GTFS files as gtfstools. However, after we finish the filter by day, we want to convert to the gtftools data type for the rest of the analysis. Luckily, both packages come with converters making it easy to switch back and forth between the two. Generally I would also advise to work with smaller feeds due to the time cost of filtering larger feeds, but when not possible its best to filter out as much data as possible before performing the spatial filter as I do here. In this example the size of the dataset decreases drastically from each filter with a starting size of 2.9 gb and ending at about 104 mb.

# filter by date
gtfs <- tidytransit::filter_feed_by_date(gtfs_obj = gtfs, 
                                         extract_date = "2023-10-17")

# convert back to gtfstools data type
gtfs <- gtfstools::as_dt_gtfs(gtfs)

# filter area to just Berlin
gtfs <- gtfstools::filter_by_sf(gtfs = gtfs, 
                                geom = st_geometry(NUTS)) # here I used st_geometry to reduce the sf datset to a sfc, or just the geometry

Step 3: Extract Route for Processing

Now that we have a GTFS feed filtered for an exact date lets select a single route to use. In this example I will use Bus line 222 that runs between Tegelort and Alt-Lübars. In order to get the correct data to create the time table we will have to perform some filters to extract the datasets we need. I will do this step by step.

# find the route id for bus line 222
route_id <- gtfs$routes[route_short_name %like% 222]$route_id

# use gtfstools filter by route id function
route_222 <- gtfstools::filter_by_route_id(gtfs = gtfs, route_id = route_id)

# Extract trip ids per direction
trip_ids_0 <- route_222$trips[direction_id == 0]$trip_id
trip_ids_1 <- route_222$trips[direction_id == 1]$trip_id

# extract all stop ids associated with the route in each direction
stop_ids_0 <- route_222$stop_times[trip_id %in% trip_ids_0]$stop_id
stop_ids_1 <- route_222$stop_times[trip_id %in% trip_ids_1]$stop_id

# find the stop times associated with each trip in each direction
stop_times_0 <- route_222$stop_times[trip_id %in% trip_ids_0]
stop_times_1 <- route_222$stop_times[trip_id %in% trip_ids_1]

# parent stations do not extis in the GTFS feed so we will create ourselves
# GTFS feed uses a standard german naming system for stops making it easy to obtain the parent station values
stop_parent_ids_0 <- 
  data.table("stop_id" = stop_ids_0,
             "parent_id" = lapply(X = stop_ids_0, 
                                  FUN = function(x) {
                                    stringr::str_split_1(string = x, 
                                                         pattern = "::")}) %>% 
                                  do.call(rbind, .) %>% 
                                  as.data.frame() %>% 
                                  select(1)) %>%
  distinct()

stop_parent_ids_1 <- 
  data.table("stop_id" = stop_ids_1,
             "parent_id" = lapply(X = stop_ids_1, 
                                  FUN = function(x) {
                                    stringr::str_split_1(string = x, 
                                                         pattern = "::")}) %>% 
                                  do.call(rbind, .) %>% 
                                  as.data.frame() %>% 
                                  select(1)) %>%
  distinct()

Step 4: Produce the Final Time Table

Now that we have the time table data components extracted, lets put them together to produce our final time table.

Step 1: Order Stops by Earliest Arrival Time

This section of code finds the earliest arrival time for every stop and then orders it from the earliest time. We then add a new column with the order of the earliest to latest stop arrival. The major assumption is that all trips will use the same stop locations and start at the same stop location.

stop_order_0 <- stop_times_0[,.("arrival_time" = min(arrival_time)), stop_id]
# merge in parent ids and reorder only using parent ids
stop_order_0 <- merge(stop_order_0, stop_parent_ids_0, by = "stop_id")
stop_order_0 <- stop_order_0 %>% 
  group_by(parent_id.V1) %>% 
  filter(arrival_time == min(arrival_time)) %>% 
  arrange(arrival_time) %>%
  as.data.table()
stop_order_0$stop_order <- 1:nrow(stop_order_0)

# table for direction 1
stop_order_1 <- stop_times_1[,.("arrival_time" = min(arrival_time)), stop_id][order(arrival_time)]
# merge in parent ids and reorder only using parent ids
stop_order_1 <- merge(stop_order_1, stop_parent_ids_1, by = "stop_id")
stop_order_1 <- stop_order_1 %>% 
  group_by(parent_id.V1) %>% 
  filter(arrival_time == min(arrival_time)) %>% 
  arrange(arrival_time) %>%
  as.data.table()
stop_order_1$stop_order <- 1:nrow(stop_order_1)

Step 2: Order Trips by Earliest Arrival Time

This section of code is very similar to the previous except this time we are ordering the trips by earliest arrival time. The major assumption is that all trips use the same stop locations for each trip, therefore ordering my earliest arrival time should put them in the correct order.

trip_order_0 <- stop_times_0[,.("arrival_time" = min(arrival_time)), trip_id][order(arrival_time)]
trip_order_0$trip_order <- 1:nrow(trip_order_0)
trip_order_1 <- stop_times_1[,.("arrival_time" = min(arrival_time)), trip_id][order(arrival_time)]
trip_order_1$trip_order <- 1:nrow(trip_order_1)

Step 3: Merge in Stop Times

This step merges back the stop times with the stop and trip order columns. We will also merge in the stop table that contains the name of each stop to use in the final table.

stop_times_0 <- merge(stop_times_0, 
                      y = stop_order_0[,.(stop_id, stop_order)], 
                      by = "stop_id") %>%
  merge(y = trip_order_0[,.(trip_id, trip_order)], by = "trip_id") %>%
  merge(y = route_222$stops[,.(stop_id, stop_name)], by = "stop_id")

stop_times_1 <- merge(stop_times_1, 
                      y = stop_order_1[,.(stop_id, stop_order)], 
                      by = "stop_id") %>%
  merge(y = trip_order_1[,.(trip_id, trip_order)], by = "trip_id") %>%
  merge(y = route_222$stops[,.(stop_id, stop_name)], by = "stop_id")

Step 4: Convert the Stop Times into a Wide Format

# create table for direction 0 by casting to wide format
stop_times_0_wide <- data.table::dcast.data.table(data = stop_times_0, formula = stop_order+stop_name+stop_id ~ trip_order, value.var = "arrival_time")
# clean colnames
colnames(stop_times_0_wide) <- c("Stop Order", "Stop Name","Stop_ID", 
                            paste0("Trip_", 1:(length(stop_times_0_wide)-3)))

# create table for direction 0 by casting to wide format
stop_times_1_wide <- data.table::dcast.data.table(data = stop_times_1, formula = stop_order+stop_name+stop_id ~ trip_order, value.var = "arrival_time")
# clean colnames
colnames(stop_times_1_wide) <- c("Stop Order", "Stop Name","Stop_ID", 
                            paste0("Trip_", 1:(length(stop_times_1_wide)-3)))

Step 5: Display Time Table

Here we will produce the first results of the table. This will not be our final clean version, but we can use it to see if our assumptions about ordering by earliest stop and trip worked.

library(reactable)
library(reactablefmtr)

# view the results in a nice table
reactable(data = stop_times_0_wide, pagination = F, height = 800, 
          theme = reactableTheme(color = "black")) %>%
  add_title("Bus Line 222 Time Table") %>%
  add_subtitle("Direction 0")

Bus Line 222 Time Table

Direction 0

# view the results in a nice table
reactable(data = stop_times_1_wide, pagination = F, height = 800, 
          theme = reactableTheme(color = "black")) %>%
  add_title("Bus Line 222 Time Table") %>%
  add_subtitle("Direction 1")

Bus Line 222 Time Table

Direction 1

Looking at the schedules they do appear to be in the correct order or at least in most cases. However, it is difficult to catch any errors unless we go line by line. So instead we will use some automated tests to make sure that all times are displayed correctly.

Step 5: Check Results

Now that we have created the time table lets perform a series of automated checks to see if the results we have produced are correct. We will perform two basic checks that will ensure that the table is in the correct order. The first will check each column and makes sure that the value in the next row is always greater than the previous value, as we should always be increasing in time when going to the next stop. The second check will look at each row and perform the same test but looking at each trip (column) should start at a later time as the previous trip. From here on out I will only focus on the time table in direction 0, as the process would be the same for both directions.

Check 1: Stop Order

This check will produce a vector of time values that will exclude all NA values and then perform a comparison that will check of the previous time value is greater than the next time value. If all time values are in the correct order we should receive a TRUE value.

# create a function to check all values in a column
column_check_fun <- function(time_values) {
  
  # remove any nas from vector (effectivly removes stops not being stopped at)
  time_values <- time_values[!is.na(time_values)]
  
  # lag vector
  lag_vect <- time_values[1:(length(time_values)-1)]
  # lead vector
  lead_vector <- time_values[2:length(time_values)]
  # check if the lead value is greater than the lag value
  test <- lead_vector > lag_vect
  return(all(test))
}

# run checks on columns
col_checks <- apply(X = stop_times_0_wide[,c(-1,-2,-3)], MARGIN = 2, FUN = column_check_fun)

# see if all values are true or not
all(col_checks)
## [1] FALSE

As we can see from the printout we received a FALSE value, so we do have time values out of order indicating that our stop order was not correct. Lets extract the stops that we are having issues with and do some exploration to see if we can come up with a solution to fix the issue.

# extract the trips with an error and perform a more indepth analysis 
trip_errors <- which(col_checks == F)

# go through each trip and return the lead stop(s) that is causing the issue
trip_errors <- lapply(trip_errors, FUN = function(x) {
  
  # subset for trip, order by stop order, and find diff in stop sequence
  temp <- stop_times_0[trip_order == x][order(stop_order), diff(x = stop_sequence)]
  
  # find vlue that does not equal 1 (diff in the sequence should always be 1)
  hold <- (which(temp != 1):length(temp)) + 1 # add 1 to get the actual stop with the issue, we want all values after this as well
  
  # return the stop information with the issue
  stop_times_0[trip_order == x][order(stop_order)][hold]
  
}) %>%
  do.call(rbind, .)

# find which stops are causing the issue
stops_to_change <- trip_errors[,unique(stop_id)]
stops_to_change
## [1] "de:11000:900090191::1" "de:11000:900090104::2"

We found two stops that appear to have an issue. Let us take a look at one of the trips and see if we can find the exact issue in one case.

# extract the trip order numbers from the trip_errors table
trip_order_numb <- unique(trip_errors$trip_order)
trip_order_numb
## [1] 69 71
# lets view the schedule of the trips with the errors and see if we can spot the issue
# note becuase the way the table is set up I can simply add 3 to the trip order number to get the correct column number
reactable(data = stop_times_0_wide[,c(1:3, trip_order_numb+3), with = F], 
          pagination = F, 
          height = 800, 
          theme = reactableTheme(color = "black")) %>%
  add_title("Bus Line 222 Time Table") %>%
  add_subtitle("Direction 0")

Bus Line 222 Time Table

Direction 0

If we scroll to the bottom of the table we will find the two stops. They both have time values that should be much earlier in the table. What appears to have occurred is that these two trips may be a bit unique and those two stops only appear a few times, so ended up much lower in the stop order than expected. Our initial assumption was that the stop order for all trips should be about the same, but in this case it does not appear to be. For a solution, the simplest method would be to move the stop order up until it is correct. I have created while loop that will move the stop order up by one location, check if the time order is correct and then either break out of the loop and move onto the next stop, or continue moving the stop order until it is correct.

# assume that placing these stops at some higger stop order should produce the correct results, will loop through each attempt until all times are correct or fail after going through the complete stop order
lapply(stops_to_change, FUN = function(stops_change) {
  
  # create new stop order by starting by placing the change stop in the first postion and working down until the times are in the correct order
  i <- 1
  while (i < nrow(stop_times_0_wide)) {
    #print(i)
    # find current stop order
    cur_stop_order <- stop_times_0_wide[Stop_ID == stops_change]$`Stop Order`
    # make desired stop order one value higher
     stop_times_0_wide[`Stop Order` == cur_stop_order - 1]$`Stop Order` <- cur_stop_order
    # change the stop order postion
    stop_times_0_wide[Stop_ID == stops_change]$`Stop Order` <- cur_stop_order - 1
    # save current version of stop times 0 wide to global env
    stop_times_0_wide <<- stop_times_0_wide[order(`Stop Order`)]
    # rerun col checks and see if all values are true
    col_checks <- apply(X = stop_times_0_wide[,c(-1,-2,-3)], MARGIN = 2, 
                        FUN = column_check_fun)
    # increase postion
    i <- i + 1
    # break if all values are true, if not repeat with the next position
    if (all(col_checks)) {
      break
    }
    
  }
})
## [[1]]
## NULL
## 
## [[2]]
## NULL
# remake stop order starting from 1
stop_times_0_wide$`Stop Order` <- 1:nrow(stop_times_0_wide)

Recheck and make sure that fixed the issue or not.

# run checks on columns
col_checks <- apply(X = stop_times_0_wide[,c(-1,-2,-3)], MARGIN = 2, FUN = column_check_fun)

# see if all values are true or not
all(col_checks)
## [1] TRUE

It appears that our method worked and that all stops are in the correct order. Now lets run a similar check on the rows.

Check 2: Trip Order

Like the previous check we will produce a single vector of time values by row in order of the trips. We will check that all previous times are less than the following time. This check function is slightly different in that I have opted to return the full test data.frame instead of running the all check and returning just a TRUE or FALSE value. The reason for this is to save time in extracting potential trips with errors, I could also adjust the column check to do the same, but thought it is also good to show different solutions.

# create a function to check all values by row
row_check_fun <- function(time_values) {
  
  if (all(is.na(time_values))) {
    return(NA)
  }
  
  # remove any nas from vector (effectivly removes stops not being stopped at)
  time_values <- time_values[!is.na(time_values)]
  
  # check if time value only has 1 value then return NA
  if (length(time_values) == 1) {
    return(NA)
  }
  
  # lag vector
  lag_vect <- time_values[1:(length(time_values)-1)]
  # lead vector
  lead_vector <- time_values[2:length(time_values)]
  # check if the lead value is greater than the lag value
  test <- lead_vector > lag_vect
  return(test)
}

# run checks on columns
row_checks <- apply(X = stop_times_0_wide[,c(-1,-2,-3)], MARGIN = 1, FUN = row_check_fun, simplify = F)

# see if all values are true or not
all(row_checks %>% unlist(), na.rm = T)
## [1] FALSE

Once again we return FALSE indicating that there are issues with the trip order. Lets extract which trips are the issue when checking by row.

# find trip numbers that are the issue
row_checks <- lapply(row_checks, function(x) {
  
  # find all trips that failed
  temp <- which(x != T)
  
  # get the names of the trips
  names(temp)
  
  }) %>%
  unlist() %>%
  unique() %>%
  gtools::mixedsort()

row_checks
##  [1] "Trip_6"  "Trip_27" "Trip_29" "Trip_31" "Trip_33" "Trip_35" "Trip_37"
##  [8] "Trip_39" "Trip_41" "Trip_43" "Trip_45" "Trip_47" "Trip_71"

We have 13 trips in which the time values are earlier than expected. Once again lets take a look at an example to see the issue ourselves.

# column numbers 8 and 9 are trip_5 and trip_6
reactable(data = stop_times_0_wide[,c(1:3,8:9), with = F], 
          pagination = F, 
          height = 800, 
          theme = reactableTheme(color = "black")) %>%
  add_title("Bus Line 222 Time Table") %>%
  add_subtitle("Direction 0")

Bus Line 222 Time Table

Direction 0

Scrolling down to stop 34, we can see the issue. We go from 6:30 to 6:01. Once again the solution appears to move the trip down, check if that fixed the issue, repeat if not or break if it does fix it. We will do this for all trips and this should hopefully correct the issues. Our initial assumption that the trip order could be sorted by the earliest trip fails here. As trip 5 technically starts earlier, but trip 6 is unique and doesn’t start its trip until stop 34 causing the times by row to fail.

# move each trip earlier in the schedule and check if that fixes the issue or not. assumption is that they would have to occur earlier to work.
lapply(row_checks, FUN = function(col_name) {
  
  # create new stop order by starting by placing the change stop in the first postion and working down until the times are in the correct order
  
  # find current stop order
  cur_trip_number <- gsub(pattern = "Trip_", replacement = "", x = col_name) %>%
      as.integer()
  i <- cur_trip_number
  while (i > 0) {
    # extract vector of time table data
    cur_trip_column <- stop_times_0_wide[,paste0("Trip_", i), with = F] %>%
      unlist()
    # previous trip col time table data
    prev_trip_column <- stop_times_0_wide[,paste0("Trip_", i - 1), with = F] %>%
      unlist()
    # replace cur column with prev column
    stop_times_0_wide[,paste0("Trip_", i)] <- prev_trip_column
    # replace cur trip column with prev column location
    stop_times_0_wide[,paste0("Trip_", i-1)] <- cur_trip_column
    # save current version of stop times 0 wide to global env
    stop_times_0_wide <<- stop_times_0_wide
    # run row check on the two change rows and see if that fixed the issue or not
    row_checks <- apply(X = stop_times_0_wide[,4:(i+3)], MARGIN = 1, 
                        FUN = row_check_fun)
    # increase postion
    i <- i - 1
    # break if all values are true, if not repeat with the next position
    if (all(row_checks %>% unlist(), na.rm = T)) {
      break
    }
  }
})
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL
## 
## [[11]]
## NULL
## 
## [[12]]
## NULL
## 
## [[13]]
## NULL

Lets do a final check to see if all rows are correct now.

row_checks <- apply(X = stop_times_0_wide[,c(-1,-2,-3)], MARGIN = 1, FUN = row_check_fun, simplify = F)

all(row_checks %>% unlist(), na.rm = T)
## [1] TRUE

The code worked and everything appears to be in the correct stop and trip order. Let us now clean up the tables into both a digital format that can easily be read in an html and in a pdf document.

Step 6: Clean Up Table for Final Version

We are going to clean up a bit more and then produce two versions.

  • webpage friendly version that is easy to read and scroll through.

  • print friendly version that can saved as a pdf.

Web version

For the web version we will use the reactable package and and a few customizations.

# clean up trip column  names
colnames(stop_times_0_wide)[4:length(stop_times_0_wide)] <- gsub(pattern = "_",
                                                                 replacement = " ",
                                                                 x = colnames(stop_times_0_wide)[4:length(stop_times_0_wide)])
# remove the stop id and stop order values
reactable(data = stop_times_0_wide[,c(-1,-3), with = F], 
          pagination = F, 
          height = 800,
          defaultColDef = colDef(
            align = "center"
          ),
          columns = list(
            `Stop Name` = colDef(align = 'left', 
                                  width = 250)
          ),
          theme = reactableTheme(color = "black")) %>%
  add_title("Bus Line 222 Time Table") %>%
  add_subtitle("Direction 0")

Bus Line 222 Time Table

Direction 0

PDF version

For the pdf version we will use gridExtra and with some upfront calculations to produce an automated pdf table. For the final size I had to experiment and find how many rows and columns would fit within my selected page size of 11x8.5 inches. Because the size of the table is large, we cannot show all stops on a single page. So we will produce a series of tables that show the first set of stops and all trips, followed by the rest of the stops and all trips.

library(gridExtra)
## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine
library(grid)

page_width <- 11
page_height <- 8.5
page_margin <- 0.5

# create a loop to determine how many columns (trips) can be selected for each page
table_width <- 0
trips_per_page <- 1
while (page_width - page_margin > table_width) {
  
  # calculate page width starting with one column at a time
  table_width <- tableGrob(d = stop_times_0_wide[,c(-1,-3)][,1:trips_per_page], 
                          rows = NULL, 
                          theme = ttheme_default(base_size = 8))
  
  table_width <- grid::convertWidth(x = sum(table_width$widths), 
                                   unitTo = "inches", 
                                   valueOnly = T)
  
  trips_per_page <- trips_per_page + 1
  table_width <<- table_width
  
}
# subtract 1 from trips per page so we dont exceed the page size
trips_per_page <- trips_per_page - 1

# calculate number of stops per page
table_height <- 0
stops_per_page <- 1
while (page_height - page_margin > table_height) {
  
  # calculate page width starting with one column at a time
  table_height <- tableGrob(d = stop_times_0_wide[1:stops_per_page,c(-1,-3)], 
                          rows = NULL, 
                          theme = ttheme_default(base_size = 8))
  
  table_height <- grid::convertWidth(x = sum(table_height$heights), 
                                   unitTo = "inches", 
                                   valueOnly = T)
  
  stops_per_page <- stops_per_page + 1
  table_height <<- table_height
  
}
# subtract 1 from stops per page so we dont exceed the page size
stops_per_page <- stops_per_page - 1

# calculate the seq of trips per page
trips_seq_start <- seq(1,length(stop_times_0_wide)-2,trips_per_page)
trips_seq_end <- c(trips_seq_start[-1] - 1, length(stop_times_0_wide)-2)

# calculate the seq of stops per page
stops_seq_start <- seq(1,nrow(stop_times_0_wide),stops_per_page)
stops_seq_end <- c(stops_seq_start[-1] - 1, nrow(stop_times_0_wide))

# create two loops, outer loop for stops and and inner loop for trips.
pdf("table.pdf", width = page_width, height = page_height)
# outer loop for stops
mapply(start_stops = stops_seq_start, end_stops = stops_seq_end, 
       FUN = function(start_stops, end_stops) {
         # inner loop for trips
         mapply(trips_start = trips_seq_start, trips_end = trips_seq_end, 
                FUN = function(trips_start, trips_end) {
                  grid.arrange(
                    tableGrob(d = stop_times_0_wide[start_stops:end_stops,c(-1,-3)][,c(trips_start:trips_end), with = F], 
                              rows = NULL, 
                              theme = ttheme_default(base_size = 8))
                  )
                })
         
})
##      [,1]      [,2]     
## [1,] gtable,12 gtable,12
## [2,] gtable,12 gtable,12
## [3,] gtable,12 gtable,12
## [4,] gtable,12 gtable,12
## [5,] gtable,12 gtable,12
## [6,] gtable,12 gtable,12
## [7,] gtable,12 gtable,12
dev.off()
## png 
##   2

I have embedded the final pdf results below.

Check 3: Manual Time Table Checks

Now that we have a clean table that is easy to read, lets us do one more check by looking at the printed time table created by the BVG (PDF Time Table) compared to ours. Its best to do some spot checking here to ensure that nothing is critically wrong, but may not be worth the effort to go line by line.

Future Plans

While this post only showed the results for a single line, we can convert all this logic into a shiny app that can load in any GTFS feed, filter the feed by given time and date, and then select any of the routes and produce the time table and map for each one. Putting this code into a shiny app will allow users to quickly check the time tables of each route and ensure that they are running as expected. Once the shiny app is complete I will create a new post going over the details of the app and sharing the app itself. In addition, while the code presented here works to fix bus line 222, it has not been checked for edge cases or other lines at this time. So using the same code in the shiny app will also provide a means to check if this solution can work in general for all GTFS feeds, or if there will be special situations when it will fail.