Real-time network assessment and updating using vehicle-locating data

GEORGIA DOT RESEARCH PROJECT 20-01 FINAL REPORT
REAL-TIME NETWORK ASSESSMENT AND UPDATING USING
VEHICLE-LOCATING DATA
OFFICE OF PERFORMANCE-BASED MANAGEMENT AND RESEARCH
600 WEST PEACHTREE ST. NW ATLANTA, GA 30308 MARCH 2022

1. Report No.

2. Government Accession No.

FHWA-GA-22-2001

N/A

4. Title and Subtitle

Real-Time Network Assessment and Updating Using Vehicle-Locating Data

7. Author(s) Iris Tien, Ph.D., https://orcid.org/0000-0002-1410-632X Zachary Roberts Kaixin Chen
9. Performing Organization Name and Address Georgia Tech Research Corporation Office of Sponsored Programs 926 Dalney Street NW Atlanta, GA 30332-0420
12. Sponsoring Agency Name and Address
Georgia Department of Transportation
600 W. Peachtree St. NW
Atlanta, GA 30308

3. Recipient's Catalog No. N/A 5. Report Date March 2022 6. Performing Organization Code N/A
8. Performing Organization Report No. N/A
10. Work Unit No. N/A 11. Contract or Grant No. Project No. 0016970
13. Type of Report and Period Covered Final Report (July 2020March 2022) 14. Sponsoring Agency Code N/A

15. Supplementary Notes
Conducted in cooperation with the U.S. Department of Transportation, Federal Highway Administration.
16. Abstract This project explores the ability to use vehicle-locating data to assess the state of the road network, including identifying road blockages along different segments of the transportation system. The project utilizes the mobile sources of Georgia Department of Transportation (GDOT) vehicles and their associated vehicle-tracking information to infer the state of the road network and perform transportation network assessment. We develop and implement multiple data trimming and processing methods using ArcGIS-specific Python algorithms to transform an initially large dataset into a usable format for network assessment. To utilize the vehicle-locating data in particular, we create a workflow to enable comparison of the vehicle routes with optimal routes to detect suboptimal routing decisions that may be indicative of blockages in the road network. We use the resulting datasets as inputs and create machine learning models with multiple variables to detect the presence of a road blockage. We explore both regression-based and classification-based models, and find that the classification model performs particularly well for this task. Specifically, the decision tree classification model is able to detect road blockages with high accuracy, with results showing up to 92.0 percent recall and 92.4 percent precision. In addition, the accuracy for the no-blockage class is up to 99.3 percent. In this project, through the development and use of multiple data processing and data analysis methods combined with machine learning approaches, we show how the vehicle-locating data can be used to perform network assessment and accurate detection of blockages in the road network.

17. Key Words Vehicle Data, Network Assessment, Route, Machine Learning, Geographic Information System (GIS), Roads, Traffic, Geospatial Data

18. Distribution Statement No restrictions. This document is available through the National Technical Information Service, Springfield, VA 22161.

19. Security Classification (of this report) Unclassified

20. Security Classification (of this page) Unclassified

21. No. of Pages 88

22. Price Free

Form DOT F 1700.7 (8-72)

Reproduction of completed page authorized

GDOT Research Project No. 20-01 Final Report
REAL-TIME NETWORK ASSESSMENT AND UPDATING USING VEHICLE-LOCATING DATA
By Iris Tien, Ph.D. Professor School of Civil and Environmental Engineering Zachary Roberts Graduate Research Assistant
Kaixin Chen Graduate Research Assistant
Georgia Institute of Technology
Contract with Georgia Department of Transportation
In cooperation with U.S. Department of Transportation Federal Highway Administration
March 2022
The contents of this report reflect the views of the authors, who are responsible for the facts and accuracy of the data presented herein. The contents do not necessarily reflect the official views of the Georgia Department of Transportation or the Federal Highway Administration. This report does not constitute a standard, specification, or regulation.
ii

Symbol
in ft yd mi
in2 ft2 yd2 ac mi2
fl oz gal ft3 yd3
oz lb T
oF
fc fl
lbf lbf/in2

SI* (MODERN METRIC) CONVERSION FACTORS

APPROXIMATE CONVERSIONS TO SI UNITS

When You Know

Multiply By

To Find

LENGTH

inches

25.4

millimeters

feet

0.305

meters

yards

0.914

meters

miles

1.61

kilometers

AREA

square inches

645.2

square millimeters

square feet

0.093

square meters

square yard

0.836

square meters

acres

0.405

hectares

square miles

2.59

square kilometers

VOLUME

fluid ounces

29.57

milliliters

gallons

3.785

liters

cubic feet

0.028

cubic meters

cubic yards

0.765

cubic meters

NOTE: volumes greater than 1000 L shall be shown in m3

MASS

ounces

28.35

grams

pounds

0.454

kilograms

short tons (2000 lb)

0.907

megagrams (or "metric ton")

TEMPERATURE (exact degrees)

Fahrenheit

5 (F-32)/9

Celsius

or (F-32)/1.8

ILLUMINATION

foot-candles foot-Lamberts

10.76 3.426

lux candela/m2

FORCE and PRESSURE or STRESS

poundforce

4.45

newtons

poundforce per square inch

6.89

kilopascals

Symbol
mm m m km
mm2 m2 m2 ha km2
mL L m3 m3
g kg Mg (or "t")
oC
lx cd/m2
N kPa

Symbol
mm m m km
mm2 m2 m2 ha km2
mL L m3 m3
g kg Mg (or "t")
oC
lx cd/m2
N kPa

APPROXIMATE CONVERSIONS FROM SI UNITS

When You Know

Multiply By

To Find

LENGTH

millimeters

0.039

inches

meters

3.28

feet

meters

1.09

yards

kilometers

0.621

miles

AREA

square millimeters

0.0016

square inches

square meters

10.764

square feet

square meters

1.195

square yards

hectares

2.47

acres

square kilometers

0.386

square miles

VOLUME

milliliters

0.034

fluid ounces

liters

0.264

gallons

cubic meters

35.314

cubic feet

cubic meters

1.307

cubic yards

MASS

grams

0.035

ounces

kilograms

2.202

pounds

megagrams (or "metric ton")

1.103

short tons (2000 lb)

TEMPERATURE (exact degrees)

Celsius

1.8C+32

Fahrenheit

ILLUMINATION

lux candela/m2

0.0929 0.2919

foot-candles foot-Lamberts

FORCE and PRESSURE or STRESS

newtons

0.225

poundforce

kilopascals

0.145

poundforce per square inch

Symbol
in ft yd mi
in2 ft2 yd2 ac mi2
fl oz gal ft3 yd3
oz lb T
oF
fc fl
lbf lbf/in2

*SI is the symbol for the International System of Units. Appropriate rounding should be made to comply with Section 4 of ASTM E380. (Revised March 2003)

iii

TABLE OF CONTENTS
Executive Summary ........................................................................................................................ 1 Chapter 1. Introduction ................................................................................................................... 3
Literature Review and Related Work .......................................................................................... 5 Project Objectives ....................................................................................................................... 6 ArcGIS Definitions and Data Analysis Terminology ................................................................. 8 Chapter 2. Data Inputs .................................................................................................................. 11 Georgia Road Network Shapefile ............................................................................................. 11 WebEOC Executive Report ...................................................................................................... 13 Verizon Network Fleet Geodatabase......................................................................................... 13 Chapter 3. PreProcessing and Data Modification ......................................................................... 18 Chapter 4. Data Processing ........................................................................................................... 21 Valid.py Workflow .................................................................................................................... 25 Chapter 5. Machine Learning Models and Results ....................................................................... 32 Model Variables ........................................................................................................................ 32
Average Annual Daily Traffic (AADT) (Independent Variable) .......................................... 32 Fulton County Weather Data (Independent Variable)........................................................... 36 Optimal Route Length Difference (Independent Variable) ................................................... 37 WebEOC Incident Presence (Dependent Variable) .............................................................. 47 Model Results............................................................................................................................ 48 Chapter 6. Discussion and Conclusions........................................................................................ 54 Chapter 7. Recommendations ....................................................................................................... 57 Appendix....................................................................................................................................... 59 ArcGIS/ArcPy Python Glossary................................................................................................ 59 User Code .................................................................................................................................. 61 GetSegments Code................................................................................................................. 61 mxFindRoutes Code............................................................................................................... 63 Valid.py Code ........................................................................................................................ 71 Acknowledgements....................................................................................................................... 77 References..................................................................................................................................... 78
iv

LIST OF FIGURES
Figure 1. Screenshot. LRSN_GDOT ArcGIS layer (left). Includes road segments spanning across the state of Georgia. Information is tagged to each segment in the form of a Shapefile Attribute Table with identifying information (right). ................................................................... 12
Figure 2. Map. Shown in cyan is Ferst Drive on the Georgia Tech campus in Atlanta, GA. Selected feature displaying information on the polyline shape length, county, road identification code, and direction (increasing/decreasing). .......................................................... 12
Figure 3. WebEOC Spreadsheet Data. Important features include Incident Type, Time of Occurrence, and Geolocation (latitude/longitude). ....................................................................... 13
Figure 4. Spreadsheet data. Verizon Network Fleet Attribute Table displaying vehicle locating point identifying information. The "Ignition" column shows the state of the vehicle being turned on or off. This will indicate when to cease a vehicle route segment for Valid.py, as explained in chapter 4. Vehicles are grouped and identified using the data in the "VIN" column, representing the vehicle identification number............................................................... 14
Figure 5. Screenshot. VLP1 (left) transformed into VLP_FC1 (center) (FC = Fulton County) via ArcGIS Clip function by Fulton County layer. Clip function input and output shown on right. .............................................................................................................................................. 16
Figure 6. Screenshot. Vehicle-locating points utilized in machine learning regression and classification models. Blue data points representing VLP19 Feature Class, red data points representing VIN 1FTBF2B69HEE49969 of VLP18. Data trimmed to Fulton County for analysis.......................................................................................................................................... 17
Figure 7. Screenshot. ArcGIS XY Table To Point function. Outputs feature class of georeferenced points using table row information........................................................................ 19
Figure 8. Diagram. Demonstration of vehicle-locating points removed by the ArcGIS Buffer function in the data-trimming step. Redundant data points (outlined in red) most likely represent stationary vehicles due to their proximity to each other and distance from the buffered route segment (shown in green). .................................................................................... 20
Figure 9. Screenshot. GetSegments.py output............................................................................... 21
Figure 10. Screenshot. Visualization of vehicle-locating points to be connected as nodes to one polyline (representing a vehicle routing segment) in Valid.py. ............................................. 23
Figure 11. Screenshot. Individual vehicle ID numbers................................................................. 24
Figure 12. Screenshot. Output segments from the Valid.py function pre- (left) and post(right) segmentation by vehicle ID. .............................................................................................. 25
v

Figure 13. Screenshot. Convert Time Field. ................................................................................. 26
Figure 14. Screenshot. Summary Statistics................................................................................... 27
Figure 15. Screenshot. Environments. .......................................................................................... 28
Figure 16. Code. Repetitive segment removal.............................................................................. 29
Figure 17. Screenshot. Select By Attributes. ................................................................................ 30
Figure 18. Screenshot. GDOT Road and Traffic Data. The most recent numbers from 2019 are used, as 2021 data have yet to be published. Under the Traffic Data Type, the Spatial Geodatabase is used for this project.............................................................................................. 34
Figure 19. Screenshot. Visualization of Traffic Location Data (left) with georeferenced information (pop-up on right). AADT is used for this study. For our example here, 4,530 vehicles is the AADT for 2019. Attribute information is later linked based on the FID closest to a vehicle route segment (representing the road the vehicle is on) using the ArcGIS Merge function.............................................................................................................................. 35
Figure 20. Screenshot. ArcGIS Near function. Inputs a feature class and target feature class to find the closest feature of the target feature class. The output is the Feature ID (FID) of the closest feature in the Traffic Data into the VLP Attribute Table (indicated as route_demo in this example). The search radius is set at 100 ft to avoid incorrect closest features from being linked to the data. In the case that the closest feature is greater than 100 ft, FID and AADT for the Vehicle Locating Point are linked to the value "-1". This is later reassigned to "0", identified using Select By Attributes function with the SQL "FID = -1", with the checked "New Selection". ............................................................................................................ 36
Figure 21. Screenshot. Weather Underground Data for Fulton County, June 2021. Daily Temperature, Dew Point, Precipitation, and Wind Speed values are included in the data table............................................................................................................................................... 37
Figure 22. Screenshots. Visualization of suboptimal routing between the start and end point. As displayed on Google Maps (for demonstration purposes; ArcGIS is used in the project), the left shows the quickest route, which involves traveling due north 2 miles; actual GDOT vehicle route is shown on the right. .............................................................................................. 39
Figure 23. Screenshot. ArcGIS Solve function output. Each purple line, with one highlighted in cyan as a demonstration, represents an optimal path. 1's represent the starting point and 2's represent the ending point of a vehicle route segment. ........................................................... 39
Figure 24. Screenshot. ArcGIS Feature Vertices To Points function........................................... 40
Figure 25. Screenshot. Feature Vertices To Points outputting more than two points for the Start and End Point Layer. Selected points in the attribute table (left) are shown geographically (right).................................................................................................................... 41
vi

Figure 26. Code. Code to delete extraneous points. ..................................................................... 42 Figure 27. Screenshot. Stops Attribute Table. .............................................................................. 44 Figure 28. Screenshot. Add Join. .................................................................................................. 46 Figure 29. Screenshots. WebEOC incidents within 1 week of data timestamps (left, in green) and WebEOC incidents within 1 month (right, in pink). .............................................................. 48 Figure 30. Screenshot. Classification decision tree visualized in MATLAB. Each node represents a predictive decision made by the model to arrive at an estimate for whether or not a route blockage is present. End nodes (leaves) represent these binary predictions. In this model, x1 = Optimal Route Length Difference, x2 = Daily High Temperature, x3 = Precipitation, x4 = AADT. ................................................................................................... 51 Figure 31. Screenshots. Confusion matrices for 1-week and 1-month classification models, with row summaries (right of each matrix) also shown................................................................ 52
vii

LIST OF TABLES Table 1. Regression model results for WebEOC 1 week.............................................................. 49 Table 2. Regression model results for WebEOC 1 month............................................................ 49
viii

EXECUTIVE SUMMARY
This project explores the ability to use vehicle-locating data to assess the state of the road network, including identifying road blockages along different segments of the transportation system. Compared to prior work using stationary data sources, such as loop detectors, traffic cameras, or traffic monitoring stations, or individual human-collected data collected either directly or through third-party sources, this project utilizes the mobile sources of Georgia Department of Transportation (GDOT) vehicles and their associated vehicle-tracking information to infer the state of the road network and perform transportation network assessment. These data are already currently being collected, demonstrating the utility of these data in performing road network assessment without the need to invest in new technologies, dedicate additional resources, or implement new instrumentation or infrastructure.
The raw dataset of vehicle-locating data is large and, in many cases, messy. In this project, we develop and implement multiple data trimming and processing methods using ArcGIS-specific Python algorithms to transform this initially large dataset into a usable format for network assessment. To utilize the vehicle-locating data in particular, we create a workflow to enable comparison of the vehicle routes with optimal routes to detect suboptimal routing decisions that may be indicative of blockages in the road network. This workflow includes the creation of vehicle route segments based on the individual vehicle-locating data points, the linking of segments into routes, the identification of optimal routes between these points, and the comparison of distances between the actual taken routes and the optimal routes to detect the degree of suboptimal routing and its association with the likelihood of the presence of a road blockage.
1

We use the resulting datasets as inputs and create machine learning models with multiple variables to detect the presence of a road blockage. We explore both regression-based and classification-based models, and find that the classification model performs particularly well for this task. Specifically, the decision tree classification model is able to detect road blockages with high accuracy, with results showing up to 92.0 percent recall and 92.4 percent precision. In addition, the accuracy for the no-blockage class is up to 99.3 percent. In this project, through the use of multiple data processing and data analysis methods combined with machine learning approaches, we show how the vehicle-locating data can be used to perform network assessment and accurate detection of blockages in the road network.
2

CHAPTER 1. INTRODUCTION
Road infrastructure makes up a crucial component of Georgia's asset network. Throughout the state, connections link different areas to each other, providing access to employment, social, and health services, thereby supporting state activities and stimulating economic development. These services are interrupted, however, by the presence of road blockages, including those due to vehicular accidents, debris, and flooding, among other factors, which limit and can prohibit travel along certain routes. Providing real-time information on the state of the transportation network is a way for state agencies to understand the state of the network at any point in time, deploy resources as needed to resolve any road blockages, and prioritize specific areas of the road network for recovery.
An increasing number of data sources are available to potentially provide such information on the state of the road network. However, these often require significant resources to implement, including to install certain infrastructure or hardware to collect data, or in changing specific practices by the public or individual workers to ensure reliable data collection. These challenges potentially limit the utility of these data sources for road network assessment, both by the amount of data that can be collected, and in how accurate or reliable these data turn out to be.
In this project, rather than using these types of data sources (such as data collected from fixed infrastructure installations, or individual human-collected data) with their accompanying challenges and limitations, we (research team) use data that are already currently being collected by the Georgia Department of Transportation (GDOT). Specifically, we use data that are already implemented through hardware on GDOT vehicles that track GDOT vehicle locations as they
3

travel over the network to infer the state of the road network and perform continuous network assessment and updating. The idea is that as GDOT vehicles travel over the network, they are continuously collecting data on the state of the network in the road segments they are traveling over. For example, if a GDOT vehicle travels over a certain route, it can be inferred from the vehicle-locating information that the particular route that the vehicle traveled over is unblocked, and open for passage. In contrast, if a vehicle makes an unexpected detour around a certain part of the network, there is some likelihood that the vehicle was avoiding a blocked part of the network, indicating a potential road blockage in the area avoided.
Thus, the GDOT vehicles provide valuable information on the real-time network state. The benefit of using these vehicle-locating data for the network assessment is that these data are already being collected by GDOT assets, so no additional investment in assets or infrastructure needs to be implemented to perform the network assessment. In addition, the GDOT equipment that collects the vehicle-locating data is implemented passively rather than actively, meaning that it will collect these data without the need for operators to turn on certain instruments or capabilities. The result is that there is less risk that something will occur to disrupt data collection and that there is increased reliability that the data will be continuously collected. The GDOT vehicle-locating data are also being collected continuously, enabling the network assessments that are made based on the data to be continuously updated as new information is recorded and received about the locations and routes of GDOT vehicles across the network. Finally, because the data are being collected by GDOT rather than by a third party or by the public, and they are being used for GDOT purposes, there are no issues regarding data security or privacy in order to collect or use the data. Also, GDOT has control over how the data are
4

collected moving forward, rather than relying on potential changing data collection strategies, rules, and regulations from third-party owners.
LITERATURE REVIEW AND RELATED WORK Previous research includes work in the area of using new technologies to facilitate evacuation decisions after a disaster (Iliopoulou et al. 2020); however, this project focuses on transportation network assessment rather than evacuation routing. While many previous studies focus on traffic estimation and prediction (e.g., Mena-Yedra et al. 2018), this project focuses on real-time network assessments with outcomes facilitating resource allocation and network recovery through identification and detection of road blockages. In terms of specific technologies, previous research often utilizes fixed data-collection sources, such as loop detectors and traffic monitoring stations providing traffic count information (Singh et al. 2018). Compared to that work on utilizing stationary data sources (i.e., loop detectors, traffic cameras, traffic monitoring stations) for transportation network analysis, this study focuses on the mobile sources of GDOT vehicles and their associated vehicle-tracking information, which is wider-reaching with lower operational costs, and the other benefits previously described.
Recently, movement has been toward the use of increased mobile data sources (e.g., Meng et al. 2017). However, that work focuses on traffic flow modeling rather than actual network assessment, which is the focus of this project. Finally, regarding the use of mobile data for postdisaster network assessment, much of the recent work uses crowdsourced information for infrastructure assessment (Basu et al. 2016, Astarita et al. 2020). Compared to crowdsourced data, the mobile vehicle-locating data utilized in this project represents a more trustworthy, detailed, and accurate geolocated data source for transportation network assessment.
5

In general, there have been no studies on utilizing vehicle-locating data to perform real-time transportation network assessment. This project represents the first time that such data are being investigated for use for this purpose. The anticipated benefit is that data that are currently collected by agencies, such as GDOT, can be used and leveraged for use in transportation network assessment and updating as vehicles, routes, and network conditions change.
PROJECT OBJECTIVES The objective of this project is to create a system and investigate the feasibility of such a system that is able to utilize currently collected GDOT vehicle-locating data to provide real-time assessment and updating of the state of the transportation network assessment. Doing so will provide GDOT with important information to support increased situational awareness of the state of the network, as well as support resource allocation, hazard mitigation, and network recovery operations to resolve any road blockages across the transportation network.
To accomplish this, in this project, we utilize data sources provided by GDOT as inputs into the system. The main data inputs are vehicle routing information and traffic incident data representing road blockage information. These data inputs are described further in chapter 2. Next, we perform a series of preprocessing, data-modification, and data-processing operations in order to make the data usable and consistent for the full data-processing system. It is noted that the datasets investigated are large, and require several transformations to enable operational viability and provision of use as geographic information system (GIS) intelligence. A workflow has been developed to efficiently create and utilize the vehicle-locating points (VLPs). This includes the processing of the large vehicle-locating datasets using data trimming and buffering methods, as well as the identification and connection of specific vehicle-locating data points into
6

individual vehicle route segments. These operations refine and process the datasets, and are described in more detail in chapter 3 and chapter 4.
Next, the goal is to create a model that is able to use the vehicle-locating data as inputs to detect road blockages in the transportation network. We utilize machine learning methods, which involve building of the models and both training and testing of the datasets with the models. The training step--training a dataset using machine learning models--enables us to understand how traffic conditions and vehicle-routing information interact with each other to be able to infer the presence of a road blockage based on the vehicle-locating information. The goal is then to use the trained dataset to apply to a real-time detection system with the presence of processing capabilities. The specific machine learning methods investigated include ordinary least squares (OLS) linear regression and decision tree classification, both of which are explored to learn the trends of traffic across the network based on the vehicle-locating information to accurately predict the likelihoods of road blockages. The resulting model provides intelligence and learning about how two large datasets, containing VLPs and georeferenced traffic incident data, interact with one another over the time scope of the study. The created models and model results are described in chapter 5.
Chapter 6 provides discussion and conclusions on the results of this work. Chapter 7 provides further recommendations for how the system can be used for real-time network assessment and updating, including descriptions of expanding the scope of the project results and outcomes. The results of this study demonstrate the novelty and utility of a mobile detection system across a broad network utilizing currently collected GDOT vehicle-locating data to provide information about the state of the transportation network as it changes over time.
7

ARCGIS DEFINITIONS AND DATA ANALYSIS TERMINOLOGY To conduct the data processing and analysis activities in this project, we use ArcGIS, a software tool that enables geolocated information to be processed, integrated, and analyzed. Below are definitions related to ArcGIS (ESRI 2021), and varying data analysisrelated terminology (Bailey 2005, Yale University 2021) that are used in this report. A further glossary of terms for varying ArcGIS functions is provided in the appendix.
Shapefile A vector data storage format for storing the location, shape, and attributes of geographic features. A shapefile is stored in a set of related files and contains one feature class.
Network Dataset A collection of topologically connected network elements (edges, junctions, and turns) that are derived from network sources, typically used to represent a linear network, such as a road or subway system. Each network element is associated with a collection of network attributes. Network datasets are typically used to model undirected flow systems.
Polyline A shape defined by one or more paths, in which a path is a series of connected segments. If a polyline has more than one path (a multipart polyline), the paths may either branch or be discontinuous.
ArcGIS Geodatabase Feature Class A collection of geographic features with the same geometry type (such as point, line, or polygon), the same attributes, and the same spatial reference. Feature classes are stored in geodatabases, shapefiles, coverages, or other data formats. Feature classes allow homogeneous features to be grouped into a single unit for data storage purposes. For example, highways, primary roads, and secondary roads can
8

be grouped into a line feature class named "roads." In a geodatabase, feature classes can also store annotation and dimensions. Linear Regression A model that analyzes the relationship between two or more variables by fitting a linear equation to observed data. One variable is considered to be a dependent variable, and the other(s) are considered to be explanatory variable(s). The goal is typically to predict the dependent variable based on the explanatory variable(s).
o Ordinary Least Squares (OLS) Regression A regression method that calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line, i.e., by minimizing the sum of the square errors between the predicted and observed data values. Because the deviations are first squared, then summed, there are no cancellations between positive and negative values.
o Correlation Coefficient An index number between -1 and 1 indicating the strength of the linear association between two variables.
Classification The generic process for grouping entities by similarity. A classification model is similar to that of a regression model, taking input variable(s) and predicting a dependent variable, except that the result is a binary output based on class prediction for a binary classification problem. o Decision Tree A supervised machine learning algorithm that splits data into branches until it achieves a threshold value. A branch represents a classifying decision, which relies on a variety of factors as determined by the classification model input. Leaves represent the end of a string of decision branches, where they are the terminal nodes that predict a classifying outcome. Branches of the tree are
9

determined based on the input variable(s) and corresponding classes by points in the dataset. o Recall The ability of a classification model to identify all data points in a relevant class. Recall is a measure used to evaluate accuracy of a classification model. o Precision The ability of a classification model to return only the data points in a class. Precision is a measure used to evaluate accuracy of a classification model.
10

CHAPTER 2. DATA INPUTS
This chapter describes the multiple data inputs utilized in the developed data processing and analysis pipeline for this project. GEORGIA ROAD NETWORK SHAPEFILE To assess the geospatial relations of all the utilized datasets related to the transportation network, it is necessary to have a base file of the road network. The first data input is the shapefile of the State of Georgia's road network. This shapefile consists of a series of interconnected polylines representing the midpoint of Georgia roads. The shapefile includes the identification code, geolocation, and width of 205,351 road segments. Figure 1 shows the full ArcGIS layer of the road network shapefile, which includes road segments spanning across the state of Georgia. In addition to the visual representation, information is tagged to each segment in the form of a Shapefile Attribute Table with corresponding identifying information for each segment. Figure 2 shows an example of one road segment in the shapefile. Selected for illustrative purposes is Ferst Drive on the Georgia Institute of Technology campus in Atlanta, Georgia. Information about the selected segment includes the polyline shape length, county, road identification code, and direction (increasing/decreasing).
11

Figure 1. Screenshot. LRSN_GDOT ArcGIS layer (left). Includes road segments spanning across the state of Georgia. Information is tagged to each segment in the form of a Shapefile Attribute Table with identifying information (right).
Figure 2. Map. Shown in cyan is Ferst Drive on the Georgia Tech campus in Atlanta, GA. Selected feature displaying information on the polyline shape length, county, road identification code, and direction (increasing/decreasing). 12

WEBEOC EXECUTIVE REPORT To match the vehicle-locating data with identified incidents on the road network leading to potential road blockages, it is necessary to know where and when the road incidents occurred. This information is attained through the WebEOC Executive Report, which is exported as a spreadsheet. Figure 3 shows an Excel spreadsheet representing historical traffic incident data from January 2016 to September 2021. The report includes state route location, incident description, direction, and the number of lanes passable. For this study, there are approximately 4,000 incidents reported within the Fulton County boundaries.
Figure 3. WebEOC Spreadsheet Data. Important features include Incident Type, Time of Occurrence, and Geolocation (latitude/longitude).
VERIZON NETWORK FLEET GEODATABASE The vehicle-locating data utilized in the project as tracking information for the GDOT-owned vehicles as they travel over the road network is from the Verizon Network Fleet system installed
13

and operational on the vehicles. These data are output as a Verizon Network Fleet Attribute Table displaying vehicle locating point identifying information. An example of the converted Excel spreadsheet of the GDOT vehicle-tracking information from the Verizon Network Fleet geodatabase is shown in figure 4, which includes vehicle ID, location, time, and ignition status of the vehicle (On/Off). The location and time information are used to create the vehicle tracks over the network. The "Ignition" column provides the information about the state of the vehicle being turned on or off. These data are used to cut the large dataset into individual vehicle route segments, indicating when to cease a vehicle route segment in the created function Valid.py. This function is explained further in chapter 4. Vehicles are grouped and identified using the data in the "VIN" column, representing the vehicle identification number. This enables us to identify and locate individual vehicles over the network. Vehicles are tracked with a frequency of 2 minutes until the ignition of the car is turned off.
Figure 4. Spreadsheet data. Verizon Network Fleet Attribute Table displaying vehicle locating point identifying information. The "Ignition" column shows the state of the vehicle being turned on or off. This will indicate when to cease a vehicle route segment for Valid.py, as explained in chapter 4. Vehicles are grouped and identified using the data in the "VIN"
column, representing the vehicle identification number.
14

Vehicle-locating points are restricted to Fulton County and subdivided into 19 separate ArcGIS feature classes. The Verizon Network Fleet data are subdivided to improve the processing time of our user-developed Valid.py function, which creates the vehicle route segments. For the study, the smallest of the feature classes, VLP19, and the largest represented vehicle of VLP18 (VIN 1FTBF2B69HEE49969) are used. Further data could not be implemented due to computational processing time restrictions. For the study, approximately 3,500 data points are represented.
As the Verizon Network Fleet data span across Georgia, our data need to be extracted from Fulton County, which is chosen for its centrality of vehicle traffic in the state. To initiate this process, a feature class is used named Counties.gdb containing the shapes of all 159 counties in the state of Georgia. To extract the Fulton County shape, the attribute is selected in the Feature Class Attribute Table, and scrolling over to the layers in the ArcGIS project, we create a layer via the "Make Layer From Selected Features" function. The Selected Feature Class is then named "Fulton County". The ArcGIS function Clip is then used to create new feature classes containing only the vehicle-locating points from Fulton County. The input is the Vehicle Locating Point Feature Class and Fulton County selection feature class, with the output being of the name VLP_FC[number of data subdivision]. Here, VLP stands for vehicle-locating points, and FC stands for Fulton County. Figure 5 shows the vehicle-locating points VLP1 transformed into VLP_FC1 using the ArcGIS Clip function by the Fulton County layer. The Clip function input and output is also shown. Figure 6 shows the VLPs that are utilized in the machine learning regression and classification models. The blue data points represent the VLP19 Feature Class,
15

and the red data points represent VIN 1FTBF2B69HEE49969 of VLP18. Figure 6 shows the data that are trimmed to Fulton County for analysis.
Figure 5. Screenshot. VLP1 (left) transformed into VLP_FC1 (center) (FC = Fulton County) via ArcGIS Clip function by Fulton County layer. Clip function input and output
shown on right.
16

Figure 6. Screenshot. Vehicle-locating points utilized in machine learning regression and classification models. Blue data points representing
VLP19 Feature Class, red data points representing VIN 1FTBF2B69HEE49969 of VLP18. Data trimmed to Fulton County for analysis.
17

CHAPTER 3. PREPROCESSING AND DATA MODIFICATION
With the set of data inputs described in chapter 2, certain preprocessing and data-modification operations need to be conducted to properly prepare the data for processing. The purpose in this preprocessing stage is to prepare the workspace within ArcGIS for the later route segmentation and analysis stages. The first step in the data preprocessing is to convert the Verizon Network Fleet Excel files into ArcGIS geopoints. This is done through the function XY Table To Point, where the X field specifies longitude and the Y field specifies latitude. This latitude and longitude information is included in the Network Fleet .csv table. All other columned information is transferred into and associated with each VLP. Figure 7 shows the XY Table To Point function used. The output of this function is a feature class of georeferenced points corresponding with the vehicle-locating data points. Next, a network analysis layer that identifies the Georgia road network must also be set up. This is completed through the creation of a "New Network Database" (ND), which inputs the LRSN_GDOT (i.e., the Georgia Road Network Shapefile). This input, as described in chapter 2, is composed of all the center points of the Georgia road network strung together as separate polylines.
18

Figure 7. Screenshot. ArcGIS XY Table To Point function. Outputs feature class of georeferenced points using table row information.
Additionally, in working with the datasets, given the large size of the datasets, the data points are trimmed to reduce the processing time and remove any redundant information in the datasets. Looking closely at the data, the data in this preprocessing time are trimmed based on the location of the points relative to the locations of the road segments. Here, we trim the data based on proximity to the road network using the ArcGIS Buffer analysis function. Vehicle-locating data points sufficiently far from the road network indicate that the vehicle is not actually on the road or traveling along a road segment.
A buffer of 30 ft (which covers large highways) is chosen from the center of the road network layer to cover all data points within the road network. Parking lot areas and driveways are examples of data points that occur outside of the buffer and are trimmed and not included in the analysis. These cases can be neglected in our objective for vehicle routing and vehicle tracking
19

along road segments. Once points are converted into vehicle routes, additional trimming will be conducted, as described in chapter 4. Figure 8 shows a demonstration of the VLPs that are removed by the Buffer function in this data-trimming step. The redundant data points (outlined in red) most likely represent stationary vehicles due to their proximity to each other and distance from the buffered route segment (shown in green). The redundant data points are removed as they do not represent information about vehicles traveling on the road network, and therefore, are not of use in the vehicle route-tracking analysis process to detect blockages along the road network for this project.
Figure 8. Diagram. Demonstration of vehicle-locating points removed by the ArcGIS Buffer function in the data-trimming step.
Redundant data points (outlined in red) most likely represent stationary vehicles due to their proximity to each other and distance from the buffered route segment (shown in green).
20

CHAPTER 4. DATA PROCESSING With the data preprocessed, this chapter describes the many functions developed as part of this project to process the data in the created data analysis and processing pipeline. These functions are written in Python to facilitate the interoperability of datasets and use with ArcGIS for the geolocated data. There are two main processing steps, each with an associated Python function written. The first is to obtain the desired vehicle routing and incident segments such that they can be overlaid for analysis; this function is called GetSegments.py. The second is to ensure that the vehicle-locating data points are valid and to connect consecutive valid points as nodes to create individual vehicle routing segments; this function is called Valid.py. Our first step of processing involves coding a function to retrieve the vehicle routing and WebEOC incident segments for further analysis. The first function, GetSegments.py, is enabled by the ArcPy function Segment Along Line and utilizes ArcPy. GetSegments inputs the WebEOC dataset and the Road Network layer to output incident segments. Figure 9 shows the output of this function.
Figure 9. Screenshot. GetSegments.py output.
21

The second function, Valid.py, utilizes the ArcGIS Network Analyst feature and ArcPy to find connected vehicle-locating points by similar Vehicle ID (VIN) and FixTime to create vehicle segments across the Georgia road network. The function ensures that the VLPs used in the analysis are all valid points as part of individual GDOT vehicle routes. Valid.py also utilizes a function file named mxFindRoutes that finds the VLPs to be connected and linked together through the Valid.py function. The VLPs are sequential points for a single vehicle as it travels over the network. Each point is separated by a 2-minute time interval. As such, a certain number of points in sequence are of interest to construct the full detailed routes of the vehicles. The objective here is to use the individual VLPs to construct a continuous route of the vehicle as it travels over the network. In doing so, there is a tradeoff between the number of consecutive points used (to construct a continuous route), and the computational cost of storing all the points in an increasingly large dataset for processing. Therefore, from an investigation of the data, and the typical distance covered between points, up to six points are connected at a time to create a vehicle route segment. mxFindRoutes links these points, up to six in number, to input into Solve and create a vehicle segment of as many nodes. Figure 10 shows an example of six of these VLPs to be connected as nodes to form one polyline (representing a vehicle routing segment) in the Valid.py function. This number of six points can be thought of as a moving window as the vehicle continues its route, such that the window of six points moves forward along the direction of the vehicle to create new segments as more data points are continuously collected.
22

Figure 10. Screenshot. Visualization of vehicle-locating points to be connected as nodes to one polyline (representing a vehicle routing segment) in Valid.py.
Given the breadth and number of vehicle tracks that are collected as part of the datasets, Valid.py unintentionally creates certain extraneous segments, including multiple route segments on the same path, some routes contained within others, and segments with no length. These attributes are often found when working with field-collected datasets and need to be addressed to ensure the resulting data points used for analysis are all representative of the data that are desired to be collected, and are accurate and reliable in reflecting vehicle routing actions in the field.
To detect and filter out the data, and in particular remove the extraneous routing segments, a custom program is developed in MATLAB to identify such segments (specifically those where multiple route segments are identified on the same path and the segments that are of zero length). Once identified, the rows containing extraneous segments are selected with the Select By Attributes tool and deleted to eliminate their effect on the subsequent machine learning model developed. The specific user codes developed to perform these operations are provided in the appendix.
23

Finally, given the density and amount of data collected as part of the vehicle-locating equipment, significant computational times are required to process the datasets. An initial processing and analysis were conducted through the Valid.py function. The function was run for approximately 40 hours, which processed 2,000 data points. Essentially, the computer was left over two separate nights to process the dataset. Even with this time, however, the function errored out before completion and this magnitude of computation time is not feasible for long-term use and analysis. Therefore, to reduce computational times, we created an additional step in the data processing and analysis, which is to iterate by vehicle numbers to match the correct vehicle segments and complete the code. A list of VINs are compiled, and the input in Valid Track Points are singlevehicle selections (using ArcGIS Make Layer From Selected Features), repeated for each vehicle. Figure 11 shows example VINs in the vehicle-locating data.
Figure 11. Screenshot. Individual vehicle ID numbers.
24

Because the code is iterative through each VIN, a queue of Valid.py codes is made for each vehicle, which significantly improves the data processing times. For example, the longest of such codes, with 98 vehicle-locating data points, takes 22 minutes to complete. In this case, this code is repeated 68 times, which is the number of vehicles in VLP19 with greater than three points recorded in the dataset, plus one additional execution for VIN 1FTBF2B69HEE49969 of VLP18. For the last extreme case with the most vehicle-locating data points, the data processing and analysis takes approximately 32 hours to successfully iterate through and make route segments from 1,500 data points. The segmentation by vehicle ID both reduces processing time and results in more accurate route segmentations. An example of the vehicle routing output without (left) and with (right) processing by vehicle ID is shown in figure 12.
Figure 12. Screenshot. Output segments from the Valid.py function pre- (left) and post- (right) segmentation by vehicle ID.
To better understand the multiple processes that are conducted as part of the Valid.py function, and to provide a step-by-step demonstration of operations of this function, a workflow for Valid.py is provided below: VALID.PY WORKFLOW
1. Open VLP Feature Class.
25

2. Run Convert Time Field. Use formatting of the image as shown in figure 13. The FixTime column is input as a Text field instead of a Date field, which Valid.py requires as the correct format.
Figure 13. Screenshot. Convert Time Field. 3. Divide the VLP Feature Class into different layers based on Vehicle ID Number. A list of
the VINs can be found by right-clicking the "VIN" column in the VLP Attribute Table and selecting Summary Statistics. Note: Set the Statistic Type to "Unique" as shown in figure 14 to obtain a count of every vehicle and to create separate routes for each vehicle.
26

Figure 14. Screenshot. Summary Statistics. a. Make Layer From Selected Features based on the VIN. Save as "VLP_FC
Selection [VIN]". 4. Run Valid.py on each Selected Features Layer.
a. Keep output feature class and table names consistent to allow for easy copy-andpaste in the data-trimming step. We used in the study "route_demo_[VIN]" for feature class name and "table_demo_[VIN]" for table name.
5. Convert Feature Class To Shapefile for MATLAB data trimming.
27

a. Under Environments, as shown in figure 15, ensure that the Output M and Z values are disabled so the object remains two-dimensional. MATLAB cannot process data with M and Z values.
Figure 15. Screenshot. Environments. 6. Run Repetitive Segment Removal code in MATLAB. The algorithm used is shown in
figure 16 below. The objective is to reduce the amount of data stored by route segment such that repetitive segments are removed and only the most relevant route segments are
28

kept. Valid.py uses the FirstStopID value to connect the VLPs into route segments, so we can see where each segment begins and ends. With this information, we can then find when two or more segments have the same FirstStopID, in which case we want to keep the segment with the most points included. In our Valid.py code, this is six data points. So, an iterative i,j loop is run over the length of the Shapefile to pinpoint when two segments share the same FirstStopID, and if StopCount(i) is less than StopCount(j), where StopCount is the number of points captured by the vehicle routing segment, the ith segment is marked to be deleted. Thus, repetitive segments are removed. The code snippet to perform this operation is shown in figure 16.
delete_row = zeros(size(cell,1),1); %find indices of delete-able rows for i = 1:size(cell,1)
for j = 1:size(cell,1) if startID(i) == startID(j) %for startID being same if stopcount(i) < stopcount(j) delete_row(i) = 1; end end
end end % else delete_row(i) = 0
RepetitiveSegment_Removal.m
Figure 16. Code. Repetitive segment removal.
a. Copy the output of the deletable indices to be put back into ArcGIS. 7. In ArcGIS, Select By Attributes.
a. Paste Output of MATLAB code into SQL as "New Expression". Once all deletable features from MATLAB are selected in ArcGIS, click Delete Selection.
29

The resulting SQL should appear as shown in figure 17, with "OBJECTID = 1 OR OBJECTID = 2", etc., as the MATLAB function outputs.
Figure 17. Screenshot. Select By Attributes. 8. Repeat steps 27 for the remaining Vehicle IDs. In the field-collected data, some of the
vehicles have very few data points, indicating very short routes or malfunctions in the vehicle location-recording equipment. Vehicles with fewer than four VLPs are omitted, as this is an indication that the data are of lesser quality (with missing data, etc.) and provide less useful information in the vehicle-routing analysis.
30

9. Merge all route_demo_[VIN] output feature classes from step 8. Use the ArcGIS function Merge.
10. With the merged data, look through the data to delete segments with zero-length or excessive missing data using the Select By Attributes function. The result is the full vehicle-locating dataset used for model building and analysis.
31

CHAPTER 5. MACHINE LEARNING MODELS AND RESULTS
This chapter describes the machine learning models developed in the project, including the variables included in the models, the types of models explored, and the results from creating and running the models with the project datasets.
MODEL VARIABLES Multiple independent variables are included in the models, to account for varying factors affecting potential road network and road blockage incidents, as described below. In addition, to fully utilize the GDOT vehicle-locating data, a detailed comparison of the taken routes (analyzed based on the vehicle-locating data) with optimal routing scenarios is conducted. The process and workflow to analyze and conduct this comparison is also provided.
Independent Variables Average Annual Daily Traffic (AADT) The first independent variable included is the average annual daily traffic (AADT) on specific road segments. The amount of traffic on a given road segment will affect the likelihood of a potential incident on that segment. GDOT provided road and traffic data that are publicly available to be used in this project, such as a shapefile of traffic counts along the Georgia road network (GDOT 2021). Included in these data are geolocated AADT for given road segments. Figure 18 shows the interface for downloading the GDOT road and traffic data. The most recent numbers from 2019 are used, as 20212022 data have yet to be published. Under the Traffic Data Type, the Spatial Geodatabase is used for this project. Figure 19 shows a visualization of
32

the traffic location data, along with the georeferenced information, including AADT, provided as part of the database and used in this project. Attribute information is later linked based on the Feature ID (FID) closest to a vehicle route segment (representing the road the vehicle is on) using the ArcGIS Merge function. The ArcGIS Near function is used to link the VLPs with this geolocated traffic information. Figure 20 shows the implementation of the ArcGIS Near function. Inputs are a feature class and target feature class to find the closest feature of the target feature class. The output is the FID of the closest feature in the Traffic Data into the VLP Attribute Table. A search radius of 100 ft is set, which is intended to avoid incorrect closest features from being linked to the data. In the case that the closest feature is greater than 100 ft, FID and AADT for the Vehicle Locating Point are linked to the value "-1". This is later reassigned to "0", identified using the Select By Attributes function with the SQL "FID = -1", with the checked "New Selection".
33

Figure 18. Screenshot. GDOT Road and Traffic Data. The most recent numbers from 2019 are used, as 2021 data have yet to be published. Under the Traffic Data Type, the Spatial
Geodatabase is used for this project.
34

Figure 19. Screenshot. Visualization of Traffic Location Data (left) with georeferenced information (pop-up on right). AADT is used for this study. For our example here,
4,530 vehicles is the AADT for 2019. Attribute information is later linked based on the FID closest to a vehicle route segment (representing the road the vehicle is on) using the ArcGIS Merge function.
35

Figure 20. Screenshot. ArcGIS Near function. Inputs a feature class and target feature class to find the closest feature of the target feature class. The output is the Feature ID (FID) of
the closest feature in the Traffic Data into the VLP Attribute Table (indicated as route_demo in this example). The search radius is set at 100 ft to avoid incorrect closest features from being linked to the data. In the case that the closest feature is greater than 100 ft, FID and AADT for the Vehicle Locating Point are linked to the value "-1". This is
later reassigned to "0", identified using Select By Attributes function with the SQL "FID = -1", with the checked "New Selection".
Fulton County Weather Data
As weather conditions often have a significant impact on the likelihood of road incidents and potential road blockages, weather data are included in the analysis. Fulton Countyspecific weather data are utilized in creating the machine learning models for this study. These data are provided in downloadable format by Weather Underground (2021). The precipitation (inches) and temperature are both variables included in the model. Figure 21 shows the weather-related data provided by Weather Underground via a location-specific data table.
36

Figure 21. Screenshot. Weather Underground Data for Fulton County, June 2021. Daily Temperature, Dew Point, Precipitation, and Wind Speed values are included in the data table.
Optimal Route Length Difference Key to the investigations performed in this project is the detailed analysis of the vehicle routes that are identified based on the vehicle-locating data collected from the GDOT vehicles traveling over the road network. We consider the difference between the actual routes taken by the GDOT vehicles (identified by the vehicle-locating data) and the determined optimal routes in traveling between points in the network, where optimality is measured by the shortest route length. We analyze these differences by identified vehicle route segments. In order to compare the taken routes and optimal routes, we first need to determine the optimal routes; then, we look at the difference between these two routes by length.
37

To identify and create the optimal route segments, the start and end points are taken for each vehicle segment and input into the Network Database Analyst as "Stops" using the Import Stops function. We then use the function Solve to create the optimal vehicle segments. As the ArcGIS Feature Compare cannot be utilized for polylines but rather points, we use the length difference between the optimal route and the taken route as the model variable. Figure 22 shows the difference between an optimal route and an actual route between a given start point and end point. In the example, a case of suboptimal routing is observed, i.e., the optimal (left) and actual (right) routes are different. We consider that the vehicle may be taking a suboptimal route due to a road blockage on the optimal route, thus requiring a rerouting decision for the vehicle. The amount of rerouting may also be associated with the likelihood of road blockage. Therefore, a larger difference in routes should correlate with a higher probability of road blockage. Figure 23 shows an example of the Solve function output in ArcGIS. Many vehicle routes and vehicle segments are shown. Each purple line represents an optimal path, with the 1's representing the starting points and 2's representing the ending points of each vehicle route segment. For illustration, one specific optimal path for one particular vehicle route segment is highlighted in cyan.
38

Figure 22. Screenshots. Visualization of suboptimal routing between the start and end point. As displayed on Google Maps (for demonstration purposes; ArcGIS is used in the
project), the left shows the quickest route, which involves traveling due north 2 miles; actual GDOT vehicle route is shown on the right.
Figure 23. Screenshot. ArcGIS Solve function output. Each purple line, with one highlighted in cyan as a demonstration, represents an optimal path. 1's represent the
starting point and 2's represent the ending point of a vehicle route segment. As identification and comparison with the optimal route is key to the analysis of the vehicle route segments and use of the vehicle-locating data, the optimal route workflow is now described in more detail. To explain the workflow of creating the Optimal Route from the Valid.py vehiclelocating points layer, we must first make a separate layer of the start and end points for each
39

route segment. With this separate layer, we can then run Solve similar to that of Valid.py, except we use the ArcGIS code rather than our developed code as in Valid.py to find the output. Once this step is completed, we compare the actual taken route and the optimal route by using the taken route segments (route_demo and merge_vlp19) and the optimal segments (from Solve) to create an Optimal Route Length Difference field. The output is the length difference between the taken route and the optimal route. Optimal Route Workflow
1. Create a layer of start and end points for each route segment. a. Use the ArcGIS function Feature Vertices To Points. i. Select "Both start and end vertex" for Point Type, as shown in figure 24.
Figure 24. Screenshot. ArcGIS Feature Vertices To Points function.
2. Delete extraneous points created from Feature Vertices To Points. a. Extraneous points are created for route segments with more than two vehiclelocating points present, as the segments themselves are the merging of multiple
40

vehicle route segments. Figure 25 shows the display of selected points from this function.
Figure 25. Screenshot. Feature Vertices To Points outputting more than two points for the Start and End Point Layer. Selected points in the attribute table (left) are shown geographically (right). b. Convert Start/End Point Layer to a shapefile for MATLAB identification of extraneous points. MATLAB can read shapefiles, but not Geodatabase Feature Classes. i. Use the function Feature Class To Shapefile. c. Identify points using the user-developed MATLAB algorithm to keep the sequential first and last points of the start and end points for each route segment. This way, only one optimal route will be created between the start and end points. The code to perform this operation is shown in figure 26.
41

Function [msg] = startend_removal(filename)
% read shapefile S = shaperead(filename); %demo: `merge_vlp19_startend.shp'
% cell transposed to keep formatting consistent from ArcGIS to MATLAB cell = transpose(struct2cell(S));
id = cell(:,5); route_id = cell2mat(id);
% find indices of delete-able rows, initialization delete_row_less = zeros(size(cell,1),1); delete_row_more = zeros(size(cell,1),1);
% when the indice of route is less than the same for another for I = 1:size(cell,1) %for startID being same
for j = 1:size(cell,1) if route_id(i) == route_id(j) %for same route if i < j delete_row_less(i) = 1; end end
end end
% when the index of route is greater than the same for another for i = 1:size(cell,1) %for startID being same
for j = 1:size(cell,1) if route_id(i) == route_id(j) %for same route if i > j delete_row_more(i) = 1; end end
end end
indices = zeros(size(cell,1),1);
% union less and more matrix to get the in-between vertices for k = 1:size(cell,1)
if delete_row_less(k) == 1 if delete_row_more(k) == 1 indices(k) = k; end
end end
indices(indices==0) = []; % remove all ''s from array indices = string(indices);
% format the function output to be pasted into ArcGIS""Select By Attribute"" SQL_msg =`'OBJECTID =`' + indices +`' OR`'; msg = sprintf`'%s\'',string(SQL_msg)); MATLAB startend_removal.m
Figure 26. Code. Code to delete extraneous points.
d. Copy and paste startend_removal.m output into the ArcGIS Select By Attributes
function in the SQL. This output will look similar to that of step 7 of the Valid.py
Workflow. Once all extraneous features are selected, use the Delete Selection
42

function. Make sure all points are properly selected to avoid deleting important data in the dataset. 3. Run Make Route Analysis Layer to access the Network Analyst toolbox (in this case, it is named: route_demo). 4. Import Stops, where stops are your edited Start/End Point Layer. 5. Connect the points so that the Solve function can create separate route segments. a. When Stops are imported initially, they are not given a value in the "Route Name" field. This is the identifying field that helps the Solve algorithm connect points when creating a route segment through the Road Network Dataset (LRSN_GDOT). To do this, since the points are listed sequentially and every route has now two points representing the start and end point, we must fill in the "Route Name" field so that the two points have a separate and unique value for every route. The result is that Object ID 1 and 2 will be linked, Object ID 3 and 4 will be linked, Object ID 5 and 6 will be linked, etc.
i. On the Attribute Table, click on Calculate Field. Input the function, RouteName = math.ceil(!Sequence!/2), where !Sequence! represents the ObjectID of the Stops Attribute Table. An example of the Stops Attribute Table is shown in figure 27. Math.ceil is a Python function that rounds a decimal up to the nearest integer. Therefore, for example, Sequence = 3 and Sequence = 4 both round up to RouteName = 2 since math.ceil(3/2) and math.ceil(4/2) both equal 2, respectively. This step links the sequence of points into specific vehicle routes.
43

Figure 27. Screenshot. Stops Attribute Table.
6. Execute Solve or Run in Network Analyst. In route_demo, the "Routes" layer will be filled in. "Sort Ascending" based on "FirstStopID" and export to another Feature Class to continue processing on the routes; the function that enables this is Feature Class To Feature Class. Name the resulting feature class "optimal_route_vlp[ ]".
7. Create matching count arrays in both the Optimal Route and VLP Route layers. We name this layer "num" to be a sequential numbering of each Object in the Attribute Table, 1: (last ObjectID of the table). This output is a sequence of number up to the number of objects, i.e., 1,2,3,4,5,...,[num. objects]. a. In each attribute table, enter the Calculate Field. Enter this as the Field Type: Short (small integer). i. For the Optimal Route Layer, script: Num = !Name! ii. For the VLP Route Layer, script:
44

Num = autoIncrement() Code Block: rec = 0 def autoincrement()
global rec pStart = 1 pInterval = 1 if (rec == 0)
rec = pStart else:
rec += pInterval return rec 8. Join the Optimal and VLP Route Attribute Tables via the Add Join function, as shown in figure 28. Important: Use the Validate Join button to ensure all records are matched together, as rows unmatched could be deleted if not checked. a. Input Table: VLP Route [ ] b. Join Table: optimal_route_vlp[ ] c. Input Join Field/Join Table Field = num
45

Figure 28. Screenshot. Add Join. 9. To complete the optimal route workflow, as we now have the information for the taken
route and optimal route in the same row, we can create a field in the VLP table that compares the two route segments together. In our method of approaching the problem, we use the difference of segment lengths as an indication of how suboptimal a route is. For example, for a vehicle route segment that has taken an optimal path, their routes should be identical and therefore length_difference should be equal to 0. A suboptimal route should differ significantly in its route length compared to the taken route. The result is the difference in length between the taken route and optimal route. In order to calculate the difference between the two routes:
a. In the attribute field, add a new field named length_difference, as a double. b. Use the function Calculate Field.
length_difference = math.fabs(![VLP Shape Length]! - ![Optimal Shape Length]!)
46

i. Math.fabs is a Python function that calculates the absolute value of an input.
Dependent Variable WebEOC Incident Presence Finally, the objective is to use the independent variables described above in order to predict road blockages in the network, in this case measured by road incidents as recorded by WebEOC. Thus, we use the WebEOC incident data as our model output, with the presence of an incident being the binary dependent variable for prediction. We conduct two regression and classification models for the datasets, split on the binary dependent variable between two time frames: 1 week and 1 month. These time frames are selected to ensure sufficient data (i.e., sufficient numbers of individual data points) for the training and testing of the models. If a vehicle route segment is within 100 ft of a WebEOC incident within the time frames, a "1" is given for the presence of the traffic incident. A "0" represents the vehicle route segment not being in the presence of a traffic incident with the same constraints. The WebEOC incidents in the dataset within these time frames are shown in figure 29. Note that with an increase in data and processing capabilities, the model should be trained on the incident being within an hour or less of the vehicle driving by, to be able to pinpoint when a traffic incident has occurred and the application of a real-time monitoring system. This study focuses on a particular subset of the data, with the scope focused on Fulton County and the longer time frames to demonstrate the model's feasibility and applicability.
47

Figure 29. Screenshots. WebEOC incidents within 1 week of data timestamps (left, in green) and WebEOC incidents within 1 month (right, in pink).
MODEL RESULTS We explore the creation and performance of two types of machine learning models for this work: a regression model and a classification model. In particular, to analyze the results of our models, we use and compare two specific methods: (1) ordinary least squares (OLS) regression (Generalized Linear Regression, Continuous model type) implemented in ArcGIS, and (2) a classification decision tree implemented in MATLAB using the Statistics and Machine Learning Toolbox. Fit is assessed using the R2 measure for the regression model and using a confusion matrix giving the number of correctly and incorrectly classified instances for the classification model. Table 1 and table 2 show the model results for the WebEOC incidents within 1 week and 1 month, respectively, for the regression model.
48

Variable

Table 1. Regression model results for WebEOC 1 week.

Coefficients

Standard error

t-Statistic

Length_Difference

0

0

27.0804

Temperature

0.0008

653.339

0

Precipitation

0.014

26617.5168

0

AADT

0

0

8.2446

Intercept

-0.0129

0.0034

-3.7491

GLR Diagnostics

Property

Multiple R-Squared

Adjusted R-Squared

Akaike's Information Criterion (AIC)

Akaike's Information Criterion corrected (AICc)

Probability 0 1 1 0
0.0002
Value 0.194 0.1936 -2102.1726 -2102.1726

Variable

Table 2. Regression model results for WebEOC 1 month.

Coefficients

Standard error

t-Statistic

Length_Difference

0

0

31.8693

Temperature

0.003

829.6213

0

Precipitation

-0.002

33799.3864

0

AADT

0

0

12.4488

Intercept

-0.0248

0.0044

-5.6609

GLR Diagnostics

Property

Multiple R-Squared

Adjusted R-Squared

Akaike's Information Criterion (AIC)

Akaike's Information Criterion corrected (AICc)

Probability 0 1 1 0 0
Value 0.3499 0.3496 1402.5739 1402.5739

49

Our 1-month regression model outputs a larger R2 value than that of the 1-week model, indicating better fit and predictive power. Approximately 35 percent of the variation in diagnosing the presence of a traffic incident can be explained by our model's four independent variables. The correlation coefficient of 0.59 represents a medium-to-strong correlation between the dependent and independent variables. The large t-statistics of AADT and Optimal Route Length Difference indicate that these variables are significant in predicting the outcome of the dependent variable, i.e., in predicting the presence of a road blockage. Weather variables, Precipitation and Temperature, are shown not to have a significant effect on the dependent variable, which can be attributed to lack of diversification of data (as only three days out of the year are represented in the vehicle-locating dataset), statistical insignificance, or a combination of the two. For the significant variables, the significance of the Optimal Route Length Difference variables indicates that the vehicle-locating data can be used as a significant predictor of road blockages in the road network. This is a promising result for the objectives of this study to investigate the utility and use of GDOT vehicle-locating data to perform the assessment of the state of the road network, including the detection of potential road blockages.
Next, results from the decision tree classification model are shown. Figure 30 shows the resulting classification decision tree from the built model. Each branch in the model is shown, with the tree branching by nodes. Each node represents a predictive decision made by the model to arrive at an estimate for whether or not a route blockage is present. End nodes (leaves) represent these binary predictions. In this model, x1= Optimal Route Length Difference, x2 = Daily High Temperature, x3 = Precipitation, and x4 = AADT.
50

Figure 30. Screenshot. Classification decision tree visualized in MATLAB. Each node represents a predictive decision made by the model to arrive at an estimate for whether or not a route blockage is present. End nodes (leaves) represent these binary predictions. In
this model, x1 = Optimal Route Length Difference, x2 = Daily High Temperature, x3 = Precipitation, x4 = AADT.
With the classification decision tree model built, we can arrive at results for accuracy in the classification prediction. Figure 31 shows the confusion matrix results for the 1-week and 1-month classification models. In the confusion matrices, both the True Class from the datasets and the Predicted Class from the models are shown. Class 1 indicates no-blockage is present; Class 2 indicates a route blockage is present.
51

Figure 31. Screenshots. Confusion matrices for 1-week and 1-month classification models, with row summaries (right of each matrix) also shown.
In figure 31, the number of data points falling in each category based on True Class and Predicted Class is shown. If we take the presence of a route blockage as the "positive" class, and no route blockage present as the "negative" class, then True Class = 1 and Predicted Class = 1 indicates a true negative (TN), upper left in the confusion matrix; True Class = 1 and Predicted Class = 2 indicates a false positive (FP), upper right in the confusion matrix; True Class = 2 and Predicted Class = 2 indicates a true positive (TP), lower right in the confusion matrix; and True Class = 2 and Predicted Class = 1 indicates a false negative (FN), lower left in the confusion matrix.
From these values, we can calculate the accuracy and performance of the classification model. In particular, we are interested in the recall and precision of the models. Recall indicates the ability of a classification model to identify the data points in a relevant class and is calculated as +. Precision, on the other hand, indicates the ability of a classification model to return only the data points in a class and is calculated as +.
52

In this case, for the 1-week model, Recall = 82.1 percent and Precision = 88.3 percent. The 1-month yields stronger results, with Recall = 92.0 percent and Precision = 92.4 percent. Precision is particularly important with this analysis, as the interpretation of the results is in correctly deducing that there is a blockage present. Both of the classification decision tree models perform well in this area, in terms of using the data to result in a classification decision with high precision. Finally, we can assess accuracy of the models by looking at the percentage of correctly classified instances in the no-blockage class. For this class, the model performs even better, with 99.3 percent accuracy for the 1-week model, and 98.9 percent accuracy in the 1-month model, for detecting cases with no blockages. The results of this analysis are equally important to be able to know which routes are passable across the road network.
53

CHAPTER 6. DISCUSSION AND CONCLUSIONS
This project explores the ability to use GDOT vehicle-locating data to assess the state of the road network in Georgia, including identifying road blockages along different segments of the transportation system. The goal is to determine if we are able to utilize data that are currently being collected to perform this network assessment. This novelty is in using a different data source than has been used or explored in the past, specifically mobile vehicle-locating data collected from GDOT-owned vehicles, rather than using stationary data sources such as loop detectors, traffic cameras, or traffic monitoring stations; or public crowdsourced data sources that rely on third parties for data collection and curation.
Through the course of the project, we made several discoveries. First, the data are crucial to the ability to create such a system. The raw dataset of vehicle-locating data is large and, in many cases, messy, with cases of missing data, zero-length data segments, and redundant route segments. Through multiple data trimming and processing methods developed and implemented using ArcGIS-specific Python algorithms, this initially large dataset is made into a usable format to run machine learning models to see the importance of multiple variables, including the vehicle-locating data and associated routing decisions, on the likelihood of road blockage detection. The steps for transforming the data that have been established as part of this project are described in detail and are reliable and repeatable methods that can be implemented with new datasets.
Second, to utilize the vehicle-locating data, we create a workflow to enable comparison of the vehicle routes with optimal routes to detect suboptimal routing decisions that may be indicative
54

of blockages in the road network. This requires multiple steps in the workflow, including the creation of vehicle route segments based on the individual vehicle-locating data points, the linking of segments into routes, the identification of optimal routes between these points, and then the comparison of distances between the actual taken routes (processed from the vehiclelocating data points) and the optimal routes to detect the degree of suboptimal routing and its association with the likelihood of the presence of a road blockage.
Finally, to use this vehicle routing information to assess the state of the road network, we create machine learning models with multiple variables as input to detect the presence of a road blockage. While both regression-based and classification-based models are explored, the classification model, in particular, performs well for this task. We demonstrate that the created models are able to detect road blockages with high accuracy. Specifically, the decision tree classification model is able to detect road blockages with up to 92.0 percent recall and 92.4 percent precision accuracy. In addition, the accuracy for the no-blockage class is up to 99.3 percent, indicating utility of the system for distinguishing between blockage and noblockage cases using a combination of weather, traffic volume, and--of particular interest for this project--vehicle-locating information.
As different transportation-related data sources emerge, there is the opportunity to leverage these data sources for monitoring of the conditions of a transportation network. Rather than relying on external third-party data, this project explores the use of GDOT-collected data for this purpose. In addition, it focuses on the use of data that are already currently being collected, demonstrating the utility of these data in performing road network assessment without the need to invest in new technologies, dedicate additional resources, or implement new instrumentation or infrastructure.
55

Through the use of multiple data processing methods combined machine learning approaches, we show how the vehicle-locating data can be used to perform network assessment and detection of blockages in the road network.
56

CHAPTER 7. RECOMMENDATIONS
With the results from this project, we offer several recommendations for future work and for expanding the scope of the study outcomes.
The first recommendation relates to data processing and computational times. Given the density and amount of data processed, significant computational times are required to process the datasets, and particularly the vehicle-locating data. Datasets on the order of thousands of points take many hours to process, analyze, and complete. To address this, a subset of the data was used for analysis. Specifically, one of 19 subdivisions of the vehicle-locating data was used, and the study focused on data in Fulton County, chosen for its centrality of vehicle traffic in the state and the number of incidents reported in the area. In addition, multiple analysis tools were written to trim, process, and reduce the amount of data identified as relevant for use in the analysis. Even for this dataset, the computational times required were significant. While the detailed level of analysis conducted in this study, and therefore the required large computational times, is desired for this work--to be able to look at individual vehicle tracks at the micro level--further computational solutions will enable more vehicle-locating points to be used for the analysis.
The second recommendation is to expand the analysis to increased datasets. Such analyses and tests will increase confidence in the model outcomes and system results for use and integration with other GDOT platforms, such as through the Transportation Management Center, to increase situational awareness for GDOT of the state of the network at any point in time. The structure of the current system developed through this project that uses existing systems and datasets such as WebEOC will facilitate this system integration. It is noted that while the processing of large
57

historical datasets is computationally intensive, as previously described, the processing of any two consecutive vehicle-locating data points (separated by 2 minutes) is able to be accomplished in less than 2 minutes, supporting the desired real-time outcomes of the project. The results shown in this study support potential real-time implementation of this system, as long as processing capability is present. ArcGIS would be working through a large subset of data to train, but if capable of turning real-time data into points to be strung together through the Valid.py and the Solve algorithms, the model could learn the signs of what might point toward the presence of a road blockage. The classification model will be most important for the learning of the indicators as seen through a decision tree, while the OLS regression will be important in terms of assessing the strength of the model and regression coefficient weighting. The number of independent variables that are being input to the model can also be expanded to improve model predictability and performance. All of these activities can be built on the framework created and developed through this project and discussed in this report.
58

APPENDIX
ARCGIS/ARCPY PYTHON GLOSSARY Below are the definitions of varying functions used in ArcGIS and ArcPy as part of the ArcGIS and Python functions and codes that have been developed in this project.
a) Add Join
arcpy.management.AddJoin(in_layer_or_view, in_field, join_table, join_field, {join_type}, {index_join_fields})
b) Buffer
arcpy.analysis.Buffer(in_features, out_feature_class, buffer_distance_or_field, {line_side}, {line_end_type}, {dissolve_option}, {dissolve_field}, {method})
c) Calculate Field
arcpy.management.CalculateField(in_table, field, expression, {expression_type}, {code_block}, {field_type}, {enforce_domains})
d) Clip
arcpy.analysis.Clip(in_features, clip_features, out_feature_class, {cluster_tolerance})
e) Convert Time Field
arcpy.management.ConvertTimeField(in_table, input_time_field, {input_time_format}, output_time_field, {output_time_type}, {output_time_format})
f) Delete Selection
arcpy.management.DeleteFeatures(in_features)
g) Feature Compare
arcpy.management.FeatureCompare(in_base_features, in_test_features, sort_field, {compare_type}, {ignore_options}, {xy_tolerance}, {m_tolerance}, {z_tolerance}, {attribute_tolerances}, {omit_field}, {continue_compare}, {out_compare_file})
59

h) Feature Class To Feature Class
arcpy.conversion.FeatureClassToFeatureClass(in_features, out_path, out_name, {where_clause}, {field_mapping}, {config_keyword})
i) Feature To Point
arcpy.management.FeatureToPoint(in_features, out_feature_class, {point_location})
j) Feature Class To Shapefile
arcpy.conversion.FeatureClassToShapefile(Input_Features, Output_Folder)
k) Generalized Linear Regression (GLR)
arcpy.stats.GeneralizedLinearRegression(in_features, dependent_variable, model_type, output_features, explanatory_variables, {distance_features}, {prediction_locations}, {explanatory_variables_to_match}, {explanatory_distance_matching}, {output_predicted_features})
l) Make Route Analysis Layer (Import Stops, Run, Routes)
arcpy.na.MakeRouteAnalysisLayer(network_data_source, {layer_name}, {travel_mode}, {sequence}, {time_of_day}, {time_zone}, {line_shape}, {accumulate_attributes}, {generate_directions_on_solve}, {time_zone_for_time_fields}, {ignore_invalid_locations})
m) Merge
arcpy.management.Merge(inputs, output, {field_mappings}, {add_source})
n) Segment Along Line
arcpy.segmentAlongLine (start_measure, end_measure, {use_percentage})
o) Select By Attributes
arcpy.management.SelectLayerByAttribute(in_layer_or_view, {selection_type}, {where_clause}, {invert_where_clause})
p) Solve
arcpy.na.Solve(in_network_analysis_layer, {ignore_invalids}, {terminate_on_solve_error}, {simplification_tolerance}, {overrides})
60

q) Summary Statistics
arcpy.analysis.Statistics(in_table, out_table, {statistics_fields}, {case_field})
r) XY Table To Point
arcpy.management.XYTableToPoint(in_table, out_feature_class, x_field, y_field, {z_field}, {coordinate_system})
USER CODE
The below functions and descriptions apply to the user-generated processing codes. Included are the GetSegments, mxFindRoutes (as part of the Valid.py Function File), and Valid.py codes.
GetSegments Code
import arcpy import pandas as pd # get parameters from the toolbox interface tbl_Segments = arcpy.GetParameterAsText(0) field_route_name_tbl = arcpy.GetParameterAsText(1) field_start_position = arcpy.GetParameterAsText(2) field_end_position = arcpy.GetParameterAsText(3) field_last_updated = arcpy.GetParameterAsText(4) lyr_route = arcpy.GetParameterAsText(5) field_route_name_lyr = arcpy.GetParameterAsText(6) workspace_output = arcpy.GetParameterAsText(7) result_route_name = arcpy.GetParameterAsText(8) # read segment table from the csv file df_segment_position = pd.read_csv(tbl_Segments) # detect the workspace type dec_workspace = arcpy.Describe(workspace_output) type_workspace = dec_workspace.workspaceType # if workspace is a folder, export a shapefile (.shp)
61

if type_workspace == "FileSystem": result_route_name = result_route_name + ".shp"
result_route_layer_path = workspace_output + "\\" + result_route_name
# get spatial reference of the route layer spRf = arcpy.Describe(lyr_route).spatialReference
# create segments feature class arcpy.management.CreateFeatureclass(workspace_output, result_route_name, geometry_type="POLYLINE", spatial_reference=spRf) arcpy.management.AddField(result_route_layer_path, "Route_name", "TEXT", None, None, 100) arcpy.management.AddField(result_route_layer_path, "S_Position", "DOUBLE") arcpy.management.AddField(result_route_layer_path, "E_Position", "DOUBLE") arcpy.management.AddField(result_route_layer_path, "LastUpdated", "Date")
# insert segments into segments feature class in_Cur = arcpy.da.InsertCursor(result_route_layer_path, ["SHAPE@", "Route_name", "S_Position", "E_Position", "LastUpdated"])
# no. of routes for counting and defining progressor n_route = df_segment_position.count()[0]
arcpy.SetProgressor("Step", "processing....", 0, n_route, 1)
i = 0
# loop segments' row in segments table, get the segment from road network. for index, row in df_segment_position.iterrows():
route_name = row[field_route_name_tbl] start_position = row[field_start_position] end_position = row[field_end_position] last_updated = row[field_last_updated]
i = i + 1
arcpy.SetProgressorPosition() arcpy.SetProgressorLabel("processing " + route_name + "...... " + "{}/{}, {:.1f}%".format(i, n_route, i * 100.0 / (n_route)))
s_cur_route = arcpy.da.SearchCursor(lyr_route, ["Shape@", field_route_name_lyr], "{0}
62

='{1}'".format(field_route_name_lyr, route_name))
try: s_c_route = s_cur_route.next()
segments_shp = s_c_route[0].segmentAlongLine(start_position, end_position) in_Cur.insertRow((segments_shp, route_name, start_position, end_position, last_updated)) arcpy.AddMessage("{}: start from {}, end at{}, last updated {}, OK".format(route_name, start_position, end_position, last_updated))
# if the route in the csv file does not have corresponding name in the route network layer, return a "NotOK"
except StopIteration: arcpy.AddMessage("{}: start from {}, end at{}, last updated {},
NotOK".format(route_name, start_position, end_position, last_updated)) continue
del in_Cur
arcpy.ResetProgressor()
GetSegments Code
mxFindRoutes Code
import arcpy import os
# to export direction information using "AddLocation method", xml format should be used import xml.dom.minidom as xmld
class FindRoutes():
def __init__(self, track_points, track_points_name, order_field, ignition_field, network_NAlyr, output_db_path, output_routes_name, output_statistical_tbl_name):
self.track_points = track_points self.track_points_name = track_points_name self.order_field = order_field self.ignition_field = ignition_field self.network_NAlyr = network_NAlyr self.output_db_path = output_db_path self.output_routes_name = output_routes_name self.output_statistical_tbl_name = output_statistical_tbl_name
63

self.lyr_tem = arcpy.MakeFeatureLayer_management(self.track_points, "layer_tem")
self.dic_points_name = {} self.dic_points_ignition = {} self.dic_points_counter = {} self.dic_points_fixtime = {}
self.result_routes = os.path.join(output_db_path, output_routes_name) self.result_statistical_table = os.path.join(output_db_path, output_routes_name)
# add counter field to layer # counter is defined to account for individual travels (judging from Ignition status) arcpy.AddField_management(track_points, "Counter", "SHORT")
# assign counters i = 1 up_cur = arcpy.da.UpdateCursor(track_points, [order_field, ignition_field, "Counter"], sql_clause=(None, "ORDER BY " + order_field)) for up_c in up_cur:
up_c[2] = i if up_c[1] == "Off":
i = 0 i = i + 1 up_cur.updateRow(up_c)
# create the statistical table self.result_table = arcpy.CreateTable_management(output_db_path, output_statistical_tbl_name) self.statis_tbl_fields = ["Track_Point_Name",
"Before2p", "Before2p_dif", "Before3p", "Before3p_dif", "Before4p", "Before4p_dif", "Before5p", "Before5p_dif", "Ignition", "Counter", "FixTime"]
# define the format of the cell using AddField function for add_f in self.statis_tbl_fields:
if add_f == "Track_Point_Name" or add_f == "Ignition":
64

arcpy.AddField_management(self.result_table, add_f, "TEXT") elif add_f == "FixTime":
arcpy.AddField_management(self.result_table, add_f, "Date") else:
arcpy.AddField_management(self.result_table, add_f, "DOUBLE")
# scan the track points s_cur = arcpy.da.SearchCursor(track_points, ["shape@", track_points_name, order_field, ignition_field, "Counter"], sql_clause=(None, "ORDER BY " + order_field))
n = 0 for s_c in s_cur:
n = n + 1 self.dic_points_name[n] = s_c[1] self.dic_points_fixtime[n] = s_c[2] self.dic_points_ignition[n] = s_c[3] self.dic_points_counter[n] = s_c[4]
RouteSubLayer = arcpy.na.GetNAClassNames(network_NAlyr)
self.routeSlyr_stops = RouteSubLayer["Stops"] self.routeSlyr_route = RouteSubLayer["Routes"]
arcpy.na.AddFieldToAnalysisLayer(network_NAlyr, self.routeSlyr_route, "Direction", "TEXT", field_length=100000)
arcpy.na.AddFieldToAnalysisLayer(network_NAlyr, self.routeSlyr_route, "FixTime_Start", "Date", field_length=1000)
arcpy.na.AddFieldToAnalysisLayer(network_NAlyr, self.routeSlyr_route, "FixTime_Current", "Date", field_length=1000)
# function for insert data into statistial table def insert_statis_table(self,
Track_point_name, Before2p=None, Before2p_dif=None, Before3p=None, Before3p_dif=None, Before4p=None, Before4p_dif=None, Before5p=None, Before5p_dif=None, Ignition=None, Counter=None, FixTime=None):
65

inst_val = [Track_point_name, Before2p, Before2p_dif, Before3p, Before3p_dif, Before4p, Before4p_dif, Before5p, Before5p_dif, Ignition, Counter, FixTime]
inst_statis_tbl = arcpy.da.InsertCursor(self.result_table, self.statis_tbl_fields) inst_statis_tbl.insertRow(inst_val) del inst_statis_tbl
# function for setup stops for "AddLocation" and "Solve"; set up messages def stopsSetup(self, current_point=3, checkBack=2):
stops1 = None stops2 = None message1 = None message2 = None
if current_point >= 3 and checkBack == 2: stops1 = [self.dic_points_name[current_point - 2], self.dic_points_name[current_point]] message1 = "{0}<--{1}, ignition:{2},
counter:{3}".format(self.dic_points_name[current_point], self.dic_points_name[current_point - 2], self.dic_points_ignition[current_point], self.dic_points_counter[current_point])
stops2 = [self.dic_points_name[current_point - 2], self.dic_points_name[current_point - 1], self.dic_points_name[current_point]]
message2 = "{0}<--{1}<--{2}, ignition:{3}, counter:{4}".format(self.dic_points_name[current_point],
self.dic_points_name[current_point - 1], self.dic_points_name[current_point - 2], self.dic_points_ignition[current_point], self.dic_points_counter[current_point])
66

if current_point >= 4 and checkBack == 3: stops1 = [self.dic_points_name[current_point - 3], self.dic_points_name[current_point]] message1 = "{0}<--{1}, ignition:{2},
counter:{3}".format(self.dic_points_name[current_point], self.dic_points_name[current_point - 3], self.dic_points_ignition[current_point], self.dic_points_counter[current_point])
stops2 = [self.dic_points_name[current_point - 3], self.dic_points_name[current_point - 2], self.dic_points_name[current_point - 1], self.dic_points_name[current_point]]
message2 = "{0}<--{1}<--{2}<--{3}, ignition:{4}, counter:{5}".format(self.dic_points_name[current_point],
self.dic_points_name[current_point - 1], self.dic_points_name[current_point - 2], self.dic_points_name[current_point - 3], self.dic_points_ignition[current_point], self.dic_points_counter[current_point]) if current_point >= 5 and checkBack == 4: stops1 = [self.dic_points_name[current_point - 4], self.dic_points_name[current_point]] message1 = "{0}<--{1}, ignition:{2}, counter:{3}".format(self.dic_points_name[current_point], self.dic_points_name[current_point - 4], self.dic_points_ignition[current_point], self.dic_points_counter[current_point])
stops2 = [self.dic_points_name[current_point - 4], self.dic_points_name[current_point - 3], self.dic_points_name[current_point - 2], self.dic_points_name[current_point - 1], self.dic_points_name[current_point]]
message2 = "{0}<--{1}<--{2}<--{3}<--{4}, ignition:{5}, counter:{6}".format(self.dic_points_name[current_point],
self.dic_points_name[current_point - 1], self.dic_points_name[current_point - 2], self.dic_points_name[current_point - 3], self.dic_points_name[current_point - 4], self.dic_points_ignition[current_point], self.dic_points_counter[current_point]) if current_point >= 6 and checkBack == 5: stops1 = [self.dic_points_name[current_point - 5],
67

self.dic_points_name[current_point]] message1 = "{0}<--{1}, ignition:{2}, counter:{3}".format(self.dic_points_name[current_point],
self.dic_points_name[current_point - 5], self.dic_points_ignition[current_point], self.dic_points_counter[current_point])
stops2 = [self.dic_points_name[current_point - 5], self.dic_points_name[current_point - 4], self.dic_points_name[current_point - 3], self.dic_points_name[current_point - 2], self.dic_points_name[current_point - 1], self.dic_points_name[current_point]]
message2 = "{0}<--{1}<--{2}<--{3}<--{4}<--{5}, ignition:{6}, counter:{7}".format(self.dic_points_name[current_point],
self.dic_points_name[current_point - 1],
self.dic_points_name[current_point - 2],
self.dic_points_name[current_point - 3],
self.dic_points_name[current_point - 4],
self.dic_points_name[current_point - 5],
self.dic_points_ignition[current_point],
self.dic_points_counter[current_point])
return stops1, stops2, message1, message2
# xml format is adopted to extract the direction information def getDirection(self):
dr = arcpy.na.Directions(self.network_NAlyr, "XML").getOutput(0) dr_list = [] dom = xmld.parse(dr) root = dom.documentElement rs = root.getElementsByTagName("STRING")
for r in rs: if r.getAttribute("style") in ["depart", "normal", "arrive"]: dr_list.append(r.getAttribute("text"))
68

dr_str0 = " --> ".join(dr_list)
return dr_str0
# function for finding routes from point 1-2 def findRoutesBack1p(self, current_point=2):
stops = [self.dic_points_name[current_point - 1], self.dic_points_name[current_point]]
# In ArcGIS Pro, the FindRoutes function can only be completed by online routing service, for which the result is not extractable. Therefore, we now use Addlocation and Solve to solve for the routes between points
sql = str(stops).replace("[", "").replace("]", "").replace("u", "") arcpy.management.SelectLayerByAttribute(self.lyr_tem, "NEW_SELECTION", "{} in({})".format(self.track_points_name, sql)) arcpy.na.AddLocations(self.network_NAlyr,
self.routeSlyr_stops, self.lyr_tem, "Name {} #".format(self.track_points_name), sort_field=self.order_field, append="CLEAR") arcpy.na.Solve(self.network_NAlyr)
result_route = self.routeSlyr_route
dr_str = self.getDirection()
message = "{0}<--{1}, ignition:{2}, counter:{3}".format(self.dic_points_name[current_point],
self.dic_points_name[current_point - 1], self.dic_points_ignition[current_point], self.dic_points_counter[current_point])
up_cur = arcpy.da.UpdateCursor(result_route, ["shape@", "Name", "Direction", "FixTime_Start", "FixTime_Current"])
route_shp = None for up_c in up_cur:
up_c[1] = message up_c[2] = dr_str up_c[3] = self.dic_points_fixtime[current_point - 1] up_c[4] = self.dic_points_fixtime[current_point] route_shp = up_c[0] up_cur.updateRow(up_c)
69

del dr_str
if not arcpy.Exists(self.result_routes): arcpy.CopyFeatures_management(result_route, self.result_routes)
else: arcpy.Append_management(result_route, self.result_routes)
return route_shp, message
# function for finding routes from point >= 3 def findRoutesBack2p(self, current_point, backward, checkIndex):
stops = None message = None
stops1, stops2, message1, message2 = self.stopsSetup(current_point, backward)
if checkIndex == 1: stops = stops1 message = message1
elif checkIndex == 2: stops = stops2 message = message2
sql = str(stops).replace("[", "").replace("]", "").replace("u", "") arcpy.management.SelectLayerByAttribute(self.lyr_tem, "NEW_SELECTION", "{} in({})".format(self.track_points_name, sql)) arcpy.na.AddLocations(self.network_NAlyr,
self.routeSlyr_stops, self.lyr_tem, "Name {} #".format(self.track_points_name), sort_field=self.order_field, append="CLEAR") arcpy.na.Solve(self.network_NAlyr) # route solving algorithm, arcgis function
result_route = self.routeSlyr_route
dr_str = self.getDirection()
up_cur = arcpy.da.UpdateCursor(result_route, ["shape@", "Name", "Direction", "FixTime_Start", "FixTime_Current"])
70

route_shp = None for up_c in up_cur: # just recording
up_c[1] = message up_c[2] = dr_str up_c[3] = self.dic_points_fixtime[current_point - backward] up_c[4] = self.dic_points_fixtime[current_point] route_shp = up_c[0] up_cur.updateRow(up_c)
del dr_str
arcpy.Append_management(result_route, self.result_routes)
return route_shp, message
mxFindRoutes Code (Valid.py Function File)
Valid.py Code
import arcpy import mxFindRoutes_pro
# get parameters from tool track_points = arcpy.GetParameterAsText(0) track_points_name = arcpy.GetParameterAsText(1) order_field = arcpy.GetParameterAsText(2) ignition_field = arcpy.GetParameterAsText(3) NDS = arcpy.GetParameterAsText(4) output_db = arcpy.GetParameterAsText(5) result_routes_name = arcpy.GetParameterAsText(6) result_table_name = arcpy.GetParameterAsText(7)
# setup the workspace can be overwrote overw = arcpy.env.overwriteOutput
arcpy.env.overwriteOutput = True
# beginning of the tool myFindRoute = mxFindRoutes_pro.FindRoutes(track_points, track_points_name, order_field, ignition_field, NDS, output_db, result_routes_name, result_table_name)
# get the numbers of the track points, to set up the progressor n = len(myFindRoute.dic_points_name)
71

arcpy.SetProgressor("step", "calculating route...", 0, n - 1, 1) for i in range(1, n + 1):
point_name = myFindRoute.dic_points_name[i] arcpy.SetProgressorPosition() arcpy.SetProgressorLabel("calculating route of point " + point_name + "......") bf2p = bf2p_dif = bf3p = bf3p_dif = bf4p = bf4p_dif = bf5p = bf5p_dif = None ignition = myFindRoute.dic_points_ignition[i] counter = myFindRoute.dic_points_counter[i] fixtime = myFindRoute.dic_points_fixtime[i] # 1st point of the travel if counter == 1:
arcpy.AddMessage("p{0}:ignition:{1}, counter:{2}".format(str(i), ignition, counter))
# 2nd point of the travel if counter == 2:
r_shp, msg = myFindRoute.findRoutesBack1p(i) arcpy.AddMessage("p" + str(i) + ":" + msg)
# 3rd point of the travel if counter >= 3:
r_shp1_1, msg = myFindRoute.findRoutesBack2p(i, 2, 1) arcpy.AddMessage("p" + str(i) + ":" + msg) shp_route1_1 = r_shp1_1 r_shp1_2, msg = myFindRoute.findRoutesBack2p(i, 2, 2) arcpy.AddMessage("p" + str(i) + ":" + msg) shp_route1_2 = r_shp1_2
bf2p = 1 bf2p_dif = 0
72

# Sometimes the recorded points are too close to produce a route and if not specified, it will cause errors and the termination of the program. Basically, the error is from the result; when the result is none (no routes were returned), it will be impossible for comparison.
# To solve for this problem, the following codes were added (same for below): if shp_route1_2 is None and shp_route1_1 is not None:
bf2p = 0 bf2p_dif = 0 - shp_route1_1.length
elif shp_route1_2 is not None and shp_route1_1 is None: bf2p = 0 bf2p_dif = 0 - shp_route1_2.length
elif shp_route1_2 is None and shp_route1_1 is None: bf2p = 1 bf2p_dif = 0
elif not shp_route1_2.equals(shp_route1_1): bf2p = 0 bf2p_dif = shp_route1_2.length - shp_route1_1.length
if counter >= 4: # check 4-1--> 2 or 3 # --route 4-1 r_shp2_1, msg = myFindRoute.findRoutesBack2p(i, 3, 1) arcpy.AddMessage("p" + str(i) + ":" + msg)
shp_route2_1 = r_shp2_1
r_shp2_2, msg = myFindRoute.findRoutesBack2p(i, 3, 2) arcpy.AddMessage("p" + str(i) + ":" + msg)
shp_route2_2 = r_shp2_2
bf3p = 1 bf3p_dif = 0
if shp_route2_2 is None and shp_route2_1 is not None: bf3p = 0 bf3p_dif = 0 - shp_route2_1.length
elif shp_route2_2 is not None and shp_route2_1 is None: bf3p = 0 bf3p_dif = 0 - shp_route2_2.length
73

elif shp_route2_2 is None and shp_route2_1 is None: bf3p = 1 bf3p_dif = 0
elif not shp_route2_2.equals(shp_route2_1): bf3p = 0 bf3p_dif = shp_route2_2.length - shp_route2_1.length
if counter >= 5: # check 5-1--> 2 or 3 or 4 # --route 5-1 r_shp3_1, msg = myFindRoute.findRoutesBack2p(i, 4, 1) arcpy.AddMessage("p" + str(i) + ":" + msg)
shp_route3_1 = r_shp3_1
r_shp3_2, msg = myFindRoute.findRoutesBack2p(i, 4, 2) arcpy.AddMessage("p" + str(i) + ":" + msg)
shp_route3_2 = r_shp3_2
bf4p = 1 bf4p_dif = 0
if shp_route3_2 is None and shp_route3_1 is not None: bf4p = 0 bf4p_dif = 0 - shp_route3_1.length
elif shp_route3_2 is not None and shp_route3_1 is None: bf4p = 0 bf4p_dif = 0 - shp_route3_2.length
elif shp_route3_2 is None and shp_route3_1 is None: bf4p = 1 bf4p_dif = 0
elif not shp_route3_2.equals(shp_route3_1): bf4p = 0 bf4p_dif = shp_route3_2.length - shp_route3_1.length
74

if counter >= 6: # check 6-1--> 2 or 3 or 4 or 5 # --route 6-1 r_shp4_1, msg = myFindRoute.findRoutesBack2p(i, 5, 1) arcpy.AddMessage("p" + str(i) + ":" + msg)
shp_route4_1 = r_shp4_1
r_shp4_2, msg = myFindRoute.findRoutesBack2p(i, 5, 2) arcpy.AddMessage("p" + str(i) + ":" + msg)
shp_route4_2 = r_shp4_2
bf5p = 1 bf5p_dif = 0
if shp_route4_2 is None and shp_route4_1 is not None: bf5p = 0 bf5p_dif = 0 - shp_route4_1.length
elif shp_route4_2 is not None and shp_route4_1 is None: bf5p = 0 bf5p_dif = 0 - shp_route4_2.length
elif shp_route4_2 is None and shp_route4_1 is None: bf5p = 1 bf5p_dif = 0
elif not shp_route4_2.equals(shp_route4_1): bf5p = 0 bf5p_dif = shp_route4_2.length - shp_route4_1.length
myFindRoute.insert_statis_table(point_name, bf2p, bf2p_dif, bf3p, bf3p_dif, bf4p, bf4p_dif, bf5p, bf5p_dif, ignition, counter, fixtime)
75

arcpy.env.overwriteOutput = overw arcpy.ResetProgressor() arcpy.SelectLayerByAttribute_management(track_points, "CLEAR_SELECTION") #arcpy.RefreshActiveView()
Valid.py Code
76

ACKNOWLEDGEMENTS Support for this project from the Georgia Department of Transportation (GDOT) through Award No. RP 20-01 is acknowledged. Discussions with GDOT staff, including John Hibbard, Larry Barnes, and Emily Fish, regarding project scope is acknowledged. Data provided by GDOT staff, including Teague Buchanan, Hong Liang, and their respective team members, is also acknowledged.
77

REFERENCES
Astarita, V., Giofr, V.P., Guido, G., Stefano, G., and Vitale, A. (2020). "Mobile Computing for Disaster Emergency Management: Empirical Requirements Analysis for a Cooperative Crowdsourced System for Emergency Management Operation." Smart Cities, 3(1), pp.31 47. Available online: https://doi.org/10.3390/smartcities3010003.
Bailey, K.D. (2005). "Typology Construction, Methods and Issues." In Kempf-Leonard, K. (Ed.), Encyclopedia of Social Measurement, Elsevier, London, 3, pp. 889898.
Basu, M., Bandyopadhyay, S., and Ghosh, S. (2016). "Post Disaster Situation Awareness and Decision Support Through Interactive Crowdsourcing." Procedia Engineering, 159(2016), pp.167173. Available online: http://dx.doi.org/10.1016/j.proeng.2016.08.151.
ESRI. (2021). GIS Dictionary. (website) Available online: http://webhelp.esri.com/arcgisserver/9.3/java/geodatabases/definition_frame.htm, last accessed May 1, 2021.
Georgia Department of Transportation (GDOT). (2021). "Road and Traffic Data." (website) Atlanta, GA. Available online: http://www.dot.ga.gov/ds/data#tab-4, last accessed September 1, 2021.
Iliopoulou, C., Konstantinidou, M.A., Kepaptsoglou, K.L., and Stathopoulos, A. (2020). "ITS Technologies for Decision Making During Evacuation Operations: A Review." Journal of Transportation Engineering, Part A: Systems, 146(4), 04020010. Available online: https://doi.org/10.1061/JTEPBS.0000329.
Mena-Yedra, R., Casas, J., and Gavald, R. (2018). "Assessing Spatiotemporal Correlations from Data for Short-term Traffic Prediction Using Multi-task Learning." Transportation Research Procedia, 34, pp.155162. Available online: http://dx.doi.org/10.1016/j.trpro.2018.11.027.
Meng, F., Wong, S.C., Wong, W., and Li, Y.C. (2017). "Estimation of Scaling Factors for Traffic Counts Based on Stationary and Mobile Sources of Data." International Journal of Intelligent Transportation Systems Research, 15(3), pp.180191. Available online: http://dx.doi.org/10.1007/s13177-016-0131-1.
Singh, N.K., Vanajakashi, L., and Tangirala, A.K. (2018). "Segmentation of Vehicle Signatures from Inductive Loop Detector (ILD) Data for Real-time Traffic Monitoring." 2018 10th International Conference on Communication Systems & Networks (COMSNETS), January, pp. 601606. Available online: https://doi.org/10.1109/COMSNETS.2018.8328281.
Weather Underground. (2021). "Atlanta, GA Weather History." TWC Product and Technology, Brookhaven, GA. Available online: https://www.wunderground.com/history/monthly/us/ga/atlanta/KATL/date/2021-6, last accessed September 1, 2021.
78

Yale University. (2021). "Linear Regression." (website) Available online: http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm, last accessed September 1, 2021.
79

Locations