GEORGIA DOT RESEARCH PROJECT 21-07 Final Report
FORECASTING TSPLOST REVENUES IN GEORGIA
Office of Performance-based Management and Research 600 West Peachtree NW Atlanta, GA 30308
July 2023
TECHNICAL REPORT DOCUMENTATION PAGE
1. Report No.:
2. Government Accession No.:
FHWA-GA-23-2107
N/A
4. Title and Subtitle:
Forecasting TSPLOST Revenues in Georgia
3. Recipient's Catalog No.: N/A 5. Report Date: October 2023
6. Performing Organization Code:
N/A
7. Author(s):
8. Performing Organization Report No.:
Peter Bluestone, Ph.D. (https://orcid.org/0000-0002-6177-3491);
21-07
Benedict Jimenez, Ph.D. (https://orcid.org/0000-0002-0228-7768); John
Gomez; Kshitiz Shrestha, Ph.D.;
Nicholas Warner
9. Performing Organization Name and Address:
10. Work Unit No.:
Georgia State University
N/A
Center for State and Local Finance
11. Contract or Grant No.:
Address 55 Park Place Ste 774, Atlanta, GA 30303
PI# 0018101
Phone: (404) 413-0264
Email: pbluestone@gsu.edu
12. Sponsoring Agency Name and Address:
13. Type of Report and Period Covered:
Georgia Department of Transportation
Final; Start Month March 2022 End Month
Office of Performance-based Management and Research
October 2023
600 West Peachtree St. NW
14. Sponsoring Agency Code:
Atlanta, GA 30308
N/A
15. Supplementary Notes:
Prepared in cooperation with the U.S. Department of Transportation, Federal Highway Administration.
16. Abstract: Due to the long-term planning involved in the design and construction of transportation infrastructure projects, transportation planners need reliable long-run forecasts and stable sources of revenue over the project timeline. Funding streams typically vary from period to period due to inherent volatility in the funding source or unforeseen events, or both. Additional understanding of the economy and improved forecasting techniques can reduce some of the uncertainty of volatile revenue sources. While improved forecasting techniques are not capable of anticipating unexpected events, such as a hurricane or wildfire, budget strategies do exist that can mitigate volatility in revenues arising from unforeseen events. These strategies include the use of rainy day funds and revenue diversification, among others. This research focuses on three areas of improvement: (1) documenting the economic and demographic factors that influence sales tax collections; (2) reviewing the available forecasting models and determining which types best suit the TSPLOST regions of Central Savannah, River Valley, Heart of Georgia, and Southern Georgia; and (3) analyzing best practices in budgeting for transportation sales taxes using a case study approach.
17. Keywords: TSPLOST
18. Distribution Statement: No Restriction
19. Security Classif.(of this report): 20. Security Classif. (of this page): 21. No. of Pages: 22. Price:
Unclassified
Unclassified
116
Free
Form DOT 1700.7 (8-69)
GDOT Research Project 21-07 Draft Report
FORECASTING TSPLOST REVENUES IN GEORGIA
By Peter Bluestone, Ph.D.
Associate Director John Gomez
Graduate Research Assistant Benedict Jimenez, Ph.D.
Affiliated Research Faculty Kshitiz Shrestha, Ph.D.
Senior Research Associate and
Nicholas Warner Senior Research Associate
Center for State and Local Finance, Georgia State University Contract with
Georgia Department of Transportation In cooperation with
U.S. Department of Transportation, Federal Highway Administration
July 2023
The contents of this report reflect the views of the authors, who are responsible for the facts and accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the Georgia Department of Transportation or the Federal Highway Administration. This report does not constitute a standard, specification, or regulation.
ii
iii
TABLE OF CONTENTS
EXECUTIVE SUMMARY..................................................................................................................1
CHAPTER 1. INTRODUCTION .......................................................................................................7
CHAPTER 2. LITERATURE REVIEW .........................................................................................12 FORECASTING METHODS.....................................................................................................13 DATA QUALITY AND QUANTITY ........................................................................................19 ECONOMIC CYCLES AND UNCERTAINTY .......................................................................19 POLITICAL FACTORS .............................................................................................................20 INTENTIONAL UNDER- AND OVERESTIMATION ..........................................................23
CHAPTER 3. ECONOMIES OF THE TSPLOST REGIONS ......................................................25 RELATIVE SIZE OF TSPLOST REGIONAL ECONOMIES ..............................................25 CHANGES IN SIZE OF REGIONAL ECONOMIES OVER TIME.....................................27
CHAPTER 4. CORRELATIONS.....................................................................................................39 STATE-LEVEL VARIABLES, CORRELATIONS, AND FORECASTING........................42
CHAPTER 5. MODEL COMPARISON .........................................................................................45 MACHINE LEARNING .............................................................................................................51 FORECAST METHODS FOR TSPLOST REVENUES .........................................................56
CHAPTER 6. CASE STUDIES ON BUDGETING IN TSPLOST REGIONS ............................60 MEASURE M, LOS ANGELES COUNTY ..............................................................................61 Sales Tax Levy........................................................................................................................61 Use of Sales Tax .....................................................................................................................61 Project Selection and Delivery..............................................................................................64 Budgeting................................................................................................................................65 TRANSNET, SAN DIEGO COUNTY .......................................................................................66 Sales Tax Levy........................................................................................................................66 Use of Sales Tax .....................................................................................................................66 Project Selection and Delivery..............................................................................................67 Budgeting................................................................................................................................68 CENTRAL VIRGINIA TRANSPORTATION AUTHORITY ...............................................69 Sales Tax Levy........................................................................................................................69 Use of Sales Tax .....................................................................................................................71 Project Selection and Delivery..............................................................................................71 Budgeting................................................................................................................................72 FASTRACKS PROJECT, DENVER RTD ...............................................................................72 Sales Tax Levy........................................................................................................................72
iv
Use of Sales Tax .....................................................................................................................73 Project Selection and Delivery..............................................................................................74 Budgeting................................................................................................................................74 CAPITAL METROPOLITAN TRANSPORTATION AUTHORITY, TEXAS ...................76 Sales Tax Levy........................................................................................................................76 Use of Sales Tax .....................................................................................................................77 Project Selection and Delivery..............................................................................................78 Budgeting................................................................................................................................80 UTAH TRANSIT AUTHORITY................................................................................................81 Sales Tax Levy........................................................................................................................81 Use of Sales Tax .....................................................................................................................81 Project Selection and Delivery..............................................................................................82 Budgeting................................................................................................................................83 CHAPTER 7. CONCLUSIONS ........................................................................................................85 APPENDIX A: STEPS TO CREATE SEASONAL ARIMA MODEL.......................................101 APPENDIX B: MACHINE LEARNING.......................................................................................105 TESTING FOR TSPLOST REVENUES FORECASTS........................................................114 REFERENCES .................................................................................................................................118 REVENUE FORECASTING AS PART OF GOVERNANCE .............................................120 REVENUE FORECASTING AS PART OF CAPITAL PLANNING..................................120 BUDGETING .............................................................................................................................120
v
LIST OF FIGURES
Figure E1. Graph. Examples of preferred models..................................................................................5 Figure 1. Map. Share of state GSP by region (percentage in nominal dollars)....................................26 Figure 2. Graph. TSPLOST regional share of Georgia GSP, 20012020. ..........................................27 Figure 3. Graphs. Nominal GRP, employment, and personal income for TSPLOST regions,
20012020......................................................................................................................................28 Figure 4. Map. Share of GRP by county (percentage in nominal dollars)...........................................29 Figure 5. Graphs. Nominal GRP composition by sector for TSPLOST regions, 20012020. ............31 Figure 6. Graphs. Sectoral composition of regional economic output per capita, employment,
and employment compensation per employee. ..............................................................................33 Figure 7. Graph. Value of nominal GRP of tax base sectors by regions, 20012020. ........................35 Figure 8. Graphs. Revenue from three sales tax generating sectors, 20012020. ...............................36 Figure 9. Graph. The economic output and sales tax collections in total for: (1) retail trade;
(2) accommodation and food services; (3) arts, entertainment, and recreation. ............................37 Figure 10. Graph. Sector and sales tax correlations for Georgia regions and state. ............................40 Figure 11. Graphs. State and regional ARIMA models (continued on next page). .............................49 Figure 12. Graphs. Predictions of the proposed models by region. .....................................................53 Figure 13. Graph. Sample quarterly forecast graph with conservative and optimistic estimates. .......59 Figure 14. Map. Measure M projects map. ..........................................................................................63 Figure 15. Chart. Los Angeles County Metropolitan Transportation Authority management
organizational chart........................................................................................................................65 Figure 16. Chart. SANDAG organizational chart. ...............................................................................68 Figure 17. Map. CVTA localities.........................................................................................................70 Figure 18. Chart. RTD organizational chart.........................................................................................74 Figure 19. Map. CapMetro service area...............................................................................................77 Figure 20. Chart. CapMetro organizational chart. ...............................................................................79 Figure 21. Chart. UTA organizational chart. .......................................................................................83 Figure 22. Graph. Annual regional sales tax collection, actual and predicted.....................................89 Figure 23. Graphs. Quarterly regional sales tax collection, actual and predicted................................91 Figure 24. Chart. Cross-validation in time series...............................................................................104 Figure 25. Graphs. Predictions of the proposed models by region. ...................................................109
LIST OF TABLES
Table 1. Forecasted values and sales tax correlations for regions and state. .......................................43 Table 2. UTA contributions from other governments..........................................................................82 Table 3. RMSE of the proposed models by TSPLOST region. .........................................................117
vi
EXECUTIVE SUMMARY
Due to the long-term planning involved in the design and construction of transportation infrastructure projects, transportation planners need reliable long-run forecasts and stable sources of revenue over the project timeline. In the absence of stable revenue streams and reliable revenue forecasts, planners may be reluctant to engage in more costly and complex projects. Conversely, if such projects are undertaken, they often may be more expensive than projects produced with a more stable funding mechanism.
Funding streams typically vary from period to period due to inherent volatility in the funding source or unforeseen events, or both. Additional understanding of the economy and improved forecasting techniques can reduce some of the uncertainty of volatile revenue sources. For instance, improved information and well-chosen forecasting techniques are likely capable of reducing the uncertainty regarding future receipts associated with seasonal effects, changes in income, and population, though they will not eliminate the inherent volatility in a funding source. Smaller taxing jurisdictions rely on narrow tax bases or relatively few taxpayers. Such a situation results in a revenue source that will be more volatile compared to a larger jurisdiction with a more diverse set of industries and larger population.
Furthermore, improved forecasting techniques are not capable of anticipating unexpected events, such as a hurricane or wildfire, but budget strategies exist that can mitigate volatility in revenues arising from unforeseen events. These strategies include the use of rainy day funds and revenue diversification, among others. Use of such strategies could serve to reduce uncertainty of future funding streams and, coupled with improvements in information regarding the Transportation
1
Special Purpose Local Option Sales Tax (TSPLOST) economies as well as forecasting practices, could result in a more stable budget and project schedule for transportation planners.
This research focuses on three areas of improvement: (1) documenting the economic and demographic factors that influence sales tax collections; (2) reviewing the available forecasting models and determining which types best suit the TSPLOST regions of Central Savannah (CS), River Valley (RV), Heart of Georgia (HOG), and Southern Georgia (SGA); and (3) analyzing best practices in budgeting for transportation sales taxes using a case study approach.
First, we explore the economic and demographic factors that influence the sales tax receipts generated in each of the four TSPLOST regions in Georgia. The four TSPLOST regions represent a relatively small share of the state economy--ranging 1 to 4 percent of the gross state product (GSP). Despite being smaller economies, they do share certain similarities with the state. Most economic activity in the four regions occurs in the largest regional urban center. In two regions, roughly 74 percent of regional gross regional product (GRP) is generated in the two counties that comprise the largest cities: Chattahoochee and Muscogee counties in RV (city of Columbus) and Augusta Richmond and Columbia counties in CS (city of AugustaRichmond). In SGA and HOG, on the other hand, the two largest counties in terms of GRP account for 50 percent and 36 percent of regional economic activity, respectively.
In all four regions, the largest economic sector in terms of output is services, including business, health, and education. In three regions--CS, RV, and SGA--services have grown substantially during the years 20012020, nearly doubling the GRP generated by the sector. In contrast, growth in other sectors has been modest in these three regions. In HOG, services have grown; however, manufacturing has grown in its share of GRP as well.
2
Still, there are important differences between the economies of the state and the four regions. For instance, in the TSPLOST regions, the manufacturing sector is a larger share of GRP on average than for the state as a whole. For manufacturing, HOG and SGA have higher shares of sector employment (at roughly 15 percent) than the state average. HOG and SGA also have a higher share in agriculture than the state average. These differences in regional economies compared to the state have implications for the forecasting of TSPLOST revenue in the regions.
Roughly half the sales tax collected in the four TSPLOST regions comes from three sectors: retail trade, accommodations and food services, and entertainment services. Yet these three sectors account for a much smaller share of regional output, about 10 percent in 2021, suggesting that the large other sectors of the regional economies also have an indirect role through taxable purchases or changes in the levels of worker income.
At the state level, the largest industry sector by revenue is strongly and positively correlated to state sales tax collections. Thus, at the state level, economic activity such as business services moves together in the same direction, and with the same magnitude, as sales taxes. However, these correlations for the four TSPLOST regions, while still positive, are weaker. Only in the largest economic region, CS, are the correlations between large industry sectors and sales tax collections similar to those of the state; for the other three regions, the correlations are weaker. These lower valued correlations suggest that whereas firms in the large sectors of business, education, and healthcare services can be used as a leading indicator of future sales tax receipts, caution should be used because the magnitude of the change is influenced by additional factors.
To inform what additional data may be useful in the quarterly forecasts, we examine the correlations of sales tax collections with statewide forecasted economic variables for GSP,
3
Georgia retail sales, and Georgia personal income. Here again, the relationship between the economic variables and sales tax is strong at the state level but weaker for the four TSPLOST regions. This suggests that several approaches to forecasting sales tax should be examined, including models that incorporate the relevant economic data and those that use only historical regional sales tax collections.
Many different forecasting models are used for these types of data. As a second task, we review the basic models and their properties, and we analyze machine learning forecasting models that have been recently adapted for this purpose. Through statistical testing and analysis, as well as assessing the tractability of the models, we determined that autoregressive integrated moving average, i.e., ARIMA-type, models were best suited for this task.
ARIMA models rely on the premise that actual past sales tax collections are a good predictor of future sales tax collections. The models have internal parameters that can be used to adjust for seasonality and trends in the data. These models can account for annual events that recur at roughly the same time each year (e.g., the Masters golf tournament). Single-variable ARIMA models are frequently used by forecasters and are reliable and tractable for policymakers. These models are best suited for a shorter term forecast. To improve the forecast for longer time periods, we include economic variables forecasted by Moody's Analytics for Georgia retail sales and the consumer price index, a measure of anticipated inflation. We show through a variety of measures, including statistical tests using historical data and graphical analysis, this family of models is preferred going forward for the quarterly forecasts. Figure E1 is a graphical depiction of the preferred models for each region. The figure illustrates actual collections from 20192022. After 2022, the conservative and optimistic model's specifications diverge to account for larger standard errors as uncertainty grows with the longer time periods.
4
Regional Annual TSPLOST Collections Actual and Forecasted for 2019-2032 (millions $)
$140.0
Forecasted
$120.0
$100.0
$80.0
$60.0
$40.0
$20.0
$0.0 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032
CS7 Con HG9 Con
CS7 Opt HG9 Opt
RV8 Con South11 Con
RV8 Opt South11 Opt
Figure E1. Graph. Examples of preferred models.
Third, this study surveys best practices in budgeting for transportation sales taxes by reviewing the extant literature. Subnational governments' revenue volatility is not a simple function of changes in the national economy. For example, the revenue volatility for states has not always corresponded to economic changes, and the extent of tax volatility is, on average, wider than that of economic volatility. Factors that contribute to this wider volatility include revenue source fluctuations, economic downturns, and forecasting errors. Best practices to manage this volatility
5
are limited to rainy day funds (also known as budget stabilization funds) and revenue diversification. To better understand how revenue volatility is managed, we include six case studies from a mix of large, mid-sized, and smaller TSPLOST initiatives located across five states: California, Colorado, Virginia, Texas, and Utah. The case studies focus on the following subjects:
Laws authorizing levy of sales tax for transportation-related infrastructure. Methods for regulating sales tax revenue spending. Methods for selecting and successfully delivering projects. Budgeting practices and procedures for the projects and revenues. The regional TSPLOST has been adopted by 64 counties in four transportation regions across the state. As of January 2020, approximately $1.1 billion has been raised from this revenue source and almost 700 projects have been completed. Because the tax is scheduled to expire in three of the four regions by the end of 2022, voters will decide whether to extend the tax in the near term. The provision of accurate forecasts will provide transparency and better inform voters' and transportation planners' decisions. It is not possible to quantify the benefits stemming from this research for the purposes of constructing a benefitcost ratio. On the other hand, the cost of this research project is computed to be less than 0.02 percent of the total revenue raised over the 10-year period of the TSPLOST during the years 20132022.
6
CHAPTER 1. INTRODUCTION
Due to the long-term planning involved in the design and construction of transportation infrastructure projects, transportation planners need reliable long-run forecasts and stable sources of revenue over the project timeline. In the absence of stable revenue streams and reliable revenue forecasts, planners may be reluctant to engage in more costly and complex projects. Conversely, if such projects are undertaken, they often may be more expensive than projects that are produced with a more stable funding mechanism.
Funding streams typically vary from period to period due to inherent volatility in the funding source or unforeseen events, or both. For instance, revenue streams associated with a geographic area reliant on only a few industries or employers are likely to exhibit more revenue volatility than those associated with a more diverse economy. Uncertainty in revenue further stems from unplanned exogenous events, such as a natural disaster or global pandemic.
Additional understanding of the economy and improved forecasting techniques can reduce some of the uncertainty of volatile revenue sources. Improvements using well-chosen forecasting techniques are likely capable of reducing uncertainty with regard to future receipts associated with seasonal effects, changes in income and population, or changes in age. However, such improved data and models will not eliminate the inherent volatility in a funding source. Smaller taxing jurisdictions rely on narrow tax bases or relatively few taxpayers, resulting in a revenue source that will be more volatile compared to a larger jurisdiction with a more diverse set of industries and larger population.
Improved forecasting techniques are not capable of anticipating unknowable events, such as a hurricane or wildfire; however, budget strategies exist that can mitigate volatility in revenues
7
arising from unforeseen events. These strategies include the use of rainy day funds, pooling of resources across jurisdictions, and securitizing revenue streams, among others. Use of these strategies could serve to reduce uncertainty of future funding streams and, coupled with improvements in information regarding the Transportation Special Purpose Local Option Sales Tax (TSPLOST) economies as well as forecasting practices, could result in a more stable budget and project schedule for transportation planners.
This research focuses on three areas of improvement: (1) documenting the economic and demographic factors that influence sales tax collections; (2) reviewing the available forecasting models and determining which types best suit the TSPLOST regions in Georgia; and (3) analyzing best practices in budgeting for transportation sales taxes using a case study approach.
First, we explore the economic and demographic factors that influence the sales tax receipts generated in each of the four TSPLOST regions in Georgia. The four TSPLOST regions--Central Savannah (CS), River Valley (RV), Heart of Georgia (HOG), and Southern Georgia (SGA)-- represent a relatively small share of the state economy, ranging 1 to 4 percent of the gross state product (GSP). Despite being smaller economies, they do share certain similarities with the state. Most economic activity in the four regions occurs in the largest regional urban center. The four regions' largest economic center in terms of output is services, including business, health, and education.
Important differences also exist between the state economy and those of the regions. For instance, in the four TSPLOST regions, the manufacturing sector is a larger share of gross regional product (GRP) on average than for the state. In two of the regions, HOG and SGA,
8
agriculture also plays a larger economic role than the state as a whole. These differences in regional economies compared to the state have implications for the forecasting of TSPLOST revenue in the regions.
At the state level, the largest industry sector by revenue is strongly and positively correlated to state sales tax collections. Thus, at the state level, economic activity such as business services moves together in the same direction and with the same magnitude as sales taxes. Thus, if business services increase revenue by 1 percent, a similar increase in state sales tax collections would also be expected. However, these correlations for the four TSPLOST regions, while still positive, are weaker. Only in the largest economic region, CS, are the correlations between large industry sectors and sales tax collections similar to those of the state; for the other three regions, the correlations are weaker. Thus, for example, an increase in business services of 1 percent is expected to be associated with increased sales tax collections, but the amount of increase is likely to be considerably less than 1 percent. These lower valued correlations suggest that whereas firms in the large sectors of business, education, and healthcare services can be used as a leading indicator of future sales tax receipts, caution should be used because the magnitude of the change is influenced by additional factors.
To inform what additional data may be useful in the quarterly forecasts, we examine the correlations of sales tax collections with statewide forecasted economic variables for GSP, Georgia retail sales, and Georgia personal income. Here again, the relationship between the economic variables and sales tax is strong at the state level but weaker for the four TSPLOST regions. This suggests that several approaches to forecasting sales tax should be examined, including models that incorporate the relevant economic data and those that use only historical regional sales tax collections.
9
Many different forecasting models are used for these types of data. As a second task, we review the basic models and their properties. We also analyze machine learning (ML) forecasting models that have been recently adapted for this purpose. Through statistical testing and analysis, as well as assessing the tractability of the models, we determined that autoregressive integrated moving average, i.e., ARIMA-type models were best suited for this task.
ARIMA models rely on the premise that actual past sales tax collections are a good predictor of future sales tax collections. The models have internal parameters that can be used to adjust for seasonality and trends in the data. These models can account for annual events that recur at roughly the same time each year (e.g., the Masters golf tournament). Single-variable ARIMA models are frequently used by forecasters and are reliable and tractable for policymakers. These models are best suited for a shorter term forecast. To improve the forecast for longer time periods, we include economic variables forecasted by Moody's Analytics for Georgia retail sales and the consumer price index (CPI), a measure of anticipated inflation. We show through a variety of measures, including statistical tests using historical data and graphical analysis, that this family of models is preferred going forward for the quarterly forecasts.
Third, this study surveys best practices in budgeting for transportation sales taxes by reviewing the extant literature and conducting six case studies of regional or county TSPLOSTs. The case studies include a mix of large, mid-sized, and smaller TSPLOST initiatives located across five states: California, Colorado, Virginia, Texas, and Utah. The case studies focus on the following subjects:
Laws authorizing levy of sales tax for transportation-related infrastructure. Methods for regulating sales tax revenue spending.
10
Methods for selecting and successfully delivering projects. Budgeting practices and procedures for the projects and revenues. The regional TSPLOST has been adopted by 64 counties in four transportation regions across the state. As of January 2020, approximately $1.1 billion has been raised from this revenue source, and almost 700 projects have been completed. Because the tax is scheduled to expire in three of the four regions by the end of 2022, voters will decide whether to extend the tax in the near term. The provision of accurate forecasts will provide transparency and better inform voters' and transportation planners' decisions. It is not possible to quantify the benefits stemming from this research for the purposes of constructing a benefit-cost ratio. On the other hand, the cost of this research project is computed to be less than 0.02 percent of the total revenue raised over the 10-year period of the TSPLOST during the years 20132022.
11
CHAPTER 2. LITERATURE REVIEW
Many areas of good governance and public budgeting include forecasting revenues and expenditures. Political and policy decisions are about events in the future, and informed decisions require expectations of funds and spending under current or proposed law. Planning transportation infrastructure projects requires prior knowledge or an expectation of the amount of funding available. Policymakers and analysts rely on forecasting models to aid them in this planning. Historically, transportation projects have been funded through state and federal motor fuel taxes or directly through user fees. In this chapter, we discuss recent developments in transportation infrastructure finance using local sales taxes and the methods used to forecast this revenue source. Because of the limited literature specifically focusing on transportation-related sales tax forecasting, we also survey the broader literature on revenue forecasting.
Starting in the mid-1980s, California began enacting locally approved sales taxes earmarked directly for transportation infrastructure projects. The benefits of this funding source include direct voter approval, fundraising and expenditure within a county, expiration dates, and explicit earmarking for infrastructure (Crabbe et al. 2005). California's program was found to expand funding for transportation projects, including significant investment into mass-transit projects. Sales taxes are, however, inherently more uncertain than per-gallon fuel taxes and toll revenues--introducing greater uncertainty into the planning process (Afonso 2013).
Georgia's history of local option sales taxes (LOSTs) started in 1975 with the creation of a LOST that makes funds available to local governments for any use when passed by referendum. In 1985, the Special Purpose Local Option Sales Tax (SPLOST) law intended to directly fund county capital projects; the county referenda would include the duration of the tax and the
12
projects to be funded by it. If generated sales taxes reached the stated budget before the expiration date, the tax would terminate. The Transportation Act of 2010 allowed for a region-based special purposes local option sales tax for transportation projects (i.e., TSPLOST). Forecasting region-based LOST became part of transportation infrastructure planning at that time.
Local governments use many different tools to forecast local revenues and sales taxes. When done well, the forecast process utilizes the best available data and input from officials familiar with the data-generation process, tests multiple forecast models, and prefers the models that predict the observed data well. We focus our review on factors that influence the accuracy of revenue forecasts including: (1) forecasting methods, (2) data quality and quantity, (3) economic cycles and uncertainty, (4) political factors, and (5) intentional under/overestimation.
FORECASTING METHODS Researchers have been particularly interested in the methods that yield the most accurate revenue forecasts. Although extensive research has been conducted to date, the results are inconclusive (Franklin et al. 2019).
Revenue forecasting methods can be classified as either judgmental or quantitative. Whereas judgmental forecasting is based on the judgment or intuition of forecast experts, quantitative forecasting utilizes jurisdictions' historical data and statistical models (Kavanagh and Williams 2014). Quantitative forecasting can also be divided into time series and causal models (Chung et al. 2022, Kavanagh and Williams 2014, Mikesell and Ross 2014, Williams and Calabrese 2019). Time series methods rely on the autoregressive nature of historical data. Causal models use statistical spec
13
ifications and known independent variables (mostly macroeconomic) to predict future revenues (Williams and Calabrese 2016). Specifically, time series methods use revenue observations that repeat over time. The time series are usually autoregressive, meaning previous values can predict future values. This autocorrelation creates the theoretical framework for times series models and requires only historical data for the revenues being forecasted. Frank and Zhao (2009) found that most quantitative forecasting by local governments uses moving averages or trend analysis, which are both simple time series methods. Smoothing techniques--e.g., HoltWinters smoothing--attempt to measure and account for trend and seasonality factors and attempt to establish stationarity in the time series. A stationary time series' statistical properties do not depend on the time period observed. It would have the same mean variance and covariance matrix regardless of time period selected or season. In practice, smoothing techniques separate out predictable seasonality and trends and forecast the stationary time series. The seasonality and trend are then reapplied in the forecast period. Box et al. (1970) established a widely used complex time series forecast model, now known by the acronym ARIMA (i.e., autoregressive integrated moving average). This addition to smoothing techniques solves for the optimal seasonality, autoregressive structure, and trend for a time series. Makridakis et al. (1982 and 1993), Makridakis and Hibon (2000), and Williams and Kavanagh
14
(Error! Reference source not found.)1 found that simple time series models frequently outperform more complex models.
Econometric models attempt to forecast revenue based on estimating regression models that include variables that predict revenue levels. These are classified as causal-like forecast models because they attempt to model revenue amount caused by changes in the broader economy (e.g., Williams and Calabrese 2016). Establishing the causal relationship between economic variables and changes in sales taxes requires complex modeling that may not be necessary for forecasting. Prediction accuracy is the primary concern when selecting forecasting models. Linear and nonlinear regression forecast models can be utilized without strict causality assumptions if they predict well. The models require future values for dependent variables, creating greater data burdens.
The complex multivariant and interrelated process within a local economy can be difficult to measure and model with data, but it does result, eventually, in the generation of local sales taxes. This economic complexity could be better modeled, in certain instances, by previous values of revenue, rather than other economic variables modeled as causing or driving revenue changes.
The rise of machine learning applications has impacted forecasting, and they are being used by governments (Chung et al. 2022). ML and artificial techniques were first used for forecasting weather in 1964 (Hu and Root 1964), and these tools are now available to governments because of data Error! Reference source not found.). Various studies have attempted to compare the ability of
1 From Williams 2016 (Kavanagh and Williams 2016). These ad hoc techniques may be appealing to forecasters with moderate sophistication because of ease in learning how to use these techniques; however, they are generally inaccurate and should be avoided (Armstrong 2001, Kavanagh and Williams 2016). Further discussion of exponential smoothing methods can be found in Gardner (1985), Gardner (2006), De Gooijer and Hyndman (2006), and Hyndman et al. (2002).
15
ML models to outperform existing techniques with mixed results (Chung et al. 2022). To date, this new family of forecasting models appears to be an interesting addition to the existing forecasting techniques rather than a revolutionary improvement.
Local governments tend to use judgmental forecasts rather than quantitative methods despite their higher errors (Beckett-Camarata 2006, Kong 2007, Sun 2005). Kong (2007) shows that counties' primary techniques are judgmental forecasts among small counties, even though causal methods (econometric forecasts) perform better than other methods regardless of the size of the counties. Long-term forecasting is also rarely carried out by smaller jurisdictions. However, Kavanagh and Williams (2014) suggest that documenting forecasts, having multiple forecasters, being aware of cognitive biases and preparing to mitigate them, and establishing algorithms for forecasting can reduce the errors in judgment forecasting.
The limited use of sophisticated methods might be because small governments lack resources, adequately trained personnel, and long-term data (Cirincione et al. 1999, Kong 2007, Williams and Kavanagh 2016). When municipalities use causal revenue forecasting techniques, they rely less on expert judgment. Those officials trained through workshops or seminars on forecasting are more likely to pursue more advanced techniques (Beckett-Camarata 2006).
A critical question is whether quantitative techniques actually produce more accurate forecasts. Studies find no single quantitative approach that produces the most accurate forecasts (Williams and Kavanagh 2016). However, many studies report the efficacy of damped trend methods, ARIMA, and exponential smoothing methods. Analyzing data from a city in Florida from 1984 to 1990, Gianakis and Frank (1993) find no dominant technique predicted revenues more accurate
16
ly; however, the Winters, Holt, and ARIMA techniques could be potentially the most accurate. Damped trend and exponential smoothing methods perform best with monthly and quarterly data (Williams and Kavanagh 2016), and exponential smoothing models are generally the most accurate for analyzing seasonal patterns (Cirincione et al. 1999). For complex models, recent literature recommends the damped trend and modified Holt exponential smoothing for yielding the best prediction (Williams and Kavanagh 2016). Many states and large cities also employ multiple forecasting methods for individual or total revenues. States using multiple methods tend to predict their revenue more accurately than states using a single method (Rubin et al. 1999, Willoughby and Guo 2008).
Research claims that combined forecasts will help increase accuracy. In the case of forecasting sales tax revenues in Idaho, the econometric predictions are more accurate than the time series extrapolations. Still, the composite forecasts outperform both econometric models and the ARIMA model. The errors from composite forecasts range -2.6 to 2.6 percent of actual revenues (Fullerton 1989). Grizzle and Klay (1994) also find that combining different methods for revenue forecasts is more accurate than any single analytical method or state's own forecasts. Combining different quantitative methods or combining forecasts from different information, such as combining an extrapolative method with the state's own forecasts, would enhance forecast accuracy. However, when estimating more volatile revenues with no trend, the moving average is expected to diminish forecast errors the most. Also, some studies suggest that averaging estimations from different methods could be better than combining forecasts (Williams and Calabrese 2016).
On the other hand, complex methods do not always guarantee higher accuracy than simple methods (Willoughby and Guo 2008). Simple models generally forecast as well as complex ones
17
, even though more complex methods would still improve forecast accuracy (Gianakis and Frank 1993). A combination of multiple regression models and qualitative methods can be a more accurate approach. Time series techniques and simultaneous equation modeling can reduce forecast accuracy (Bretschneider et al. 1989). Most recently, Chung et al. (2022) discuss the possibility of using ML models in budget forecasting. By testing 20 years of data from local governments, they found ML algorithms can be adapted to revenue forecasting in the public sector. However, traditional statistical methods (i.e., moving averages, exponential smoothing, ARIMA) are more accurate than ML methods overall. ARIMA models are consistently superior to other forecasting methods, whereas knearest neighbors (KNN) is the best predictor among ML algorithms. The accuracy of prediction varies by type of revenue; traditional methods are better for sales tax, and KNN is most effective for predicting property taxes.
Williams and Calabrese (2016) claim that simple methods of forecasting work as well or better when compared with complex methods, especially for local governments in the absence of skilled forecasters or forecast software. Simple moving averages or trend analysis seems more appropriate among simple time series methods. In Connecticut, simple models are as effective as complex models, and linear models work better than nonlinear models to predict revenues (Cirincione et al. 1999). Effectiveness also depends on revenue sources: simple methods can be more accurate than complex methods for some revenue sources and vice versa (Beckett-Camarata 2006). For moderately skilled forecasters, using forecasting software is also appropriate (Williams and Kavanagh 2016). However, local governments are reluctant to use software despite its wide availability and reasonable costs (Frank and Zhao 2009)
18
DATA QUALITY AND QUANTITY For revenue forecasting, it is essential to improve the quality and quantity of data, along with employing appropriate forecasting methods to minimize errors. Studies favor data from a shorter time span over annual data to increase forecasting accuracy. Williams and Kavanagh (2016) find that monthly or quarterly data are better than annualized data for increasing forecast accuracy. In addition, Cirincione et al. (1999) find that using bimonthly data yields more accurate forecasts than monthly and quarterly data. However, Gianakis and Frank (1993) assert that the level of aggregation or the length of the data stream seems to have a limited impact on increasing forecast accuracy.
Additionally, some quantitative methods would work better with specific data types. Naive methods would perform better than exponential smoothing methods and real dollar conversion than nominal dollars when using annual data. However, real dollar conversion has no advantage for monthly data (Williams and Kavanagh 2016). Also, using more than three years' worth of data can decrease forecasting errors (Cirincione et al. 1999).
ECONOMIC CYCLES AND UNCERTAINTY Forecasting accuracy can also be affected by economic circumstances. First, economic cycles can reduce forecasting accuracy. Even though revenue forecasts can be fairly accurate in the long term, the accuracy can deteriorate during recession periods (Mikesell 2018). Studies report that forecasting errors are larger during and after recessions, and variations in errors across states also increase in and after recessions (Boyd and Dadayan 2014). For example, the Great Recession led to the largest overestimates in revenue forecasts because there were major declines in personal income, corporate income, and sales taxes compared to earlier recessions. Errors in revenue forecast
19
s during the 2001 and 2007 recessions were much worse than in the recessions of the early 1990s. (Pew Charitable Trusts and Nelson A. Rockefeller Institute of Government 2011).
Governments have more or less difficulty forecasting depending on revenue sources and structure. For example, forecasting errors are larger for corporate income tax than errors for individual income and general sales tax revenues, which means corporate income tax is more difficult to forecast than other tax sources. Reitano (2018) finds that revenue diversification and volatility tend to decrease forecast accuracy and more expenditure on administrative operations and local fiscal capacity (measured by millage rates) increases the accuracy. Also, smaller jurisdictions that are dependent on a few sectors of the economy tend to have larger errors (Boyd and Dadayan 2014, Pew Charitable Trusts 2015, Willoughby and Guo 2008).
Uncertainty caused by adopting a new policy may cause a deterioration in revenue forecasts. Kavanagh and Williams (2017) examine how extremely uncertain situations, such as regulating recreational marijuana, may affect revenue forecasting. Despite the uncertainty, the City of Boulder, Colorado, made fairly accurate revenue forecasts. They conclude that the success of the City of Boulder can be attributed to finding a pertinent reference case for comparison, especially when there are no historical data. It is also important to engage experts in forecasting, organize information into a model, disaggregate the analysis, aggregate the forecast, acknowledge the uncertainty in the forecast presentation, design the public forum appropriately, and establish an environment for good decisions (Kavanagh and Williams 2017, pp. 1617).
POLITICAL FACTORS Because revenue forecasting is always part of a political process, Mikesell and Ross (2014) indicate that it is also important to consider how well policymakers understand the process, as wellError!
20
Reference source not found.). Existing studies primarily focus on the forecasting process, such as executive, separate, and consensus forecasts (Franklin et al. 2019). Executive forecasts are produced only by an executive body of a government, such as the governor and staff. In contrast, consensus forecasts are made by both executive and legislative branches in cooperation. Separate forecasts are produced by the executive and legislative bodies acting independently of each other. Those processes sometimes include external researchers, agencies, or citizens (McNichol 2014, Willoughby and Guo 2008). McNichol (2014) suggested that the forecast and its assumptions should be published and accessible online to the public. Forecasting meetings and the process should be open to public oversight.
Most studies recommend consensus forecasts for greater accuracy, transparency, and political acceptance than other processes (Boyd and Dadayan 2014, Franklin et al. 2019, McNichol 2014, Mikesell 2018, Mikesell and Ross 2014, Williams and Calabrese 2016, Willoughby and Guo 2008). Mikesell and Ross (2014) focus on political impacts on forecasting accuracy and biases in Indiana cases. They argue that forecast acceptance is as important as accurate forecast estimates. Consensus between actors from the executive and legislative branches not only outperforms other forecasting processes but also plays a vital role in preventing unnecessary conflicts and rejection over the final forecast estimates. Boyd and Dadayan (2014) also argue that consensus forecasting helps to avoid undue political influences; however, they do not find evidence that it increases forecasting accuracy. Reitano (2018) also discovers political factors matter to increase the accuracy. While Republican or unanimous control over a school district may improve revenue forecasting in general, Republican unanimous control tends to reduce the accuracy.
21
Along with the consensus process, the inclusion of various participants in the process has been proven to reduce forecasting errors significantly and smooth the budget process. Consensus revenue estimates were made from the executive and legislative branches in about half the states. More than one-third of states include nonpolitical experts in forecasting (McNichol 2014, Pew Charitable Trusts 2015). Shkurti and Winefordner (1989) also highlight that a conscious effort by elected leaders to minimize bias, use of economic advisors and outside experts, cooperation with an active and professional legislative budget office, and a coordinated revisions schedule would contribute to preventing partisan manipulation from the forecast process.
Forecast accuracy can be improved by government experts by having independent executive and legislative forecast agencies that have cooperative relationships and are bound to formal procedures for revisions (Bretschneider and Gorr 1987, Bretschneider et al. 1989, Shkurti and Winefordner 1989). However, the efficacy of having either experts outside government or citizens in the process is unclear. Bretschneider et al. (1989) observe that whereas forecasting by experts within government increases the accuracy of forecasts, relying on outside experts decreases the accuracy. In a case study of Ohio, outside experts did not improve forecast accuracy but improved the likelihood of acceptance (Shkurti and Winefordner 1989). Moreover, the presence of a council of economic advisors and democratic control reduces forecast accuracy (Bretschneider and Gorr 1987).
Formalizing the reevaluation process for forecasts is critical in enhancing revenue forecasting accuracy. According to a survey, better performing states are reevaluating their revenues quarterly or monthly. This is because reevaluation would provide additional information that helps adjust their estimation and identify unexpected threats more quickly (Willoughby and Guo 2008). Thus, it is recommended that the estimates should be revised midsession at least once (McNError! Reference source not found.). Timing of forecast and reevaluation is also important. As the forecast is made
22
closer to the time of budget adoption, the estimates will be more accurate, and forecast bias will increase over longer horizons (Pew Charitable Trusts 2015, Williams 2012). Therefore, revenue forecast updates approaching the fiscal year's start will increase accuracy the most (Boyd and Dadayan 2014).
INTENTIONAL UNDER- AND OVERESTIMATION Williams and Calabrese (2016) suggest that most U.S. subnational governments use conservatively biased revenue and expenditure forecasts. These governments tend to underestimate revenues and overestimate expenditures intentionally (Bretschneider and Schroeder 1985, Rodgers and Joyce 1996, Williams 2012). Underforecasting revenues can sometimes be justified where it serves as a buffer against sudden revenue shortfalls and helps governments deal with budget uncertainty. For example, underforecasting revenues and overforecasting expenditures seems to increase school districts' unassigned fund balances (UFB) (Barrett et al. 2019). However, the practice of underestimation can still be a major source of revenue forecasting errors, and excessive underestimation could lead to undesirable consequences such as limiting expenditure growth and reserve ratcheting. In the case of New York City's property tax, lawmakers have been found to underestimate property tax revenues, which could lead to excessive spending in multi-year budgeting (Propheter 2019). Especially when taxes are particularly difficult to forecast, governments tend to underforecast revenues (Boyd and Dadayan 2014). Frank and Zhao (2009) find that almost all surveyed cities in their sample (90 percent) underestimated revenues by 17 percent annually.
23
Improving citizen awareness of revenue forecasting and removing incentives for underforecasting revenues, such as having independent, nonpartisan forecasting agencies, can help address systematic underforecasting bias (Propheter 2019). Political incentives are particularly important in this matter. For instance, revenue forecasts from legislative or agency budget offices would be more effective in preventing underestimation as they are more interested in funding expenditures (Bretschneider et al. 1988). Similarly, Rose and Smith (2012) found that underestimation bias tends to diminish when a state has budget stabilization funds (BSFs) with deposit and withdrawal rules. This suggests that rule-bound BSFs help government officials avoid underestimating revenues caused by political pressures.
24
CHAPTER 3. ECONOMIES OF THE TSPLOST REGIONS
This chapter describes the economies of the regions to better understand the underlying dynamics of sales tax generation. We describe and analyze traditional economic indicators such as output, employment, and personal income. We also examine the distribution of economic activity in the different sectors and industries to show the main economic activities in each region and their link to TSPLOST revenues.
The data in this section are from two sources. First, we use IMPLAN data by region for 2019 to detail the economic composition of the TSPLOST regions.2 Second, to explore the temporal dynamics, we use data from the Bureau of Economic Analysis (BEA) on regional and sectorial GRP from 2001 to 2020. The BEA data are also used to compare the economic size and performance of the regional economies with those from other regions in Georgia.
RELATIVE SIZE OF TSPLOST REGIONAL ECONOMIES Figure 1 and figure 2 compare the size of the economies with respect to other regions in Georgia. As expected, most of the economic value of the state is concentrated in the Atlanta metropolitan area. As shown in figure 1, for 2019 the contributions of the TSPLOST regions are small. For that year, Central Savannah (CS) contributed 3.4 percent of Georgia's economy, followed by River Valley (RV) with 2.6 percent, South Georgia (SGA) with 2.3 percent, and Heart of Georgia (HOG) with 1.3 percent.
2 IMPLAN is a computer inputoutput model commonly used in economic impact analyses. For more details on IMPLAN, visit https://implan.com/.
25
Source: BEA and authors' calculations; GDP = gross domestic product.
Figure 1. Map. Share of state GSP by region (percentage in nominal dollars). Furthermore, figure 2 shows that during the last decade (20112020) their contributions have been decreasing, suggesting that a larger share of the postGreat Recession recovery has been concentrated in the other regions.
26
Source: BEA and authors' calculations
Figure 2. Graph. TSPLOST regional share of Georgia GSP, 20012020. CHANGES IN SIZE OF REGIONAL ECONOMIES OVER TIME Figure 3 presents the data from 2001 to 2020 for the nominal GRP, employment, and personal income in each of the TSPLOST regions. During this period, the relative size of the regions remained unchanged with CS the largest and HOG the smallest in terms of GRP. The nominal and personal income graphs show how regions tended to grow similarly up to 2013. After that, CS has been growing faster than the other regions. This pattern is visible in the other variables of employment and PI, as well.
27
Source: BEA and authors' calculations; Bill = billions, Tho. = thousands.
Figure 3. Graphs. Nominal GRP, employment, and personal income for TSPLOST regions, 20012020.
28
Source: BEA and authors' calculations
Figure 4. Map. Share of GRP by county (percentage in nominal dollars).
29
Figure 4 shows where the economic activity is located in each region; it is generally concentrated in counties that contain major cities.3 In two regions, about 74 percent of regional GRP is generated in the two counties that comprise the largest cities: Chattahoochee and Muscogee counties in RV (city of Columbus), and AugustaRichmond and Columbia counties in CS (city of AugustaRichmond). In SGA, Tift County (city of Tifton) and Lowndes County (city of Valdosta) account for roughly 50 percent of the regional GRP. In the HOG region, Laurens County (city of Dublin) and Appling County (city of Baxley) account for 36 percent of regional GRP. The HOG region has the most evenly distributed economic activity of the four regions. Figure 5 shows the top GRP generating sectors of each region's economy from 2001 to 2020. The dominant two sectors in all four regions were services and manufacturing in 2001.4 Over the next 20 years, services grew dramatically in CS, RV, and SGA, whereas manufacturing remained at a similar level as in 2001. Most of the other sectors experienced modest growth, with various sectors leading the way in different regions. We highlight sector growth in each region next.
3 Figure 4 data are from 2019 to be consistent with later analyses using IMPLAN data. Note that GRP is not the same as the output data from IMPLAN. However, both measures are highly correlated with each other and are good ways to measure levels of economic activity. 4 For a list of BEA sectors and included industries, see https://www.bea.gov/.
30
Source: BEA and authors' calculations; Bill = billions.
Figure 5. Graphs. Nominal GRP composition by sector for TSPLOST regions, 20012020.
31
The CS region is home to Plant Vogtle, the nuclear power plant in Burke County.5 Thus, utilities contribute a meaningful share to regional GRP, growing from less than $1 billion in 2001 to just over $2 billion by 2020. The commerce sector also experienced modest growth in the period.6 In the RV region, GRP growth was buoyed only by the services sector. All the remaining sectors remained flat or experienced a modest decline in their contributions to GRP. In the SGA region, growth in GRP was also mostly due to services. However, commerce also grew from 2001 to 2020. In 2020, it generated a slightly larger share of GRP than the manufacturing sector, which is historically the region's second-ranked GRP contributor. The remaining sectors experienced slight growth from 2001 to 2020. The HOG region is the smallest in terms of GRP. Whereas services are the top contributing sector to regional GRP, manufacturing has remained a close second, even contributing to regional GRP growth from 2015 to 2020. The HOG region is also home to the Hatch Nuclear Plant in Appling County, which accounts for the large contribution of the utility sector to the regional GRP. In all four regions, agriculture and construction have a limited contribution to GRP. These low shares have been consistent throughout the 20-year period.
5 Note that in Georgia the fuel used to generate electricity purchased by power plants is subject to sales tax. 6 In the BEA data, commerce includes both wholesale and retail trade.
32
Source: IMPLAN and authors' calculations; Tho. = thousands.
Figure 6. Graphs. Sectoral composition of regional economic output per capita, employment, and employment compensation per employee.
Figure 6 illustrates how the four regions compare to the Georgia average on various measures that help to illustrate aspects of sector productivity. The measures are output per capita, sector employment as a share of total employment, and sector compensation per employee. The output data are derived from IMPLAN and are a proxy for sector revenue or sales. Employee compensation is also generated from IMPLAN and is a proxy for the total value of compensation, which includes wages and benefits. These three scaled metrics are used because they facilitate comparisons across
33
the regions and with the state. The three charts illustrate how the regions compare to each other and to the state in output, employment, and wages for these six leading industry sectors.
In output per capita in figure 6, the four regions generally lag behind the state average. Examining services, the leading sector in the four regions, shows that the regions lag substantially behind the state average in output per capita. In the manufacturing sector, the second-leading regional sector, the four regions are about equal to or slightly outperform the state average in output per capita. This suggests that the manufacturing firms in the region are generally as efficient as those throughout the state. In the commerce sector, the four regions also trail the state average in output per capita by a small amount. The construction sector also lags the state in output per capita in three regions, with the exception being CS.
The second graph in figure 6 examines the share of total employment for industry sectors. For CS and RV, the share of employment in services is similar to the state at about 60 percent to total employment, whereas the other two regions are at roughly 50 percent. For manufacturing, HOG and SGA have higher shares of sector employment (approximately 15 percent) than the state average (around 10 percent). HOG and SGA also have a higher share in agriculture than the state average.
The last graph in figure 6 shows compensation per employee. Generally, the three regions trail the state average in services. This is in keeping with the lower output per capita, shown previously. Manufacturing compensation per employee also trails the state average in HOG and SGA. Note the outlier in wages per employee is utilities, but this is generally a small share of each region's economy.
Figure 7 shows that the regions have experienced growth in sectors responsible for most direct sales tax generation (i.e., retail trade; accommodation and food services; and arts, entertainment, and
34
recreation). CS has outpaced the other three regions in growth of these sectors, followed by SGA. The RV region has experienced only modest growth and is the third-ranked region. HOG has had growth, but the sectors are a small share within the economy (see figure 5).
Source: BEA and authors' calculations; Bill. = billions.
Figure 7. Graph. Value of nominal GRP of tax base sectors by regions, 20012020. Figure 8 illustrates how each sector in the region has changed from 2001 to 2020. Retail trade has experienced growth in all regions except RV. The second-contributing sector, accommodations and food service, has grown modestly in the four regions. HOG has seen the least growth of all regions in this sector. The smallest sector of these three, entertainment, has had very limited growth in all four regions.
35
Source: BEA and authors' calculations; Bill. = billions.
Figure 8. Graphs. Revenue from three sales tax generating sectors, 20012020.
Figure 9 shows the share of sales tax collected in the four regions from the three sectors of interest in 2021. It also shows the relative output of these three sectors (retail trade, accommodations and food service, and entertainment services) relative to the total output. In all four regions, the sectors account for roughly 50 percent of all sales tax collections and about 10 percent of the regional output.
36
Source: IMPLAN, Georgia Department of Revenue (DOR), and authors' calculations.
Figure 9. Graph. The economic output and sales tax collections in total for: (1) retail trade; (2) accommodation and food services; (3) arts, entertainment, and recreation.
In summary, these regions are relatively small in terms of economic activity, compared to the other regions in the state and particularly to the Atlanta region. Economies of these regions also differ from those of the state on average. Whereas services are the largest economic sector in these regions, manufacturing still has a greater economic role in the four regions than in the state broadly. In addition, the types of services or service firms in the four regions generate lower output per capita than the state average. Wages are also lower in the four regions broadly when compared to the state average. One-half of the sales tax is generated from three sectors that account for a small share of the
37
economy. That said, most of the economic activity generated in the region occurs in cities and urban areas, which is similar to the state. In the next chapter, we focus on how other sectors of the economy, as well as other economic variables, are linked to sales tax collections.
38
CHAPTER 4. CORRELATIONS
As was shown in the previous chapter, some of the leading sales tax collecting sectors of the economy only account for about 10 percent of the regional economic activity, yet these sectors account for half of all sales tax. In this chapter, we examine additional sectors of the economy that generally do not collect sales tax but can provide the economic activity that generates income that leads to spending in the sectors that do collect sales tax--or through nonexempt firm purchases that generate sales tax. These sectors can provide the necessary second-order income that drives sales tax collections. Also, we examine the correlations between the statewide forecasts of relevant economic variables, available from Moody's Analytics: Georgia GSP, retail sales, and personal income. Changes in population also have economic impacts at the state level and are included. The sales tax values used in this chapter span 20012021 and are sourced from Education Local Option Sales Tax ELOST and SPLOST data with missing data points imputed (the creation of these data is discussed in more detail in chapter 5). The economic variables and estimated sales tax collections examined here are in nominal dollars unless otherwise noted.
Figure 10 shows the correlations of regional sales tax collections with the main economic sectors in the regions as well as that of the state.7
7 Generally, for a variable to be considered highly correlated, it should have a correlation coefficient of roughly 0.9 or greater. This means that economic activity in a particular sector has a strong positive relationship to sales tax collections. If economic activity in the sector increases, we would expect sales tax collections to increase in a similar manner. Note that showing correlation is not the same as showing causality. However, due to the economic linkages in the regions between the sectors that generate income for workers to spend on taxable goods and services, it is a reasonable assumption.
39
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 CS region 7
RV region 8
HOG region 9
SGA region 11
Professional and business serv. Education, health, & soc. asst. Manufacturing
Construction
Agriculture
Utilities
Georgia
Figure 10. Graph. Sector and sales tax correlations for Georgia regions and state.
At the state level, all the large industry sectors are highly correlated with sales tax collections (with values of 0.8 or greater). Those sectors that have higher wages (as shown in figure 6; i.e., the two service sectors, business and health education, and utilities) are strongly correlated with sales tax collections. Manufacturing and construction sectors at the state level also have a relatively strong correlation to sales tax collections. The sector least correlated with sales tax coll
40
ections is agriculture. This is likely due to the low wages paid and the relatively low number of workers in the sector when compared to the others.8
Because the regional economies differ in their industry makeup and size, we examine each one separately. The CS region is the largest region economically, and the relationship the sectors have to sales tax collections in this region are similar to the state correlations. The largest economic sectors--professional business services, and education and healthcare services--are highly correlated with sales tax collections. The construction and utility sectors are also strongly correlated with sales tax collections in this region, whereas the agriculture and manufacturing sectors are weakly correlated with state sales tax collections.
In the RV region, only the two service sectors have correlations that could be considered moderately strong, both around 0.7. None of the other correlation coefficients are high enough to reliably link sector growth to sales tax growth. Note that in this region, the positive relationships between sector revenue and sales tax collections are very weak in agriculture, construction, and manufacturing, suggesting that growth in these sectors is associated with very low levels of sales tax growth, if at all.
In the HOG region, only the construction sector has a correlation coefficient over 0.7. All the other sectors have modest coefficients around 0.6. This includes both service sectors, a somewhat surprising result, given the sectors' high correlations with sales tax collections in the other regions and at the state level.
8 Poultry processing is another large industry in the state and in some of the regions. It is classified as manufacturing. We tested this industry's relation to sales tax collections using IMPLAN and found it had a small positive impact more similar to the agriculture sector rather than the manufacturing sector.
41
In the SGA region, only the two service sectors have moderately high correlations with sales tax correlations, around 0.7. Here again, agriculture and manufacturing sectors are very weakly correlated with sales tax collections.
For regions that are interested in trying to gauge changes and sales tax collections based on regional economic conditions, figure 10 offers some guidance.9 For the three regions, CS, RV, and SGA, monitoring large firms in the business and professional services sectors as well as healthcare and education services will provide some insight into future sales tax collections. If firms in these sectors are experiencing growth year over year, then sales tax receipts are likely to grow as well. In the HOG region, large firms in the construction sector could be used as a barometer for future sales tax collections.
STATE-LEVEL VARIABLES, CORRELATIONS, AND FORECASTING Figure 10 shows which economic sectors are more correlated to regional and state sales tax collections than others. However, the industry sector sales data have limited use for forecasting future sales tax collections, as reliable forecasts for future revenues at the state and regional level are not readily available.
Instead, the forecasting team at Georgia State University relies on data from Moody's Analytics, which includes forecasts of Georgia GDP, personal income, and retail sales. In addition, the BEA population projections for the state and region are available. As figure 10 shows, the relationship between correlations for industry sectors varies across regions as well as when compared to the stat
9 The regional forecasts of sales tax receipts are generated quarterly by Georgia State University. These forecasts are an ideal guide to future regional sales tax collections. Thus, the benefit of tracking other variables to gain insight on future sales tax collections might have limited value.
42
e. The Moody's economic forecast data are only available statewide. Table 1 illustrates the variation in correlations in the Moody's data, comparing the state to the four regions. The table also illustrates the role the Metro Atlanta region has in the relationship of these economic variables to sales tax collections.
Table 1. Forecasted values and sales tax correlations for regions and state.
Georgia
Georgia
(all regions) (excl. ATL) CS
Georgia Retail Sales
0.93
Georgia Personal Income
0.90
Georgia GDP
0.90
Population
0.84
0.88
0.89
0.86
0.90
0.85
0.88
0.82
0.86
RV HOG SGA 0.69 0.67 0.75 0.69 0.66 0.74 0.66 0.64 0.72 0.47 0.56 0.71
Examining the first column showing the correlations of the Moody's forecast values with state sales tax collections, all the coefficients are 0.9 or greater. Population is also strongly correlated at 0.84. When the Metro Atlanta region is excluded (second column), the correlation coefficients drop slightly, but all remain highly correlated. However, a slight decline in the coefficient values does suggest that the Moody's economic variables have a stronger relationship with sales tax collections in larger urban areas, such as the city of Atlanta. For CS, the correlation coefficients are almost the same as those for Georgia excluding Atlanta. As was shown previously, the economic activity for the CS region occurs predominantly in the urban area around AugustaRichmond, one of the largest urban areas in the state.
For the two economically small regions, RV and HOG, the correlations are smaller, all less than 0.7, and the population correlations are smaller still, suggesting the economic relationship in the
43
larger regions for which statewide forecasts are a good proxy do not behave as well in the smaller regions. The table illustrates the need for a multiprong approach in choosing the forecasting models and variables. The Moody's statewide forecast of economic variables offers potential prediction values for the larger TSPLOST regions, but less so for the smaller regions. We discuss these approaches in the next chapter.
44
CHAPTER 5. MODEL COMPARISON
In this chapter we discuss how to compare various forecasting models using historical data and examine the complex machine learning models that have begun to be used as well for various forecasting tasks. We also discuss some basic forecasting models that are commonly used for revenue forecasting.
Forecast models can be constructively compared by splitting a long time series of observed data into (1) an earlier time period as training data and (2) a later time period used to test the models' predictions. The training data can be used to calibrate the models, and the later period can be used to see how well these models predict actual observations of revenues that occurred. Using this method one can compare various forecasting models graphically.
The next section of the report makes use of Georgia Department of Community Affairs (DCA) region quarterly LOST data, dating back to the first quarter of 2000. These data are unique in this report because it predates regional TSPLOSTS and rather attempts to forecast a generic 1 percent LOST. To accomplish this, SPLOST and ELOST county data were collected and used to impute a stable 1 percent sales tax collection in every county for every month.10
These county-level, 1 percent tax revenues were aggregated to quarterly data for each of the 12 DCA regions from the first quarter of 2000 through the fourth quarter of 2020. The forecast
10 To accomplish this, we analyzed the monthly ELOST and SPLOST distributions for all counties and identified months that appeared to represent a full month of economic activity for each tax. The tax base for each is essentially the same, with a few exceptions, and on average SPLOST was 97 percent of ELOST when both were present and represented an entire month. SPLOST divided by 0.97, when present, replaced ELOST if ELOST was not in place or appeared to represent a partial distribution. For 44 observations, the county had neither tax. In those instances, representing 0.1 percent of our data, economic variables were used to impute expected ELOST.
45
was based on 20002015 revenues, and they were used to predict quarter one 2016 through the end of 2020. Four models for the DCA regions, along with statewide aggregate revenues, were used to evaluate how each model's predictions compared to the revenues that were collected in 2016 through 2020 for each region. This analysis includes the data from the onset of the Coronavirus Disease 2019 (COVID-19) pandemic in the United States in March 2020, during which time sales tax receipts plunged dramatically. However, with the onset of federal COVID-19 relief funds, sales tax receipts recovered in an equally dramatic fashion. These data points are used in our models but are not the type of events that can be forecasted. Some models incorporate per capita income, as income is linked to spending on goods and services that generate sales tax. In cases where these models were conducted in 2015, a forecast of statewide per capita income from Moody's Analytics, produced in January of 2016, was utilized. HoltWinters multiplicative smoothing, a univariate time series model, is the first model considered. It primarily measures the relationship between lagged revenues to predict subsequent and forecasted revenues. This is considered the most basic model employed by forecasters but can, in certain instances, be the best performing. Predictions from this model are heavily influenced by the most recent changes in the revenues. If current conditions alone are strong predictors and revenues lack underlying structure that other models could exploit, then this model may perform well. For the TSPLOST regions and Georgia, this model understates the revenues that came in, and it only performed well in the early forecast periods, as seen in the red lines in figure 11.
46
The second model considered is a basic ARIMA, discussed in the literature review in chapter 2, with three lagged values as predictors (p), one difference in the underlying data (d), and one moving average lag (q) included. Slight adjustments to this model have been incorporated as a conservative prediction in recent TSPLOST forecasts. These predictions are shown in the light green lines in figure 11; they typically match earlier periods better and represent low but reasonable expectations in the later years.
ARMAX11 is a variant of the ARIMA model family with additional explanatory variables. Here we use an ARMAX model with the same p, d, q parameters as the ARIMA forecast model, also incorporating per capita income. The inclusion of per capita income, along with its expected growth from 2016 to 2020, generates higher expected revenues when compared to the second model (i.e., basic ARIMA). It overstated revenues in the earlier periods for most regions and statewide and predicted the later forecast years much more accurately. Derivations of these two models represent the current strategy for TSPLOST forecasts. In each model, we use the standard error of the estimates and build an interval around the predicted value. Moreover, we include a time-varying volatility component to account for increasing uncertainty over the forecast horizon. Thus, the confidence interval becomes wider as the forecast period increases. The predictions from this model are shown in orange in figure 11. Adjustments to this model have been incorporated as more optimistic predictions within recent TSPLOST forecasts.
The last basic model employed for this comparison is an ordinary least squares regression model (OLS). It models the relationship between per capita income and sales tax revenue for the years
11 Autoregressivemoving-average model with exogenous inputs.
47
20002015, controlling for basic time trend effects. The coefficients of the regression, along with Moody's per capita income prediction from 2016, are the basis for its 2016 through 2020 predictions. Shown in light green in figure 11, this model overstates revenues in all regions compared to other predictions, in all periods except the latest time periods.
In summary, comparing these four basic forecast models to realized revenues creates a valuable opportunity to consider the pros and cons of various forecasting strategies. First, univariate time series models appear to have matched short-term forecasting tasks much better than models that included economic predictors. See basic ARIMA compared to ARMAX models, where the only difference is the inclusion of per capita income as an economic driver of local sales taxes.
Second, and conversely, the inclusion of per capita income improved model performance in longerterm forecasting tasks. OLS regressions dramatically overstated revenues in all regions in the earlier forecast periods but were among the highest performing during 2020. Inclusion of these data, when performing a forecast, however, requires an expectation of future values. Good practice is to obtain outside expectations for these data; thus, Moody's forecasts circa January 2016 were used here.
Finally, it is appropriate to consider the strengths and weaknesses of each of these basic models, and a forecast can combine the strengths from each to produce composite forecasts. Derivations of the ARIMA and ARMAX models described here, accounting for each's strengths, have been combined to produce recent TSPLOST forecasts. The models discussed above are traditional statistical models and have been used in forecasting for many years. In the next section, we discuss a relatively new tool for revenue forecasting, machine learning.
48
(a) Georgia
(b) DCA Region 7 Central Savannah River Area Figure 11. Graphs. State and regional ARIMA models
(continued on next page). 49
(c) DCA Region 8 River Valley
(d) DCA Region 9 Heart of Georgia Altamaha Figure 11. (Continued) (continued on next page). 50
(e) DCA Region 11 Southern Georgia Figure 11. (Continued).
MACHINE LEARNING During the last two decades, forecasters have seen significant additions to their set of statistical and computational tools. Increasing computational power has allowed modelers to increase the complexity of the models and explore their different specifications--although, usually under a cost in terms of interpretability and tractability or with limitations in terms of the size of the data. In this section, we explore a method in the nonstructural and univariate time series forecasting literature involving automated ARIMA and three models associated with machine learning: the least absolute shrinkage and selection operator (LASSO), the KNN, and the random forest (RF) algorithm. We focus on how these methods perform in predicting revenues from the sales tax in
51
the TSPLOST regions. More details about the methods and how they are related to the more traditional ARIMA models can be found in Appendix B. Again, we will be evaluating these four algorithms on their ability to predict observable data. In this case, we have quarterly sales tax revenues for 19982022 for each of the TSPLOST regions. All models will be tested on forecasting the last eight observable periods, from the first quarter of 2021 to the fourth quarter of 2022, while using the rest of the data to estimate or train the model. We adjust the data for each of the models' required formats for estimation. Whereas the data layouts differ, all the values of the data remain the same, including the historical values of sales tax and the explanatory variables, such as Moody's estimates of the CPI, Georgia personal income, and Georgia retail sales. We will measure the performance of each of the methods assessed, based on how well their predictions fit the real and observed values for this timeframe. Our results are shown in figure 12 and explained below.
52
Figure 12. Graphs. Predictions of the proposed models by region. The first model is the automated ARIMA. Recall from the Forecasting Methods section that the ARIMA model is a leading statistical model for forecasting univariate time series and is widely used by local governments to forecast sales tax revenues. By using past data and standard assumptions, ARIMA models can identify patterns in how a variable tends to fluctuate throughout a given period. Given the data, the ARIMA model is fully described by three parameters: represents the number of lagged variables, represents the number of lagged errors, and the differentiation parameter () describes the transformation of the variable of interest before estimation. However, there is no standard way to choose the value of these parameters, making the specifications of ARIMA models time consuming and susceptible to researcher judgment, and in most cases intractable.
53
Hyndman and Khandakar (2008) proposed an automated strategy to select the model parameters of an ARIMA model in three stages (Hyndman and Athanasopoulos 2023).12 The algorithm chooses the models through a series of tests and transformations, so the model with the best fit, in this case measure through the Akaike information criterion (AIC), is chosen. Hyndman and Khandakar's automated algorithms offered an opportunity for a more extensive evaluation of the forecast properties of the ARIMA model. By explicitly setting a fitting criterion, it allows the model to be consistently defined. The results of this algorithm and our predictions are shown by the red lines in figure 12.
The remaining three models/methods are part of the ML literature. ML has become an essential tool in different disciplines (e.g., economics, finance, marketing), given its performance in estimation, model selection, and forecasting tasks (Medeiros 2022). Most of these models are not new; however, it is the recent ability to run more complex and larger versions, repeatedly and rapidly, that has made these techniques so popular during the last decade. In fact, most of these statistical models on their own are highly susceptible to overfitting and therefore have poor forecasting properties. However, in each of these models, the flexibility is achieved usually by one or two parameters, known as tuning parameters. The idea is that through repeated estimation of these models, we can "learn" from the data the optimal values for these tuning parameters through a process known as cross-validation (CV).
Nevertheless, there are some challenges when applying ML to time series. As shown in the literature review in chapter 2, ML algorithms' performance in time series tasks and forecasting is mixe
12 To be more precise, their algorithm identifies a seasonal ARIMA (SARIMA) model. These correspond to the variation of the ARIMA model that allows for simple seasonality in the data. See Appendix B for more detail on the algorithm and its estimations.
54
d. Although it is difficult to pinpoint a specific reason for this, a significant issue is that these algorithms were not designed for time series data. To apply ML to time series, we must modify the preprocessing and analysis accordingly. More details on this adjustment are discussed in Appendix B.
For the three ML methods, we first differentiate the series. We include as explanatory variables eight lags of the variables, the average and standard deviation for the last year (last four quarters) and two years (last eight quarters), and the three explanatory variables discussed above (CPI, Georgia personal income, and Georgia retail sales). The details of the CV of the methods, the tuning parameter of each method, and other adjustments on the algorithm or the data preprocessing are considered in Appendix B.
There are multiple models in the ML literature. In this report we will discuss three: LASSO, KNN, and RF. The LASSO is a parametric model, equivalent to a regression model, penalized by the number of nonzero coefficients. KNN and RF are instead nonparametric models. The predictions under LASSO (blue), KNN (green), and RF (brown) are shown in figure 12.
Each of the predictions are compared with the real value for the forecasting period (2021q1 to 2022q4) in black in figure 12. As expected, there is not one algorithm that performed over the others. For example, the ARIMA model outperformed at least one of the algorithms in three of the four regions. This is consistent with the literature review, where under some circumstances, the traditional models outperformed the ML algorithms. Nevertheless, ML can provide improvement in the forecast. By diversifying the portfolio of methodologies and models we consider in our forecast, we have shown there is space for improvement. How these new methods are put to use is discussed in the following sections.
55
FORECAST METHODS FOR TSPLOST REVENUES This section reviews the methods used to specify and test the forecast models for the TSPLOST revenues in the four regions. As a result of this process we selected two different forecasting models from a more extensive set, based on their practical and theoretical performance. An additional criterion was the model's tractability, meaning we can follow and explain its methods satisfactorily to policymakers.
Both models use sales tax collections data from the Georgia DOR. The monthly data were first adjusted so the observed amount coincides with the month of the sale that generates it, given that revenues reported are generated from the sales of the previous month (e.g., December DOR reported revenue was generated from actual November retail sales).
Data are aggregated at the quarterly level to capture unexpected movements of resources from one month to another (e.g., unexpected delays in payments) and use exogenous predictors of the current and predicted level of the state's economy, recorded at this time interval, coming mainly from the Moody's Corporation. After the forecast is completed, the monthly forecasts are generated based on the average weight of each month within its corresponding quarter.
The conservative model uses a standard ARIMA approach. We center the variables before we start our data analysis, which entails removing the mean from each value in a variable so that the resulting variable has a mean of zero. Any persistent trend in the data is eliminated, which makes it easier to identify the underlying time series patterns. The next step is to create a seasonality index, which measures the contribution of each season to the overall pattern of the series. For example, if we notice that the sales trend is higher during the holiday season, the seasonality index can account for this trend. The impact of an annual event like the Masters golf tournament
56
is also accounted for with this technique. We then run various statistical tests to help determine the number of autoregressive (AR) and moving average (MA) terms, respectively. We then check to see if the models selected fit the data well. For a more detailed description of these tests, see Appendix A.
The optimistic model is intended to introduce exogenous data to the model. ARIMA models are known to perform well in short-run forecasts, but their performance is expected to decrease as more periods are predicted. This is because as time passes, the model starts to depend more on predicted values than on the original dataset. This is a critical issue for this application because we want to predict more than 10 years of data, at least 40 quarters. The proposed model uses quarterly data, observed and predicted, on Georgia retail sales and the consumer price index using data from Moody's Analytics. As was shown in the previous section and tables, Georgia retail sales are highly correlated with sales tax collections. Other variables such as personal income and GSP are also highly correlated with sales tax collections. However, all these variables are also highly correlated with each other, including the population variable. Due to these correlations between similar variables, the models behave best if only one is included. Retails sales is selected because it has the most direct link to sales tax collections.
In particular, we assume a linear model with SARIMA errors (the ARIMA model version for seasonal data). For this model, we use an automated algorithm to define the parameters of the SARIMA in the component in the fable package in the R language. These algorithms select the specification for the model that best fits the data based on traditional information criteria (e.g., AIC and Bayesian information criterion [BIC]). We contrast the results of the automated algorithm with alternative specifications, coming from the autocorrelations and partial autocorrelations, and perform traditional tests on the model's residuals (e.g., Portmanteau tests).
57
Using the proposed model, we can make the corresponding predictions, with the advantage that the model now captures exogenous variables that are expected to perform better for long-term forecasts. Based on the conservative and optimistic models, we calibrate upper and lower bounds for each of the predictions and combine them to provide a unique forecast with its corresponding bounds. In each model, we use the standard error of the estimates and build an interval around the predicted value. Moreover, we include a time-varying volatility component to account for increasing uncertainty over the forecast horizon. Thus, the confidence interval becomes wider as the forecast period increases. In the final model, the upper bound considers the anticipated impact of the structural variables of the optimistic model. In the conservative estimate reflecting the lower bound, no change in the structural component of the forecast is included. We believe this method provides a reasonable view of uncertainty and variability inherent in forecasting exercises. We illustrate the outcome of these methods in figure 13 from the second quarter 2023 forecast.
58
Regional Annual TSPLOST Collections Actual and Forecasted for 201932 (millions $)
$140.0
Forecasted
$120.0
$100.0
$80.0
$60.0
$40.0
$20.0
$0.0 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032
CS7 Con HG9 Con
CS7 Opt HG9 Opt
RV8 Con South11 Con
RV8 Opt South11 Opt
Figure 13. Graph. Sample quarterly forecast graph with conservative and optimistic estimates.
59
CHAPTER 6. CASE STUDIES ON BUDGETING IN TSPLOST REGIONS
In this chapter we review the best practices in budgeting for transportation sales taxes by conducting six case studies of regional or county TSPLOSTs. The case studies include a mix of large, mid-sized, and smaller TSPLOST initiatives located across five states, i.e., California (Measure M in Los Angeles County and TransNet in San Diego County), Colorado (FasTracks project, Denver Regional Transportation District [RTD]), Virginia (Central Virginia Transportation Authority [CVTA]), Texas (CapMetro, Texas), and Utah (Utah Transit Authority [UTA]). The case studies gathered information through an internet search on the following:
The law authorizing levy of sales tax for transportation-related infrastructure (including information on sales tax rate, effectivity or until when the tax is imposed, collection of sales tax and process of disbursement, and local governments that benefit from the sales tax)
Use of sales tax revenue (i.e., How are the proceeds from the sales tax spent? What can the revenue be spent for? Are revenues dedicated for regional transportation projects or projects that cross jurisdictions? Is there a local component or funding for projects within a specific jurisdiction? What is the share for the regional component? What is the share for local component?)
Project selection and delivery (How is the list of transportation projects funded through the sales tax developed? Who is responsible for managing the delivery of projects in the list? Who is responsible for the design and construction of projects?)
Budgeting (What are the revenue forecasting practices that the agencies employ to estimate revenues from the sales tax? How often do they forecast revenues? Is there information about the accuracy of forecasts? Is there a budget reserve in case actual
60
revenues are less than projected revenues? What are the policies for the deposit and withdrawal management of the reserves?) MEASURE M, LOS ANGELES COUNTY Sales Tax Levy The Los Angeles County Traffic Improvement Plan, or Ordinance No. 16-01 (Measure M) is the nosunset sales tax for funding Los Angeles County transportation services such as new projects, maintenance, and subsidies. Measure M imposes an additional 0.5 percent sales tax. With 71 percent of voters voting for the measure in a referendum in 2016, Measure M became effective on January 1, 2017, and covers Los Angeles County and nine subregions (Arroyo Verdugo, Las Virgenes Malibu, Central City Area, San Gabriel Valley, North County, South Bay, Westside, Gateway Cities, San Fernando Valley) (see figure 14).13 The tax rate will increase to 1 percent as the existing sunset sales tax of 0.5 percent (Measure R) expires on July 1, 2039.13 In addition, Measure M also requires 3 percent of local contribution to the total costs of major rail transit projects.14 The sales taxes are collected by the Board of Equalization and remitted to the Measure M fund and subfunds.
13 Los Angeles County, California. Proposed Ordinance #16-01 Measure M Los Angeles County Traffic Improvement Plan (2016). 14 Metro (2017). Measure M Final Guidelines.
61
Use of Sales Tax The Los Angeles County Metropolitan Transportation Authority (Metro) manages the fund and subfunds for the Measure M sales taxes. The ordinance restricts the utilization of Measure M fund balance. Measure M revenues must be allocated as follows:
Metro rail operations (5 percent). Transit operations (Metro and municipal providers) (20 percent). ADA Paratransit for the disabled and Metro discounts for seniors and students (2 percent). Transit construction (35 percent). Metro State of Good Repair projects (2 percent). Highway construction (17 percent), Metro active transportation program (2 percent). Local return for local projects and transit services (16 percent) and for regional rail
(1 percent).15 All major projects are expected to remain within their approved budget and planned schedule. The local return must be distributed to municipalities16 based on the population shares and must not be used for nontransportation purposes. The share of local funds will increase to 20 percent, which will become effective on July 1, 2039.17
15 Metro (2022). Annual Report on Fiscal Year 2021 Audits. 16 88 cities and the county of Los Angeles 17 Metro (2017). Measure M Final Guidelines.
62
Source: Hymon, S. (2016, September 21). "Measure M: Projects and Programs in Central Los Angeles." The Source. Retrieved December 1, 2022, from https://thesource.metro.net/2016/09/15/measure-m-projects-and-programs-in-central-los-angeles/
Figure 14. Map. Measure M projects map.
63
Project Selection and Delivery Metro is responsible for the planning, designing, operating, and maintenance of the Measure M projects. Metro is governed by a 14-member board of directors and also has an Independent Taxpayer Oversight Committee to oversee Metro's compliance with the ordinance (see figure 15).18 Starting in 2022 and every five years afterward, the Metro board reviews and adopts a Five-Year Comprehensive Program Assessment to evaluate every project of Measure M. Beginning in 2027, Metro assesses the progress of projects or programs ("Expenditure Plan Major Project" and "MultiYear Subregional Program") every 10 years and adds an expenditure plan.19 The Metro board of directors may make amendments, which require approval by a two-thirds vote once the assessments are reviewed by the Measure M Independent Taxpayer Oversight Committee and the public.19
18 Metro (2022). Annual Report on Fiscal Year 2021 Audits. 19 Los Angeles County, California. Proposed Ordinance #16-01 Measure M Los Angeles County Traffic Improvement Plan (2016).
64
Source: Metro (2021). Annual Comprehensive Financial Report FY 2021
Figure 15. Chart. Los Angeles County Metropolitan Transportation Authority management organizational chart.
Budgeting The Metro board of directors must adopt a final annual budget by June 30 every year. The budget should be prepared based on estimates and assumptions about budgeted revenues and expenditures. Metro forecasts a five-year cash flow, including estimated Measure M tax revenues every year. Metro Planning, the Office of Management and Budget (OMB), and the State Treasurer's Office work together to prepare the revenue forecast in October. Metro Planning forecasts revenues based on the projection for the first year and sales tax growth rates (a base-year estimate). Planning, OMB, and the Treasury decide each year's sales tax growth rate.
65
Cashflow surpluses can go to the Highway Contingency Subfund and Transit Contingency Subfund or other purposes following the Ordinance or other long-term plans. Those contingency subfunds are to be used to adjust for inflation for eligible future projects, including more than two-thirds of the current funding programs scheduled after FY 2026 or pay the interests of bonds issued for the projects.20
TRANSNET, SAN DIEGO COUNTY Sales Tax Levy TransNet is the 0.5 percent, county-wide sales tax program to fund transportation projects in San Diego County. TransNet was initially approved as a 20-year program in 1987 and planned to operate from 1988 to 2008. In 2004, however, San Diego voters approved the program's extension until 2048. Sales taxes are collected by the California Board of Equalization, which then remits revenues to the San Diego Association of Governments (SANDAG). The tax revenue should be deposited to a special fund and distributed under the San Diego County Regional Transportation Commission Ordinance. A city or county could also receive reimbursement for the approved projects if the costs were incurred before the fund was accessible.21 Eighteen cities22 in San Diego and San Diego County Government benefit from the TransNet program.
Use of Sales Tax The sales taxes from TransNet are used for the following:
20 Metro (2017). Measure M Administrative Procedures. 21 SANDAG (2022). TransNet Ordinance and Expenditure Plan Rules. 22 Carlsbad, Chula Vista, Coronado, Del Mar, El Cajon, Encinitas, Escondido, Imperial Beach, La Mesa, Lemon Grove, National City, Oceanside, Poway, San Diego, San Marcos, Santee, Solana Beach, and Vista.
66
1. Congestion Relief Program--major transportation corridor improvements; transit system service improvements and related programs; local system improvements and related programs.
2. Transportation Project Environmental Mitigation. 3. Bicycle, Pedestrian and Neighborhood Safety Program. 4. Administration and Independent Taxpayer Oversight Committee. Project Selection and Delivery The SANDAG board of directors approves projects to be funded by TransNet based on the Regional Transportation Plan (RTP) (see figure 16). The projects are selected through extensive assessments using surveys and focus group methods by various decision-makers such as "residents, businesses, environmental and community leaders as well as elected officials from the 18 cities and county government."23 A five-year and biennial program of projects is developed and submitted by local agencies and should be approved by the San Diego County Regional Transportation Commission for funding.
23 TransNet (2008). Frequently Asked Questions.
67
j
Source: https://www.sandag.org/
Figure 16. Chart. SANDAG organizational chart. Budgeting The Plan of Finance (POF) for TransNet SANDAG estimates the future (primarily the next 57 years) of costs and revenues, including sales tax revenues. The forecasts are based on conservative assumptions about population, income growth, and inflation rates reflecting historical trends and economic business cycles.24 The forecast models are revised with important c
24 TransNet (2009). TransNet Extension Program--Plan of Finance.
68
hanges in the assumptions over 40 years. The POF is updated annually based on the most recent revenue forecasts.25
The annual growth rate for taxable sales is forecasted by the SANDAG Demographic and Economic Forecast Model (DEFM). In 2016, they found that there were significant data errors that led to an overestimation of revenue forecasts (on average a -22.6 percent difference between forecasted and actual sales tax collections during 20092016). Since then, SANDAG has used widely accepted national and local data.
Proposition A for TransNet requires setting a contingency fund ("Project Reserve Fund") to weather sudden changes in revenue collection of about 5 percent of the funds. This contingency fund can be used for "protection for future facilities, project studies, and environmental studies, assessments, and related work, including a two-phase study of trolley extensions in the South Bay and local matching funds."26 Using the funds for other programs requires approval from the commission.
CENTRAL VIRGINIA TRANSPORTATION AUTHORITY Sales Tax Levy CVTA was established by Virginia House Bill 1541 (HB1541) to provide an additional transportation system throughout the nine localities in Planning District 15, i.e., the Town of Ashland, City of Richmond, and the counties of Charles City, Chesterfield, Goochland, Hanover,
25 Sjoberg Evashenk Consulting Inc. (2018). TransNet Extension Ordinance: 10-Year Look-Back. San Diego County Regional Transportation Commission. 26 The San Diego Association of Governments (1987). Proposition A: San Diego Transportation Improvement Program.
69
Henrico, New Kent, and Powhatan (see figure 17).27 HB1541 imposes an additional sales and use tax of 0.7 percent (effective October 2020) as well as a gas tax of 7.6 cents per gallon of gasoline and 7.7 cents per gallon of diesel fuel (effective July 2020) from the localities.28 The collected taxes are deposited monthly to the Central Virginia Transportation Fund in the state treasury and distributed to CVTA. The members of CVTA can be reimbursed for their expenses according to the law when they are approved by CVTA.
Source: https://planrva.org/home/our-localities/
Figure 17. Map. CVTA localities.
27 Central Virginia Transportation Authority: Plan RVA. Plan RVA | Central Virginia Transportation Authority. (2022, May 28). Retrieved November 21, 2022, from https://planrva.org/transportation/cvta/. 28 Virginia General Assembly. HB 1541 Central Virginia Transportation Authority (January 10, 2020).
70
Use of Sales Tax CVTA is required to use 35 percent of the Central Virginia Transportation Fund for regional projects. The regional revenue can be used to fund capital reserve, debt services, and other regional projects that follow the criteria of CVTA and are approved by the Authority. CVTA also distributes 15 percent of the fund to the Greater Richmond Transit Company (GRTC) as GRTC's revenue to provide transportation services in the region as described in the Regional Public Transportation Plan.
Every month, CVTA distributes 50 percent of the fund to the member localities for their local transportation needs, based on the share of each locality's tax contribution.29 The local component of the fund is deposited into a separate fund and used according to the CVTA Act and bylaws. GRTC and localities granted the funds are required to submit a quarterly report and an annual report to the Authority.
Project Selection and Delivery The CVTA regional funding can only be allocated to the transportation projects of "Highway, Bike/Pedestrian, Transit, Multimodal, Bridge, Studies, and Preliminary Engineering (PE)" through the selection process on an annual basis. The nine constituent local governments of the planning district of CTVA are eligible to submit applications for regional funding. The maximum number of applications is two times the weighted votes for each locality, which is proportional to its population. The proposed projects are evaluated and ranked based on costbenefit analysis. CVTA staff and the CVTA Technical Advisory Committee (TAC) select
29 Department of Planning and Budget (2020). Fiscal Impact Statement.
71
projects and the allocations of 46 years of their funding. Members and regional transportation partners of CVTA also participate in the process. The projects that are considered valuable to, and needed by, the region--and are less likely to be funded otherwise--are prioritized.30 GRTC delivers buses, paratransit vans, and specialized transit services in the CVTA area, mainly for the City of Richmond and parts of Chesterfield and Henrico counties. Bay Transit also serves some rural parts of the CVTA jurisdictions. Budgeting The CVTA has to approve an annual budget by May 15 for the annual transfer to the Special Revenue Fund, as well as general and administrative operations.31 The Virginia Department of Taxation provides the estimated revenue on which annual budgets are based. The frequency or the accuracy of the forecasts is still being assessed. The Finance Committee is required to review the financial status of CVTA quarterly, and the Authority needs to do so annually or whenever needed. CVTA decides the disposition of general fund balances at the end of every year. Even though there is a line item of reserves for contingency in the CVTA Administrative and Operating Expense Budget FY 2023, it is not found in the Financial Policies and Procedures documents.32
30 CVTA (2021). CVTA Regional Project Selection and Allocation Framework. 31 CVTA (2021). Results of the Audit. 32 CVTA (2021). Financial Policies and Procedures.
72
FASTRACKS PROJECT, DENVER RTD
Sales Tax Levy
FasTracks is a transit expansion program of the RTD applied to the Denver metropolitan area. Voters approved a 0.4 percent sales tax increase for FasTracks in 2004 to the total sales tax rate of 1 percent (the base sales and use tax of 0.6 percent since May 1, 1983), which has been effective since January 1, 2005. Also, through voters' approval in 2004, RTD is exempt from Colorado's Taxpayer's Bill of Rights (TABOR) Amendment, which restricts taxes, revenue, and spending for state and local governments.33
Use of Sales Tax
The original 2004 FasTracks program mostly focuses on rapid transit corridors, bus service enhancement, transit facilities, and transit amenities.34 With the Central Platte Valley and the Southeast Corridor project, RTD requires local contributions for the costs of the projects (about 2 percent of total costs). However, the source and form of local contributions can be decided by each locality, e.g., "right-of-way dedications, permit fee waivers, cash contributions, corridor utility relocations, as well as any other direct, project-related corridor contributions."35 A portion of the costs from major projects is to be paid with local contributions of about 2 percent of total pr
33 Regional Transportation District (RTD) (2022). Annual Comprehensive Financial Report FY 2021. 34 RTD (2018). Fastracks Program Overview--Executive Summary. "the key elements included in the 2004 FasTracks Plan, including: 119 miles of light rail and commuter rail: construction of new rapid transit in six corridors and existing rapid transit enhancements and extensions in three corridors; 57 new rail/BRT stations; 18 miles of bus rapid transit (BRT); enhanced bus network and transit hubs (FastConnects); 31 new Park-n-Rides and expansions to nine Park-nRides (adding more than 21,000 parking spaces); the renovation of Denver Union Station into a major multimodal center providing access to nearly every rapid transit line and regional buses, local circulators and intercity rail/bus service; transit facilities and amenities to improve safety, convenience and use of the transit system; and opportunities for transitoriented development (TOD)." 35 RTD (2018). Fastracks Program Overview--Executive Summary.
73
oject costs (about $95 million) from local governments who directly benefit from the projects.36 Project Selection and Delivery The regional projects were developed by RTD, which is governed by the 15 elected members of the board of directors. RTD, the Colorado Department of Transportation (CDOT), and local governments cooperate on planning and environmental studies, cost estimates, and public outreach (see figure 18).37 RTD is responsible for managing and providing the FasTracks programs identified. RTD also began to facilitate a publicprivate partnership (P3) for implementing FasTracks projects such as the Eagle Project. According to a Senate Bill (privatization legislation), up to 58 percent of the RTD vehicular services shall be operated by private contractors through competitive bids.38
Source: RTD (2022). Annual Comprehensive Financial Report, Fiscal Year 2021
Figure 18. Chart. RTD organizational chart.
36 RTD (2004). 2004 FasTracks Plan. 37 RTD (2004). 2004 FasTracks Plan. 38 RTD (2022). Annual Comprehensive Financial Report FY 2021.
74
Budgeting The General Manager submits a proposed annual budget (operating and capital) to the board of directors by October 15. The board must adopt the proposed budget before the fiscal year commencing (i.e., January 1). RTD forecasts sales tax revenues based on the sales tax growth rates estimated by the consulting firm AECOM from 2010 through 2025 using the data from the Center for Business and Economic Forecasting (CBEF).39 RTD estimates short, medium, and long-term sales tax collections. The model contains national and state economic and demographic indicators.40 Short-term predictions are primarily based on monthly data to produce forecasts that are aggregated into quarterly or yearly data. The average error rate is -0.8 percent for the short-term forecast.41 To increase the accuracy of estimates on sales tax revenues, RTD adopts the Annual Program Evaluation (APE) for FasTracks, has its forecasting methodology reviewed by experts, and involves the University of Colorado Leeds School of Business to prepare future forecasts. RTD is required to provide revenue forecasts at least every six years and update them on an annual basis.
For budget reserves, RTD has a board-appropriated fund, a capital replacement fund, and an unrestricted operating reserve. A board-appropriated fund and an unrestricted operating reserve are used to prevent cashflow disruptions due to unanticipated revenue shortfalls or sudden expenditure increases. A capital replacement fund is for capital purchases including replacing
39 RTD (2004). Regional Transportation District FasTracks Financial Plan. 40 RTD (2021). Board Briefing Documents--October 2021. "Forecasts of the national indicators that are needed to drive the state and district forecasts come from Moody's Analytics. Population projections that enter the equations of the longterm model come from Moody's Analytics and the Colorado Demography Office. Colorado economy forecasts are derived from updated specifications of the Colorado Economy Model." 41 Business Research Division (BRD) Leeds School of Business, University of Colorado (2021). RTD Forecast Model Short, Medium, and Long-Term Econometric Forecasts of RTD Sales and Use Tax Revenues: September 2021 Update.
75
major vehicles. The use of a board-appropriated fund or a capital replacement fund needs the approval of the board, and the replenishment of these funds should occur as promptly as possible. RTD is also under an obligation to maintain an Emergency Reserve (3 percent of non-Federal revenues) following Colorado's Restricted TABOR.42 The reserve can only be used for declared emergencies such as natural disasters or pandemics and not for mere economic downturns and revenue deficits.43 Also, the reserve must be refurbished in the following year of its utilization.44
For FasTracks, the sum of the Board Appropriated Fund and unrestricted fund balance needs to equal three months of FasTracks operating expenses. Also, RTD established the FasTracks Internal Savings Account (FISA) to save excessive revenues for financing unfinished FasTracks projects. The board determines the source of funding and use of FISA. FasTracks funds may not be used for nonFasTracks expenditures.43
CAPITAL METROPOLITAN TRANSPORTATION AUTHORITY, TEXAS Sales Tax Levy CapMetro provides regional public transportation services in Central Texas. CapMetro was created in 1985 by Chapter 451 of the Texas Transportation Code. CapMetro is funded by a 1 percent sales and uses tax revenues of its members: Cities of Austin, Jonestown, Lago Vista, Leander, Manor, Point Venture, San Leanna; counties of Travis (Precinct Two) and Williamson
42 RTD (2022). Annual Comprehensive Financial Report FY 2021. 43 RTD (2020). Fiscal Policy Statement. 44 What is Tabor? | Jefferson County, CO. (n.d.). Retrieved November 12, 2022, from https://www.jeffco.us/3994/Whatis-TABOR
76
(Anderson Mill area) (see figure 19). The authority has also made interlocal agreements with municipalities outside the member governments to provide services to those regions.45 The additional tax rate was once decreased to 0.75 percent by the CapMetro board of directors. However, the tax rate has returned to 1 percent since October 1, 1995.46 Sales tax revenues of the service areas are recorded, and the receipts are provided monthly by the Texas Comptroller of Public Accounts. Sales taxes comprise 60.7 percent of the total revenues of CapMetro in FY2020.47
Source: CapMetro (2021). CapMetro Approved FY2023 Operating and Capital Budget & Five-Year Capital Improvement Plan.
Figure 19. Map. CapMetro service area.
45 Cities of Georgetown, Pflugerville, Round Rock, and Buda, and Travis County. 46 Capital Metropolitan Transportation Authority [CapMetro]. (n.d.). Capital Metro's life story. About Capital Metro News and Info Capital Metro Transit Austin, Texas. Retrieved November 12, 2022, from https://web.archive.org/web/20101227055655/http://capmetro.org/InsideMetro/history.asp 47 CapMetro (2020). Comprehensive Annual Financial Report FY 2020.
77
Use of Sales Tax CapMetro provides a wide range of transportation services such as MetroBus, MetroExpress, MetroRapid, MetroRail, Night Owl, E-Bus, Pickup, University of Texas Shuttles, MetroAccess, MetroRideShare, Freight rail, on-demand pickup services, and parallel door-to-door service for the disabilities.48 Project Selection and Delivery The Capital Metro board of directors oversees and governs CapMetro (see figure 20). The eight members of the board are responsible for deciding the organization's general policies regarding the organization's operation, oversight, and management. The board of directors and CapMetro's executive management team are responsible for developing the Strategic Plan together to provide service goals and tasks. They are also in charge of the delivery of projects. CapMetro also contracts with private companies delivering transit services such as "fixed-route, rail, and paratransit service."48
48 CapMetro (2020). Comprehensive Annual Financial Report FY 2020.
78
Source: CapMetro (2020). Comprehensive Annual Financial Report FY 2020.
Figure 20. Chart. CapMetro organizational chart.
79
Budgeting State law requires the board of directors to adopt an annual budget before every fiscal year begins (i.e., October 1) and to approve a five-year capital improvement plan. Thus, forecasts for revenues and expenditures for the next five years are also prepared annually. The longer term financial forecasts are prepared at least every five years or less if needed. CapMetro estimates revenue and expenditures for 10 years depending on different economic scenarios as well as the growth assumptions on historical growth, inflation, and contractual obligations.49
CapMetro is obliged to maintain "the reserve and budgetary contingency balances" separate from cash balances.50 The balances and changes in the balances are reported monthly to the board of directors and annually for the budget and long-term planning. CapMetro must have "a statutory operating reserve (at least two months of the prior fiscal year's operating expenses), a capital project reserve (no less than $2 million and 10 percent of the average capital expenditure), a budgetary operating contingency account (at least 2 percent of the prior fiscal year's operating expenses), a self-insurance reserve (at least 25 percent of the prior fiscal year's actual claim payments), and a budget stabilization reserve or `rainy day fund' (one month of annual average operating expenses)."51 According to the FY 2020 Comprehensive Annual Financial Report, the balance of the statutory operating reserve was $39.7 million, the budget stabilization reserve was $21.5 million, and the selfinsurance reserve was $1.3 million.52 The withdrawal of the reserves is
49 CapMetro (2021). CapMetro Approved FY2023 Operating and Capital Budget & Five-Year Capital Improvement Plan. 50 Senate Bill 650. 51 CapMetro (2021). CapMetro Approved FY2023 Operating and Capital Budget & Five-Year Capital Improvement Plan. 52 CapMetro (2020). Comprehensive Annual Financial Report FY 2020.
80
at the discretion of the board (or the chief executive officer) for financing unexpected temporary budget deficits or emergencies.
UTAH TRANSIT AUTHORITY Sales Tax Levy UTA was established in 1970 to provide mass transportation services to the Utah area. UTA is governed by the board of trustees consisting of three appointed members. The operating funding source of UTA is mostly coming from a portion of local sales taxes. Sales tax revenue is expected to account for 78.8 percent of the total operating revenues in 2023. The local option sales taxes for UTA are imposed by six counties and some of their cities that benefit from the mass transit services UTA provides (see table 2). Each county's sales tax rate as of 2021 is: Salt Lake, 0.7875 percent; Davis, 0.65 percent; Weber, 0.65 percent; Box Elder, 0.55 percent; Utah, 0.6260 percent; and Tooele, 0.4 percent.53 Sales taxes are accrued and collected monthly by the Utah State Tax Commission. UTA's Comptroller is required to report monthly sales tax receipts to Corporate Staff. UTA receives them about 60 days after the tax collection.
Use of Sales Tax UTA mainly provides bus, bus rapid transit, paratransit, light rail (TRAX rail), commuter rail, streetcar, and Innovative Mobility Solutions.54 Sales tax revenues are also used for administration, operations support and maintenance, debt servicing, reserves, and capital projects support. The use of some portion of local sales taxes is sometimes restricted. For example, a
53 UTA (2022). Tentative Budget 2023. 54 UTA (2019). Utah Transit Authority Fast Facts
81
ccording to Proposition 3, the additional 0.25 percent sales tax in Utah County should be mostly spent on building commuter rail and highway construction. Only 5 percent of the increased fund can go to other transit projects, such as bus rapid transit; in Salt Lake County, at least 25 percent of the additional tax must be spent for preserving routes for future roads.55
Table 2. UTA contributions from other governments.
Source: FY 2021 Annual Comprehensive Financial Report.
Project Selection and Delivery The three members of the board represent localities and are appointed by the governor (figure 21). The board of trustees governs and sets general policies and long-term missions for the
55 Deseret News. (2006, November 8). Transit measures approved. Deseret News. Retrieved December 1, 2022, from https://www.deseret.com/2006/11/8/19984540/transit-measures-approved.
82
UTA. The board also provides internal audit and fiscal oversight. The Office of the Executive Director manages daily operations and establishes the annual plans for the organization and partners.56
Source: FY 2021 Annual Comprehensive Financial Report.
Figure 21. Chart. UTA organizational chart.
56 UTA (2022). Tentative Budget 2023.
83
Budgeting The executive director and chief financial officer (CFO) prepare an annual budget, including a FiveYear Capital Plan every year. The board of trustees approves and adopts the annual budgets in December of the prior year. Since 2021 (the 2022 Budget), UTA had a contract for forecasting sales tax growth with the Economic Development Unit (EDU) at the University of Utah, on which annual budgets and five-year plans are based. The forecasted growth rate was 6.1 percent for 2022 over the sales tax revenue of 2021. The sales tax revenue forecast is prepared using the sales tax data of past years and econometric models. The forecasts are expected to be updated to support the final budget documents in November.57
UTA is required to maintain reserves such as general operating reserves, service stabilization reserves, bond reserves, capital replacement reserves, self-insurance/catastrophic reserves, and debt reduction reserves. The minimum amount of general operating reserves is 12 percent of the budgeted operating expense; service stabilization reserve at 3 percent of the budgeted operating expense; bond reserves at a level demanded by bond covenants; capital replacement reserve at 1 percent of the property, facilities, and equipment cost. The board of trustees could also create other reserves.
General operating reserves can be used upon the decision of the Treasurer. Service stabilization, bond, or capital replacement reserves and debt reduction reserves can only be used upon the approval of the board of trustees. The board also decides on the full reimbursement schedule. Reserves must start to be replenished within 24 months after the first use of reserves.57
57 UTA (2022). Tentative Budget 2023.
84
CHAPTER 7. CONCLUSIONS
Due to the long-term planning involved in the design and construction of transportation infrastructure projects, transportation planners need reliable long-run forecasts and stable sources of revenue over the project timeline. In the absence of stable revenue streams and reliable revenue forecasts, planners may be reluctant to engage in more costly and complex projects. Conversely, if such projects are undertaken, they may often be more expensive than projects produced with a more stable funding mechanism.
Funding streams typically vary from period to period due to inherent volatility in the funding source or unforeseen events, or both. Additional understanding of the economy and improved forecasting techniques can reduce some of the uncertainty of volatile revenue sources. For instance, improved information and well-chosen forecasting techniques are likely capable of reducing the uncertainty regarding future receipts associated with seasonal effects, changes in income, and population, though they will not eliminate the inherent volatility in a funding source. Smaller taxing jurisdictions rely on narrow tax bases or relatively few taxpayers. Such a situation results in a revenue source that will be more volatile compared to a larger jurisdiction with a more diverse set of industries and larger population.
Furthermore, improved forecasting techniques are not capable of anticipating unexpected events, such as a hurricane or wildfire, but budget strategies exist that can mitigate volatility in revenues arising from unknowable events. These strategies include the use of rainy day funds and revenue diversification, among others. Use of these strategies could serve to reduce uncertainty of future funding streams, and, coupled with improved information regarding the TSPLOST economies
85
and improvements in forecasting practices, could result in a more stable budget and project schedule for transportation planners.
The four TSPLOST regions represent a relatively small share of the state economy--ranging 1 to 4 percent of the GSP. Despite being smaller economies, they do share certain similarities with the state. Most economic activity in the four regions occurs in the largest regional urban center. In two regions, roughly 74 percent of regional GRP is generated in the two counties that comprise the largest cities: Chattahoochee and Muscogee counties in RV (city of Columbus) and Augusta Richmond and Columbia counties in CS (city of AugustaRichmond). In SGA and HOG, on the other hand, the two largest counties in terms of GRP account for 50 percent and 36 percent of regional economic activity, respectively.
In all four regions, the largest economic sector in terms of output is services, including business, health, and education. In three regions, CS, RV, and SGA, services have grown substantially during the years 20012020, nearly doubling the GRP generated by the sector. In contrast, growth in other sectors has been modest in these three regions. In HOG, services have grown; however, manufacturing has grown in its share of GRP as well.
Still, there are important differences between the economies of the state and the four regions. For instance, in the TSPLOST regions, the manufacturing sector is a larger share of GRP on average than for the state as a whole. For manufacturing, HOG and SGA have higher shares of sector employment (at roughly 15 percent) than the state average. HOG and SGA also have a higher share in agriculture than the state average. These differences in regional economies compared to the state have implications for the forecasting of TSPLOST revenue in the regions.
86
Roughly half the sales tax collected in the four TSPLOST regions comes from three sectors: retail trade, accommodations and food service, and entertainment services. Yet these three sectors account for a much smaller share of regional output, about 10 percent in 2021, suggesting that the large other sectors of the regional economies also have an indirect role through taxable purchases or changes in the levels of worker income.
At the state level, the largest industry sector by revenue is strongly and positively correlated to state sales tax collections. Thus, at the state level, economic activity such as business services moves together in the same direction, and with the same magnitude, as sales taxes. However, these correlations for the four TSPLOST regions, while still positive, are weaker. Only in the largest economic region, CS, are the correlations between large industry sectors and sales tax collections similar to those of the state; for the other three regions, the correlations are weaker. These lower valued correlations suggest that whereas firms in the large sectors of business, education, and healthcare services can be used as a leading indicator of future sales tax receipts, caution should be used because the magnitude of the change is influenced by additional factors.
To inform what additional data may be useful in the quarterly forecasts, we examined the correlations of sales tax collections with statewide forecasted economic variables for GSP, Georgia retail sales, and Georgia personal income. Here again, the relationship between the economic variables and sales tax is strong at the state level but weaker for the four TSPLOST regions. This suggests that several approaches to forecasting sales tax should be examined, including models that incorporate the relevant economic data and those that use only historical regional sales tax collections.
87
Many different forecasting models are used for these types of data. Through statistical testing and analysis, as well as assessing the tractability of the models, we determined that ARIMA-type models were best suited for this task. ARIMA models rely on the premise that actual past sales tax collections are a good predictor of future sales tax collections. The models have internal parameters that can be used to adjust for seasonality and trends in the data. These models can account for annual events that recur at roughly the same time each year (e.g., the Masters golf tournament). Single-variable ARIMA models are frequently used by forecasters and are reliable and tractable for policymakers. These models are best suited for a shorter term forecast. To improve the forecast for longer time periods, we included economic variables forecasted by Moody's Analytics for Georgia retail sales and the consumer price index, a measure of anticipated inflation. We show through a variety of measures, including statistical tests using historical data and graphical analysis, this family of models is preferred going forward for the quarterly forecasts. Figure 22 is graphical depiction of the preferred models for each region for the regional sales tax using annual collections data. The figure illustrates the conservative and optimistic specifications as compared to historical data starting in 2010 until 2022. After 2022, the model's specifications diverge to account for larger standard errors as uncertainty grows with the longer time periods.
88
$120,000,000
$100,000,000
$80,000,000
$60,000,000
$40,000,000
$20,000,000
Collections CS
Cons CS
Opt CS
Collections SGA
Cons SGA
Opt SGA
$0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027
$90,000,000 $80,000,000 $70,000,000 $60,000,000 $50,000,000 $40,000,000 $30,000,000 $20,000,000 $10,000,000
$0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027
Collections HOG
Cons HOG
Opt HOG
Collections RV
Cons RV
Opt RV
Figure 22. Graph. Annual regional sales tax collection, actual and predicted.
89
Note the large drop in actual sales tax collections in all regions from 2012 to 2013. This was the result of a change in how cars were treated for sales tax purposes. In 2013, the Georgia Title Ad Valorem Tax (TAVT) was implemented, removing car sales from the sales tax base. Due to this legislative change, regional sales tax collections declined by 20 percent or more from 2012 to 2013. It is not possible for forecasting models to predict these one-off events, but the ARIMA models do react to the change and use the historical data to make future predictions. Figure 23 uses the same sales collections data as figure 22 but plots the quarterly collections. Figure 23 illustrates how quarterly volatility in actual sales tax collections contributes to the volatility of the quarterly forecast. Figure 22 shows fairly consistent annual sales tax collections from 2013 to 2018, with a small upward trend. In contrast, the quarterly collections data in figure 23 are much more volatile and have more variation, and thus the trend is harder to parse. This volatility also shows up in the forecasts for the conservative and optimistic models. The quarterly forecasts appear to be less accurate than the annual ones. But this is merely a reflection of the underlying variance in the actual collections data. Again, this variation is greatly reduced when summed to an annual collection amount. Beginning in 2019, there is a higher growth rate in actual annual collections. The impact of COVID19 and the accompanying federal funds in 2020 further support growth in collections. Again, due to these one-off events, the models lag somewhat behind in their predictions. This is most apparent when looking at the quarterly data in figure 23.
90
2010q1 2010q4 2011q3 2012q2 2013q1 2013q4 2014q3 2015q2 2016q1 2016q4 2017q3 2018q2 2019q1 2019q4 2020q3 2021q2 2022q1 2022q4 2023q3 2024q2 2025q1 2025q4 2026q3 2027q2
$30,000,000 $25,000,000 $20,000,000 $15,000,000 $10,000,000
$5,000,000 $0
Collections CS
Cons CS
Opt CS
Collections SGA
Cons SGA
Opt SGA
$25,000,000 $20,000,000 $15,000,000 $10,000,000
$5,000,000 $0
Collections RV
Cons RV
Opt RV
Collections HOG
Cons HOG
Opt HOG
Figure 23. Graphs. Quarterly regional sales tax collection, actual and predicted.
2010q1 2010q4 2011q3 2012q2 2013q1 2013q4 2014q3 2015q2 2016q1 2016q4 2017q3 2018q2 2019q1 2019q4 2020q3 2021q2 2022q1 2022q4 2023q3 2024q2 2025q1 2025q4 2026q3 2027q2
91
In comparing the conservative model and the optimistic model to actual collections, there is no need to make adjustments, as the standard errors remain the same for the forecasting period of interest. Thus, the two forecasting models track each other fairly closely. After the actual data series ends in 2022, an adjustment to the model's standard errors is needed, as uncertainty grows with the longer time periods that are forecasted. These adjustments to the standard errors create two distinct forecasts. The conservative forecast shows collections slowing and returning to a pattern more similar to those during 20132018. The optimistic forecast maintains the greater growth that started in 2019 and then slowly tapers off in the out-years after 2025.
Lastly, the best practices in budgeting for transportation sales taxes are reviewed in the literature. Subnational governments' revenue volatility is not a simple function of changes in the national economy. For example, the revenue volatility for states has not always corresponded to economic changes, and the extent of tax volatility is, on average, wider than that of economic volatility. Factors that contribute to this wider volatility include revenue source fluctuations, economic downturns, and forecasting errors. Best practices to manage this volatility are limited to rainy day funds (also known as budget stabilization funds) and revenue diversification.
To better understand how revenue volatility is managed, we used six case studies from a mix of large, mid-sized, and smaller TSPLOST initiatives located across five states: California, Colorado, Virginia, Texas, and Utah. The case studies illustrate the following topics:
Laws authorizing levy of sales tax for transportation-related infrastructure. Methods for regulating sales tax revenue spending. Methods for selecting and successfully delivering projects. Budgeting practices and procedures for the projects and revenues.
92
The regional TSPLOST has been adopted by 64 counties in four transportation regions across the state. As of January 2020, approximately $1.1 billion has been raised from this revenue source and almost 700 projects have been completed. Because the tax is scheduled to expire in three of the four regions by the end of 2022, voters will decide whether to extend the tax in the near term. The provision of accurate forecasts will provide transparency and better inform voters' and transportation planners' decisions. It is not possible to quantify the benefits stemming from this research for the purposes of constructing a benefitcost ratio. On the other hand, the cost of this research project is computed to be less than 0.02 percent of the total revenue raised over the 10year period of the TSPLOST during the years 20132022.
93
APPENDIX A: STEPS TO CREATE SEASONAL ARIMA MODEL
To start, we consider multiple forecast methods and test their ability to predict historical revenues and their explainability and complexity. Recent forecasts have settled on the combination of two separately modeled seasonal ARIMA models.
ARIMA is the leading statistical model for forecasting, and it remains one of the fundamental tools in the forecasting community. This can be particularly useful for tax revenue forecasting, where past trends can provide important insights into future revenue collection. By using past revenue data, ARIMA models can identify patterns in how tax revenue tends to fluctuate throughout the year, such as higher collections during certain months or quarters. Nevertheless, there are several ways to specify them. The forecast for each of the regions is a combination of two of these models, each selected to perform better under different contexts and time frames.
The data used for the analysis come from two primary sources: (1) monthly sales tax revenue distribution data from the Georgia DOR and (2) Moody's forecast data to both introduce exogenous economic factors into the forecast as well as predict future economic conditions. Lastly, we gather county economic data from the Federal BEA to better understand local economic conditions.
The monthly data are first adjusted so the observed amount coincides with the month of the sale that generates it, given that revenues for a specified month should represent sales during the previous months. The data are aggregated at the quarterly level to capture unexpected movements of resources from one month to another and matching the time interval used for our exogenous economic variables.
94
Quarterly data are preferred because monthly data are likely to have more white noise due to shortterm fluctuations caused by factors such as seasonality, weather, etc. Quarterly data are usually more stable and less subject to short-term fluctuations, which can result in less white noise. (Annual data would have even less white noise and fluctuations, but there would be fewer data periods available.) Revenues are received by counties monthly so forecasts are typically delivered based on the timing the revenues will be received and budgeted. As a final step, our quarterly forecasts were converted into monthly forecasts based on the average participation of each month within its corresponding quarter.
The Portmanteau test is used to determine whether a time series is white noise (wntestq in Stata). When the null hypothesis (time series is white noise) is rejected, it is recommended to use higher orders of autoregressive or moving average terms to capture additional correlation in the data. Adding exogenous variables can also help reduce noise in residuals. Model selection for monthly forecasting considers the following criteria:
1. Number of coefficients that are statistically significant, both autoregressive and moving average components for which higher values of significance is preferred.
2. AIC and BIC are used to compare models for goodness of fit of the models for which lower values are preferred.
3. Log likelihood is also used as a measure of how well the model fits the data for which higher values are preferred. This test value is used to evaluate and compare forecast models.
4. Sigma is the estimated standard deviation of the residuals in the model for which lower values are preferred. This test is also used to evaluate and compare forecast models.
95
The above tests are all used to ascertain which model most accurately captures the underlying datagenerating process. The AIC and the BIC are two often-used measures for choosing the optimal model. They provide a way to balance the goodness of fit of a model against its complexity to determine the simplest model that adequately captures the data. Both measures are used to select the model that best fits the data. Additionally, the Portmanteau test is used to check the adequacy of the fitted models. Sigma and log likelihood are also used to compare and evaluate models. These steps provide a thorough process for evaluating the performance of a forecast model to predict historical data and help us choose the ultimate forecast model or models.
For our recent TSPLOST forecast, the conservative model uses a standard ARIMA approach. We center the variables before we start our data analysis. This is done by adjusting the variables so that they have a mean of zero. Any persistent trend in the data is eliminated, which makes it easier to identify the underlying time series patterns. The next step is to create a seasonality index, which measures the contribution of each season to the overall pattern of the series. For example, if we notice that the sales tax collections are higher during the holiday season, the seasonality index can account for this trend. The Augmented DickeyFuller (ADF) test is used to determine whether the data are stationary. The test involves estimating an autoregressive model of the data and then determining if the residuals from this model have a unit root (nonstationary). Next, we use autocorrelation correlogram and partial autocorrelation correlogram to determine the number of AR and MA terms, respectively.
The second model is more optimistic and is intended to introduce exogenous data to the model. ARIMA models are known to perform well in short-run forecasts, but their performance is expected to decrease as more periods are predicted. This is because the model starts to depend
96
more on predicted values than on the original dataset in more temporally distant periods. This is a critical issue for this application because we want to predict more than 10 years of data, at least 40 quarters. The proposed model uses quarterly data, observed and predicted, on retail sales and CPI for Georgia. In particular, we assume a linear model with SARIMA errors, the ARIMA model version for seasonal data.
For this model, we use an automated algorithm to define the parameters of the SARIMA in the component in the fable package in R language. The package implements the strategy proposed by Hyndman and Khandakar (2008), by which the model parameters are selected in three stages (Hyndman and Athanasopoulos 2018). First, the nonseasonal differentiation and the seasonal differentiation parameters are defined based on successive KPS unit-root tests and a CanovaHansen (CH) test, respectively.
Second, the model is estimated for a series of predefined values for the autoregressive and the moving averages parameters. By default, the methods define the best of these models based on a predefined adjustment criterion, such as AIC. Finally, the algorithm makes predefined changes to the parameters, looking for other possible parameters than can increase the adjustment. The algorithm uses current computational power and approximations to reduce the time that the algorithm takes to run, which has allowed for its popularization in frameworks where a variety of time series are considered. We contrast the results of the automated algorithm with alternative specifications, coming from the analysis of the autocorrelations and partial autocorrelations, and perform traditional tests on the model's residuals (e.g., Portmanteau test). Using the proposed model, we can make the corresponding predictions, with the advantage that the model now captures exogenous variables that are expected to perform better for long-term forecasts.
97
APPENDIX B: MACHINE LEARNING
During the last two decades, forecasters have seen significant additions to their set of statistical and computational tools. Increasing computational power has allowed modelers to increase the complexity of the models and explore their different specifications. In this section, we explore a series of new algorithms and methods in the nonstructural and univariate time series forecasting literature. Furthermore, we explore how these methods perform in the prediction of revenues from the sales tax in the TSPLOST regions. We divide this section into two parts. First, we provide a short review of these recent additions. In particular, we focus on automated ARIMA algorithms and three ML methods: the LASSO model, KNN, and RF. Second, we compare the performance of these four algorithms with the traditional ARIMA model and discuss the opportunities and challenges to improve our forecasts.
ARIMA, as covered in the literature review, is a leading statistical model for forecasting univariate time series and is widely used by local governments to forecast sales tax revenues. By using past data, ARIMA models can identify patterns in how a given variable tends to fluctuate throughout a given period. In the simplest ARIMA model, where the differentiation parameter (d) is set to zero, a given variable can be described as a function of its own lagged values and previous models' error. That is:
= + 11 + ... + + + 11 + ... +
(1)
As before, represents the number of lagged variables, represents the number of lagged errors, and the differentiation parameter () describes the transformation of the variable of interest before estimation.
98
How well the model can forecast depends on how closely a series of assumptions are satisfied. One such assumption is that the variable follows a covariance stationary process. Under covariance stationarity, a variable's changes through time are sufficiently explained by the correlation of the values in the present with those in the past, including the unexpected shocks. If this is not satisfied, one form of solving it is to differentiate the variable of interest; thus, instead of using the value for in equation 1, we use d=1 = yt yt1. The parameter represents the number of times we must differentiate the variable so it achieves covariance stationarity, evaluated with a series of statistical tests. In its more general way, an ARIMA (, , ) model is represented as:
11 ... = + + 11 + ... +
(2)
There are multiple ways to specify an ARIMA model, each resulting in different predictions. The diversity between ARIMA models comes not only from the specification of the time correlation component (i.e., the , , parameters) but also from the pre-estimation processing and specification as well as the estimation of the trend and seasonality component. In practice, this means that different specifications come from, for example, using different filters to extract seasonality and trends or allowing further seasonality adjustments in the time correlation component. Furthermore, including exogenous variables adds another level of diversity between models, coming from which variables to include and how to include them. Different specification comes with varying performances of forecast and theoretical reliability.
Because there is no observable way to verify the parameters for the ARIMA model, the selection of the time correlation components has been part of the forecasters' process. The BoxJenkins methodology, which provides a series of steps for using ARIMA models, assumes that the
99
researcher draws candidates for the , , parameters based on, for example, autocorrelation and partial autocorrelation plots (Gujarati and Porter 2009). The model is evaluated under a series of statistical tests based on the proposed values. If the test results are negative, the researcher should reevaluate the proposed parameters. This process implies that the estimation of multiple ARIMA models might be highly cumbersome because the researcher would have to expend a significant amount of time on each of them, usually applying subjective criteria that may make the process intractable.
In contrast, the current forecasting tasks require automated processes that allow the application of forecasting models to multiple variables in a consistent way. Hyndman and Khandakar (2008) proposed an automated strategy to select the model parameters of a SARIMA model in three stages (Hyndman and Athanasopoulos 2018).58 First, the nonseasonal differentiation (d) and the seasonal differentiation (D) parameters are defined based on successive KwiatkowskiPhillipsSchmidt (KPS) unit-root tests and a CH test, respectively.
Second, the model is estimated for a series of predefined values for the autoregressive and the moving averages parameters (i.e., p, q, P, and Q). By default, the methods define the best of these models based on a predefined adjustment criterion, such as AIC. Finally, the algorithm makes predefined changes to the parameters, looking for other possible parameters that can increase the adjustment. The algorithm uses current computational power and approximations to
58 SARIMA models correspond to the variation of the ARIMA model that allows for simple seasonality in the data. For an assumed size of the seasonal cycle (), the variable can also depend on lagged values of the variable or the model error on the previous cycle stage. Three new parameters are added to represent the number of lagged values of the variables (), the error (), and seasonal differentiation (D).
100
reduce the time that the algorithm takes to run, which has allowed for its popularization in frameworks where a variety of time series are considered.
Hyndman and Khandakar's (2008) automated algorithms offered an opportunity for a more extensive evaluation of the forecast properties of the ARIMA model. Explicitly setting a fitting criterion allows the model to be consistently defined. But most importantly to our purpose, it allows the comparison with other automated algorithms such as those proposed by the ML literature.
Machine learning has become an essential tool in different disciplines (e.g., economics, finance, marketing), given its performance in estimation, model selection, and forecasting tasks (Medeiros 2022). ML is the use and development of computer algorithms intended to classify or predict data that learn and adapt without following explicit instructions. In the case of forecasting, we are particularly interested in a subfield of ML known as supervised ML. Following Medeiros (2022), this corresponds to the set of models/methods combined with automated computer algorithms that seek to learn hidden patterns of a specified target variable given a set of explanatory variables.
More pragmatically, ML is a combination of statistical models with computer science. Most of these models are not new; however; it is the recent ability to run more complex and larger versions of these models repeatedly and rapidly that has made these techniques so popular during the last decade. These computational abilities are at the heart of the learning component of these algorithms. Most of these statistical models on their own are highly susceptible to overfitting, so their individual forecasting properties are usually poor. To prevent this, one of the model's parameters, usually known as the tuning parameter, is selected by contrasting the forecast's
101
performance in a subsample of the data. That is, the data are divided between the training and evaluation subsamples. With the training data, we estimate many models, each under different tuning parameter values. These models are used to predict the target variable of those observations in the evaluation subsample and evaluate the performance of each tuning value. This process is at the heart of the cross-validation strategy, which is the root of the learning components of ML. We allow the model to "learn" from the data what is the best value for the tuning parameter. Furthermore, we can repeat or refine this process as we find it more convenient. ML has been highly recognized for its prediction abilities in multiple contexts. Nevertheless, its use for forecasting is still a matter of discussion. As we stated in the literature review in chapter 2, ML algorithms' performance in time series tasks and forecasting is mixed. Although it is difficult to pinpoint a unique reason for this, a significant issue is that these algorithms were not designed for time series data. Most of the algorithms were intended to use cross-sectional data, where there is neither change in time nor dependence between observations. This dependence structure, where the present depends on the past but not on the future, justifies the time series field in statistics. In contrast, most of the ML literature depends on the assumption of observation independence. To apply ML to time series, we must modify the preprocessing and analysis accordingly. As with the ARIMA model, before applying any algorithm it is important that the data are already covariance stationary so we have some level of independence once we condition on the history of the variable. Manani (2022) proposes three key elements to consider and adjust.
102
First, in supervised ML we seek to predict an outcome/independent variable () using the
information on a set of dependent variables/features variables ( = [1,...,]). In contrast, on
univariate time series, we have information only from the variable we wish to forecast (). However, we can see from equation 1 that, in this case, the lagged values could be considered the feature variables. Manani (2022) argues that in addition to the lagged values, we can consider functions of the lagged variables. For example, for forecasting the price of a given commodity for a given month, we could include the past values of the price, but we might find it useful to add the price variation during the last year. This manipulation of the database is known as feature engineering, and its purpose is to create and add to the model features that represent the data we seek to represent.
Second, the sampling for the cross-validation of the models needs to be adjusted for time series. For cross-sectional data, the training and evaluation data are derived as a random sample from the original database. However, in time series data, a random selection will no longer represent our task as it will omit the time structure of our data. In this case, we would like to get a set of CV samples that respect the sequential nature of the data. Figure 24 represents the CV structure proposed by Manani 2022. Given the task of predicting periods (i.e., the size of the forecasting period), we will divide our observed data (i.e., our estimation periods) into training and evaluation periods. The evaluation periods of size should include at least periods. Each cross-validation sample () corresponds to a training data set and an evaluation data of or less consecutive periods. There will be CV samples with periods of evaluation, with evaluation periods of less than periods.
103
Figure 24. Chart. Cross-validation in time series.
Finally, when the forecasting task implies to predict for more than one period, we should define if we will be forecasting directly or recursively. Any forecast is done with data of a set of periods (1,,), used to predict the following periods. denotes the size of the forecasting window and ( + ..., + ) corresponds to the set of forecasted periods. Direct forecasting means that using the same data, we will build and train as many models as the size of the forecasting window. Therefore, each model predicts one of the periods in the set of forecasted periods.
In contrast, recursive forecasting implies that the researcher will train only one model and use it to predict for every forecasted period. In this case, using the original data (1,.., ) we predict the first of the forecasted periods ( + 1). After each prediction, we add the forecast to our data and use it to predict the next period. So, for example, using the original data and the first forecasted period, i.e., ( 1,...,, + 1), we predict the value for the second forecasted period ( + 2). We update our data and make new predictions as many times as the size of the forecasting window. This distinction is not usually necessary in classic statistical methods, such as ARIMA, because these are by construction recursive.
104
Each of the adjustments that we need to do the process represents a possible caveat to extrapolating the success of the ML algorithms in cross-sectional data to time series. There are, however, others we might want to consider. The amount of data, for example, can be problematic in some applications. As a result of the adaptations we need to do in the CV stage, discussed above, these algorithms would not have the same amount of data as traditional cross-sectional applications. Furthermore, increasing the number of subsamples (by increasing the size of the evaluation period in figure 24) would imply relying increasingly on past relations to predict the future. This relation might no longer be true for the forecasting periods, an assumption traditionally made for most forecasting models and methods.
There are multiple models in the ML literature. In this report we discuss three: LASSO, KNN, and RF. All three have been discussed with the time series literature in chapter 2. In the next section we will be applying these for the TSPLOST case. For now, we introduce each of them.
The first model is LASSO, which is one of the many shrinkage estimators known in the literature. The idea of this type of estimator is to reduce the length of the parameters in a traditional linear regression model. The difference between the estimators comes from the type of sanction used to reduce the length of parameters, described by an expression known as the regularizer. These techniques have been found useful for cases that have multiple potential predictors but where most are expected to be irrelevant. This is at the heart of the challenge of the estimations with Big Data, which is why LASSO and the other shrinkage methods have been so popular in the ML literature (Chan and Mtys 2022). The tuning parameter in this model measures the importance of the regularization parameter in the estimation procedure, which is selected by cross-validation.
105
KNN is also one of the most popular methods in the ML literature (James et al. 2021). In contrast to LASSO, this is a nonparametric approach, where we do not assume any particular form for the expected value of our forecast. It is also known to be highly flexible, including that it can be used for continuous and discrete variables. In the time series context, and for a given value of neighbors (), the prediction of the outcome variable, , in the evaluation period is the average of those periods in the training period with features closer to those of observation . As we allow for the algorithm to consider more neighbors to calculate this average, our estimation will be smoother (as it will depend less on each training observation), but it will, at the same time, decrease its precision because the forecast will depend more on further observations. Therefore, the number of neighbors () is the tuning parameter, to be defined by cross-validation.
Finally, random forest is a tree-based algorithm. The best way to understand an RF is as an aggregation of other models, known as classification and regression trees (CART). Chan et al. (2022) argue that most people think of a CART as similar to a decision tree, but it is sometimes more useful to think of it as a subdivision of the features. The forecast of an observation in the evaluation sample is the average of those training observations in the subdivision where the observation falls. More subdivision means more but smaller groups, increasing its precision but with higher variance. Tree-based methods can be considered as nonlinear, nonparametric methods, making them highly flexible (Chan et al. 2022). Nevertheless, the trees are known to be highly unstable, so a group of algorithms are known to aggregate, usually by average, the results of multiple trees. This has resulted in a forecast with less variance. RF is a particular form of aggregation where multiple trees are estimated, each one under random samples of the training data and the features/independent variables. In this case, the tuning parameters are usually a measure of t
106
he size of the tree (i.e., the number of subdivisions associated with each tree) and the number of trees to be aggregated.
The superiority of any alternative method, either automated ARIMA or ML models, is still an empirical question. In the next section, we propose a typical scenario of forecast of sales tax revenues in TSPLOST regions to evaluate the performance of these models.
TESTING FOR TSPLOST REVENUES FORECASTS To test the performance of the automated ARIMA and the three proposed ML algorithms, we will use quarterly data of the sales tax revenues from 1998 to 2022. All models will be tested on forecasting eight periods, from the first quarter of 2021 to the fourth quarter of 2022. In particular, we will measure the fit of the forecast of each of the four methods to the real observed data by calculating their root mean squared error (RMSE). Finally, we will include in our model explanatory variables that have been used previously in some of our revenues forecasting. This includes information at the state level on CPI, personal income, and retail sales.
For the automated ARIMA we use the ARIMA function of the fable package in R (O'Hara-Wild et al. 2023). To make it comparable with the other ML techniques, we abstain from major preprocessing. Instead, we allow the algorithm to choose all the ARIMA parameters, including the differentiation and the seasonal parameters, so that the AIC is minimized. It is important to highlight that we are not using CV for the ARIMA models. Recent literature has suggested to use CV to select the parameters in ARIMA specifications, although selection in this case is done through the automated algorithm discussed above. As we include the price index, personal income, and retail sales variables, the model is estimated as regressions with ARIMA errors.
107
In the case of the ML algorithms, we differentiate the series before applying each of the methods. In addition to eight lags of the forecasted variable, we add as features the average and standard deviation for the last year (four last quarters) and two years (eight last quarters), and the three explanatory variables discussed above. We do a recursive forecast; after each quarter prediction we update the features and use the predicted value to estimate the next value. We follow the CV strategy suggested by Manani (2022) and sketched in figure 24. In particular, we set 16 periods as the evaluation period and test on samples of eight or fewer periods in each CV sample.
In the case of the LASSO model, we use the glmnet function of the glmnet package in R (Friedman et al. 2010), and we select the regularization parameter through CV. For the KNN algorithm, we use the knnreg function of the caret package in R (Kuhn 2008) and we select the number of neighbors ( ) by CV. Finally, in the case of the RF model, we estimate with the ranger function of the ranger package in R (Wright and Ziegler 2017), and we select the combination of number of trees and minimum node size of each tree through CV.
The main results are shown in figure 25 and table 3. Figure 25 presents the data for all TSPLOST regions from the first quarter of 2019 to the last quarter of 2022. Starting from the first quarter of 2021, the figure presents the predictions of each of the proposed models (automated ARIMA, LASSO, KNN, and RF). To quantify the adjustment, one calculates the RMSE that quantifies how far the forecasted value is from the real value in the forecasting period (2021q1 to 2022q4).
As expected, one algorithm did not perform over the others. For example, the ARIMA outperformed at least one of the algorithms in three of the four regions. This is consistent with the literature review in chapter 2, where under some circumstances the traditional models overperformed the ML algorithms. Nevertheless, ML can provide improvement in the forecast. For
108
example, RF provides the best forecast in three of the four regions, and KNN has the second-lowest average RMSE of the four algorithms. In conclusion, by diversifying the portfolio of methodologies and models we consider in our forecast, there is space for improvement. Despite not having one model that outperforms, by diversifying our options we give our data more expansive opportunities to predict a forecast.
Figure 25. Graphs. Predictions of the proposed models by region.
109
Table 3. RMSE of the proposed models by TSPLOST region.
CS HGA RV SGA
Auto. ARIMA 2,399,338 1,096,428 3,263,004 6,220,827
LASSO
5,778,236 3,170,091 3,299,587 4,180,616
KNN
3,909,701 485,782 1,754,060 4,240,413
RF
2,378,478 1,110,175 1,223,129 1,660,177
110
REFERENCES
Afonso, W.B. (2013). "Diversification Toward Stability? The Effect of Local Sales Taxes on Own Source Revenue." Journal of Budgeting, Accounting & Financial Management, 25(4), 649674. https://www.emerald.com/insight/content/doi/10.1108/JPBAFM-25-04-2013-B004/full/html.
Armstrong, J.S. (2001). Principles of Forecasting: A Handbook for Researchers and Practitioners. Springer.
Box, G.E., Jenkins, G.M., and Reinsel, G. (1970). "Time Series Analysis: Forecasting and Control Holden-day San Francisco." BoxTime Series Analysis: Forecasting and Control Holden-day 1970,
Central Virginia Transportation Authority. (year) Annual Report FY 202122. Available online: https://planrva.org/wp-content/uploads/Annual_Report_DIGITAL-USE-1-1.pdfChan, F., Harris, M.N., Singh, R.B., and Yeo, W. (Ben) E. (2022). "Nonlinear Econometric Models with Machine Learning." In Econometrics with Machine Learning (pp. 4178). Springer.
Chan, F. and Mtys, L. (2022). "Linear Econometric Models with Machine Learning." In Econometrics with Machine Learning (pp. 139). Springer.
De Gooijer, J.G. and Hyndman, R.J. (2006). "25 Years of Time Series Forecasting." International Journal of Forecasting, 22(3), 443473. https://doi.org/10.1016/j.ijforecast.2006.01.001.
Friedman, J., Tibshirani, R., and Hastie, T. (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent." Journal of Statistical Software, 33(1), 122. https://doi.org/10.18637/jss.v033.i01.
Gardner, Jr., E.S. (1985). "Exponential Smoothing: The State of the Art." Journal of Forecasting, 4(1). 128. https://doi.org/10.1002/for.3980040103.
Gardner, Jr., E.S (2006). "Exponential Smoothing: The State of the Art--Part II." International Journal of Forecasting, 22(4), 637666. https://doi.org/10.1016/j.ijforecast.2006.03.005.
Gujarati, D. and Porter, D. (2009). Basic Econometrics (Fifth). McGrawHill/Irwin.
Hu, M.J.C. and Root, H.E. (1964). "An Adaptive Data Processing System for Weather Forecasting." Journal of Applied Meteorology and Climatology, 3(5). 513523, https://doi.org/10.1175/15200450(1964)003<0513:AADPSF>2.0.CO;2.
Hyndman, R. and Athanasopoulos, G. (2023). Forecasting: Principles and Practice (3rd ed). https://otexts.com/fpp3/.
Hyndman, R.J. and Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.
111
Hyndman, R.J. and Khandakar, Y. (2008). "Automatic Time Series Forecasting: The Forecast Package for R." Journal of Statistical Software, 27(3), 122. https://doi.org/10.18637/JSS.V027.I03.
Hyndman, R.J., Koehler, A.B., Snyder, R.D., and Grose, S. (2002). "A State Space Framework for Automatic Forecasting Using Exponential Smoothing Methods." International Journal of Forecasting, 18(3), 439454. https://doi.org/10.1016/S0169-2070(01)00110-8.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R (Second). Springer. http://www.springer.com/series/417.
Kuhn, M. (2008). "Building Predictive Models in R Using the caret Package." Journal of Statistical Software, 28(5), 126. https://doi.org/10.18637/jss.v028.i05.
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., et al. (1982). "The Accuracy of Extrapolation (Time Series) Methods: Results of a Forecasting Competition." Journal of Forecasting, 1(2), 111153. https://doi.org/10.1002/for.3980010202.
Makridakis, S., Chatfield, C., Hibon, M., Lawrence, M., Mills, T., Ord, K., and Simmons, L.F. (1993). "The M2-Competition: A Real-time Judgmentally Based Forecasting Study." International Journal of Forecasting, 9(1), 522. https://doi.org/10.1016/0169-2070(93)90044-N.
Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2018). "Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward." PloS one, 13(3). https://doi.org/10.1371/journal.pone.0194889.
Makridakis, S. and Hibon, M. (2000). "The M3-Competition: Results, Conclusions and Implications." International Journal of Forecasting, 16(4), 451476. https://doi.org/10.1016/S01692070(00)00057-1.
Manani, K. (2022, July 10). Feature Engineering for Time Series Forecasting. PyData. https://www.youtube.com/watch?v=9QtL7m3YS9I&t=454s.
Medeiros, M. (2022). "Forecasting with Machine Learning Methods." In F. Chan and L. Mtys (Eds.), Econometrics with Machine Learning (Vol. 53).
O'Hara-Wild, M., Hyndman, R., and Wang, E. (2023). fable: Forecasting Models for Tidy Time Series. https://CRAN.R-project.org/package=fable.
Regional Transportation District, Annual Comprehensive Financial Report FY 2021. Available online: https://www.rtd-denver.com/sites/default/files/files/2022-06/2021_ACFR.pdf.
Williams, L.V. and Reade, J.J. (2016). "Forecasting Elections." Journal of Forecasting, 35(4), 308 328. https://doi.org/10.1002/for.2377.
112
Wright, M.N. and Ziegler, A. (2017). "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R." Journal of Statistical Software, 77(1), 117. https://doi.org/10.18637/jss.v077.i01.
REVENUE FORECASTING AS PART OF GOVERNANCE Kavanagh, S.C. and Williams, D.W. (2016). Informed Decision-making Through Forecasting: A Practitioner's Guide to Government Revenue Analysis. Government Finance Officers Association. Williams, D.W. and Calabrese, T.D. (2016). "The Status of Budget Forecasting." Journal of Public and Nonprofit Affairs, 2(2), 127160. REVENUE FORECASTING AS PART OF CAPITAL PLANNING Crabbe, A.E., Hiatt, R., Poliwka, S.D., and Wachs, M. (2005). "Local Transportation Sales Taxes: California's Experiment in Transportation Finance." Public Budgeting & Finance, 25(3), 91121. BUDGETING Barrett, N., Fowles, J., Jones, P., and Reitano, V. (2019). "Forecast Bias and Fiscal Slack Accumulation in School Districts." American Review of Public Administration, 49(5), 601613. doi:10.1177/0275074018804671. Beckett-Camarata, J. (2006). "Revenue Forecasting Accuracy in Ohio Local Governments." Journal of Public Budgeting, Accounting & Financial Management, 18(1), 7799. doi:10.1108/JPBAFM-1801-2006-B004. Boyd, D.J. and Dadayan, L. (2014). State Tax Revenue Forecasting Accuracy: Technical Report. Rockefeller Institute of Government, University of New York, Albany. Bretschneider, S., Straussman, J.J., and Mullins, D. (1988). "Do revenue forecasts influence budget setting? A small group experiment." Policy Sciences, 21(4), 305325. Bretschneider, S.I. and Gorr, W.L. (1987). "State and Local Government Revenue Forecasting." In The Handbook of Forecasting, 118134.
113
Bretschneider, S I., Gorr, W.L., Grizzle, G., and Klay, E. (1989). "Political and Organizational Influences on the Accuracy of Forecasting State Government Revenues." International Journal of Forecasting, 5(3), 307319. doi:10.1016/0169-2070(89)90035-6.
Bretschneider, S.I. and Schroeder, L. (1985). "Revenue Forecasting, Budget Setting and Risk." Socio-Economic Planning Sciences, 19(6), 431439. https://doi.org/10.1016/0038-0121(85)90017-5.
Chung, I.H., Williams, D.W., and Do, M.R. (2022). "For Better or Worse? Revenue Forecasting with Machine Learning Approaches." Public Performance & Management Review, 122. doi:10.1080/15309576.2022.2073551.
Cirincione, C., Gurrieri, G.A., and Van De Sande, B. (1999). "Municipal Government Revenue Forecasting: Issues of Method and Data." Public Budgeting & Finance, 19(1), 2646. doi:10.1046/j.0275-1100.1999.01155.x.
Frank, H.A. and Zhao, Y. (2009). "Determinants of Local Government Revenue Forecasting Practice: Empirical Evidence from Florida." Journal of Public Budgeting, Accounting & Financial Management, 21(1), 1735. doi:10.1108/JPBAFM-21-01-2009-B002.
Franklin, E., Bourdeaux, C., and Hathaway, A. (2019). "State Revenue Forecasting Practices: Accuracy, Transparency, and Political Participation." In The Palgrave Handbook of Government Budget Forecasting (pp. 155175): Springer.
Fullerton, T.M. (1989). "A Composite Approach to Forecasting State Government Revenues: Case Study of the Idaho Sales Tax." International Journal of Forecasting, 5(3), 373380. doi:10.1016/0169-2070(89)90040-X.
Gianakis, G.A. and Frank, H.A. (1993). "Implementing Time Series Forecasting Models: Considerations for Local Governments." State & Local Government Review, 25(2), 130144. Retrieved from https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,shib&db=edsjsr&AN=edsjsr.435 5064&site=eds-live&scope=site&custid=gsu1.
Grizzle, G.A. and Klay, W.E. (1994). "Forecasting State Sales Tax Revenues: Comparing the Accuracy of Different Methods." State & Local Government Review, 26(3), 142152. Retrieved from https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,shib&db=edsjsr&AN=edsjsr.435 5099&site=eds-live&scope=site&custid=gsu1.
Kavanagh, S.C. and Williams, D.W. (2014). "Making the Best Use of Judgmental Forecasting." (cover story). Government Finance Review, 30(6), 816. Retrieved from https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,shib&db=bth&AN=100297349& site=eds-live&scope=site&custid=gsu1.
Kavanagh, S.C. and Williams, D.W. (2017). "The City of Boulder and Recreational Marijuana: Forecasting Under Extreme Uncertainty." National Civic Review, 106(2), 1017.
114
Kong, D. (2007). "Local Government Revenue Forecasting: The California County Experience." Journal of Public Budgeting, Accounting & Financial Management, 19(2), 178199. doi:10.1108/JPBAFM-19-02-2007-B003.
McNichol, E. (2014). "Improving State Revenue Forecasting: Best Practices for a More Trusted and Reliable Revenue Estimate." Center on Budget and Policy Priorities-CBPP, set.
Mikesell, J.L. (2018). "Often Wrong, Never Uncertain: Lessons from 40 Years of State Revenue Forecasting." Public Administration Review, 78(5), 795802. doi:10.1111/puar.12954.
Mikesell, J.L. and Ross, J.M. (2014). "State Revenue Forecasts and Political Acceptance: The Value of Consensus Forecasting in the Budget Process." Public Administration Review, 74(2), 188203. doi:10.1111/puar.12166.
Pew Charitable Trusts. (2015). Managing Volatile Tax Collections in State Revenue Forecasts. Retrieved from
Pew Charitable Trusts & Nelson A. Rockefeller Institute of Government. (2011). States' Revenue Estimating: Cracks in the Crystal Ball. Retrieved from Philadelphia, PA, and Albany, NY.
Propheter, G. (2019). "Excessive Revenue Underforecasting: Evidence and Implications from New York City's Property Tax." In The Palgrave Handbook of Government Budget Forecasting (pp. 217 240): Springer.
Reitano, V. (2018). "An Open Systems Model of Local Government Forecasting." American Review of Public Administration, 48(5), 476489. doi:10.1177/0275074017692876.
Rodgers, R. and Joyce, P. (1996). "The Effect of Underforecasting on the Accuracy of Revenue Forecasts by State Governments." Public Administration Review, 56(1), 4856. doi:10.2307/3110053.
Rose, S. and Smith, D.L. (2012). "Budget Slack, Institutions, and Transparency." Public Administration Review, 72(2), 187195.
Rubin, M., Mantell, N., and Pagano, M.A. (1999). "Approaches to Revenue Forecasting by State and Local Governments." Proceedings. Annual Conference on Taxation and Minutes of the Annual Meeting of the National Tax Association, 92, 205221. Retrieved from http://www.jstor.org/stable/41954655.
Shkurti, W.J. and Winefordner, D. (1989). "The Politics of State Revenue Forecasting in Ohio, 19841987: A Case Study and Research Implications." International Journal of Forecasting, 5(3), 361371. doi:10.1016/0169-2070(89)90039-3.
Sun, J. (2005). "The Dynamics of Government Revenue Forecasting from an Organizational Perspective: A Review of the Literature." Journal of Public Budgeting, Accounting & Financial Management, 17(4), 527556. doi:10.1108/JPBAFM-17-04-2005-B006.
115
Williams, D.W. (2012). "The Politics of Forecast Bias: Forecaster Effect and Other Effects in New York City Revenue Forecasting." Public Budgeting & Finance, 32(4), 118. doi:10.1111/j.15405850.2012.01021.x. Williams, D.W. and Calabrese, T. (2016). "The Status of Budget Forecasting." Journal of Public and Nonprofit Affairs, 2(2), 127160. doi:10.20899/jpna.2.2.127-160. Williams, D.W. and Calabrese, T. (2019). "Current Midyear Municipal Budget Forecast Accuracy." In The Palgrave Handbook of Government Budget Forecasting (pp. 257272): Springer. Williams, D.W. and Kavanagh, S.C. (2016). "Local Government Revenue Forecasting Methods: Competition and Comparison." Journal of Public Budgeting, Accounting & Financial Management, 28(4), 488526. Willoughby, K.G. and Guo, H. (2008). "The State of the Art: Revenue Forecasting in U.S. State Governments." In Government Budget Forecasting Theory and Practice (pp. 2842): CRC Press: Taylor & Francis Publishers.
116