/* This program estimates the Deterministic Trend / Deterministic Seasonal model for the Plano Sales Tax Revenue data set. Proc Autoreg is used because the errors of the regression are autocorrelated and statistical inference on any of coefficients cannot be conducted without proper adjustment for this autocorrelation. At the same time the program tests for the presence of seasonality while adjusting for autocorrelation in the errors. Also we find that a quadratic term is not needed to model the trend in the data. Obviously, we are using an outdated model here because the trend in the data is more likely to be stochastic rather than deterministic. */ data Plano; input month $ 1-3 yr 4-5 rev; title 'Plano Sales Tax Revenue Data'; title2 'By Month'; datalines; Feb90 2068592 Mar90 867387 Apr90 791878 May90 1731316 Jun90 911839 Jul90 909258 Aug90 1826999 Sep90 964868 Oct90 1020941 Nov90 1881435 Dec90 1075607 Jan91 964977 Feb91 2699324 Mar91 884494 Apr91 1035007 May91 1930143 Jun91 1124814 Jul91 1098136 Aug91 1812798 Sep91 1095294 Oct91 1163039 Nov91 1920424 Dec91 1000743 Jan92 1075763 Feb92 2341127 Mar92 1062449 Apr92 1120898 May92 1939866 Jun92 1316907 Jul92 1284888 Aug92 2098891 Sep92 1375423 Oct92 1201251 Nov92 2165295 Dec92 1301110 Jan93 1251165 Feb93 2986796 Mar93 1271028 Apr93 1228055 May93 2349629 Jun93 1385267 Jul93 1537452 Aug93 2576586 Sep93 1642938 Oct93 1577049 Nov93 2765401 Dec93 1940847 Jan94 1640531 Feb94 3271545 Mar94 1383909 Apr94 1495825 May94 2772734 Jun94 1592051 Jul94 1560732 Aug94 2773904 Sep94 1523255 Oct94 2013622 Nov94 2957306 Dec94 1789103 Jan95 1848972 Feb95 3507801 Mar95 1821378 Apr95 1930585 May95 2823010 Jun95 1970356 Jul95 1970534 Aug95 2982305 Sep95 1795240 Oct95 2145180 Nov95 3021075 Dec95 1908781 Jan96 1957956 Feb96 3955970 Mar96 2119970 Apr96 2208176 May96 3063504 Jun96 2190613 Jul96 2197082 Aug96 3085586 Sep96 2642591 Oct96 2550586 Nov96 3230872 Dec96 2482466 Jan97 2315274 Feb97 4388396 Mar97 2335249 Apr97 1956240 May97 3183566 Jun97 2421722 Jul97 1879301 Aug97 3094563 Sep97 2599894 Oct97 2320012 Nov97 3518486 Dec97 2407487 Jan98 2291118 Feb98 4813948 Mar98 2380134 Apr98 2223477 May98 3378416 Jun98 2876314 Jul98 2650942 Aug98 3788448 Sep98 2651506 Oct98 2450710 Nov98 4118992 Dec98 2434040 Jan99 2763878 Feb99 5227962 Mar99 2762093 Apr99 2528931 May99 4040412 Jun99 2883152 Jul99 3100274 Aug99 4149743 Sep99 3061236 Oct99 2805394 Nov99 3962285 Dec99 3197688 Jan00 3149649 Feb00 5401137 Mar00 3393528 Apr00 2852524 May00 4708691 Jun00 3567883 Jul00 3405732 Aug00 4885709 Sep00 4142396 Oct00 3564755 Nov00 4794159 Dec00 3459785 Jan01 3600702 Feb01 5789400 Mar01 3283596 Apr01 3411052 May01 4783941 Jun01 3706871 Jul01 3756080 Aug01 4318154 Sep01 3201376 Oct01 3502712 Nov01 4864603 Dec01 3108517 Jan02 3357796 Feb02 5904823 Mar02 2951480 Apr02 3185525 May02 4729624 Jun02 3282329 Jul02 3271971 Aug02 4559047 Sep02 3350292 Oct02 3286394 Nov02 4566940 Dec02 2863028 Jan03 3049842 Feb03 5780438 Mar03 3286533 Apr03 3016081 May03 4533575 Jun03 3296881 Jul03 3535071 Aug03 5290070 Sep03 3323063 Oct03 3318144 Nov03 5206490 Dec03 3240679 Jan04 3673046 Feb04 6166054 Mar04 3573983 Apr04 2999256 May04 5177550 Jun04 3845943 Jul04 3492933 Aug04 4975878 Sep04 3531498 Oct04 3611446 Nov04 5145814 Dec04 3260597 Jan05 3715755 Feb05 6239931 Mar05 3730730 Apr05 3431157 May05 5404423 Jun05 4049371 Jul05 3648390 Aug05 5394527 Sep05 3968853 Oct05 3970771 Nov05 5384216 ; DATA plano; SET plano; t = _n_; t2 = t*t; d1 = (month='Jan'); d2 = (month='Feb'); d3 = (month='Mar'); d4 = (month='Apr'); d5 = (month='May'); d6 = (month='Jun'); d7 = (month='Jul'); d8 = (month='Aug'); d9 = (month='Sep'); d10 = (month='Oct'); d11 = (month='Nov'); d12 = (month='Dec'); run; /* Here we are estimating the "Relative to January" deterministic time trend model but assuming that the errors are not autocorrelated, that is, we are ASSUMING that there is no cyclical movement in the data. The F-test for seasonality is reported but is not to be used for statistical inference if the errors are autocorrelated. */ proc reg data = plano; model rev = t t2 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 / dwprob; test d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12; run; /* The previous OLS regression indicates that there is substantial autocorrelation in the errors of the model. The Durbin-Watson statistic applied to the OLS residuals is low and corresponding p-value for the case for Pr < DW is very small indicating the presence of autocorrelation in the OLS residuals. Therefore,we leave "Proc Reg" and go to "Proc Autoreg" to do our work. Since we don't know the extent of the autocorrelation in the data we are going to let Proc Autoreg choose the AR(p) model for the errors that works best. The order of the AR(p) model is chosen by a backward elimination search with a drop out significance level set to 0.05 (slstay=0.05). We use p = 12 because it is possible that our seasonal dummies might not pick up all of the seasonality in the data that might exist in corresponding months 12 months earlier. */ proc autoreg data = plano; model rev = t t2 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12/ nlag = 12 DW=4 DWPROB method=ml backstep slstay=0.05; run; /* Given the above run of proc autoreg it appears that, after adusting our test statistics for autocorrelation it appears that the residuals of the estimated model are now white noise. Looking at the t2 variable in this context we can see that its coefficient in not statistically significant at convenential levels therefore we conclude that the quadratic t2 term is no longer needed. So let's reexamine the model without the quadratic (i.e. curvature) term. */ proc autoreg data = plano; model rev = t d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12/ nlag = 12 DW=4 DWPROB method=ml backstep slstay=0.05; run; /* Given the immediate above results it appears that the AR(1,3,5,8,10,12) model is appropriate to explain the cyclical behavior in the Plano Sales Tax Revenue data, apart from linear trend and seasonal effects some of which are highly significant. */ /* Now in the below code let us once again look at the F-test for Seasonality but this time adjust for the autocorrelation in the residuals of the model. */ proc autoreg data = plano outest = coeff; model rev = t d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12/ nlag = 12 method=ml backstep slstay=0.05; test d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12; run; /* Note that the p-value of the F-statistic for testing H0: no seasonality versus H1: seasonality is present is less than 0.05 (F = 36.02 with p < 0.0001). Thus, we conclude that the data has seasonal variation in it and therefore needs to be modeled vis-a-vis the inclusion of seasonal dummy variables. */ /* In this next section of the program we are going to create "standardized" versions of the seasonal coefficients so that we can compare them with each other as it relates to sign and magnitude. If a standardized coefficient is positive, it represents a "strong" month and the larger the size of the coefficient, the stronger the month's seasonal effect is. In contrast, if a standardized coefficient is negative, it represents a "weak" month and the more negative the coefficient, the weaker the month's seasonal effect is. */ /* Remember in the "Relative to January" parametrization, the intercept is the January intercept, while the Feburary coefficient is the INCREMENT to January's intercept and, therefore, the February intercept is equal to the SUM of the intercept estimate and the February coefficient. */ data coeff; set coeff; seasonsum = (12*intercept + d2 + d3 + d4 + d5 + d6 + d7 + d8 + d9 + d10 + d11 + d12); seasonave = seasonsum/12; d1a = (intercept - seasonave)/seasonave; d2a = (intercept + d2 - seasonave)/seasonave; d3a = (intercept + d3 - seasonave)/seasonave; d4a = (intercept + d4 - seasonave)/seasonave; d5a = (intercept + d5 - seasonave)/seasonave; d6a = (intercept + d6 - seasonave)/seasonave; d7a = (intercept + d7 - seasonave)/seasonave; d8a = (intercept + d8 - seasonave)/seasonave; d9a = (intercept + d9 - seasonave)/seasonave; d10a = (intercept + d10 - seasonave)/seasonave; d11a = (intercept + d11 - seasonave)/seasonave; d12a = (intercept + d12 - seasonave)/seasonave; sum = d1a + d2a + d3a + d4a + d5a + d6a + d7a + d8a + d9a + d10a + d11a + d12a; run; title1 'Standardized seasonal effects by month. They sum to zero.'; title2 'Strong months are positive and weak months are negative.'; title3 'Their magnitudes can be compared.'; proc print data = coeff; var sum d1a d2a d3a d4a d5a d6a d7a d8a d9a d10a d11a d12a; run;