/* The following data was obtained from the website for the Stock and Watson textbook Introductory Econometrics (Addison-Wesley, 2007, Second edition). It is used for the chapter 1 presentation on the empirical question of "Does Reducing Class Size Improve Elementary School Education?" This is data gathered on 420 California school districts in 1998. This is, of course, a cross-section data set. We are first going to analyze the data using a one-way analysis of variance (i.e. a test of difference of means assuming the two populations have equal variances and are normally distributed.) Then later we will use multiple regression analysis, first using OLS and then later adjusting for heteroskedasticity by using White's heteroskedasticity-consistent standard errros for the OLS coefficient estimates. obs = observation number, score = district average test score (Fifth grade), st_ratio = average student to teacher ratio in the fifth grade classes in the district, expend_pupil = expenditure per 5th grade pupil in the district, english = average percentage of fifth grade students learning English in the district. */ Data Combined; input obs score st_ratio expend_pupil english; datalines; 1 690.8 17.89 6385 0.0 2 661.2 21.52 5099 4.6 3 643.6 18.70 5502 30.0 4 647.7 17.36 7102 0.0 5 640.8 18.67 5236 13.9 6 605.6 21.41 5580 12.4 7 606.8 19.50 5253 68.7 8 609.0 20.89 4566 47.0 9 612.5 19.95 5356 30.1 10 612.7 20.81 5036 40.3 11 615.8 21.24 4548 52.9 12 616.3 21.00 5447 54.6 13 616.3 20.60 6567 42.7 14 616.3 20.01 4819 20.5 15 616.5 18.03 5621 80.1 16 617.3 20.25 6026 49.4 17 618.1 16.98 6723 85.5 18 618.3 16.51 5590 58.9 19 619.8 22.70 5065 77.0 20 620.3 19.91 5434 49.8 21 620.5 18.33 5726 40.7 22 621.4 22.62 4542 16.2 23 621.8 19.45 5107 45.1 24 622.1 25.05 4660 39.1 25 622.6 20.68 4555 76.7 26 623.1 18.68 5415 40.5 27 623.2 22.85 4998 73.7 28 623.5 19.27 5224 70.0 29 623.6 19.25 5139 56.0 30 624.2 20.55 4614 11.1 31 624.6 20.61 5342 80.4 32 625.0 21.07 5347 63.1 33 625.3 21.54 5036 65.1 34 625.8 19.90 5117 53.4 35 626.1 21.19 5117 49.8 36 626.8 21.87 5272 35.5 37 626.9 18.33 5226 56.1 38 627.1 16.23 6517 32.4 39 627.3 19.18 4559 65.5 40 627.3 20.28 5119 53.1 41 628.3 22.99 5338 49.6 42 628.4 20.44 5090 45.1 43 628.6 19.82 5485 30.3 44 628.7 23.21 4793 52.2 45 628.8 19.27 5093 36.8 46 629.8 23.30 4360 30.3 47 630.3 21.19 5645 49.9 48 630.4 20.87 4518 13.8 49 630.5 19.02 5864 28.9 50 630.5 21.92 5258 52.8 51 631.1 20.10 5017 44.1 52 631.4 21.48 4720 35.3 53 631.8 20.07 5471 37.5 54 631.9 20.38 5615 50.4 55 632.0 22.45 5245 31.1 56 632.0 22.90 4838 18.3 57 632.2 20.50 5368 34.7 58 632.3 20.00 5526 33.3 59 632.4 22.26 4353 33.5 60 632.8 21.56 5034 38.2 61 633.0 19.48 4692 36.9 62 633.0 17.67 5607 33.0 63 633.2 21.95 4969 58.2 64 633.7 21.78 4676 17.0 65 633.9 19.14 5306 17.7 66 634.0 18.11 5694 7.3 67 634.1 20.68 5182 31.2 68 634.1 22.62 5229 16.6 69 634.1 21.79 5339 58.1 70 634.2 18.58 6056 55.9 71 634.2 21.55 4846 5.5 72 634.4 21.15 4827 14.4 73 634.5 16.63 5367 22.8 74 634.7 21.14 5743 38.8 75 634.9 19.78 4136 64.2 76 634.9 18.98 5268 25.2 77 635.0 17.67 5238 5.4 78 635.2 17.75 5463 6.1 79 635.5 15.27 6313 0.0 80 635.6 14.00 6653 0.0 81 635.6 20.60 5533 34.9 82 635.8 16.31 6119 13.1 83 636.0 21.13 5099 36.0 84 636.1 17.49 5653 34.7 85 636.5 17.89 5329 28.7 86 636.6 19.31 5930 1.8 87 636.7 20.89 4897 30.6 88 636.9 21.29 5100 59.0 89 637.0 20.20 4826 13.6 90 637.0 24.95 4079 5.0 91 637.1 18.13 5349 1.0 92 637.3 20.00 5869 0.7 93 637.7 18.73 6462 38.5 94 637.9 18.25 6232 0.0 95 638.0 18.99 4994 17.2 96 638.0 19.89 4664 9.9 97 638.2 19.38 6107 19.4 98 638.3 20.46 5324 36.5 99 638.3 22.29 5221 39.4 100 638.3 20.70 5158 28.7 101 638.5 19.06 5131 13.7 102 638.7 20.23 5279 11.3 103 639.3 19.69 4704 3.4 104 639.3 20.36 6090 15.4 105 639.3 19.75 5357 18.0 106 639.5 19.38 5145 22.7 107 639.8 22.92 4906 18.4 108 639.8 19.37 5490 16.2 109 639.8 19.16 5719 2.0 110 639.9 21.30 4436 9.6 111 640.1 18.30 4895 41.5 112 640.2 21.08 5159 9.9 113 640.5 18.79 5491 16.1 114 640.8 19.63 5172 43.5 115 640.9 19.59 4442 8.8 116 641.1 20.87 5220 39.0 117 641.4 21.12 5253 53.9 118 641.4 20.08 5519 41.1 119 641.5 19.91 5609 1.4 120 641.8 17.81 4945 35.4 121 642.2 18.13 5223 8.6 122 642.2 19.22 4757 15.3 123 642.4 18.66 5522 19.9 124 642.8 19.60 6088 3.1 125 643.0 19.28 4961 9.9 126 643.2 22.82 4880 16.1 127 643.3 18.81 6538 43.5 128 643.4 21.37 4926 45.0 129 643.4 20.02 5205 18.2 130 643.5 21.50 4907 15.5 131 643.5 15.43 5923 0.9 132 643.7 22.40 4742 7.6 133 643.7 20.13 4954 29.1 134 644.2 19.04 5500 0.1 135 644.2 17.34 5361 50.9 136 644.4 17.02 5686 11.5 137 644.5 20.80 5594 13.5 138 644.5 21.15 4693 4.7 139 644.5 18.46 5085 21.4 140 644.5 19.14 5456 12.4 141 644.7 19.41 5105 30.1 142 645.0 19.57 5520 15.9 143 645.1 21.50 5066 43.8 144 645.3 17.53 5182 0.0 145 645.5 16.43 5960 0.0 146 645.6 19.80 5125 16.7 147 645.6 17.19 5655 3.2 148 645.8 17.62 5812 0.0 149 645.8 20.13 5468 48.5 150 646.0 22.17 5213 0.8 151 646.2 19.96 4613 1.9 152 646.3 19.04 4887 13.5 153 646.4 15.22 6455 0.0 154 646.5 21.14 5125 16.7 155 646.5 19.64 5194 5.9 156 646.7 21.05 5003 36.2 157 646.9 20.18 5338 45.0 158 646.9 21.39 5161 16.7 159 647.0 20.01 5159 15.2 160 647.3 20.29 5123 34.3 161 647.3 17.67 5783 0.0 162 647.6 18.22 4843 13.1 163 647.6 20.27 4219 4.3 164 648.0 20.20 5081 39.6 165 648.2 21.38 5145 21.7 166 648.3 20.97 4674 3.3 167 648.3 20.00 4830 7.9 168 648.7 17.15 5622 39.6 169 648.9 22.35 4949 10.1 170 649.2 22.17 5101 27.5 171 649.3 18.18 5133 14.0 172 649.5 18.96 5359 8.8 173 649.7 19.75 5149 6.4 174 649.8 16.43 5373 2.4 175 650.4 16.63 6485 6.0 176 650.5 16.38 5504 8.2 177 650.6 20.07 5106 15.5 178 650.7 18.00 5635 0.0 179 650.9 19.39 4980 0.0 180 650.9 16.43 6114 0.0 181 651.2 16.73 5850 31.8 182 651.2 24.41 4548 12.8 183 651.3 18.26 5012 0.0 184 651.4 18.96 5261 3.8 185 651.5 21.04 4276 2.5 186 651.8 20.74 4566 10.7 187 651.8 18.10 6049 0.0 188 651.9 19.85 4974 1.9 189 652.0 21.60 4432 4.6 190 652.1 22.44 4925 5.3 191 652.1 23.01 4604 27.5 192 652.3 17.75 5974 15.0 193 652.3 18.29 5216 47.9 194 652.3 19.27 4882 0.0 195 652.4 22.67 4146 1.8 196 652.4 19.29 7542 0.0 197 652.5 17.36 5247 0.0 198 652.8 19.82 5170 7.3 199 653.1 20.43 5951 17.0 200 653.4 21.04 4747 10.4 201 653.5 19.92 4944 2.3 202 653.5 19.01 6306 28.2 203 653.6 23.82 4260 8.8 204 653.7 19.37 4718 7.5 205 653.8 19.83 4751 9.8 206 653.8 15.26 5653 12.5 207 653.9 17.16 5920 0.0 208 654.1 21.81 4826 31.2 209 654.2 19.07 5533 1.4 210 654.2 25.79 3926 9.6 211 654.3 18.21 5806 0.0 212 654.6 18.17 4899 24.3 213 654.8 16.97 4885 9.6 214 654.8 21.50 5140 5.9 215 654.9 20.60 5249 1.0 216 655.0 16.99 5399 18.3 217 655.1 20.78 4965 5.0 218 655.1 15.51 6210 4.3 219 655.2 19.89 4798 3.4 220 655.3 21.40 5397 32.7 221 655.3 20.50 5079 9.5 222 655.3 19.36 4734 6.3 223 655.4 17.66 4963 13.3 224 655.5 21.02 5431 3.2 225 655.7 19.06 5382 17.0 226 655.8 22.54 5724 0.9 227 655.8 21.11 4843 6.6 228 656.4 20.05 5730 0.1 229 656.5 14.20 5636 0.0 230 656.6 18.48 5672 6.9 231 656.7 18.64 7071 9.6 232 656.7 20.95 5097 0.9 233 656.8 21.09 5483 1.2 234 656.8 18.69 5643 40.1 235 657.0 20.87 5280 16.7 236 657.0 19.83 5433 22.0 237 657.2 19.75 4529 8.9 238 657.4 19.50 6040 0.9 239 657.5 18.39 6168 0.0 240 657.6 18.79 5775 3.9 241 657.7 19.77 4807 11.9 242 657.8 19.33 4841 4.0 243 657.8 21.46 4954 22.9 244 657.9 23.08 4024 2.5 245 658.0 21.06 5179 9.9 246 658.3 18.69 4385 0.5 247 658.6 20.77 4778 9.0 248 658.8 19.31 4863 6.5 249 659.1 20.13 5486 20.7 250 659.2 20.67 5486 0.2 251 659.3 22.28 4631 14.0 252 659.4 20.60 5190 8.7 253 659.4 20.83 4601 20.0 254 659.8 19.22 6736 1.4 255 659.9 17.65 5475 2.1 256 660.1 17.00 5991 28.6 257 660.1 16.50 7203 0.2 258 660.2 19.78 4778 13.7 259 660.3 22.30 4303 0.0 260 660.8 17.73 5630 0.2 261 660.9 20.45 4709 3.1 262 661.3 20.37 5630 47.4 263 661.5 20.16 4891 22.7 264 661.6 21.62 4929 29.1 265 661.6 20.56 4905 4.4 266 661.8 19.96 5157 4.8 267 661.8 21.18 4942 5.2 268 661.8 18.81 5409 1.9 269 661.9 20.58 4981 8.7 270 661.9 18.32 4877 10.5 271 662.0 18.82 5112 1.2 272 662.4 20.82 5202 0.0 273 662.4 20.00 4138 9.4 274 662.5 19.68 5744 1.4 275 662.5 19.39 5703 35.3 276 662.6 20.93 4802 9.6 277 662.6 19.94 5442 2.2 278 662.7 20.79 4522 0.4 279 662.7 19.20 4966 0.6 280 662.8 19.02 5358 0.0 281 662.9 17.62 5785 0.9 282 663.3 20.24 5073 5.5 283 663.4 19.29 5556 0.0 284 663.5 18.83 5190 0.0 285 663.8 20.34 5211 2.4 286 663.8 19.23 5284 20.8 287 663.9 17.89 5332 2.2 288 664.0 19.52 5600 13.6 289 664.0 19.08 4882 13.3 290 664.2 19.94 4700 1.9 291 664.2 18.87 5650 23.1 292 664.3 20.14 5318 0.6 293 664.4 23.56 4709 3.3 294 664.4 21.46 5143 0.0 295 664.7 19.19 5133 2.2 296 664.8 20.13 5081 18.6 297 664.9 25.80 4016 6.2 298 665.0 18.78 5374 18.5 299 665.1 19.11 5428 32.1 300 665.2 19.70 5180 4.0 301 665.3 18.62 5400 15.3 302 665.7 21.00 5726 27.7 303 665.9 20.00 5004 12.5 304 666.0 20.98 4970 12.7 305 666.0 21.64 4747 6.9 306 666.1 20.03 4543 15.0 307 666.1 19.81 5230 11.3 308 666.2 18.00 4507 9.7 309 666.2 19.36 5303 3.3 310 666.4 20.18 5546 0.7 311 666.6 21.12 4616 5.1 312 666.6 23.39 5398 5.0 313 666.7 22.18 5118 7.0 314 666.7 19.94 5008 8.0 315 666.7 17.79 4906 0.5 316 666.8 14.71 6870 2.5 317 666.8 19.04 5302 3.8 318 667.2 20.89 5182 1.3 319 667.2 19.84 4999 4.9 320 667.5 19.52 6732 0.0 321 667.5 20.69 4613 0.2 322 667.6 18.18 5479 0.0 323 668.0 18.89 5667 17.9 324 668.1 24.89 4743 4.5 325 668.4 18.58 4907 9.5 326 668.6 18.04 5700 4.0 327 668.7 17.73 5192 20.7 328 668.8 21.45 4451 7.6 329 668.9 19.92 5628 27.5 330 669.0 20.34 4923 10.9 331 669.1 22.55 4969 5.0 332 669.3 21.10 4830 14.2 333 669.3 18.20 6717 0.2 334 669.3 20.11 5206 2.1 335 669.3 19.16 4358 8.6 336 669.8 19.55 5196 0.0 337 669.8 20.89 6002 4.8 338 670.0 18.39 5091 4.0 339 670.0 19.18 5112 1.6 340 670.7 19.40 5287 3.6 341 671.3 21.68 4889 8.7 342 671.3 19.29 5118 0.0 343 671.6 20.35 5119 0.3 344 671.6 20.96 4792 1.4 345 671.7 19.46 5450 0.1 346 671.7 19.29 5211 0.0 347 671.8 20.92 4715 3.7 348 671.9 20.90 4632 0.3 349 671.9 20.60 5029 0.0 350 671.9 19.38 5695 4.5 351 672.0 19.95 4504 2.4 352 672.1 18.85 5156 0.0 353 672.3 18.12 5434 0.0 354 672.3 19.18 5255 5.4 355 672.5 22.00 4593 0.0 356 672.6 21.58 4321 6.4 357 672.7 20.39 4921 6.5 358 673.0 16.29 6906 14.3 359 673.3 18.28 5730 3.7 360 673.3 19.37 4825 0.5 361 673.5 18.91 5388 0.0 362 673.5 16.41 5134 22.7 363 673.9 15.59 5346 0.0 364 674.3 18.71 4789 0.1 365 675.4 18.33 5330 2.7 366 675.7 17.90 5371 1.7 367 676.2 18.91 5771 0.0 368 676.5 20.32 4994 0.1 369 676.6 20.02 4854 0.0 370 676.8 24.00 4393 11.0 371 676.9 17.61 5884 0.0 372 677.3 19.35 4274 0.0 373 678.0 19.68 5147 0.1 374 678.1 18.73 5352 0.0 375 678.4 15.88 7668 0.0 376 678.8 20.05 5282 15.1 377 679.4 17.99 5369 1.5 378 679.5 16.97 6429 2.6 379 679.7 19.24 5387 1.5 380 679.8 19.20 5236 4.5 381 679.8 19.60 4830 0.3 382 680.1 20.54 5089 2.5 383 680.5 18.59 5390 0.5 384 681.3 15.60 6588 11.6 385 681.3 15.29 6197 11.4 386 681.6 17.66 4645 0.0 387 681.9 17.58 6490 1.4 388 682.2 22.33 5095 2.2 389 682.5 18.75 4842 5.3 390 682.5 18.10 6315 2.8 391 682.7 20.26 5399 0.0 392 683.3 18.80 5429 17.7 393 683.4 18.77 5644 0.0 394 684.3 20.41 6060 0.0 395 684.3 18.65 5320 0.4 396 684.8 20.71 4820 6.1 397 685.0 22.00 5208 10.1 398 686.1 17.70 5860 3.4 399 686.7 21.48 4963 8.6 400 687.5 16.70 7614 12.3 401 689.1 19.58 5566 1.6 402 691.0 17.26 7040 1.5 403 691.3 17.38 6604 10.4 404 691.9 17.35 6180 6.1 405 694.0 16.26 6461 2.3 406 694.3 17.70 6415 2.5 407 694.8 20.13 5231 0.9 408 695.2 18.27 5838 2.4 409 695.3 14.54 7712 3.8 410 696.6 19.15 5593 2.0 411 698.2 17.37 5933 0.6 412 698.3 15.14 7593 2.8 413 698.4 17.84 6500 1.4 414 699.1 15.41 7217 1.2 415 700.3 18.87 5393 2.1 416 704.3 16.47 7290 6.0 417 706.8 17.86 5741 4.7 418 645.0 21.89 4403 24.3 419 672.2 20.20 4776 3.0 420 655.8 19.04 5993 5.0 ; Data Large; set combined; if st_ratio >= 20; run; Data Small; set combined; if st_ratio < 20; run; /* Calculating the Mean and Standard Deviations of all of data. */ title 'Summary Statistics for all of the data'; proc means data = combined; run; /* Calculating the Mean and Standard Deviations for the Large Classrooms. */ title 'Summary Statistics for the Large Classrooms where st_ratio >= 20'; proc means data = large; run; /* Calculating the Mean and Standard Deviations of the Small Classrooms. */ title 'Summary Statistics for the Small Classrooms where st_ratio < 20'; proc means data = small; run; /* Some simple plotting of the data. */ proc plot data = combined; title 'Plot of score versus student-to-teacher ratio'; plot score*st_ratio; run; proc plot data = combined; title 'Plot of score versus expenditure per pupil'; plot score*expend_pupil; run; proc plot data = combined; title 'Plot of score versus percentage taking English'; plot score*english; run; /* Create a Dummy Variable labeled "treat" for treatment variable of Class Size. */ data combined; set combined; if st_ratio < 20 then treat = 1; else treat = 0; run; title 'Printing out the Data to make sure treatment variable is created correctly.'; proc print data = combined; run; /* Here we run "One-Way Analysis of Variance" */ title 'One-way Analysis of Variance Test of Class Size Hypothesis'; proc reg data = combined; model score = treat; run; /* Here we examine the effect of class size using multiple regression where we control for the additional variables of "expend_pupil" and "English." We also retreive the the least squares residual ("resid") and story them in the "residuals" data set. */ title 'Multiple Regression Test of Class Size Hypothesis Using OLS'; proc reg data = combined; model score = st_ratio expend_pupil english; output r = resid out = residuals; run; /* Now we merge the residuals with the variables so that we can do some residual plots to see if there is any heteroskedasticity in the data. If there is, this invalidates our regression test results. */ data together; set combined residuals; merge combined residuals; title 'Data with residuals'; proc print data = together; run; /* Here we produce some diagnostic residual plots. */ proc plot data = together; title 'Residual plot of residuals vs. student-to-teacher ratio'; plot resid*st_ratio; title 'Residual plot of residuals vs. expenditure per pupil'; plot resid*expend_pupil; title 'Residual plot of residuals vs. English proficiency of students'; plot resid*english; run; /* Now we do White's Test for Heteroskedasticity. */ data together; set together; r2 = resid**2; st_ratio2 = st_ratio**2; expend_pupil2 = expend_pupil**2; english2 = english**2; /* Some More Diagnostic Residual Plots */ proc plot data = together; title 'Squared Residual plot of residuals vs. student to teacher ratio'; plot r2*st_ratio; title 'Squared Residual plot of residuals vs. expenditure per pupil'; plot r2*expend_pupil; title 'Squared Residual plot of residuals vs. English proficiency of students'; plot r2*english; run; /* We use the overall F-statistic of the below heteroskecasticity test equation to test for heteroskedasticity. The null hypothesis is no heteroskedasticity while the alternative hypothesis is heteroskedastity. */ title 'Whites Test for Heteroskedasticity'; proc reg data = together; model r2 = st_ratio st_ratio2 expend_pupil expend_pupil2 english english2; model r2 = english english2; model r2 = st_ratio st_ratio2; model r2 = expend_pupil expend_pupil2; run; /* Here we adjust our multiple regression results using White's Heteroskedasticity Consistent Standard Errors and, in particular, use the Davidson-McKinnon Method 1 Modified Version of White's standard errors. */ title1 'Robust Regression Results with Whites Heteroskedasticity Consistent Standard Errors'; title2 'Using the Davidson-McKinnon Modified Version 1 (HCCME = 1)'; proc model data = together; parms const 649.58 b1 -0.286 b2 0.00387 b3 -0.655; /* We use the previous OLS estimates as the starting values */ score = const + b1*st_ratio + b2*expend_pupil + b3*english; fit score / white hccme=1; run; /* Finally, let's see if our functional form is correct. We will try to some additional explanatory variables including squared and interaction terms of the original variables. */ data together; set together; st_expend = st_ratio*expend_pupil; st_english = st_ratio*english; expend_english = expend_pupil*english; run; proc model data = together; parms const b1 b2 b3 b4 b5 b6 b7 b8 b9; score = const + b1*st_ratio + b2*expend_pupil + b3*english + b4*st_ratio2 + b5*expend_pupil2 + b6*english2 + b7*st_expend + b8*st_english + b9*expend_english; fit score / white hccme = 1; run; /* As it turns out two of the quadratic and one of the interaction terms are statistically significant implying that the student-to-teacher ratio has first an increasing then decreasing effect with size, */ proc model data = together; parms const b1 b2 b3 b4 b5 b6 b7 b8 b9; score = const + b1*st_ratio + b2*expend_pupil + b3*english + b4*st_ratio2 + b6*english2 + b7*st_expend; fit score / white hccme = 1; run;