* Unordered Multinomial Choice problem * Source: Greene and Hensher (1997), downloaded from William Greene, Econometric Analysis website * Time = terminal waiting time, 0 for car * Invc = In-vehicle cost component * Invt = the amount of time spent traveling * GC = Generalized cost measure * Hinc = Household income * Psize = Party size in mode chosen * Data is in "long form" with characteristics being different for each mode that the individual faces * First Mode = air, Second Mode = train, Third Mode = bus, Fourth Mode = car * Individual-specific variables: hinc, psize. * Choice-specific variables: gc, invc, invt, ttme. * 210 individuals and 840 observations. clear all set more off insheet using "C:\data\travel.csv" * Use "car" as base alternative * A conditional logit model with only choice-specific variables (gc and time) asclogit mode gc time, case(id) alternatives(travelmode) basealternative(car) nolog * A conditional logit model with both choice-specific variables (gc and time) and individual specific variable (hinc) asclogit mode gc time, case(id) alternatives(travelmode) casevars(hinc) basealternative(car) nolog * A conditional logit model with both choice-specific variables (gc and ttme) and individual specific variables (hinc and psize) asclogit mode gc time, case(id) alternatives(travelmode) casevars(hinc psize) basealternative(car) nolog * Predicted probabilities of choice of each mode and compare to actual freqs predict pasclogit, pr table travelmode, contents(mean mode mean pasclogit sd pasclogit) cellwidth(15) drop pasclogit * Reshape the dataset into wide form. reshape wide mode gc time invc invt, i(id) j(travelmode air train bus car) string gen mode =. replace mode = 1 if (modeair == 1) replace mode = 2 if (modetrain == 1) replace mode = 3 if (modebus == 1) replace mode = 4 if (modecar == 1) * Tabulate the dependent variable (mode) tabulate mode * Table of household income by travel mode table mode, contents(N hinc mean hinc sd hinc) * Table of terminal time by travel mode table mode, contents(mean timeair mean timetrain mean timebus mean timecar) * Multinomial logit with base outcome alternative 4 (car) mlogit mode hinc, baseoutcome(4) mlogit mode hinc psize, baseoutcome(4) * Odds Ratio estimates - Multinomial logit with base outcome alternative 4 (car) mlogit mode hinc, rr baseoutcome(4) * Predict probabilities of choice of each mode and compare to actual freqs quietly mlogit mode hinc, baseoutcome(4) predict pmlogit1 pmlogit2 pmlogit3 pmlogit4, pr summarize pmlogit* modeair modetrain modebus modecar, separator(4) list pmlogit* in 1/10 * Create Classification Table and get accuracy rate egen pred_max = rowmax(pmlogit*) generate pred_choice = . forv i=1/4 { replace pred_choice = `i' if (pred_max == pmlogit`i') } local mode_label: value label mode label values pred_choice `mode_label' tabulate pred_choice mode * From the above tabulation we see that the only choices predicted * were classes 2 and 4 (train = 2 and car = 4). * Accuracy rate = (46 + 40)/210 = 0.4095 * In comparison, the accuracy rate that one would expect from naively classifying * using the majority class (train = class 2) would be 30.0% accuracy on average. * See the previous tabulation result for the dependent variable - mode. * Thus, the current mlogit classifier is providing a LIFT of 40.95/30.0 = 1.365.