*** Table 5.1 in W&B book: MULTINOMIAL LOGIT EXAMPLE: CHOICE OF SECONDARY SCHOOL * This is a multinomial problem. This program reproduces Table 5.1 in the * W&B textbook. The regressors in this problem are case-specific only. * If the regressors had been alternative-specific only then the * model would be a conditional logit model. If the regressors of * a multinomial model have BOTH case-specific and alternative-specific * information then it is called a generalized (or hybrid) multinomial * model. The data here is in WIDE FORM. That is, each row contains * the dummy variable outcomes associated with the individual's multinomial choice * plus the characteristics of the individual that makes the choice. In the case * of conditional multinomial logit and generalized multinomial logit * modeling, the data must be in LONG FORM. That is, there are several * vertically "stacked" observations per case (individual). Each of the * stacked observations contains the information on the L alternatives * resulting in L stacked observations per case plus the case-specific * information if available as in the generalized multinomial model. * Read in dataset and describe dependent variable and regressors. use c:\data\school.dta, clear describe * Summarize dependent variable and regressors summarize, separator(0) * Tabulate the dependent variable tabulate school * Table of log of income by school table school, contents(N linc mean linc sd linc) * Table of mother's education by school table school, contents(N motheduc mean motheduc sd motheduc) ********** Table 5.1 in W&B MULTINOMIAL LOGIT MODEL OF SCHOOL CHOICE * Creation of year dummies generate d95 = (year == 1995) generate d96 = (year == 1996) generate d97 = (year == 1997) generate d98 = (year == 1998) generate d99 = (year == 1999) generate d00 = (year == 2000) generate d01 = (year == 2001) generate d02 = (year == 2002) * Create full time dummy for mother's employment. 1 = full time employment * 0 = otherwise. * Creat Work dummy if mother works at all or is not employed. 1 = employed * (either full-time or part-time), 0 = otherwise. generate mothftime = (mothemp == 1) generate mothwork = (mothemp < 3) * Multinomial logit with base outcome alternative 1 mlogit school motheduc mothwork linc lsize parity d95 d96 d97 d98 d99 d00 d01 d02, baseoutcome(1) nolog * Relative-risk ratio (RRR) option reports exp(b) rather than b mlogit school motheduc mothwork linc lsize parity d95 d96 d97 d98 d99 d00 d01 d02, rr baseoutcome(1) nolog * Using the very convenient "listcoef" command to generate relative risk ratios for non-base althernatives listcoef motheduc mothwork linc lsize parity, help * Following used below estimates store MNL * Marginal effect at mean of explanatory variables for outcome 3 mfx, predict(pr outcome(3)) * Average marginal effect of explanatory variables for outcome 3 margins, dydx(*) predict(outcome(3)) noatlegend * Predict probabilities of choice of each mode and compare to actual freqs predict pmlogit1 pmlogit2 pmlogit3, pr summarize pmlogit*, separator(3) * List predicted values of alternatives for first 10 observations list pmlogit* in 1/10 * Create Classification Table and get accuracy rate egen pred_max = rowmax(pmlogit*) generate pred_choice = . forv i=1/3 { replace pred_choice = `i' if (pred_max == pmlogit`i') } local school_label: value label school label values pred_choice `school_label' tabulate pred_choice school * Accuracy rate = (113 + 49 + 208)/675 = 0.548 * In comparison, the accuracy rate that one would expect from naively classifying * using the majority class (Gymnasium) would be 41.04% accuracy on average. * See the previous tabulation result for the dependent variable - school. * Thus, the current mlogit classifier is providing a LIFT of 54.8/41.04 = 1.335.