Tuesday, May 5, 2020

MANAGERIAL REPORT Essay Example For Students

MANAGERIAL REPORT Essay INTRODUCTIONThe purpose of this analysis was to develop a regression model to predict mortality. Data was collected, by researchers at General Motors, on 60 U.S. Standard Metropolitan Statistical Areas (SMSAs), in a study of whether air pollution contributes to mortality. This data was obtained and randomly sorted into two even groups of 30 cities. A regression model to predict mortality was build from the first set of data and validated from the second set of data. BODYThe following data was found to be the key drivers in the model:? Mean July temperature in the city (degrees F)? Mean relative humidity of the city? Median education? Percent of white collar workers? Median income? Suffer dioxide pollution potentialThe objective in this analysis was to find the line on a graph, using the variables mentioned above, for which the squared deviations between the observed and predicted values of mortality are smaller than for any other straight line model, assuming the differences between the observed and predicted values of mortality are zero. Once found, this ?Least Squared Line? can be used to estimate mortality given any value of above data or predict mortality for any value of above data. Each of the key data elements was checked for a bell shaped symmetry about the mean, the linear (straight line) nature of the data when graphed and equal squares of deviations of measurements about the mean (variance). After determining whether to exclude data p oints, the following model was determined to be the best model:-3276.108 + 862.93551 25.375822 + 0.5992133 + 0.02396484 + 0.018949075 41.165296 + 0.31470587 +See list of independent variables on TAB #1. This model was validated against the second set of data where it was determined that, with 95% confidence, there is significant evidence to conclude that the model is useful for predicting mortality. Although this model, when validated, is deemed suitable for estimation and prediction, as noted by the 5% error ratio (TAB #2), there are significant concerns about the model. First, although the percent of sample variability that can be explained by the model, as noted by the R? value on TAB #3, is 53.1%, after adjusting this value for the number ofparameters in the model, the percent of explained variability is reduced to 38.2% (TAB #3). The remaining variability is due to random error. Second, it appears that some of the independent variables are contributing redundant information due to the correlation with other independent variables, known as multicollinearity. Third, it was determined that an outlying observation (value lying more than three standard deviations from the mean) was influencing the estimatedcoefficients. In addition to the observed problems above, it is unknown how the sample data was obtained. It is assumed that the values of the independent variables were uncontrolled indicating observational data. With observational data, a statistically significant relationship between a response y and a predictor variable x does not necessarily imply a cause and effect relationship. This is why having a designed experiment would produce optimum results. By having a designed experiment, we could, for instance, control the time period that the data corresponds to. Data relating to a longer period of time would certainly improve the consistency of the data. This would nullify the effect of any extreme or unusual data for the current time period. Also, assuming that white collar workers are negatively correlated with pollution, we do not know how the cities were selected. The optimal selection of cities would include an equal number of white collar cities and non white collar cities. !Furthermore, a ssuming a correlation of high temperature and mortality, an optimal selection of cities would include an equal number of northern cities and southern cities. Cultural Diversity In Healthcare EssayMODEL TESTINGThe model was validated for predicting and estimating mortality with the following hypothesis test:H : Allcoefficients in the model are equal to zero. ( 1 =2 = . = k = 0)Ha: At least one of thecoefficients is not equal to zero. Rejection Region: F ; F (where the distribution of F depends on k numerator df and n (k + 1) denominator dfTest Statistic: Mean Square for model= R?/kwhere, n = number of observations Mean Square for error(1 R?)/ k = number of parameters (excluding intercept)Substitution (TAB #3): =.531026 / 7=3.5587(1 .5301)/ Decision: Reject HConclusion: There is sufficient evidence to conclude that at least one of the variables is good to estimate mortality. Confidence Interval:y? t /2 s ywhere s y = s n and t /2 is a t value based on (n-1) degrees of freedomSubstitution (TAB #8): 50.53793 ? 2.074 * 6.334616 = (37.39993642, 63.67592358)Substitution (TAB #2): 5.316607 ? 2.074 * 0.6332737 = (4.003197346, 6.630016654)Conclusion: The absolute value of the residuals is 50.5 and the percentage of error is 5.3%. Therefore with 95% confidence, we can say that the mean absolute error falls within 37 and 64 deaths with an error ratio of between 4% and 7%. CONCLUSIONS AND RECOMMENDATIONSAlthough there seems to several problems including a low R?, severe multicollinearity, influential observations and problems with linearity and variability, the model is deemed to be a good estimator/predictor of mortality. Obviously improvements such as better data collection (though an controlled experiment), larger sample size, multicollinearity analysis (inclusion and exclusion of different variables) and data transformation analysis could result in better model prediction. However, analysis of this type is extremely time consuming and is recommended only if additional funds can be generated.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.