## Predicting Auditor Changes With Financial Distress Variables: Discriminant Analysis And Problems With Data Mining Approaches [Journal of Applied Business Research](Journal of Applied Business Research Via Acquire Media NewsEdge) ABSTRACT Our study extends previous research that uses financial distress factors in predicting auditor changes by evaluating the effectiveness of the traditional discriminant analysis model, not used in previous auditor change studies, and by highlighting the importance of evaluating the likelihood that data mining approach classification results occurred by chance. Significance of individual predictor variables, as well as of the full set of 13 financial variables, can be tested using discriminant analysis. Kwak et al. (2011) document overall classification accuracy rates ranging from 61 to 63.5 percent for the four data mining models they compared but did not address whether these rates occurred by chance. Using Kwak et al.'s (2011) data set of firms changing auditors in 2007 or 2008 and matching non-auditor change firms, our discriminant analysis test results show overall accuracy rates of less than 56 percent and true positive rates over 85 percent, but these rates are influenced by a disproportionate number of non-auditor change firms being classified as auditor change firms. Individual predictor variables that are important in the discriminant equation based on standardized canonical coefficients include losses (LOSS) and no payment of dividends (DIV) in the year prior to the auditor change, retained earnings as a percent of total assets (RE/TA), and earnings before interest and taxes as a percent of total assets (EBIT/TA). The Kappa statistic and AUC metrics for all 13 data mining algorithms we used indicate that classifications using these algorithms are no better than random classifications.Keywords: Auditor Change; Discriminant Analysis; Data Mining; Financial Distress 1. INTRODUCTION Auditor change prediction is an interesting issue because auditor changes may give warnings to investors, regulators, or other financial statement users about the audited firm's financial condition. Several previous bankruptcy prediction studies document a positive association between bankruptcy and auditor changes, but most prior research studies on auditor change prediction fail to include a portfolio of financial distress variables (see discussion in Section 2). Kwak et al. (2011) use multiple criteria linear programming (MCLP) and three other data mining approaches for predicting auditor changes with 13 financial distress variables and they document overall accuracy rates of around 60 percent. Results from the application of these and other data mining approaches, however, provide limited information on the usefulness of specific predictor variables, and Peng et al. (2009) raise concerns about the lack of consistency in prediction results across various data mining algorithms and performance measures. The objectives of our current study are to gain further insights into the usefulness of specific financial distress variables for predicting auditor changes by using discriminant analysis with Kwak et al. 's (201 1) sample and to more carefully evaluate the effectiveness of various data mining algorithms based on other performance measures in addition to accuracy rates. Auditor changes may be initiated by the audited firm or by the auditors. The fact that this event must be reported to the Securities and Exchange Commission by public companies indicates the importance of this event to investors. In light of the positive association documented in prior research between bankruptcy and auditor changes, investors and potential successor auditors could benefit from having a reliable mechanism and set of indicators for predicting or anticipating auditor changes decisions, particularly if those change decisions are motivated by financial distress of the audited firm. We extend Kwak et al. (201 1) by applying discriminant analysis with their 13 financial distress variables and their data set of firms that changed auditors during 2007 or 2008 and a matched set of nonauditor change firms. Although our discriminant analysis results identify a subset of these financial distress variables that are important in predicting auditor changes, the overall accuracy rates are lower than the accuracy rates for the data mining models used by Kwak et al. (201 1). Based on comparing accuracy rates, one might conclude that the data mining approaches are more reliable than discriminant analysis in classifying or predicting auditor change firms. However, based on the Kappa statistics and AUC metrics from our application of 13 data mining approaches with our 13 financial distress predictor variables, we determine that the classification of firms as auditor change or non-auditor change firms is no better than random classification. Thus, our current study makes three important contributions to the auditor change prediction literature. First, our results suggest that auditor change prediction studies should include financial distress variables because we document that some financial distress variables are important in distinguishing between auditor change and non-auditor change firms. Second, our relatively low prediction accuracy rates using only financial distress variables as predictors indicate that a more robust set of predictor variables is needed to capture the various drivers of auditor change decisions. Third, we highlight the importance of using additional metrics beyond accuracy rates to interpret the results of data mining approaches to classification and prediction. Our paper proceeds in the following order. The next section discusses relevant prior research and presents the variables we use in our auditor change prediction models. The third section describes discriminant analysis, data mining methods, and the performance metrics for evaluating the results of these methods. Section four presents sample selection procedures, data, and empirical results. The last section summarizes and discusses the conclusions of our study and identifies further research avenues. 2. PRIOR RESEARCH AND VARIABLES USED FOR AUDITOR CHANGE PREDICTION Our current study focuses on predicting auditor changes using financial distress variables, so bankruptcy studies and auditor changes studies are both relevant. Because our current study is an extension of Kwak et al. (2011), most of this section of our paper presents the same prior research and explanation of variables found in Kwak et al. (201 1). The primary purpose of Kwak et al. (201 1) was to "analyze the predictive nature of financial distress variables for predicting auditor changes using multiple criteria linear programming (MCLP) and other data mining methods." Using 13 financial statement variables with a sample of firms that changed auditors in 2007 or 2008 and a non-auditor change sample matched on size and industry, Kwak et al. (2011) document the following overall classification accuracy rates for their four models: 61.85 percent for MCLP, 60.64 percent for BayesNet, 63.50 percent for classification and regression tree (CART), and 60.17 percent for logistic regression. Three prior bankruptcy studies that are pertinent to our current research are Schwartz and Menon (1985), Chen et al. (2004), and Chen et al. (2009). Schwartz and Menon (1985) document a significant association between bankruptcy and auditor changes using Chi-square tests for their sample of 132 bankruptcy firms (35 of which changed auditors prior to filing bankruptcy) and matched sample of 132 non-bankruptcy firms (13 of which changed auditors). In addition to confirming this significant association between bankruptcy and auditor changes using Chisquare tests for their sample of 472 bankruptcy firms and 424 matched non-bankruptcy firms, Chen et al. (2004) present logistic regression results that document statistical significance of auditor changes and five (of six) financial distress variables in predicting bankruptcy. The six financial statement ratio variables used by Chen et al. (2004) are Cash-to-total assets, Current assets-to-current liabilities, Current assets-to-Sales (not significant in the regression results), Current assets-to-Total assets, Long-term debt-to-Total assets, and Net income-to-Total Assets. In a similar study, Chen et al. (2009) use a logistic regression model for bankruptcy prediction with a small sample of bankruptcy and non-bankruptcy firms listed on the Taiwanese Stock Exchange, and the resulting coefficients on their auditor change variable and on their financial distress index variable are statistically significant. Based on these prior studies, there is a positive association between bankruptcy and auditor changes. However, we are interested in predicting auditor changes, not bankruptcy. Although firms experiencing financial distress may end up in bankruptcy, not all do. According to Lau's (1987) five-state financial condition classification, the severity of a firm's financial distress increases as it moves from financial stability (state 0) to omitting dividend payments (state 1) to default on loan payments (state 2) to protection under Chapter X or XI of the Bankruptcy Act (state 3) and finally to bankruptcy and liquidation (state 4). Hudaib and Cooke (2005) hypothesize that financial distress may influence a firm's decision to change auditors either directly or indirectly (by influencing the auditor's opinion). Using a financial ratio-based index or Z-score variable to capture financial health (distress) in their multivariate logistic regression, Hudaib and Cooke (2005) find that the probability of switching auditors increases as financial health declines. To avoid the potential loss of information from incorporating financial condition ratios into a single index or Z-score variable, we use 13 financial condition variables (identified later in this section) as predictor variables in our analyses. Extant literature includes a variety of papers that examine aspects of auditor changes, but many of these use samples that include only auditor changes and do not incorporate a portfolio of financial condition variables. Calderón and Ofobike (2008) use CART methodology to evaluate factors (none of which are financial statement ratios) that influence whether auditor changes are client-initiated or auditor-initiated. Francis and Wilson (1988) test whether agency costs influence companies to change from a non-Big Eight to a Big Eight audit firm or vice versa, and debt-to-total assets is the only financial statement ratio they include in their explanatory variables. Davidson et al. (2006) test for effects of earnings management on the direction of auditor changes (Big-to-Small, Big-to-Big, etc.) and control for financial distress using the Altman Z-score; the coefficients for the Altman Z-score in the full models are not statistically significant. Landsman et al. (2009) focus on client risk (both financial and audit) and client misalignment characteristics of audit client portfolio management decisions by the top-tier (Big N) accounting firms in pre- and post-Enron periods. They include five financial statement ratios (Return on assets (ROA), Loss (equals one if ROA is negative), Debt-to- Assets, Cash-to- Assets, and (Inventory plus Receivables) -to- Assets) in the set of client risk measures for their multinomial logistic regression model, and the coefficients on all of these variables except for Debt-to- Assets are statistically significant in at least one of the four scenarios (combinations of pre- and post-Enron and lateral/upward and downward switches). Our current study is an extension of Kwak et al. (201 1), so we use the same variables as they did to capture financial distress. Because bankruptcy is the extreme form of financial distress, most of our variables are those used by Altman (1968) and Ohlson (1980) in their classic bankruptcy prediction studies and by other studies mentioned above. We also include a dummy variable (DIV) to capture Lau's (1987) State 1 - Dividend Omission, an early state of financial distress. Our 13 financial statement variables are as follows: * TL/ = Total Liabilities ·* Total Assets (Ohlson (1980), Francis and Wilson (1988), Chen et al. (2004), and Landsman et al. (2009)) * WCA/TA = Working Capital + Total Assets (Altman (1968) and Ohlson (1980)) * CL/CA = Total Current Liabilities + Total Current Assets (Ohlson (1980) and Chen et al. (2004)) * NI/TA = Net Income + Total Assets (Ohlson (1980), Chen et al. (2004), and Landsman et al. (2009)) * FU/TL = Funds from Operations * Total Liabilities (Ohlson (1980)) * LOSS = 1 if a firm has loss in previous years; else LOSS=0 (similar to Ohlson (1980) and Landsman et al. (2009)) * DIV = 1 if a firm did not pay dividend in a previous year; else DIV=0 (Lau (1987)) * CREIN/TA = Change in the ratio of receivables plus inventories to total assets (similar to Landsman et al. (2009)) * RE/ = Retained Earnings + Total Assets (Altman (1968)) * EBIT/TA = Earnings before Interest and Taxes *· Total Assets (Altman (1968)) * MKV/TD = Market Value of Equity + Book Value of Total Debt (Altman (1968)) * SALE/TA = Sales + Total Assets (Altman (1968)) * SIZE = Log of Total Assets (similar to Ohlson (1980)) 3. MODELS - DISCRIMINANT ANALYSIS VERSUS DATA MINING For four data mining methods, Kwak et al. (201 1) document overall accuracy rates of around 60 percent in predicting auditor changes using a sample of 790 auditor-change firms and 1,126 matched non-auditor change firms during the sample period of 2007 and 2008. Peng et al. (2009) evaluate 13 different data mining classifiers or algorithms using 1 1 different software defect datasets and conclude that, for a given dataset, the identification of the best predictive algorithm depends on the performance measure used and that no single algorithm appears to be the best across datasets. Although data mining methods are much less restrictive than are parametric methods such as discriminant analysis, these methods focus on method performance metrics such as overall accuracy and do not provide much guidance on the usefulness of specific predictor variables in the classification or prediction of the characteristic or decision of interest. Thus, the objectives of our current study are to gain further insights into the usefulness of specific financial distress variables for predicting auditor changes by applying discriminant analysis to Kwak et al. 's (2011) sample and to more carefully evaluate the effectiveness of various data mining algorithms based on other performance measures in addition to accuracy rates. Our study is an extension of Kwak et al. (201 1) and differs from that study in two key respects. First, we are evaluating the effectiveness of discriminant analysis, a parametric method not applied in Kwak et al. (2011), in classifying firms as auditor-change or non-auditor change firms, and die discriminant analysis allows us to evaluate the relative usefulness of specific predictor variables in the set of 13 financial distress variables included in our models. Second, we apply the same 13 data mining algorithms used by Peng et al. (2009) to our sample of auditor change and non-auditor change firms and evaluate two additional performance metrics (used in Peng et al. (2009)) that indicate the likelihood that the accuracy rates occurred by chance. To link our paper to one of Kwak et al. 's (201 1) and Peng et al. 's (2009) data mining methods, we also perform a separate logistic regression analysis. In this section of our paper, we discuss the discriminant analysis model, Peng et al. 's (2009) classifiers, and performance metrics. Discriminant analysis (DA) has been widely used in bankruptcy classification and prediction studies (such as Airman (1968), Gepp et al. (2010), Muller et al. (2009), and Sung et al. (1999)), but our literature search did not identify auditor change studies that have used DA. Fok et al. (1995) provide a useful description of the purpose, assumptions, application, and limitations of DA. For a two-group DA (such as bankruptcy/non-bankruptcy or auditor change/no auditor change), sample data are used to identify the (usually linear) function that best discriminates between the two groups using multiple independent or predictor variables. This multivariate statistical method requires two potentially limiting assumptions: the independent variables are normal and independently distributed and the variance-covariance matrices are equal. One can evaluate the classification accuracy by applying the estimated discriminant coefficients to the original sample, and one can evaluate the predictive accuracy by applying the estimated coefficients to a new or holdout sample. Another prediction analysis alternative is crossvalidation in which each case is classified by the discriminant function derived from all cases other man that case. Discriminant analysis can be performed using several different statistical software packages, but we have used SAS to conduct the discriminant analysis in this study. The primary performance measures used to evaluate the model's effectiveness are overall accuracy and sensitivity (or true positive rate). The overall accuracy rate is the percentage of the total sample that is correctly classified (in this study, actual auditor changes classified as auditor changes and actual non-auditor changes classified as non-auditor changes). The sensitivity, or true positive rate, is die percentage of actual auditor change firms that are correctly classified. The sensitivity measure may be more important than the overall accuracy rate if the costs of misclassification or prediction errors are higher for auditor changes than for non-auditor changes. For evaluating the importance of individual predictor variables in the discriminant analysis, we focus on the standardized canonical coefficients. The magnitude of these coefficients indicates the relative importance of the predictor variables in the discriminant function. As in Peng et al. (2009), we use WEKA (see Witten 2005) to implement 13 data mining algorithms. Peng et al. (2009) group these algoridims into five categories as follows: trees (classification and regression tree (CART), Naïve Bayes tree, and C4.5), functions (linear logistic regression, radial basis function (RBF) network, sequential minimal optimization (SMO), Support Vector Machine (SVM), and Neural Networks), Bayesian classifiers (Bayesian network and Naïve Bayes), lazy classifiers (^-nearest-neighbor), and rules (decision table and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) rule induction). According to Sung et al. (1999), data mining classifiers such as Neural Networks are "black boxes" (p. 68) that do not provide interpretable rules while others such as decision trees "generate understandable rules" but have "trouble with nonrectangular regions" (p. 69). Our primary interest in implementing these data mining algorithms is to more carefully analyze the performance results using metrics that incorporate the likelihood of random classification being reflected in the five prediction accuracy rates used in Kwak et al. (2011). The five prediction accuracy rates used in Kwak et al. (2011) are Overall Accuracy (which reflects the percentage of correctly classified companies), True and False Positive Rates, and True and False Negative Rates. The Kappa statistic and AUC (area under the receiver operating characteristic curve) are two additional performance metrics evaluated by Peng et al. (2009), and these two metrics indicate the likelihood of random classification. A Kappa statistic of zero percent and an AUC of 50 percent indicate that the classification occurred by chance. As stated earlier in this section, our current study focuses on the occurrence of a change in a company's auditor, and we have not been able to find previous research on auditor changes that either uses discriminant analysis or compares DA results with results using data mining techniques. However, in their business failure prediction study, Gepp et al. (2010) conclude the predictive ability of decision trees (DTs) using See5 software is better than that of DA. Sung et al. (1999) also document better overall prediction accuracy and bankruptcy prediction (sensitivity) rates using DTs versus DA. The authors of both of these bankruptcy studies indicate that it is important to consider the prediction decision context when comparing prediction models. Neither of these studies discuss the Kappa statistic or AUC metrics as part of their accuracy analyses. Thus, because our study is in the context of an auditor change decision and incorporates additional metrics that evaluate the random nature of the classification results and accuracy rates, we are extending prior literature on comparative prediction models. 4. SAMPLE AND EMPIRICAL RESULTS Our initial sample is the same used by Kwak et al. (201 1) and includes a sample of companies that changed auditors in 2007 and 2008 and a sample of companies that did not change auditors, matched with auditor -change companies based on size (using total assets) and industry (using two-digit SIC codes). Over 790 firms were identified as auditor change firms in 2007 and 2008 using CompuStat's "Auditor" (AU) variable. As stated in Kwak et al. (201 1), the "study period of 2007 and 2008 is based on the following discussion of Sarbanes-Oxley Act of 2002 (SOX) implementation. As a result of SOX, the SEC (2003) amended Rule 2-02 of Regulation S-X to require the accountants auditing the annual financial statements to also attest to and report on management's assessment of its internal control effectiveness. The SEC (2004) required this initial attestation report be included with audited financial statements for fiscal years ending on or after November 15, 2004 for accelerated filers and for fiscal years ending on or after July 15, 2005 for non-accelerated filers. In 2005, the SEC (2005) extended the initial compliance date for non-accelerated filers to fiscal years ending on or after July 15, 2006. Fusco (2006) discusses the impact of SOX on trends in auditor changes and reports the following numbers of auditor changes each year during the period from 2002 through 2005: 1,224 in 2002, 1,467 in 2003, 1,736 in 2004, and 1,673 in 2005. To exclude the potential effects of the initial implementation of the SOX attestation requirement on auditor change decisions, our study period includes the post-SOX implementation years of 2007 and 2008." After eliminating auditor-change firms that had multiple auditor changes within the test period, Kwak et al. 's (2011) sample included 790 auditor-change (experimental) and 1,132 non-auditor change (matching control) firm- year observations. For our current study, we have chosen to exclude all firm observations with missing values for the independent variables in the four years prior to the auditor change year and with illogical (such as divided by zero) or zero calculated ratios. This exclusion results in a total sample size of 513 firms for the years 2007 and 2008, which include 169 firms that changed auditors in the two-year time frame, and 344 matching non-auditor change firms. We matched the control firms with the auditor change firms using size and industry. Therefore, there is no statistically significant difference in size, as expected (see Table 1). For sensitivity analysis, we split the data between 2007 and 2008. In 2007, there are 1 17 auditor change firms, which are more than double the number of auditor change firms in 2008 (52 auditor change firms). This disproportionate number of auditor change firms in 2007 compared to 2008 could be because 2007 is the year before the financial crash in the U.S. However, the 2008 t-test results are similar to the t-test results for 2007 and for both years combined. Table 1 shows the descriptive statistics for the 169 auditor change firms and the 344 matching control firms for the financial statement variables in the year prior to the year of auditor change, i.e., 2006 for 2007 changes and 2007 for 2008 changes. The table presents descriptive statistics for both years combined (2007 and 2008) in Panels Al and A2, for the year 2007 in Panels Bl and B2, and for the year 2008 in Panels CI and C2. The t-test statistics for tests of differences between means for the two groups are included in Panels A2, B2 and C2 of Table 1. Based on these t-test results, LOSS, DIV, and EBIT/TA are the only three (of the 13) variables that differ between auditor change and non-auditor change firms. More auditor change firms reported losses in the year prior to the change than did non-auditor change firms as indicated by the mean LOSS variable being significantly greater for auditor change firms than for non-auditor change firms (0.47 vs. 0.3 1 for 2007 and 2008 combined; 0.48 vs. 0.35 for 2007; and 0.44 vs. 0.25 for 2008). More auditor change firms paid no dividends than did non-auditor change firms based on the mean DIV variable being significantly greater for both years combined (0.78 vs. 0.69) and for 2007 alone (0.81 vs. 0.68). All three panels show negative mean EBIT/TA for auditor change firms and positive mean EBIT/TA for nonauditor change firms with statistically different means for both years combined and for 2008 alone. Discriminant Analysis We conducted a discriminant analysis to evaluate the effectiveness of the 13 financial statement variables discussed in section 2 in predicting auditor change. We used SAS in order to conduct a direct discriminant analysis using data for one year prior to the year of auditor change for each of the 13 financial statement variables as predictors of membership in two groups, auditor change and non-auditor change. Of the original 513 firms, 78 firms were identified as multivariate outliers and were deleted. For the remaining 435 firms (122 auditor change and 313 non-auditor change), evaluation of assumptions of linearity, normality, multicollinearity or singularity were satisfactory. We did find a statistically significant heterogeneity of variance-covariance matrix, and therefore, a quadratic procedure was used by SAS PROC DISCRIM for the analysis (Tabachnick and Fidell 2007). The elimination of outliers and the use of a quadratic instead of a linear procedure clearly show the impact of the restrictive assumptions that must be applied for discriminant analysis. Table 2 presents the results of the discriminant analysis for both years (2007 and 2008) combined (Panel A), for the year 2007 (Panel B) and for the year 2008 (Panel C). We verified the stability of the classification procedure and our model with a cross-validation run. The standard jackknifed classification or Leave-One-Out validation process was applied to discriminant analysis. This classification procedure eliminates bias in the classification procedure. As presented in Table 2 (Panel A for both 2007 and 2008 combined), the overall accuracy rate of discriminant analysis is 41.38 percent for correct classification of the original observations and 38.85 percent when applying cross-validation procedures. The true positive or sensitivity rate, which indicates the percentage of auditor change firms correctly classified, is 89.34 percent in the cross-validation summary. The overall accuracy and true positive rates are influenced by the classification of a disproportionate number of cases as auditor change firms. Although 28 percent of the firms in the sample actually changed auditors, the classification scheme using sample proportions as prior probabilities, classified 83.22 percent of the firms as auditor change firms [(362/435) from cross-validation summary, Table 2, Panel A]. This means that for the years 2007 and 2008, auditor change firms are more likely to be correctly classified, but non-auditor change firms are likely to be mis-classified as auditor change firms. Panel of Table 2 presents the classification results for the year 2007, which are similar to the overall results, where the overall accuracy rates are 49.65 percent and 46.50 percent for the original and the cross-validation classifications, respectively. Also in this 2007 sub-sample, a disproportionate number of cases are classified as auditor change firms. Twenty-six percent of the firms (74 of 286 firms) actually changed auditors in 2007, but the classification scheme using sample proportions as prior probabilities, classified 71.68 percent of the firms as auditor change firms [(205/286) from cross-validation summary, Table 2, Panel B]. The resulting true positive rate for 2007 is 85.14 percent in the cross-validation results. Consistent with the results for the combined years, the 2007 overall accuracy and true positive rates suggest that auditor change firms are more likely to be correctly classified when the 13 predictors are used in the model. The classification results for the year 2008 presented in Panel C, Table 2, show overall accuracy rates of 73.68 percent and 55.64 percent for the original and cross-validation procedures, respectively. Even in 2008, a disproportionate number of firms are classified as auditor change firms (38.35 percent (51/133) compared to the actual 25.56 percent of firms in the sample (34/133)). However, the true positive rate of 38.24 percent in the crossvalidation results is much lower than that for both years combined and for 2007 alone. Overall, the cross-validated prediction rates of the discriminant analysis are not strong. In addition to classification accuracy rates, we want to evaluate the contribution of specific variables as predictors in the discriminant functions. The untabulated univariate test statistics for testing equality of class means for each predictor variable from our discriminant analyses show differences for LOSS and DIV, and this is consistent with the tests of differences presented in Table 1 . However, the standardized canonical coefficients from our discriminant analyses, as presented in Table 3, provide inconsistent results when comparing both years combined (2007 and 2008). The standardized coefficients indicate the relative importance of the 13 financial statement variables in the discriminant functions, and we use 0.30 as the generally accepted cut-off between important and less important variables. DIV is the only variable that is an important predictor in all three groups (2007, 2008, and both years combined). Three other variables - LOSS, RE/ , and G - are important for both years combined and for either 2007 or 2008, but not for both years separately. Data Mining Approaches The WEKA software we used to implement 1 3 data mining algorithms includes logistic regression as a data mining approach even though logistic regression includes parametric assumptions. Because logistic regression assumptions are less restrictive than those of discriminant analysis and because logistic regression has been used in many prior bankruptcy studies and allows us to evaluate the significance of individual predictor variables in addition to the set of variables as a whole, we applied this approach using SAS software to our current auditor change prediction study before completing our analysis of data mining results. The less restrictive assumptions of logistic regression resulted in the removal of only two outliers from our initial sample so that our data set included 511 observations (168 auditor change and 343 non-auditor change firms). Summary results for the logistic regression analysis are presented in Table 4. Overall accuracy rates were 67.3 percent, 66.4 percent, and 65.0 percent for both years combined (2007 and 2008), respectively. The likelihood ratio statistics exceed 24.5 for both years combined (2007 and 2008) indicating that the 13 financial variables, as a set, reliably distinguish between auditor change and non-auditor change firms. However, true positive rates (the percentage of auditor change firms correctly classified) were very low, ranging from 4.8 percent for both years combined to 15.7 percent for 2008. The only variables with significant coefficients based on the Wald statistics (results not tabulated) are LOSS (both years combined and 2008), DIV (2007), RE/TA (2007), and WCA/TA (2008). The results of our discriminant and logistic regression analyses indicate that LOSS, DIV, and RE/ are significant predictors in distinguishing between auditor change and non-auditor change firms. However, in the cross-validation discriminant analyses, the overall accuracy rates ranged from 38.85 percent (both years combined) to 55.64 percent (2008), and the true positive rates were driven by disproportionate numbers of non-auditor change firms being classified as auditor change firms. In the logistic regression analyses, overall accuracy rates were around 66 percent, but true positive rates were below 16 percent. The second objective of our study involves using the Kappa statistic and AUC metrics to more carefully evaluate the results of applying 13 data mining algorithms, including logistic regression, to our sample data. Although the overall accuracy rates for 11 of the 13 algorithms are between 64 percent and 67 percent (with lower rates for the other two algorithms), Kappa statistics for all 13 algorithms range from -0.04 to 0.07, and AUC measures range from 0.48 to 0.56. These Kappa statistics and AUC measures indicate that the classifications of auditor change and non-auditor change firms using the 13 data mining algorithms are no better than random classifications. Even when we include values for our 13 financial distress variables from all three years prior to the change year and from the change year, the Kappa statistics and AUC measures indicate random classifications. Because these results are consistent across algorithms, we have not tabulated these results. 5. SUMMARY AND CONCLUSIONS In this study, we have applied discriminant analysis to evaluate the effectiveness of 13 financial distress variables in predicting auditor changes, and we have examined the results of applying 13 data mining algorithms in predicting auditor changes and whether these results occurred by chance. Our study extends previous research by using the traditional discriminant analysis model because this model has not been used in previous auditor change studies. Discriminant analysis also allows us to evaluate the significance of individual predictor variables in addition to the set of financial distress variable used for classification. Our study also extends prior research by highlighting the importance of evaluating the likelihood that data mining approach classification results occurred by chance. Using Kwak et al.'s (201 1) data set of firms changing auditors in 2007 or 2008 and matching non -auditor change firms, our discriminant analysis test results show overall accuracy rates ranging from 38.85 percent (for both years combined) to 55.64 percent (2008 only) and true positive rates over 85 percent, but these rates are influenced by a disproportionate number of non-auditor change firms being classified as auditor change firms. Individual predictor variables that are important in the discriminant equation based on standardized canonical coefficients include losses (LOSS) and no payment of dividends (DIV) in the year prior to the auditor change, retained earnings as a percent of total assets (RE/ ), and earnings before interest and taxes as a percent of total assets (EBIT/TA). We applied logistic regression, a parametric data mining method, for comparison with discriminant analysis, and our results show overall accuracy rates of around 66 percent, true positive rates less than 16 percent, and LOSS, DIV, RE/ , and WCA/TA as significant individual predictors. However, the Kappa statistic and AUC metrics for logistic regression and the other 12 data mining algorithms we used indicate that classifications using these algorithms are no better than random classifications. Investors are interested in reasons for auditor change decisions because these may negatively impact stock prices. Audit firms would benefit from a reliable auditor change prediction model because they stand to lose future revenues and some of their start-up and negotiation costs if they incorrectly price audit services for new clients or accept clients that fail or that change auditors again in the near future. Thus, current and future research to improve auditor change prediction is valuable. Our current study makes three important contributions to the auditor change prediction literature. First, our results suggest that auditor change prediction studies should include financial distress variables because we document that some financial distress variables are important in distinguishing between auditor change and non-auditor change firms. Second, our relatively low prediction accuracy rates using only financial distress variables as predictors indicate that a more robust set of predictor variables is needed to capture the various drivers of auditor change decisions. Third, we highlight the importance of using additional metrics beyond accuracy rates to interpret the results of data mining approaches to classification and prediction. One limitation of our current study is the time period used in our sample. Years of 2007 and 2008 are at die beginning of the economic recession in the United States, so our results may not be generalizable to periods with different economic conditions. A potential extension of our study could be to expand the period used in our sample and control for general economic conditions. Future research could also incorporate additional firm characteristic variables and specific event indicators in order to better understand auditor change motivations and improve prediction accuracy. REFERENCES 1. Altman E., 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance 23 (3), 589-609. 2. Calderón, T. and E. Ofobike, "Determinants of client-initiated and auditor-initiated auditor changes," Managerial Auditing Journal, 23(1), 2008, 4 - 25. 3. Chen, C, G. Yen, F. Chang. 2009. Strategic auditor switch and financial distress prediction-empirical findings from the TSE-listed firms, Applied Financial Economics 19(1), 59. 4. Chen, Y., A. Gupta, and D. L. Senteney. 2004. Predicting impending bankruptcy using audit firm changes, Journal of American Academy of Business 4(1/2), 423-433. 5. Davidson, W., P. Jiraporn, and P. DaDalt, Causes and Consequences of Audit Shopping: An Analysis of Auditor Opinions, Earnings Management, and Auditor Changes, Quarterly Journal of Business and Economics (Winter 2006), Vol. 45, Nos. 1/2, 69-87. 6. Fok, L., J. Angeldis, . Ibrahim, and W. Fok. 1995. A decision tree approach to the interpretation of multivariate statistical techniques, Journal of Education for Business 71(2), 1 10-1 18. 7. Francis, J., and E. Wilson. 1988. Auditor changes: A joint test of theories relating to agency costs and auditor differentiation, The Accounting Review 63(4), 663-682. 8. Fusco, C, "Is It Time To Revise 8-K Rules on Auditor Changes ," Financial Executive (March 2006), 49 -51. 9. Gepp, S., K. Kumar, and S. Bhattacharya. 2010. Business failure prediction using decision trees, Journal of Forecasting 29, 536-555. 10. Hudaib, M. and T. Cooke, "The Impact of Managing Director Changes and Financial Distress on Audit Qualification and Auditor Switching," Journal of Business Finance & Accounting, 32(9) & (10), November/December 2005, 1703 - 1739. 11. Kwak, W., Y. Shi and J. Cheh. 2006. Firm bankruptcy prediction using multiple criteria linear programming data mining approach. Advances in Investment Analysis and Portfolio ^Management 2, 27-49. 12. Kwak, W., S. Eldridge, Y. Shi, and G. Kou. 201 1. Predicting Auditor Changes Using Financial Distress Variables and the Multiple Criteria Linear Programming (MCLP) and Other Data Mining Approaches, Journal of Applied Business Research, 27 (5), 73 - 84. 13. Landsman, W., K. Nelson, and B. Rountree, "Auditor Switches in the Pre- and Post-Enron Eras: Risk or Realignment ," The Accounting Review, Vol. 84, No. 2 (March 2009), 53 1 - 558. 14. Lau, A. 1987. "A Five-State Financial Distress Prediction Model," Journal of Accounting Research, Vol. 25, No. 1, 127-138. 15. Müller, G., . Steyn-Bruwer, and W. Hamman. 2009. Predicting financial distress of compames listed on the JSE - A comparison of techniques, South African Journal of Business Management 40(1), 21-32. 16. OhlsonJ. 1980. Financial ratios and the probabilistic prediction of bankruptcy, Journal of Accounting Research 18(1), 109-131. 17. Peng, Y., G. Kou, G. Wang, H. Wang, and F. Ko. 2009. Empirical evaluation of classifiers for software risk management, International Journal of Information Technology & Decision Making 8(4), 749 - 767. 18. Quinlan, J. See5.0. 2004. (available at: http://www.rulequest.com/see5-info.htmn. 19. Sarbanes-Oxley Act (SOX). 2002. Public Law No. 107-204. Washington, D.C.: Government Printing Office. 20. Schwartz, K., and K. Menon. 1985. Auditor switches by failing firms, The Accounting Review, 248-261. 21. Securities and Exchange Commission (SEC). Final Rule: Management's Reports on Internal Control Over Financial Reporting and Certification of Disclosure in Exchange Act Periodic Reports. Release No. 338238. Washington, D.C.: SEC, 2003. Available at: http://www.sec.gov/rules/final/33-8238.htm 22. Securities and Exchange Commission (SEC). Final Rule: Management's Reports on Internal Control Over Financial Reporting and Certification of Disclosure in Exchange Act Periodic Reports. Release No. 33-8392. Washington, D.C.: SEC, 2004. Available at: http://ww.sec.gov/rales/final/33-8392.htm 23. Securities and Exchange Commission (SEC). Final Rule: Management's Reports on Internal Control Over Financial Reporting and Certification of Disclosure in Exchange Act Periodic Reports of Non-accelerated Filers and Foreign Private Issuers. Release No. 33-8545. Washington, D.C.: SEC, 2005. Available at: http://www.sec.gov/rules/final/33-8545.htm 24. Senteney, D., Y. Chen, and A. Gupta. 2006a. Predicting impending bankruptcy from auditor qualified opinions and audit firm changes, Journal of Applied Business Research 22(1), 41-56. 25. Senteney, D. L., M.S. Bazaz, and A. Ahmadopur. 2006b. Tests of the incremental explanatory power of auditor qualified opinion and audit firm changes in predicting impending bankruptcy, International Journal of Accounting, Auditing and Performance Evaluation 3 (4), 434-45 1 . 26. Sung, T., N. Chang, and G. Lee. 1999. Dynamics of modeling in data mining: Interpretive approach to bankruptcy prediction, Journal of Management Information Systems 16(1), 63-85. 27. Tabachnick, B., and Fidell, L. 2007. Using Multivariate Statistics, 5th ed., Pearson Education, Inc. 28. Witten, I.H. and E. Frank. 2005. Data mining: Practical machine learning tools and techniques, 2n edition, Morgan Kaufmann, San Francisco. Susan Eldridge, University of Nebraska at Omaha, USA Wikil Kwak, University of Nebraska at Omaha, USA Roopa Venkatesh, University of Nebraska at Omaha, USA Yong Shi, University of Nebraska at Omaha, USA Gang Kou, University of Electronic Science and Technology of China, China AUTHOR INFORMATION Susan W. Eldridge is an Associate Professor of Accounting and Accounting Department Chair at the University of Nebraska at Omaha. She received her PhD from the University of North Carolina at Chapel Hill. Her research and teaching interests are in the financial accounting area. Her research has been published in the Journal of Applied Business Research, Journal of Accounting, Ethics and Public Policy, Bank Accounting & Finance, and Review of Pacific Basin Financial Markets and Policies. She is a Certified Public Accountant and a member of the American Accounting Association and the American Institute of CPAs. Wikil Kwak is a Professor of Accounting at the University of Nebraska at Omaha. He received Ph.D. in Accounting from the University of Nebraska in Lincoln. Dr. Kwak's research interests include the areas of mathematical programming approaches in capital budgeting, transfer pricing, performance evaluation and Japanese capital market studies. He has published in the Engineering Economist, Abacus, Contemporary Accounting Research, Review of Quantitative Finance and Accounting, Management Accountant, Journal of Petroleum Accounting and Financial Management, Business Intelligence and Data Mining, Review of Pacific Basin Financial Markets and Policies, and Multinational Business Review. E-mail: wkwak(S),mail.unomaha.edu (Corresponding author) Dr. Venkatesh is an assistant professor at the University of Nebraska at Omaha. Professor Venkatesh's teaching interests are managerial accounting and financial accounting. Her research interests lie in the areas of managerial accounting and auditing. Her research interests also include examining the effect of XBRL on the audit process and auditors judgments. E-mail: rvenkatesh@unomaha.edu Dr. Yong Shi, a Senior Member of IEEE, serves as the Executive Deputy Director, Chinese Academy of Sciences Research Center on Fictitious Economy & Data Science, China, since 2007. He has been the Union Pacific Chair of Information Science and Technology, College of Information Science and Technology, University of Nebraska at Omaha, USA. Dr. Shi's research interests include business intelligence, data mining, and multiple criteria decision making. He has published more than 17 books, over 200 papers in various journals and conferences proceedings. He is me Editor-in-Chief of International Journal of Information Technology and Decision Making (SCI). Dr. Gang Kou is a professor of School of Management and Economics, University of Electronic Science and Technology of China and managing editor of International Journal of Information Technology & Decision Making. Previously, he was a research scientist in Thomson Co., R&D. He received his Ph.D. in Information Technology from the College of Information Science & Technology, Univ. of Nebraska at Omaha; got his Master degree in Dept of Computer Science, Univ. of Nebraska at Omaha; and B.S. degree in Department of Physics, Tsinghua University, Beijing, China. He has published more than eighty papers in various peer-reviewed journals and conferences and accomplished more than 300 cites of published journal articles as shown in the Science Citation Index (SCI) database. (c) 2012 Clute Institute for Academic Research |