按 Enter 到主內容區
:::

交通部運輸研究所Institute of Transportation, MOTC

:::
  • 小字級
  • 中字級
  • 大字級
  • 列印
  • facebook
  • plurk
  • twitter

博碩士論文

論文名稱 VIF式矩陣擾動法解決線性模式的共線性問題
年別 104
學位 博士
學校系所 交通大學運輸與物流管理學系
作者 黃建嘉
指導教授 卓訓榮,周幼珍
論文摘要 Regression modeling has been one of the most useful and important tools for statistical research. Of all the clans in regression modeling, \emph{linear regression} refers to building a linear relationship between a set of \emph{explanatory} variables and a \emph{response} variable, enabling the researchers to examine how those explantory interact with each other and affect the responses. For instance, in econometrics, by having all other factors fixed, one can perform the \emph{ceteris peribus} analysis to examine how a unit change in a given factor affects the outcome by simply looking at the coefficient associated. Others such as biology, physics, environmental science and the likes are the related applicable fields. Thence, its wide applicability renders regression modeling an important tool for scientific studies.

Linear regression is favorable in the sense that the estimation can be done by using the ordinary least squares (OLS). The theorey of OLS is so well-founded that it provides a systematic way for researchers to obtain an estimate almost automatically. In the meanwhile, supported by a rich body of statistical inference, OLS has become the core of the regression analysis.

Whereas theoretically explanatory variables are assumed independent implicitly in most, if not all, linear models, the analysis proceeds with data given exogenously in practice. Usually, data are collected from a vast unknown population, encompassing certain degree of stochasticity and unpredictability, and thus problems result. For instance, an erroneous input of a data point may likely yield a completely different result for the estimates; missing data points due to certain reasons can be an obstacle for the subsequently analysis; or high similarity (dependency) in the collected data may lead to unsatisfactory results and so like. Problems of this sort are referred to as the \emph{data problems}.

Of all data problems in linear regression, we are more concerned about the nearly linear dependency among several factors assumed independent implicitly in the model specification. Numerically, high dependency results directly in rank-deficiency problem in the OLS estimation. The resultant problems, such as high variance, low statistical significance, and even incorrect signs, can frustrate the researchers aiming to model a problem using the regression.

Biased regression has been one of the approaches devised in the literature for solving data collinearity problems in linear regression. The most dominant method is this particular category is the ridge regression. Ridge regression has won its fame by its simplicity in implementation in practice and by producing good results. Yet, ridge regression has certain problems. First of all, by definition, the collinearity stems from the dependency between at least two different covariates in the data matrix. Instead of tackling the problem from the root, the ridge method tackles the problem by breaking the intrinsic structure of the correlation matrix, under the assumption that the data matrix is normalized. Second, the breaking the intrinsic structure leads to plausibly good results. In particular, from the persepective of VIF, the results are even infeasible, rendering also the hypothesis testings infeasible, which in turn renders the future inferences infeasible. Third, the ridge method requires a one-dimensional parameter. To find a good parameter for the ridge method, one potentially has to deal with hard optimization problem which is NP-hard in nature. Thence, to even find a local optimal, the computational effort will be quite unfriendly, let alone looking for the global oiptimality. Moreover, a simply local optimal cannot guarantee the performance of the resulting ridge estimate.

Motivated by such frustrations, we aim to develope a alternative and useful method, based on the reliable diagnostic tool, to solve the commonly seen problem in the context of linear regression. The developed method is useful becasue our method inherits the merit from matrix theory and mitigates the data collinearity problem by improving not only the values of the underlying diagnostic but also the very intrinsic eigenstructure of the correlation matrix. The close connection between the diagnostic used and the correlation matrix leads to the success of our method in dealing with the problem.

We carry out both the real-world applications (when they are accessible) and random instance experiments (when they are inaccessible) to validate the developed method. The success in real-world applications reveals that our method is capable of addressing realistic datasets troubled by the existence of data collinearity. Moreover, from the results of the random instance experiments, we learn that our method has more capability in dealing with arbitrarily generated (with some variabilities well controlled) datasets.

附件下載 (電子檔於107-08-31後開放下載)

檔案下載

瀏覽人次:489
回頁首