A multiple linear regression model with multiplicative log-normal error term for atmospheric concentration data. Liao, K., Park, E. S., Zhang, J., Cheng, L., Ji, D., Ying, Q., & Yu, J. Z. SCIENCE OF THE TOTAL ENVIRONMENT, ELSEVIER, RADARWEG 29, 1043 NX AMSTERDAM, NETHERLANDS, MAY 1, 2021.
doi  abstract   bibtex   
The homoscedasticity assumption (the variance of the error term is the same across all the observations) is a key assumption in the ordinary linear squares (OLS) solution degrees fa linear regression model. The validity of this assumption is examined for a multiple linear regression model used to determine the source contributions to the observed black carbon concentrations at 12 background monitoring sites across China using a hybrid modeling approach. Residual analysis from the traditional OLS method, which assumes that the error term is additive and normally distributed with a mean of zero, shows pronounced heterosceclasticity based on the Breusch-Pagan test for 11 datasets. Noticing that the atmospheric black carbon data are log-normally distributed, we make a new assumption that the error terms are multiplicative and log-normally distributed. When the coefficients of the multilinear regression model are determined using the maximum likelihood estimation (MLE), the distribution of the residuals in 8 out of the 12 datasets is in good accordance with the revised assumption. Furthermore, the MLE computation under this novel assumption could be proved mathematically identical to minimizing a log-scale objective function, which considerably reduces the complexity in the MLE calculation. The new method is further demonstrated to have dear advantages in numerical simulation experiments of a 5-variable multiple linear regression model using synthesized data with prescribed coefficients and lognormally distributed multiplicative errors. Under all 9 simulation scenarios, the new method yields the most accurate estimations of the regression coefficients and has significantly higher coverage probability (on average, 95% for all five coefficients) than OLS (79%) and weighted least squares (WLS, 72%) methods. (C) 2020 Elsevier B.V. All rights reserved.
@article{ WOS:000617681100015,
Author = {Liao, Kezheng and Park, Eun Sug and Zhang, Jie and Cheng, Linjun and Ji,
   Dongsheng and Ying, Qi and Yu, Jian Zhen},
Title = {{A multiple linear regression model with multiplicative log-normal error
   term for atmospheric concentration data}},
Journal = {{SCIENCE OF THE TOTAL ENVIRONMENT}},
Year = {{2021}},
Volume = {{767}},
Month = {{MAY 1}},
Abstract = {{The homoscedasticity assumption (the variance of the error term is the
   same across all the observations) is a key assumption in the ordinary
   linear squares (OLS) solution degrees fa linear regression model. The
   validity of this assumption is examined for a multiple linear regression
   model used to determine the source contributions to the observed black
   carbon concentrations at 12 background monitoring sites across China
   using a hybrid modeling approach. Residual analysis from the traditional
   OLS method, which assumes that the error term is additive and normally
   distributed with a mean of zero, shows pronounced heterosceclasticity
   based on the Breusch-Pagan test for 11 datasets. Noticing that the
   atmospheric black carbon data are log-normally distributed, we make a
   new assumption that the error terms are multiplicative and log-normally
   distributed. When the coefficients of the multilinear regression model
   are determined using the maximum likelihood estimation (MLE), the
   distribution of the residuals in 8 out of the 12 datasets is in good
   accordance with the revised assumption. Furthermore, the MLE computation
   under this novel assumption could be proved mathematically identical to
   minimizing a log-scale objective function, which considerably reduces
   the complexity in the MLE calculation. The new method is further
   demonstrated to have dear advantages in numerical simulation experiments
   of a 5-variable multiple linear regression model using synthesized data
   with prescribed coefficients and lognormally distributed multiplicative
   errors. Under all 9 simulation scenarios, the new method yields the most
   accurate estimations of the regression coefficients and has
   significantly higher coverage probability (on average, 95\% for all five
   coefficients) than OLS (79\%) and weighted least squares (WLS, 72\%)
   methods. (C) 2020 Elsevier B.V. All rights reserved.}},
Publisher = {{ELSEVIER}},
Address = {{RADARWEG 29, 1043 NX AMSTERDAM, NETHERLANDS}},
Type = {{Article}},
Language = {{English}},
Affiliation = {{Yu, JZ (Corresponding Author), Hong Kong Univ Sci \& Technol, Dept Chem, Kowloon, Clear Water Bay, Hong Kong, Peoples R China.
   Ying, Q (Corresponding Author), Texas A\&M Univ, Zachry Dept Civil \& Environm Engn, College Stn, TX 77843 USA.
   Liao, Kezheng; Yu, Jian Zhen, Hong Kong Univ Sci \& Technol, Dept Chem, Kowloon, Clear Water Bay, Hong Kong, Peoples R China.
   Park, Eun Sug, Texas A\&M Univ, Texas A\&M Transportat Inst, College Stn, TX 77843 USA.
   Zhang, Jie; Ying, Qi, Texas A\&M Univ, Zachry Dept Civil \& Environm Engn, College Stn, TX 77843 USA.
   Cheng, Linjun, China Natl Environm Monitoring Ctr, Beijing 100012, Peoples R China.
   Ji, Dongsheng, Chinese Acad Sci, Inst Atmospher Phys, State Key Lab Atmospher Boundary Layer Phys \& Atm, Beijing 100191, Peoples R China.
   Ji, Dongsheng, Chinese Acad Sci, Ctr Excellence Reg Atmospher Environm, Inst Urban Environm, Xiamen 361021, Peoples R China.}},
DOI = {{10.1016/j.scitotenv.2020.144282}},
Article-Number = {{144282}},
ISSN = {{0048-9697}},
EISSN = {{1879-1026}},
Keywords = {{Log-normal distribution; Multilinear regression; Maximum likelihood
   estimation; Residual; Source attribution}},
Research-Areas = {{Environmental Sciences \& Ecology}},
Web-of-Science-Categories  = {{Environmental Sciences}},
Author-Email = {{qying@civil.tamu.edu
   chjianyu@ust.hk}},
ResearcherID-Numbers = {{Yu, Jian Zhen/A-9669-2008
   Ji, Dongsheng/E-3807-2018}},
ORCID-Numbers = {{Yu, Jian Zhen/0000-0002-6165-6500
   Ji, Dongsheng/0000-0002-7889-4417}},
Funding-Acknowledgement = {{Hong Kong Research Grant CouncilHong Kong Research Grants Council
   {[}R6011-18]; National Institutes of HealthUnited States Department of
   Health \& Human ServicesNational Institutes of Health (NIH) - USA {[}R01
   ES029509]; Hong Kong PhD Fellowship}},
Funding-Text = {{This work was partially supported by the Hong Kong Research Grant
   Council (R6011-18) to J.Z. Yu and a Hong Kong PhD Fellowship to K.Z.
   Liao. Q. Ying, E.S. Park and J. Zhang are partially supported by a grant
   from the National Institutes of Health (R01 ES029509). The CMAQ model
   simulations were performed using the computer clusters at the Texas A\&M
   High Performance Research Computing (https://hprc.tamu.edu/).}},
Number-of-Cited-References = {{36}},
Times-Cited = {{3}},
Usage-Count-Last-180-days = {{6}},
Usage-Count-Since-2013 = {{21}},
Journal-ISO = {{Sci. Total Environ.}},
Doc-Delivery-Number = {{QG6GE}},
Unique-ID = {{WOS:000617681100015}},
DA = {{2021-12-02}},
}

Downloads: 0