Parallel PLS Aigorithm Using MapReduce and Its Aplication in Spectral Modeling
YANG Hui-hua1, DU Ling-ling2, LI Ling-qiao2, TANG Tian-biao2, GUO Tuo2, LIANG Qiong-lin3, WANG Yi-ming3, LUO Guo-an3
1. School of Electric Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004,China 2. School of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin 541004,China 3. Analysis Center, Tsinghua University, Beijing 100084,China
Abstract:Partial least squares (PLS) has been widely used in spectral analysis and modeling, and it is computation-intensive and time-demanding when dealing with massive data. To solve this problem effectively, a novel parallel PLS using MapReduce is proposed, which consists of two procedures, the parallelization of data standardizing and the parallelization of principal component computing. Using NIR spectral modeling as an example, experiments were conducted on a Hadoop cluster, which is a collection of ordinary computers. The experimental results demonstrate that the parallel PLS algorithm proposed can handle massive spectra, can significantly cut down the modeling time, and gains a basically linear speedup, and can be easily scaled up.
Key words:Parallel partial least squares;Near infrared spctrum;MapReduce;Parallel computing;Hadoop;Cloud computing
杨辉华1,杜玲玲2,李灵巧2,唐天彪2,郭 拓2,梁琼麟3,王义明3,罗国安3 . 并行MapReduce PLS算法及其在光谱分析中的应用[J]. 光谱学与光谱分析, 2012, 32(09): 2399-2404.
YANG Hui-hua1, DU Ling-ling2, LI Ling-qiao2, TANG Tian-biao2, GUO Tuo2, LIANG Qiong-lin3, WANG Yi-ming3, LUO Guo-an3. Parallel PLS Aigorithm Using MapReduce and Its Aplication in Spectral Modeling . SPECTROSCOPY AND SPECTRAL ANALYSIS, 2012, 32(09): 2399-2404.
[1] Zhang Z M, Liang Y Z, Xu Q S. Chemometrics and Intelligent Laboratory Systems,2009, 96(1): 94. [2] SHEN Yong-xiang, YANG Hui-hua, HE Qian, et al(申永祥, 杨辉华,何 倩,等). Control and Automation Publication Group(微计算机信息), 2010,26(9): 208. [3] Dean J, Ghemawat S. Google, Inc., 2004. [4] Paradies M. Datenbank Spektrum, 2011, 11:47. [5] Yang Lai, Shi Zhong-zhi. International Federation for Information Processing, 2010, 213. [6] Pham D P, Yuan S M, Jou E. LNSC6104, 2010. 662. [7] Chu C T, Kim S K, Lin Y A, et al. NIPS, 2006. 281. [8] http://mahout.apache.org/2011. [9] JIANG Xiao-ping, LI Cheng-hua, XIANG Wen, et al(江小平,李成华,向 文,等). Huazhang Univ. of Sci. & Tech.·Natural Science Edition(华中科技大学学报·自然科学版), 2011,(S1): 120. [10] TAO Yong-cai, XUE Zheng-yuan, SHI Lei(陶永才, 薛正元,石 磊). Journal of Computer Aplications(计算机应用),2011, 31(9): 2412. [11] WANG Hui-wen, WU Zai-bin, MENG Jie(王惠文, 吴载斌, 孟 洁). Partial Least-Squares Regression-Linear and Nonlinear Methods(偏最小二乘回归的线性与非线性方法). Beijing: National Defense Industry Press(北京: 国防工业出版社), 2006. 255. [12] White T. Hadoop: The Definitive Guide. Beijing: Tsinghua University Press(北京: 清华大学出版社), 2011. [13] XIE Chao, MAI Lian-dao, DU Zhi-hui, et al(谢 超,麦联叨,都志辉,等). Computer Engineering and Applications(计算机工程与应用). 2003: 66