1. Key Laboratory of Grain Information Processing and Control, Henan University of Technology, Ministry of Education, Zhengzhou 450001, China
2. Henan Provincial Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, China
3. School of Software, Henan University of Engineering, Zhengzhou 450001, China
4. School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China
5. School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China
Abstract:As a major stored carbohydrate, starch is a major source of energy in the human diet and provides more than 50% of the energy needs of the human body. Meanwhile, the starch and its deep-processing industry are fundamental to the national economy and people's livelihoods. However, due to the diversity of starch types and their high similarity in appearance, it is relatively challenging to distinguish amongthem directly. Some illegal merchants often package lower-priced starches as higher-priced starches to increase profits. Consequently, the classification of starch types has significant practical relevance for food processing and industrial production in China. Terahertz (THz) technology, as an effective non-destructive, non-contact, and label-free optical approach, does not produce harmful ionizing radiation during interactions with materials,and can obtain optical parameters such as the absorption coefficient of samples simultaneously. It has a high signal-to-noise ratio and detection sensitivity, and many scholars have applied it to the quality detection of agricultural products. Five of the most common starch samples were selected from cereal starch and rhizome starch to achieve rapid and non-destructive identification of starch. The spectral information was obtained using Terahertz time-domain spectroscopy (THz-TDS) technology, and the absorption coefficient of different starch varieties in the range of 0.2~1.2 THz was calculated based on the experimental data. Subsequently, the original spectra were processed using three preprocessing methods: Savitzky-Golay (S-G) smoothing, multiplicative scatter correction (MSC), and standard normal variate (SNV). Principal component analysis (PCA) was employed to extract feature data based on a cumulative contribution rate exceeding 95%, resulting in the selection of the first three principal components. A multi-classification model was established using the support vector machine (SVM) method. Three types of kernels (linear, polynomial, and radial basis functions) were selected to identify different varieties of starch. The results showed that the PCA-SVM-polynomial combined with SG smoothing achieved the best modeling performance for starch variety classification, with an average accuracy of 0.941 9 on the test set, a Kappa of 0.933, and an F1 score of 0.941 7. Furthermore, this method was compared with logistic regression (LR), decision tree (DT), and random forest (RF). The research results indicated that PCA-SVM was superior to other methods,proving the feasibility of THz technology for starch variety identification and demonstrating important practical application value for the modernization of the food processing industry and the development of starch-based products.
[1] Wang T, Yang X, Wang D, et al. Carbohydrate Polymers, 2012, 88(2): 754.
[2] Yang Y, Zhuang H, Yoon S-C, et al. Food Chemistry, 2018, 244: 184.
[3] Alzate J L M, Tran T, Ceballos H, et al. Industrial Crops and Products, 2024, 219: 119095.
[4] WANG Shao-qing, WU Shi-kui, MU Tong-na, et al(王绍清, 武士奎, 穆同娜,等). Food Science(食品科学), 2010, 31(22): 332.
[5] Li C, Kang X, Nie J, et al. Food Chemistry, 2023, 398: 133896.
[6] CHEN Jia, WANG Shuang, ZHOU Wei, et al(陈 佳, 王 爽, 周 巍, 等). Food Science(食品科学), 2019, 40(16): 281.
[7] Pandiselvam R, Sruthi N U, Kumar A, et al. Food Reviews International, 2023, 39(1): 209.
[8] Zahra A, Qureshi R, Sajjad M, et al. Expert Systems with Applications, 2024, 238: 122172.
[9] Shi S, Feng J, Yang L, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2023, 291: 122343.
[10] Li W, Tan F, Cui J, et al. Vibrational Spectroscopy, 2022, 123: 103447.
[11] Ríos-Reina R, Salatti-Dorado J Á, Ortiz-Romero C, et al. Food Control, 2024, 158: 110250.
[12] Zeng S, Li M, Li G, et al. Trends in Food Science & Technology, 2022, 121: 76.
[13] Sun Y, Tang H, Zou X, et al. Current Opinion in Food Science, 2022, 47: 100910.
[14] Khan A, Vibhute A D, Mali S, et al. Ecological Informatics, 2022, 69: 101678.
[15] Goyal R, Singha P, Singh S K. Trends in Food Science & Technology, 2024, 146: 104377.
[16] Manjappa M, Singh R. Advanced Optical Materials, 2020, 8(3): 1901984.
[17] Hu J, Liu Y, He Y, et al. Journal of Food Measurement and Characterization, 2020, 14(5): 2549.
[18] Karaliūnas M, Nasser K E, Urbanowicz A, et al. Scientific Reports, 2018, 8(1): 18025.
[19] JIANG Yu-ying, GE Hong-yi, ZHANG Yuan(蒋玉英,葛宏义,张 元). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2018, 38(10): 3017.
[20] Chen T, Li Z, Yin X, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2016, 153: 586.
[21] Dorney T D, Baraniuk R G, Mittleman D M. J. Opt. Soc. Am. A, 2001, 18(7): 1562.
[22] Duvillaret L, Garet F, Coutaz J-L. Appl. Opt., 1999, 38(2): 409.
[23] Rinnan Å, van den Berg F, Engelsen S B. TrAC Trends in Analytical Chemistry, 2009, 28(10): 1201.
[24] Liu J, Han J, Xie J, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2020, 226: 117639.
[25] Jin S, Zhang W, Yang P, et al. Computers and Electrical Engineering, 2022, 101: 108077.
[26] Wold S, Esbensen K, Geladi P. Chemometrics and Intelligent Laboratory Systems, 1987, 2(1): 37.
[27] Cortes C, Vapnik V. Machine Learning, 1995, 20: 273.
[28] Cheon J H, Kim D, Kim Y, et al. IEEE Access, 2018, 6: 46938.
[29] Kingsford C, Salzberg S L. Nature Biotechnology, 2008, 26(9): 1011.
[30] Breiman L. Machine Learning, 2001, 45(1): 5.
[31] Juhola M, Katajainen J, Raita T. IEEE Transactions on Signal Processing, 1991, 39(1): 204.