Advanced Search

ISSN1001-3806 CN51-1125/TN Map

Volume 47 Issue 5
Sep.  2023
Article Contents
Turn off MathJax

Citation:

Variable selection combined with model updating to improve soluble solids content detection in apples

  • Corresponding author: OUYANG Aiguo, ouyang1968711@163.com
  • Received Date: 2022-08-09
    Accepted Date: 2022-10-25
  • In order to obtain a robust near infrared spectral model, a method based on variate selection and model updating was adopted. 240 Red Fuji apples were used to obtain near infrared diffuse transmission spectra and soluble solids content data, and a partial least squares regression model was developed to predict apple soluble solids content. The modelling variates were selected by using backward interval partial least squares and competitive adaptive reweighting algorithms. The model was updated by adding some samples from the new batch to the old batch and recalibrating. The results indicate that the model performance can be improved by variable selection, with the prediction coefficient of determination increasing to 0.7915, the root mean square error of prediction decreasing to 0.5810 and the prediction bias decreasing to 0.2627. Combining the model update strategy, the root mean square error of prediction and the prediction bias were further reduced. Model updating using only 20 samples has already led to a significant improvement in model performance, with the prediction coefficient of determination improving to 0.8506, the root mean square error of prediction decreasing to 0.4358 and the prediction bias decreasing to 0.1045, the result that is useful for robust near infrared spectroscopy modelling of a wide range of fruits.
  • 加载中
  • [1] 范昊安, 薛淑龙, 杜柠, 等. 苹果梨的营养价值及加工技术研究进展[J]. 食品研究与开发, 2020, 41(22): 205-212.

    FAN H A, XUE Sh L, DU N, et al. Research progress on nutritional value and processing technology of applepear[J]. Food Research and Development, 2020, 41(22): 205-212(in Chinese).
    [2] 黎丽莎, 刘燕德, 胡军, 等. 近红外无损检测技术在水果成熟度判别中的应用研究[J]. 华东交通大学学报, 2021, 38(6): 95-105.

    LI L Sh, LIU Y D, HU J, et al. Application of near infrared nondestructive testing technology in fruit maturity discrimination[J]. Journal of East China Jiaotong University, 2021, 38(6): 95-105(in Chinese).
    [3]

    FAN S, GUO Z, ZHANG B, et al. Using Vis/NIR diffuse transmittance spectroscopy and multivariate analysis to predicate soluble solids content of apple[J]. Food Analytical Methods, 2016, 9(5): 1333-1343. doi: 10.1007/s12161-015-0313-5
    [4] 张锦龙, 辛明, 樊琳琳, 等. 基于近红外光谱在皮瓣移植术后的监测系统[J]. 激光技术, 2020, 44(1): 91-95.

    ZHANG J L, XIN M, FAN L L, et al. Monitoring systems for skin flap transplantation based on near infrared spectroscopy[J]. Laser Technology, 2020, 44(1): 91-95(in Chinese).
    [5]

    SCHMUTZLER M, HUCK C W. Simultaneous detection of total antioxidant capacity and total soluble solids content by Fourier transform near-infrared (FT-NIR) spectroscopy: A quick and sensitive method for on-site analyses of apples[J]. Food Control, 2016, 66: 27-37. doi: 10.1016/j.foodcont.2016.01.026
    [6]

    LIU C, YANG S X, DENG L. Determination of internal qualities of Newhall navel oranges based on NIR spectroscopy using machine learning[J]. Journal of Food Engineering, 2015, 161: 16-23. doi: 10.1016/j.jfoodeng.2015.03.022
    [7]

    SÁNCHEZ M T, de la HABA M J, PEREZ-MARIN D. Internal and external quality assessment of mandarins on-tree and at harvest using a portable NIR spectrophotometer[J]. Computers and Electronics in Agriculture, 2013, 92: 66-74. doi: 10.1016/j.compag.2013.01.004
    [8]

    TEH S L, COGGINS J L, KOSTICK S A, et al. Location, year, and tree age impact NIR-based postharvest prediction of dry matter concentration for 58 apple accessions[J]. Postharvest Biology and Technology, 2020, 166: 111125. doi: 10.1016/j.postharvbio.2020.111125
    [9]

    NØRGAARD L, SAUDLAND A, WAGNER J, et al. Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy[J]. Applied Spectroscopy, 2000, 54(3): 413-419. doi: 10.1366/0003702001949500
    [10]

    MEHMOOD T, SæBØ S, LILAND K H. Comparison of variable selection methods in partial least squares regression[J]. Journal of Chemometrics, 2020, 34(6): e3226.
    [11] 张立欣, 杨翠芳, 陈杰, 等. BiPLS结合SPA对苹果可溶性固形物含量的近红外检测方法[J]. 塔里木大学学报, 2021, 33(4): 78-86.

    ZHANG L X, YANG C F, CHEN J, et al. Near-infrared detection method of soluble solids content in apple by BiPLS combined with SPA[J]. Journal of Tarim University, 2021, 33(4): 78-86(in Chinese).
    [12]

    XU O, LIU J, FU Y, et al. Dual updating strategy for moving-window partial least-squares based on model performance assessment[J]. Industrial & Engineering Chemistry Research, 2015, 54(19): 5273-5284.
    [13]

    PEIRS A, TIRRY J, VERLINDEN B, et al. Effect of biological variability on the robustness of NIR models for soluble solids content of apples[J]. Postharvest Biology and Technology, 2003, 28(2): 269-280. doi: 10.1016/S0925-5214(02)00196-5
    [14]

    LOUW E D, THERON K I. Robust prediction models for quality parameters in Japanese plums using NIR spectroscopy[J]. Postharvest Biology and Technology, 2010, 58(3): 176-184. doi: 10.1016/j.postharvbio.2010.07.001
    [15] 唐金亚, 黄敏, 朱启兵. 基于主动学习的玉米种子纯度检测模型更新[J]. 光谱学与光谱分析, 2015, 35(8): 2136-2140.

    TANG J Y, HUANG M, ZHU Q B. Purity detection model update of maize seeds based on active learning[J]. Spectroscopy and Spectral Analysis, 2015, 35(8): 2136-2140(in Chinese).
    [16]

    HUANG M, TANG J, YANG B, et al. Classification of maize seeds of different years based on hyperspectral imaging and model updating[J]. Computers and Electronics in Agriculture, 2016, 122: 139-145.
    [17]

    NASCIMENTO P A M, de CARVALHO L C, JÚNIOR L C C, et al. Robust PLS models for soluble solids content and firmness determination in low chilling peach using near-infrared spectroscopy(NIR)[J]. Postharvest Biology and Technology, 2016, 111: 345-351.
    [18] 刘燕德, 徐海, 孙旭东, 等. 不同品种苹果糖度近红外光谱在线检测通用模型研究[J]. 光谱学与光谱分析, 2020, 40(3): 922-928.

    LIU Y D, XU H, SUN X D, et al. Development of multi-cultivar universal model for soluble solid content of apple online using near infrared spectroscopy[J]. Spectroscopy and Spectral Analysis, 2020, 40(3): 922-928(in Chinese).
    [19]

    ZUDE-SASSE M, TRUPPEL I, HEROLD B. An approach to non-destructive apple fruit chlorophyll determination[J]. Postharvest Biology & Technology, 2002, 25(2): 123-133.
    [20]

    MCDEVITT R M, GAVIN A J, ANDRÉS S, et al. The ability of visible and near-infrared reflectance spectroscopy (NIRS) to predict the chemical composition of ground chicken carcasses and to discriminate between carcasses from different enotypes[J]. Journal of Near Infrared Spectroscopy, 2005, 13: 109-117.
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(6) / Tables(4)

Article views(1221) PDF downloads(11) Cited by()

Proportional views

Variable selection combined with model updating to improve soluble solids content detection in apples

    Corresponding author: OUYANG Aiguo, ouyang1968711@163.com
  • School of Intelligent Electromechanical Equipment Innovation Research Institute, East China Jiaotong University, Nanchang 330013, China

Abstract: In order to obtain a robust near infrared spectral model, a method based on variate selection and model updating was adopted. 240 Red Fuji apples were used to obtain near infrared diffuse transmission spectra and soluble solids content data, and a partial least squares regression model was developed to predict apple soluble solids content. The modelling variates were selected by using backward interval partial least squares and competitive adaptive reweighting algorithms. The model was updated by adding some samples from the new batch to the old batch and recalibrating. The results indicate that the model performance can be improved by variable selection, with the prediction coefficient of determination increasing to 0.7915, the root mean square error of prediction decreasing to 0.5810 and the prediction bias decreasing to 0.2627. Combining the model update strategy, the root mean square error of prediction and the prediction bias were further reduced. Model updating using only 20 samples has already led to a significant improvement in model performance, with the prediction coefficient of determination improving to 0.8506, the root mean square error of prediction decreasing to 0.4358 and the prediction bias decreasing to 0.1045, the result that is useful for robust near infrared spectroscopy modelling of a wide range of fruits.

引言
  • 苹果营养价值高且易吸收,被公认为是营养程度最高的水果之一[1]。在众多种植苹果的国家中,中国的产量和消费量均排在首位,消费者对苹果品质要求也日益增加。苹果的糖度含量(soluble solids content, SSC)(单位采用°Brix, 表示每百克混合物中含糖多少克)是苹果品质的一个重要指标,为评价苹果口感和营养价值提供参考,并作为苹果成熟度判断的重要依据。检测苹果的SSC对于消费者挑选高品质的水果和种植者较为准确地把握采收时间具有指导作用[2]。目前,实际生产中多采用折射仪来获取水果的SSC[3-4],该方法的最大缺点就是有损检测,需要破坏被检测对象,很难满足生产需求。近红外光谱检测技术具有无损、无需预处理、分析迅速、污染程度低、价格低廉等诸多优点,成为了近年来的研究热点。目前已经有对苹果、脐橙、柑橘等[5-7]水果的研究报道。

    然而,基于近红外光谱数据开发的模型通常仅适用于单个水果批次,在不同批次上测试时则表现不佳[8]。其性能不佳的一个原因可能是现有的模型不是最优的。由于近红外数据是多个重叠峰的混合,有时很难提取有用信息,可能会导致次优模型[9]。先进的变量选择方法是对偏最小二乘(partial least squares, PLS)模型的补充,以进一步优化基于PLS的模型[10]。变量选择能够剔除无用信息,提高运算速度,得到稳健性好的模型。在化学计量学领域,存在两种主要类型的变量选择方法,即波段选择和波长选择。波段选择方法在信号上选择最能预测响应变量的子区域, 当数据具有连续变量时,如近红外光谱,基于波段的方法是有用的[11]。波长选择方法利用各种变量搜索方式和评判变量权重的参数来获得最佳的子集。

    模型在应用于新批次时性能不佳, 解决的方法之一是模型更新。模型更新方法是通过从新预测集选取有代表性的样品添加到原始校正集,并用于新批次的预测,可以使得更新后的模型具有更好的稳健性[12]。PEIRS等人[13]将不同年份的苹果加入苹果SSC预测模型中,明显提高了模型的准确性,预测均方根误差(root mean square error of prediction, RMSEP)从2.92降到了0.95。LOUW等人[14]发现加入多个品种的样本后,李子品质预测模型的通用性得到增强,总可溶性固形物的预测模型的决定系数R2=0.959,RMSEP为0.453。TANG等人[15]提出了基于主动学习的玉米种子纯度检测模型更新,结果表明,该方法提高了对新样品的预测精度。HUANG等人[16]利用增量支持向量数据描述进行模型更新,在线更新最小二乘支持向量机(least squares-support vector machine, LS-SVM)模型,结果表明,高光谱成像结合模型更新能够有效鉴定不同年份种子。

    通常,模型更新会在减少偏差和误差方面改善模型性能。然而模型更新的主要缺点是它需要新的样本,此外,还不清楚模型更新需要多少新样本。本文作者提出一种苹果SSC预测模型优化策略。在开发稳健的模型时,首要目标应该是优化PLS模型[17],使其可以在不同批次上以可接受的性能使用,而不需要模型更新和额外的测量;其次,如果优化后的模型性能仍然较差,则应进行模型更新以提高模型性能。此外,本文中还探讨了模型更新和变量选择的结合,以确定执行模型更新所需的最小样品数量。

1.   材料与方法
  • 实验中所用苹果采摘于6个不同的红富士苹果农场,分别来自山东烟台、山东栖霞、甘肃静宁、陕西礼泉、陕西洛川和新疆阿克苏6个产地,每个产地选取40个苹果,共计240个, 并等分为两批。先采集第1批苹果的数据,第2批苹果在冷库中储存一段时间后再进行测量。苹果送到实验室后,存放24 h,沿赤道部位在4等分处给每个样品的4个面进行标号。采集标号处的光谱信息和SSC含量。

  • 通过水果动态在线分选设备采集苹果的光谱[18],设备结构如图 1所示。由10个100 W/12 V的卤素灯列于两侧组成光源,采用QE65Pro型光谱仪(Ocean optics, USA),波长范围为350 nm~1150 nm,设计参数为:积分时间100 ms,检测速率0.5 m/s。水果被放置在果杯上,仪器内部卤素灯发出的光透过苹果样品,被果杯下部的光纤探头接收,苹果样品的内部信息被存储在计算机用于后续分析。图中,PLC(programmable logic controller)为可编程逻辑控制器。

    Figure 1.  Fruit dynamic online sorting equipment

  • 采集苹果样品的近红外光谱后,沿苹果的赤道切下1 cm厚的切片,随后根据标号将其分成4等份。使用折射式数字糖度计PAL-1(ATAGO,Japan)测定提取的苹果汁SSC含量。重复测量3次取平均值作为SSC含量真值。

  • 本文中利用Unscrambler软件对苹果光谱进行预处理,使用多元散射校正(multiplicative scatter correction, MSC)和Savitzky-Golay(S-G)卷积平滑消除颗粒分布不均匀及颗粒大小产生的散射影响并消除噪声,提高信噪比。从图 2可以看出,光谱经预处理后不同样品间的光谱强度差别明显减小,消除其它背景的干扰。再使用偏最小二乘法建立苹果SSC检测模型。

    Figure 2.  Apple spectrum of diffuse transmission

    为提高模型运算速度和精度,进行光谱信息变量选择是十分必要的。使用后向区间偏最小二乘法(backward interval partial least squares, BIPLS)和竞争性自适应重加权算法(competitive adaptive reweighted sampling, CARS)筛选光谱变量,建立PLS模型。

    为提高模型在不同批次的苹果上的预测性能,对模型进行更新。模型的更新是通过将新批次中的一些样本加入到旧批次中并重新校准来实现的。采用Kennard-Stone(K-S)算法从新批次中分别挑选5个样品、10个样品、15个样品和20个样品进行模型更新。这样做是为了了解足够提高模型性能的最小样本数量。

    采用RMSEP、预测决定系数Rp2和偏差B等统计参数来评价模型的优劣。

2.   结果与分析
  • 表 1中列出了两批苹果的SSC测量值。两批苹果的测量方法相同。第1批和第2批的SSC分别在7.80 °Brix~15.10 °Brix和8.70 °Brix~16.10 °Brix范围内。第1批苹果作为校正集,第2批苹果作为预测集。

    batch minimum/°Brix maximum/°Brix average value/°Brix standard deviation
    1 7.80 15.10 12.59 1.34
    2 8.70 16.10 13.07 1.18

    Table 1.  Statistical results of apple SSC

  • 由于样品对不同频率近红外光的选择性吸收,通过样品后的近红外光线在某些波长范围内会变弱,光谱前端(350 nm~600 nm)和后端(850 nm~1150 nm)存在一些噪声信号,有效信息少,故将有效波长范围定为600 nm~850 nm。两批苹果的光谱相似,仅光谱强度存在差异;在645 nm处的波峰与果皮颜色有关,675 nm处波谷受叶绿素的影响[19],758 nm处波谷受O—H伸缩振动的倍频吸收影响[20]。采用多元散射校正和S-G卷积平滑(平滑点数为3)组合作为光谱预处理方法来消除其它背景的干扰。图 2a图 2b分别为两批苹果的原始漫透射光谱和预处理后的光谱图,预处理后的光谱消除了散射影响和噪声,光谱差异明显减小,减小外界信息的干扰。

  • 在600 nm~850 nm范围内,经过预处理后建立SSC预测模型,结果见图 3。其中标准决定系数Rc2和预测决定系数Rp2分别为0.8989和0.7151。与标准场方根误差(root mean square error of criterion, RMSEC)相比,RMSEP明显增加到0.6281,且存在较大的预测偏差0.3649,表明在第1批上训练的模型不适用于第2批。造成这种结果的原因可能是苹果保存时间的不同,导致其内部理化性质的改变。

    Figure 3.  Scatter plot of PLS modeling prediction results for full spectrum

  • 利用BIPLS将光谱波段划分为等间隔的子区间建立PLS回归模型,采用10~25个间隔数,选出RMSEC值最小的子区间组合,表 2为不同区间个数的BIPLS模型选取结果。当区间个数为14时,RMSEC最小。

    number of intervals RMSEC number of subinterval combinations number of variables
    10 0.4618 6 334
    11 0.4737 9 273
    12 0.4707 7 194
    13 0.4673 7 179
    14 0.4480 7 166
    15 0.4656 12 266
    16 0.4593 7 145
    17 0.4584 10 197
    18 0.4577 10 184
    19 0.4691 15 264
    20 0.4532 11 183
    21 0.4603 15 238
    22 0.4647 14 211
    23 0.4565 16 232
    24 0.4532 11 152
    25 0.4590 11 147

    Table 2.  Division results of the total number of different intervals

    利用全部子区间建模,并根据表现依次去除较差子区间,由表 2可知, RMSEC最好为0.4480,使用7个子区间建模。所选区间分别为第3、4、8、9、11、13、14子区间,对应波长区间为637.1 nm~672.7 nm、727.6 nm~ 762.8 nm、781.4 nm~798.5 nm、817 nm~850.2 nm共计166个变量,对筛选的子区间变量建模, 结果如图 4所示。Rc2=0.8802,Rp2=0.7788,RMSEC为0.4649,RMSEP为0.5984,B=-0.3341。与未选择变量的PLS模型相比,Rp2增加,RMSEP和B降低,模型性能有所改善。

    Figure 4.  Prediction results of BIPLS model

  • 图 5显示了使用CARS进行变量选择过程。选择的变量数随着采样次数的增加逐渐减少,采样次数为36时, RMSEC值最小为0.4113,对应的变量数为55个,采样次数继续增加,RMSEC随之增加。

    Figure 5.  Selection results of CARS variables of SSC of apple samples

    对筛选后的变量建模结果如图 6所示。其Rp2增加到0.7915,RMSEP减少到0.5810,B减少到0.2627。使用CARS进行变量选择剔除了光谱中的冗余信息,简化模型,提高了模型性能。

    Figure 6.  PLS model results based on 55 CARS preferred variables

  • 使用K-S算法从第2批苹果分别挑选出5个、10个、15个、20个苹果进行模型更新,模型更新总体上提高了BIPLS模型的性能,结果见表 3。随着更新样品数量的增加模型性能得到提高,用20个样品更新模型得到了最佳的性能。更新后的模型Rp2从0.7788增加到0.8169,RMSEP从0.5984降低到0.4866,B从-0.3841降低到0.1146。

    BIPLS Rp2 RMSEP B
    no new sample 0.7788 0.5984 -0.3841
    5 samples from batch 2 0.7809 0.5610 0.2368
    10 samples from batch 2 0.7975 0.5333 0.1779
    15 samples from batch 2 0.8131 0.5079 0.1218
    20 samples from batch 2 0.8169 0.4866 0.1146

    Table 3.  Results of BIPLS combined with model update

    与BIPLS建模一样,CARS建模更新后的模型的预测效果有所改善,如表 4所示。用20个样品更新模型得到了最佳的性能。与未更新的糖度预测模型相比,Rp2从0.7915增加到0.8506,RMSEP从0.5810降低到0.4358,B从0.2627降低到0.1045。

    CARS Rp2 RMSEP B
    no new sample 0.7915 0.5810 0.2627
    5 samples from batch 2 0.8361 0.4672 0.1828
    10 samples from batch 2 0.8457 0.4583 0.1602
    15 samples from batch 2 0.8501 0.4358 0.1159
    20 samples from batch 2 0.8506 0.4358 0.1045

    Table 4.  Results of CARS combined with model update

3.   结论
  • 新鲜水果的近红外光谱模型在新一批水果中检测时缺乏稳健性。这个问题仍未有一个明确的解决方案。目前,变量选择的使用已经广泛应用于建立稳健的近红外光谱模型,在此基础上采用新样本更新模型,并结合变量选择,进一步提高了模型的性能。结果表明,在选定的区域或特定波长上开发的模型可以提高模型的性能。此外,将与少量新样本的模型更新相结合,可以进一步降低RMSEP和偏差B

    对于单个批次的样本应通过变量选择优化模型,使其在不同批次上以可接受的性能使用,而不需要模型更新和额外的测量。如果变量选择不能提高模型在新批次上的性能,应使用新批次中的一些样本来更新模型以提高模型性能。本文中的结果表明,变量选择结合模型更新能够建立稳健的苹果近红外光谱模型,在不同批次使用时表现很好。与BIPLS方法相比,CARS方法更能提高模型性能,将Rp2增加到0.7915,RMSEP降至0.5810,B降至0.2627。此外,基于K-S算法更新后的模型相比于未更新前的模型性能有了明显的提高,随着添加的样本数增多,更新后的模型对新样本的预测精度也逐渐提高,仅使用20个样品进行模型更新就极大地降低了SSC预测模型的RMSEP和B。基于变量选择和模型更新改进苹果糖度预测模型具有可行性。本文作者所提出的方法可用于其它水果建立稳健的近红外光谱模型。

Reference (20)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return