DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0029] FIG. 1 illustrates the architecture of the computer system for providing an apparatus for modeling and controlling a process. The hybrid analyzer of FIG. 1 preferably operates on a general purpose computer system such as an Alpha workstation, available from Digital Equipment Corporation. The Alpha workstation is in turn connected to appropriate sensors and output drivers. These sensors and output drivers are strategically positioned in an operating plant to collect data as well as to control the plant. The collected data is archived in a data file 110 ( FIG. 2 ) for training purposes. The data collected varies according to the type of product being produced. For illustrative purposes, FIG. 1 shows the architecture of the computer supporting the process control apparatus of the present invention and its relationship to various sensors and output drivers in a representative plant. In the embodiment disclosed here, the representative plant is a refinery or a chemical processing plant having a number of process variables such as temperature and flow rate variables. These variables are sensed by various instruments. It should be understood that the present invention may be used in a wide variety of other types of technological processes or equipment in the useful arts.
[0030] In FIG. 1 , the collected data include various disturbance variables such as a feed stream flow rate as measured by a flow meter 32 , a feed stream temperature as measured by a temperature sensor 38 , component feed concentrations as determined by an analyzer 30 , and a reflux stream temperature in a pipe as measured by a temperature sensor 71 . The collected data also include controlled process variables such as the concentration of produced materials, as measured by analyzers 48 and 66 . The collected data further include manipulated variables such as the reflux flow rate as set by a valve 80 and determined by a flow meter 78 , a reboil steam flow rate as set by a valve 60 and measured by a flow meter 58 and the pressure in a tank as controlled by a valve 86 .
[0031] These sampled data reflect the condition in various locations of the representative plant during a particular sampling period. However, as finite delays are encountered during the manufacturing process, the sampled data reflects a continuum of the changes in the process control. For instance, in the event that a valve is opened upstream, a predetermined time is required for the effect of the valve opening to be reflected in the collected variables further downstream of the valve. To properly associate the measurements with particular process control steps, the collected data may need to be delayed, or time-shifted, to account for timings of the manufacturing process. According to the present invention, this is done in a manner set forth below.
[0032] The measured data collected from analyzers 30 , 48 , 58 , 66 , and 78 and sensors 32 , 38 and 71 are communicated over a communications network 91 to an instrumentation and control computer 90 . The measured data can further be transferred from the instrumentation computer 90 to another process control workstation computer 92 via a second communication network 87 . The instrumentation computer 90 is connected to a large disk 82 or other suitable high capacity data storage devices for storing the historical data file 110 ( FIG. 2 ), as collected using the previously described sensors and output drivers. Further, the process control workstation computer 92 is connected to a large storage disk 80 to store data. In addition to storing data, the disks 80 and 82 also store executable files which, upon execution, provide the process control capability.
[0033] The computers 90 and 92 are preferably high performance workstations such as the Digital Equipment Alpha workstations or SPARC workstations, available from Sun Microsystems, or high performance personal computers such as Pentium-Pro based IBM compatible personal computers. Further, the computer 90 may be a single-board computer with a basic operating system such as the board in the WDPF II DPU Series 32 , available from Westinghouse Corporation. Additionally, each one of the computers 90 and 92 may operate the hybrid analyzer of the present invention alone, or both computers 90 and 92 may operated as distributed processors to contribute to the real-time operation of the hybrid analyzer of the present invention.
[0034] In FIG. 1 , the workstation computer 92 can be configured to store the historical data acquired by the instrumentation computer 90 into a data file 110 ( FIG. 2 ) on the disk 80 and further executes a hybrid run-time model 122 of FIG. 2 for process control purposes. The output values generated by the hybrid run-time analyzer 122 on the process control workstation computer 92 are provided to the instrumentation computer 90 over the network 87 . The instrumentation computer 90 then sends the necessary control commands over the communications network 91 to one or more valve controllers 32 , 60 and 80 to turn on and off the valves appropriately to cause various process changes. Alternatively, the instrumentation computer 90 can store the historical data file 110 on its disk drive 82 and further execute the hybrid run-time analyzer 122 in a stand-alone mode. Collectively, the computer 90 , the disk 82 , and various sensors and output drivers form a distributed control system (DCS) 124 , as shown in FIG. 2 .
[0035] Turning now to FIG. 2, a diagram showing the development and deployment of the hybrid analyzers or models 114 and 122 is shown. It is to be noted that the hybrid analyzers, or hybrid models 114 and 122 , are preferably implemented as software which is executed on the computer 90 individually, the computer 92 individually, or a combination of computers 90 and 92 . Further, although the disclosed embodiments are implemented as software routines, the present invention contemplates that the analyzers can also be implemented in hardware using discrete components, application specific integrated circuits, or field programmable gate array devices.
[0036] In the analyzer of FIG. 2 , historical data from sensors and output drivers 30 , 32 , 38 , 48 , 58 , 60 , 66 , 71 , 78 , 80 and 86 are stored in the data file 110 on the disk 82 . The data file 110 preferably contains three types of variables: manipulated variables (MVs), disturbance variables (DVs), and controlled variables (CVs). Manipulated variables are variables which a plant operator can manipulate to control and affect changes in the process. Disturbance variables are variables such as those from unexpected changes which are beyond the operator's control at which may be outputs of prior processes, and controlled variables are the variables that the process control is trying to control such as a certain product consistency, feed temperature, or feed level, among others. The historical data stored in the data file 110 is preferably collected from various sampling points in an operational plant, with the MVs, DVs and CVs as the basic data elements for training the hybrid analyzer or model 100 for process control purposes. The data file 110 is preferably archived in a large capacity data storage device such as the disk 80 in the process control workstation computer 92 and/or the disk 82 of the instrumentation computer 90 .
[0037] In FIG. 2 , the MVs and DVs are provided to a delay and variable selection module 112 . The module 112 delays, or offsets, certain input variables in time to emphasize that the sampling of certain variable measurements can occur at different points in the process to be controlled. The delays asserted in the module 112 compensate for the differentials caused by having a measurement upstream of another measurement, as previously discussed. The output of the delay and variable selection module 112 is provided to a hybrid development analyzer or model 114 .
[0038] The hybrid development analyzer or model 114 receives input variables 113 as well as target output variables 115 , including the CVs. The data variables 113 and 115 may further be suitably screened by a data selection apparatus such as that discussed in a co-pending patent application having application Ser. No. ______, entitled “APPARATUS AND METHOD FOR SELECTING A WORKING DATA SET FOR MODEL DEVELOPMENT” and commonly assigned to the assignee of the present invention, hereby incorporated by reference. The hybrid development analyzer or model 114 in FIG. 2 has two analyzers or models operating in parallel, a primary analyzer or model 132 ( FIG. 4 ) and an error correction analyzer or model 136 ( FIG. 4 ), both receiving the same set of data variables. The hybrid development analyzer or model 122 is trained using the procedures discussed below. The output of the hybrid development analyzer or model 114 is provided to a model parameter module 118 for embodying the parameters derived during the training process to be used by a hybrid run-time analyzer or model 122 . Also, the output from the delay and variable selection module 112 is also provided to a second delay and variable settings module 116 which embodies the delays and variable adjustments made during training. Thus, the modules 116 and 118 embody the knowledge gained from the training process in setting the run-time model variables.
[0039] From the delay and variable settings module 116 , data is provided to a run-time delay and variable selection module 120 . Further, from the model parameter module 118 , the data is provided to the hybrid run time analyzer or model 122 . The output of the hybrid run time analyzer or model 122 is provided to a distributed control system (DCS) 124 . The DCS system 124 supervises the control and data acquisition process in the plant. Typically, the DCS system 124 provides distributed processing units, control units, a console for interacting with the DCS components, and a data historian, or data repository, which provides for data collection and archival storage of plant historical data. Typical data archived by the data repository include various process variable status, alarm messages, operator event messages, sequence of events data, laboratory data, file data, and pre- and post-event data. The collected data are stored typically in a temporary file before they are transferred to a permanent storage device such as an optical disk, a removable magnetic disk, or magnetic tapes. Data is thus collected and archived by the distributed control system 124 and forwarded to the run-time delay and variable selection module 120 which delays, or shifts, certain data before it is presented to the run time analyzer or model 122 . The output of the run-time analyzer 122 may be all or a portion of final or intermediate process variables which are selected or defined by the user.
[0040] In FIG. 2 , the analyzer training or development is performed by the delay and variable selection module 112 and the hybrid development analyzer or model 114 . The selection module 112 performs the variable selection process where some or all of the variables are picked. Further, the picked variables may be time-shifted to account for the delays encountered during the manufacturing process, as discussed earlier. Additionally, the selection module 112 can sample the data variables on a continuous basis, or it can sample the data variables after each pre-determined delay time periods. The sampling delay period can either be user selectable, or it can be automatically determined. In one embodiment, the delay period is determined using a genetic algorithm of the type known to those skilled in the art. A suitable genetic algorithm, for example, is generally discussed in an article by applicant Casimir C. Klimasauskas in “Developing a Multiple MACD Market Timing System”, Advanced Technology for Developers, Vol. 2, pp3-10 (1993).
[0041] After processing training data stored in the data file 110 , the delay and variable selection module 112 stores delay and variable settings in the module 116 . Similarly, the model parameter module 118 stores configuration data of the hybrid analyzer based on the training of the hybrid development analyzer 114 .
[0042] During the operation of the process control system, the data stored in the delay and variable settings module 116 are loaded into the delay and variable selection module 120 . Similarly, the data from the model parameter module 118 are loaded into the hybrid run-time analyzer or model 122 . Once the configuration data and parameters have been loaded into modules 120 and 122 , the process control is ready to accept data from the DCS 124 .
[0043] Turning now to FIG. 3 , the hybrid development analyzer or model 114 includes a primary analyzer or model 132 and an error correction analyzer or model 136 . In FIG. 3 , some or all input variables 113 and target output variables 115 , from the data file 110 are selected and provided to the primary analyzer 132 and the error correction analyzer or model 136 . An output 133 of the primary analyzer or model 132 and the target output variable 115 are provided to a subtractor 140 to compute the residual, or difference, between the output of the primary analyzer 132 and the target output variable 115 .
[0044] The output of the subtractor 140 is provided as the target output of the error correction analyzer or model 136 . An output 137 of the error correction analyzer or model 136 is provided to one input of an adder 138 . The other input of the adder 138 is connected to the output 133 of the primary analyzer or model 132 . The adder 138 generates a corrected output 139 by summing the primary output 133 with the error correction model output 137 . The parameters estimated in the models 132 and 136 are provided to the model parameter module 118 of FIG. 2 . The data stored in the model parameter module 118 is subsequently provided as the parameters of the hybrid run-time analyzer or model 122 to provide process control during the run-time mode of the system.
[0045] Turning now to FIG. 4 , the details of the hybrid run-time analyzer or model 122 are disclosed. Similar to the hybrid development analyzer or model 114 , the hybrid run-time analyzer or model includes a primary analyzer or model 130 and an error correction analyzer or model 131 . The internal configuration and parameter settings of the primary analyzer or model 130 and the error correction analyzer or model 131 are provided by the model parameter module 118 . The output of the run-time primary analyzer 130 and the output of the run-time error correction analyzer 131 is provided to an adder 134 . The adder 134 generates a corrected output by summing the output of the primary run-time analyzer 130 with the output of the run-time error correction analyzer 131 . The output of the adder 134 is provided as the input to the DCS 124 of FIG. 2 .
[0046] FIG. 4A shows an alternate embodiment of FIG. 4 . In FIG. 4A, a number of elements are common to those of FIG. 4 . Thus, identically numbered elements in FIGS. 4 and 4 A bear the same description and need not be discussed. In FIG. 4 A, the output 105 of the adder 134 is presented to an adaptive filter 310 to adjust the composite model output from the adder 134 to account for measurement offsets. A number of conventional adaptive filters may be used, including a Kalman filter as known to those skilled in the art and disclosed in G. V. Puskorius and L. A. Feldkamp, “Decoupled Extended Kalman Filter Training of Feedforward Layered Networks”, IEEE Journal, (1991). The adaptive filter 310 also receives as input the controlled variables, among others. Additionally, the output 105 is further presented to a scaling and offset module 312 . The module 312 performs a multiply and cumulate operation on the output of the adaptive filter 310 and the output 105 to generate a corrected, filtered output which more accurately reflects the process dynamics.
[0047] The details of the primary analyzer or model 132 will be discussed next. The primary analyzer 132 is preferably a data derived linear analyzer or model. The linear model is advantageous in that process engineers can quantify the relationship between the input variables and the output variables. Thus, process engineers can extrapolate the input data. Further, the primary analyzer or model 132 is data derived such that no prior knowledge of a first principle is necessary. Preferably, the primary analyzer or model 132 is a partial least squares model.
[0048] In chemometrics, partial least squares (PLS) regression has become an established tool for modeling linear relations between multi-variate measurements. As described in Paul Geladi and Bruce R. Kowalski, “Partial Least-Squares Regression: A Tutorial”, Analytica Chimica Acta, Vol. 185, pp. 1-17 (1986), the PLS approach typically uses a linear regression model which relates the model inputs to the outputs through a set of latent variables. These latent variables are calculated iteratively and they are orthogonal to each other. As a result, compared to other linear regression models, the PLS model works well for the cases where input variables are correlated and the data are sparse.
[0049] In the PLS model, the regression method compresses the predicted data matrix that contains the value of the predictors for a particular number of samples into a set of latent variable or factor scores. By running a calibration on one set of data (the calibration set), a regression model is made that is later used for prediction on all subsequent data samples. To perform the PLS regression, input and output data are formulated as data matrices X and Y respectively:
1
[0050] where each row is composed of one set of observations and N is the number of sets of observations. The PLS model is built on a basis of data transformation and decomposition through latent variables. The input data block X is decomposed as a sum of bilinear products of two vectors, t h and p′ h , in addition to a residual matrix E:
2
[0051] where P′ is made up of the p′ as rows and T of the t as columns. Similarly, the output data block Y is composed as
3
[0052] where Q′ is made up of the q′ as rows and U of the u as columns, in addition to a residual matrix F. Further, t h and u h are called score vectors of the h-th factor, p h and q h are called loading vectors corresponding to these factors. These vectors are computed such that the residual matrices E and F are minimized.
[0053] The PLS model builds a simplified regression model between the scores T and U via an inner relation:
u h =b h t h +e
[0054] where b h is a coefficient which is determined by minimizing the residual e. Under that case, the regression model is
Y′=x′W ( P′W ) −1 BQ′
[0055] where W is a weighting matrix used to create orthogonal scores and B is a diagonal matrix containing the regression coefficients b h .
[0056] Turning now to FIG. 5 , the routine to train or develop the PLS primary analyzer or model 132 is disclosed. In step 200 , the input variables are scaled such that the input data X and the output data Y are preferably mean-centered and fitted into a unit-variance as follows:
4
[0057] where
[0058] and
5
[0059] with
[0060] Next, the variables E, F, and H are initialized in step 202 by setting E 0 =X, F 0 =Y, and h=1. Further, the processing of each latent component h is performed in steps 206 - 226 .
[0061] In step 206 , one column of Y is used as a starting vector for u such that u h =y j . Next, in the X block, the value of w′ is calculated in step 208 as:
w′ h =u′ h E h-1 /∥u′ h E h-1 ∥
[0062] In step 210 , t h is calculated from E h-1 and w′ h :
t′ h =E h-1 w h
[0063] Next, in the Y block, q h is calculated from F h-1 and t h in step 212 as follows:
q′ h t′ h F h-1 /∥t′ h F h-1 ∥
[0064] In step 214 , u h is updated by the following equation:
u h F h-1 q h
[0065] Next, in step 216 , the routine checks for convergence by examining whether if the current t h is equal to the previous t h , within a certain predetermined rounding error. If not, the routine loops back to step 206 to continue the calculations. Alternatively, from step 216 , if the current t h is equal to the previous t h , the routine calculates the X loadings and obtains the orthogonal X block scores in step 218 . The score is computed as follows:
p′ h =t′ h E h-1 /t′ h t h
[0066] p h is then normalized such that:
p′ h — new =p h — old /∥p′ h — old ∥;
t h — new =t h — old /∥p′ h — old ∥
w′ h — new =w′ h — old /∥p′ h — old ∥
[0067] where p h ′, q h ′ and w h ′ are the PLS model parameters that are saved for prediction by the run-time model; t h and u h are scores that are saved for diagnostic and/or classification purposes.
[0068] Next, in step 220 , the routine finds the regression coefficient b for the inner relation:
b h =u′ h t h ′/t h ′t h
[0069] Further, the routine of FIG. 5 calculates the residuals in step 222 . In step 222 , for the h component of the X block, the outer relation is computed as:
E h =E h-1 −t h p h ; E 0 =X
[0070] Further, in step 222 , for the h component of the Y block, the mixed relation is subject to:
F n =F h-1 b h t h q′ h ; F 0 =Y
[0071] Next, the h component is incremented in step 224 . In step 226 , the routine checks to see if all h components, or latent variables, have been computed. If not, the routine loops back to step 206 to continue the computation. Alternatively, from step 226 , if all h components have been computed, the routine exits. In this manner, regression is used to compress the predicted data matrix that contains the value of the predictors for a particular number of samples into a set of latent variable or factor scores. Further, by running a calibration on one set of data (the calibration set), a regression model is made that is later used for prediction on all subsequent sample.
[0072] The thus described process of FIG. 5 builds a PLS regression model between the scores t and u via an inner relation
u h =b h t h +e
[0073] where b h is a coefficient which is determined by minimizing the residual e. Under that case, the regression model is
y′=x′W ( P′W ) −1 BQ′
[0074] Upon completion of the process shown in FIG. 5 , the parameters are stored in the model parameter module 118 ( FIG. 2 ) for subsequent utilization by the run-time primary analyzer or model 130 ( FIG. 4 ).
[0075] In addition to the aforementioned, the present invention contemplates that the PLS analyzer further accepts filtered variables which better reflect the process dynamics. Additionally, the present invention also contemplates that the primary analyzer or model 132 can compute the derivative of the output 133 and then providing the derivative output to an integrator which outputs second predicted variables. Further, it is also contemplated that the primary analyzer or model 132 can apply splines to map the latent variables to the output variables. In certain applications, the primary analyzer may also accept prior values of the predicted values as inputs, or prior errors between the predicted target outputs as additional inputs.
[0076] Attention is now directed to the error correction analyzer or model 136 which captures the residual between the primary analyzer or model 132 output and the target output. In the present invention, the neural network serves as a compensator rather than a whole process model for the prediction and other purposes. The same architecture is used for the error correction analyzers 131 and 136 . Thus, the description of the neural network applies to both error correction analyzers 131 and 136 . In the embodiment of FIG. 6, a back-propagation neural network is used as the error correction analyzer or model 131 . In certain applications, the error correction analyzer may also accept prior values of the predicted values as inputs, or prior errors between the predicted target outputs as additional inputs.
[0077] In the embodiment of FIGS. 7 - 8 , a neural network PLS model is used as the error correction analyzer or model 131 . As the error correction analyzers or models 131 and 136 are structurally identical, the description of the neural network PLS error correction analyzer or model 131 applies equally to the description of the neural network PLS error correction analyzer or model 136 .
[0078] FIG. 6 illustrates in more detail a conventional multi-layer, feedforward neural network which is used in one embodiment of the present invention as the error correction analyzer for capturing the residuals between the primary analyzer or model 132 output and the target output 115 . The neural network of FIG. 6 has three layers: an input layer 139 , a hidden layer 147 and an output layer 157 . The input layer 139 has a plurality of input neurons 140 , 142 and 144 . The data provided to the input layer 139 of the neural network model are the same as that supplied to the primary analyzer or model 132 , including the MVs and DVs.
[0079] Although the identical variables provided to the PLS analyzer of FIG. 3 can be used, the present invention contemplates that the input variables may be filtered to using techniques such as that disclosed in U.S. Pat. No. 5,477,444, entitled “CONTROL SYSTEM USING AN ADAPTIVE NEURAL NETWORK FOR TARGET AND PATH OPTIMIZATION FOR A MULTIVARIABLE, NONLINEAR PROCESS.” Alternatively, a portion of the variables provided to the primary analyzer 132 is provided to the input layer 139 . Additionally, certain latent variables generated by the primary analyzer 132 can be provided to the input layer 139 . The latent variables can further be filtered, as previously discussed. The error correction analyzer may also use additional process variables which are available, but not used in the primary analyzer. These variables may be used directly or they may further be filtered to capture the process dynamics.
[0080] Correspondingly, the hidden layer 147 has a plurality of hidden neurons 148 , 150 , 152 , and 154 , while the output layer 157 has a plurality of output layer neurons 158 , 160 and 162 . The output of each input neuron 140 , 142 or 144 is provided to the input of each of the hidden neurons 148 , 150 , 152 , and 154 . Further, an input layer bias neuron 146 is connected to each of the hidden layer neurons 148 , 150 , 152 and 154 . Similarly, the output of each of the hidden layer neurons 148 , 150 , 152 and 154 is provided to the input of the each of the output layer neurons 158 , 160 and 162 . Further, a hidden layer bias neuron 156 generates outputs which are individually provided to the input of each of the output layer neurons 158 , 160 and 162 . The outputs of the neural network of FIG. 6 are trained to predict the residuals or errors between the output of the primary model and the target variables. Additionally, the input neurons 140 , 142 and 144 may be connected to each of the output units 158 , 160 and 162 .
[0081] The neural network of FIG. 6 is preferably developed using matrix mathematical techniques commonly used in programmed neural networks. Input vectors presented to neurons 140 , 142 and 144 are multiplied by a weighting matrix for each of the layers, the values in the weighting matrix representing the weightings or coefficients of the particular input to the result being provided by the related neuron. An output vector presented to neurons 148 , 150 , 152 and 154 and propagated forward is from the sum of the matrix multiplications. Thus, the input layer 139 of FIG. 6 uses the inputs to the neurons 140 , 142 and 144 , along with the value of the bias neuron 146 , as the input vector and produces an output vector which is then used as the input vector for the hidden layer 147 . The outputs of the hidden layer neurons 148 , 150 , 152 and 154 , as well as the bias neuron 156 , are further used to produce an output vector which is used as the values in neurons of the output layer 157 . Preferably, the neurons in the neural network use a hyperbolic transfer function such as (E x −E −x )÷(E x +E −x ) for x values in the range of minus infinity to positive infinity.
[0082] The neural network of FIG. 6 may be trained through conventional learning algorithms well known to those skilled in the art such as the back-propagation, radial basis functions, or generalized regression neural networks. The neural network is trained to predict the difference between the primary model predictions and the target variables. The outputs are obtained by running the primary model over all available data and calculating the difference between the outputs of the primary model and the target variables for each data point using the neural network training process. Thus, the neural network of FIG. 6 learns how to bias the primary model to produce accurate predictions.
[0083] Further, in the event that the primary analyzer 132 deploys a derivative calculator at the output 133 , the neural network of the error correction analyzer 136 can be trained to predict the error in the derivative of the output 133 of the primary analyzer 132 . Similarly, if the primary analyzer 132 further deploys an integrator to integrate the output of the derivative calculator, the neural network of the error correction analyzer 136 can be further trained to predict the error in the integrated value of the derivative of the output 133 .
[0084] FIG. 7 shows an alternative to the neural network analyzer or model of FIG. 6 , called a neural network partial least squares (NNPLS) error correction analyzer or model. Although highly adaptable, the training a high dimension conventional neural network such as that of FIG. 6 becomes difficult when the numbers of inputs and outputs increase. To address the training issue, the NNPLS model does not directly use the input and output data to train the neural network. Rather, the training data are processed by a number of PLS outer transforms 170 , 180 and 190 . These transforms decompose a multivariate regression problem into a number of univariate regressors. Each regressor is implemented by a small neural network in this method. The NNPLS of FIG. 7 can typically be trained quicker than a conventional multilayer feedforward neural network. Further, the NNPLS reduction of the number of weights to be computed reduces the ill-conditioning or over-parameterized problem. Finally, the NNPLS faces fewer local minima owing to the use of a smaller size network and thus can converge to a solution quicker than the equivalent multilayer neural network.
[0085] Turning now to FIG. 7 , the schematic illustration of the NNPLS model is shown in more detail. As the error correction analyzers or models 131 and 136 are structurally identical, the description of the neural network PLS error correction analyzer or model 131 applies equally to the description of the neural network PLS error correction analyzer or model 136 . In FIG. 7, a PLS outer analyzer or model 170 is used in conjunction with a neural network 172 for solving the first factor. Thus, in the combination of the PLS 170 and the neural network 172 , the PLS outer analyzer or model 170 generates score variables from the X and Y matrices. The scores are used to train the inner network analyzer or model 172 . The neural network 172 can be multilayer feed forward networks, radial basis functions, or recurrent networks. The output of the neural network 172 is applied to the respective variables X and Y using the summing devices 176 and 174 respectively. The outputs from the summer 174 , F 1 , and 176 , E 1 , are provided into the next stage for solving the second factor solution.
[0086] In the analyzer or model of FIG. 7 , the outputs of the first PLS outer model 170 and the neural network 172 , F 1 and E 1 , are provided to a second combination including a PLS outer model 180 and a neural network 182 . The PLS outer model 180 receives F 1 and E 1 as inputs. The output from the PLS outer model 180 are provided to train the neural network 182 . Further, the outputs of the neural network 182 are provided to summers 184 and 186 to generate outputs F 2 and E 2 , respectively. Further, a number of additional identical stages can be cascaded in a similar manner. At the end of the network of FIG. 7 , the output from the summers generating Fi and Ei are provided to a final PLS outer model 190 . The output of the final PLS outer model 190 is used to train a final neural network 192 .
[0087] As shown, in each stage of the NNPLS of FIG. 7 , original data are projected factor by factor to latent variables by outer PLS models before they are presented to inner neural networks which learn the inner relations. Using such plurality of stages, only one inner neural network is trained at a time, simplifying and reducing the training times conventionally associated with conventional neural networks. Further, the number of weights to be determined is much smaller than that in an m-input/p-output problem when the direct network approach is used. By reducing the number of weights down to a smaller number, the ill-conditioning or over-parameterized problem is circumvented. Also, the number of local minima is expected to be fewer owing to the use of a smaller size network. Additionally, as the NNPLS is equivalent to a multilayer neural network such as the neural network of FIG. 6 , the NNPLS model captures the non-linearity and keeps the PLS projection capability to attain a robust generalization property.
[0088] Referring now to FIG. 8 , an inner single input single output (SISO) neural network representative of each of the neural networks 172 , 182 and 192 ( FIG. 7 ) is shown in greater detail. Preferably, a three layer feed forward neural network with one hidden layer should be used as the inner SISO nonlinear model. Each neuron in the hidden layer of the neural network preferably exhibits a sigmoidal function such as the following centered tanh function:
6
[0089] with which a zero input leads to a zero output. This is consistent with the following specific properties of the PLS inner model:
7
[0090] where u hi and t hi are the ith elements of u h and t h , respectively.
[0091] In FIG. 8 , the input data is presented to an input neuron 272 . The input neuron 272 further stores a weighting factor matrix ω 1 . Also, at the input layer level, an input bias neuron stores a weighting factor matrix β 1 .
[0092] The SISO network of FIG. 8 has a hidden layer having a plurality of hidden neurons 282 , 284 , 286 and 288 . Each of the hidden neurons receives as inputs the summed value of the data presented to the input neuron 272 , as vector-multiplied with the weighting factor matrix ω 1 . Further, each of the hidden neurons receives as an input the value stored in the bias neuron 270 , as vector-multiplied with the weighting factor matrix β 1 . In general, the number of hidden neurons is associated with the complexity of the functional mapping from the input to the output. Too few hidden neurons would under-parameterize the problem, or cause the model to fail to learn all conditions presented to it. Alternatively, too many hidden neurons would result in an over-parameterized model where the neural network is overtrained and suffers from over-memorization of its training data. In the preferred embodiment, a cross-validation or train/test scheme is used to determine the optimal number of hidden neurons.
[0093] Finally, the SISO network of FIG. 8 has an output layer having one output neuron 290 . The output neuron 290 receives as inputs the summed value of the data stored in the hidden neurons 282 - 288 , as vector-multiplied with the weighting factor matrix ω 2 . Further, the output neuron 290 receives as input the value stored in the bias neuron 280 , as vector-multiplied with the weighting factor matrix β 2 . The output of the neuron 290 is Û h . The SISO network of FIG. 8 is thus a smaller network than the conventional neural network which can be trained quicker.
[0094] Due to its small size, the SISO neural network can be trained quickly using a variety of training processes, including the widely used back-propagation training technique. Preferably, the SISO network of FIG. 8 uses a conjugate gradient learning algorithm because its learning speed is much faster than back-propagation approach and the learning rate is calculated automatically and adaptively so that they do not need to be specified before training.
[0095] Prior to training, the SISO network needs to be initialized. When using the preferred conjugate gradient training process, the SISO network will seek the nearest local minimum from a given initial point. Thus, rather than using the conventional random-valued network weight initialization, the preferred embodiment initializes the SISO network using the linear PLS process which takes the best linear model between u h and t h to initialize the first hidden node of the network. Additional hidden nodes is then initialized with small random numbers.
[0096] Turning now to FIG. 9 , the routine for selecting the number of hidden neurons of FIG. 8 is shown. In the preferred training scheme, the available data for modeling are divided into two sets: training data and testing data and then are transformed into corresponding score variables {t h } and {u h }. The inner network training starts with one hidden node in step 292 and is trained using training data set in step 294 . Next, the training routine tests if more hidden neurons are required based on the prediction error on the test data set in step 296 . Additional hidden neurons can be added in step 298 , and the efficacy of the additional neurons can be tested by checking the deviation of the SISO network from the expected results. From step 298 , the routine loops back to step 294 to check the efficacy of the new arrangement. The routine stops adding additional hidden neurons when the best prediction error for the test data set has been obtained from step 296 . By adding a sufficient number of hidden neurons, but not too many, to the network, the optimal number of hidden neurons is achieved.
[0097] Turning now to FIG. 10 , the process for training the NNPLS model of FIG. 7 is shown. The NNPLS model is trained based on a similar framework as the PLS model described previously. In step 230 , the input variables are scaled such that the input data X and the output data Y are preferably mean-centered and fitted into a unit-variance as follows:
x ij =( x ij −{overscore (x)} j )/ S j x
[0098] where
8
[0099] and
y ij =( y ij −{overscore (y)} j )/ S j y
[0100] with
9
[0101] Next, the variables E, F, and H are initialized in step 232 by setting E 0 =X, F 0 =Y, and h=1. Further, the processing of each latent component h is performed in steps 234 - 252 .
[0102] In step 234 , one column of Y is used as a st