[0001] The present invention relates to a computer-implemented system and method to process one or more time-series signals into representative piece-wise discrete trends. Applications include processing prices from financial data such as stocks, options, futures, commodities, currencies, indices, foreign government and municipal securities, bonds, treasury bills, foreign exchange rates, and interest rates; identifying features on a digital electrocardiogram (EKG) output such as anomalous trend lengths, amplitudes, shapes, or coefficients indicative of a problem with the patient; and identifying features from digital signals measured from machinery to investigate and identify problems with the machinery.
[0002] The prior art includes computational procedures for filtering noisy data. Kostelich and Yorke considered the motion of points around a phase space attractor. (Kostelich, Eric. J. and James A. Yorke, 1990, “Noise reduction: finding the simplest dynamical system consistent with the data”,
[0003] U.S. Pat. No. 5,537,368 issued Jul. 16, 1996 to O'Brien Jr. et al., considered the problem of detecting signals from an underwater target. They derived a signal processing system consisting of a series of Kalman filters in parallel to process an incoming signal. Each of the Kalman filters fit a polynomial trend of order i (i=0, 1, 2, . . . ) to a data stream. The coefficients of the polynomial, the data stream, and fit values representing the degree of fit of the polynomial to the data stream were exported to a target motion system to determine the velocity and position of a target.
[0004] U.S. Pat. No. 5,956,702 issued Sep. 21, 1999 to Matsuoka and Golea, noted the problems with using a Kalman filter to process discontinuous changes in the time series trend, and taught a neural network to estimate a discontinuously changing time series trend. Wolberg discussed using non-linear kernel regression to estimate prices at in the future. (Wolberg, John R.,
[0005] The regressions are performed over a window that can either move or stay fixed as new points are added. He introduced the concept of a “growing option”, where the values of new points are included in the regression for predicting the next point. After each point is predicted, it is then included in the prediction of the next point. Wolberg did not discuss discontinuities in the regressions, but did point out that regression models that are created in one period of time may not be useful in the future.
[0006] An embodiment of this invention relates to processing discrete financial time series signals such as stock prices, options prices, futures prices, commodities prices, currency exchange rates, economic and financial indices, foreign government and municipal security prices, bond prices, treasury bill prices, foreign exchange rates, and interest rates. The invention is not restricted to financial applications. In this application, the term “time series” refers to both its literal definition of a string of values, representative of some process continuing through time, each separated by a constant time interval, dt; and a vector of numbers separated by a constant interval or a pair of vectors where one of the vectors contains the location of the data elements in the second vector. In this application, the elements of a time series may be single data points or sets of data points. The vectors data elements don't necessarily have to be equally spaced. For example, a pair of vectors could include the land elevations and locations of those elevations along a trajectory that extends from Dallas to San Francisco.
[0007] Over a period of time, these types of time series exhibit fluctuations, which can be characterized by a plurality of discrete “trends”, where each trend represents a sequence of data elements when the data is generally increasing or decreasing at a certain linear or non-linear rate. Typically the data is “noisy” in that the data points may each deviate from the values defined by the trend. Typically, these trends do not last indefinitely, and it is of great interest to financial managers and others to determine when a trend has ended.
[0008] The current invention addresses the analysis of time series data in a manner that provides useful information about the trends for a particular data set and how accurately those trends track the actual data. The method is based on the observation that while many different sets of discrete trends can be computed for a particular data set, a few of those trend sets are much better at tracking subsequent data than others. In general, the method produces a relatively large number of discrete trend sets for a particular data set; evaluates the relative effectiveness of how well each set tracks the time series over which it was tested, and selects a few of those trend sets that are most effective in that tracking. The method then employs those more effective trend sets to evaluate subsequent data.
[0009] In one embodiment a particular set or vector of time series data may be input to a processor for analysis.
[0010] In an alternative embodiment, the processor may access and analyze multiple sets of time series data to determine one or more sets which meet a selection criteria. Typically this selection criteria is related to relatively strong “trending” over a period of time, and the detection can often be made by a simple first-order, linear regression fit to the data. In that case, frequently the time series datasets that have regressions with larger slopes and higher correlations with the time series garner more interest.
[0011] The selected time series is fed into the processor to derive the optimum parameters for processing the time series. A subset of the time series is used as a “training” set. Inherent in deriving the parameters is the actual processing of the time series into discrete trends. One goal is to find the parameters that minimize the error between the time series and trend values, while maximizing the average length of the trends. Another goal is to find the trends that generate the greatest return from the beginning to the end of the trend. The two goals are not necessarily the same.
[0012] Once the parameters have been selected, they are applied to the time series. In this step, an initial window size (m
[0013] Once the trends have been identified, the trends, the original time series, and other results are exported for further optional processing and to calculate values such as the average length of the trends, the beginning and ending times and/or dates, the beginning and ending data values, and the average of the first derivatives of the trends.
[0014] In applying this tool to multiple financial time series, such as stock prices and interest rate indices, it is apparent that the optimum trend determination parameters and the resulting trends differ from stock to stock. Some stocks are strongly trending, while others display other recognizable characteristics. Some stocks seem to display no apparent patterns after processing with this tool with the options available at this time. This tool has the ability to extract patterns from some seemingly random time series. While the patterns aren't deterministic, there seems to be a character or “personality” in regards to the length and shape of the patterns for many of the stocks processed so far. The trends resulting from different sets of trend determination parameters for a given model illustrate different aspects of that “personality”.
[0015] Specific trends either move up or down or are flat. These different trends identify different “states” of the process. It is reasonable to assume that within a trend that some mathematical methods of prediction or trading might be realized, and that a method of prediction or trading would not necessarily work once the trend had ended. Certain trading systems might work within a flat trend might not work for a trend moving up or down.
[0016] While individual models, each based on a single set of trend determination parameters, may do moderately well in defining when trends begin and end, better models may be achieved by combining the trends from different models to produce a composite model. For example, frequently after a single trend has ended, several very short trends will follow. Some of the trend algorithms described in this application do not do perform well while prices are oscillating within a lateral range. They will produce numerous short trends in these models, which would not be wise to trade. Therefore, several models could be combined so that once a trend ended; the next trend that would be selected would be one that had the most data elements lasted the longest up to that point.
[0017] The methods described in this application only take into account the previous values of a time series or vector to evaluate a new data element to determine whether it is within a trend or not. These methods can be improved to include other information that might affect the prices of a security, including other financial information. A method of doing this is an extension of the regressions described in this application and is multiple regression with additional independent variables.
[0018] One of the outputs of this process is a set of normalized deviations from the dynamic trend. This time series oscillates between positive and negative values and could be used with thresholds as a trading indicator.
[0019] If this method is able to extract a deterministic component of price movement from securities that can be modeled and simulated, then this has ramifications in regards to not only being able to make predictions about the future value of security prices, or to define different “states” in which to trade, but also to how options are priced.
[0020] The above-mentioned and other objects and features of the present invention will become apparent from a reading of the following detailed description with reference to the accompanying drawings, in which:
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033] Selecting a Time Series Data Set
[0034] In this embodiment, a plurality of time series data sets is analyzed in order to identify one or more data sets for further analysis. Referring now to
[0035] Referring now to
[0036] In step
[0037] In step
[0038] In step
[0039] In step
[0040] In step
[0041] Alternate methods of data set selection may be used such as analyzing both the trend and positive or negative fundamental information. For example, a stock that has both an increasing trend, and a low price to earnings ratio may be a more desirable candidate than an overpriced stock whose trend is just as strong. Similarly overpriced stocks may become good candidates for selling short once they start trending down.
[0042] Generating the Trend Determination Parameters
[0043] Referring again to
[0044] Each set of trend determination parameters in combination with the particular data fitting procedure, such as polynomial regression, produces a unique set of trends for a time series data set. At this step of the method, many different sets of trend determination parameters preferably are used to produce corresponding sets of trends. These multiple sets of trends are then evaluated to determine which of the sets are better at matching the current data from the time series data set.
[0045] The trend determination parameters can be selected in several ways. In this embodiment, a subset of the time series, the first M
[0046] Referring now to
[0047] M1MIN=3
[0048] M1MAX=70
[0049] M1INC=1
[0050] LSMIN=1
[0051] LSMAX=20
[0052] LSINC=0.5
[0053] The trend determination minimum, maximum, and increment parameters are determined through experience. If either of the selected trend determination parameter values is equal to M1MAX or LSMAX for a selected trend set, then this would indicate the need to increase the LSMAX or M1MAX values. The M1INC and LSINC are selected with a tradeoff between accuracy and time. M1INC cannot be smaller than 1. The smaller the values for the increments, M1INC and LSINC, the longer it takes to process all of the trends. However, with smaller increments, there is an increased likelihood of obtaining trend parameters that produce more accurate trends. The lower bound Initial Window Size, M1MIN, is limited by the order of the polynomial that one is using for the trend. For a linear trend (a line) one needs at least two points to fit a line. For second-order polynomials, one needs three points to fit a curve to the data points.
[0054] In step
[0055] the number of trends in the subset of the time series (N
[0056] the RMS Error between the input data values and trend values;
[0057] the average length of the trends;
[0058] the average trend length/m
[0059] the average return (%) of the trends;
[0060] the cumulative return (%) of the trends;
[0061] the fraction of correct predictions, where the initial trend was increasing and the ending price was higher than the initial price, or the initial trend was decreasing and the ending price was lower than the initial price at the beginning of the trend;
[0062] the fraction of incorrect predictions, where the initial trend was increasing and the last price was lower than the initial price, or the initial trend was decreasing and the last price was higher than the initial price;
[0063] the RMS Error/(Average length of the trends/m
[0064] and the RMS error *L
[0065] the efficiency of the trends (%), where efficiency is defined as the average return of the trends (%)/average length of the trends;
[0066] compounded return of the trends (%)
[0067] Each of these values is stored in the computer with its corresponding set of trend determination parameters (m
[0068] Selecting a Set of Useful Trend Determination Parameters
[0069] Referring again to
[0070] In a semi-automated method of this embodiment, the RMS error is crossplotted against the average trend length and a subset of the more effective parameter sets are identified from that plot. Referring now to
[0071] In other embodiments, other types of error measures such as other λ
[0072] Processing the Time Series with Trend Determination Parameter Sets
[0073] Referring again to
[0074]
[0075] A new trend is defined to have begun when one of the following occurs:
[0076] The derivative of the trend changes from positive to negative or vice versa; or
[0077] The number of deviations that the difference between the last closing price or sequential data element and the predicted trend value meets or exceeds L
[0078] Referring to
[0079] In step
[0080] Initialize the trend determination parameters in step
[0081] Set j, the time series index, to m
[0082] Set n, the number of points in the current trend to 0 in step
[0083] For each trend, check to see if the current value, y
[0084] For the First Trend
[0085] In step
[0086] In step
[0087] In step
[0088] at the point x
[0089] In step
[0090] Calculate the residuals, d
[0091] Calculate the standard deviation, s
[0092] Calculate the deviation between the predicted trend value and the actual value, y
[0093] Normalize the deviation, d
[0094] Test to determine whether the trend has changed at step
[0095] The trend also changes if the normalized deviation, d′
[0096] If the trend does not change at step
[0097] If the trend does change at step
[0098] Increment j at step
[0099] At step
[0100] For Subsequent Trends
[0101] Select m
[0102] In step
[0103] at the point x
[0104] In step
[0105] Calculate the residuals, d
[0106] Calculate the standard deviation, s
[0107] Calculate the deviation between the predicted trend value at step
[0108] Normalize the deviation at step
[0109] At step
[0110] The procedure preferably outputs the following arrays for each trend parameter set. There are N
[0111] Date or index array
[0112] Data array (y
[0113] Data value change=ln(y
[0114] Trend codes array
[0115] d′, the array of normalized deviations from the trend.
[0116] Array of minimum closing prices allowed based on trend
[0117] Array of maximum closing prices allowed based on trend
[0118] Array of the first derivatives of the dynamic trend
[0119] Array of the first derivatives of the normalized dynamic trend. The data values at the beginning of the trends are normalized to one, so that the initial trend data value is one for all trends. The derivative of the trend fit to the normalized data values can be compared to the derivatives of the other trends for filtering purposes. Alternate methods of normalization may be used as described previously. This can include: a) dividing the time series values in the trend by its mean or median; b) ranking the values in the trend and using the ranks instead of the time series values themselves in the regression; or c) calculate the Fourier transform of the time series values in the trend, zero the zero-th frequency contribution, and back-transform the series back to the time domain.
[0120] Array of the first derivatives of the final trend.
[0121] Array of correlation coefficients (R) of the final trend line through the data values for each index i.
[0122] The array of final trend line values once a trend has been finalized.
[0123] An array of the differences between the data values and the trend values.
[0124]
[0125] Determining Attributes of Each Trend
[0126] Referring again to
[0127] Referring now to
[0128] Determining the beginning date or index for each trend at step
[0129] Determining the ending date or index for each trend at step
[0130] Determining the beginning data value in each trend at step
[0131] Determining the ending data value in each trend at step
[0132] Determining the trend length at step
[0133] Determining the average trend correlation at step
[0134] Setting a flag at step
[0135] Determining the fraction of changes in data values that were increasing relative to the previous data value for each trend at step
[0136] Determining the fraction of changes in data values that were decreasing relative to the previous data value for each trend at step
[0137] Determining the average slopes or first derivatives of each trend at step
[0138] In this embodiment, a series of arrays is output, each containing N
[0139] Implications of the End of a Trend
[0140] One of the uses of this invention is to provide an unbiased method for assessing whether the fluctuations in the prices of a financial instrument are normal fluctuations around a trend, or whether they are indicative of a change in the trend itself. In the first case, negative deviations from the trend provide buying opportunities if the trend is positive. In the second case, where the tool would provide an indication that the trend itself is changing; a change in the trend would provides an opportunity to reassess the potential direction of prices for a financial instrument. Other tools, such as the fundamental analysis of a stock's potential could be assessed to determine whether a position in a stock should be changed. For example, assume that a stock is in an uptrend. A downward change in the trend would provide a signal that the trend has changed and that the position needs to be re-evaluated. In another example, if the stock has a positive fundamental position, but the stock is in a downtrend, this tool would be used to assess when the downtrend has ended and signal an opportunity to purchase the stock at an even cheaper price.
[0141] This alternate embodiment relates to steps
[0142] For example, (x
[0143] Selecting a Vector Data Set
[0144] In this embodiment, as in the first embodiment, a plurality of vector data sets is analyzed in step
[0145] As in the first embodiment, the normalized derivatives and correlation coefficients are recorded. Upon such analysis of each of the data sets, a data set is selected for further analysis. When identifying financial instruments to trade, a vector that has a large increase or decrease in normalized derivative with a correspondingly large absolute value of the correlation coefficient would be a likely candidate for subsequent processing. Vector data sets that have a small slope would generally be avoided.
[0146] Referring now to
[0147] In step
[0148] In step
[0149] In step
[0150] In step
[0151] In step
[0152] After each data set has been analyzed in steps
[0153] Generating the Trend Determination Parameters
[0154] Referring again to
[0155] Each set of trend determination parameters in combination with the particular data fitting procedure, such as polynomial regression, produces a unique set of trends for a vector data set. At this step of the method, many different sets of trend determination parameters are used to produce corresponding sets of trends. These multiple sets of trends are then evaluated to determine which of the sets are better at matching the current data from the vector data set.
[0156] The trend determination parameters can be selected in several ways. In this embodiment, a subset of the vector, the first M
[0157] Referring now to
[0158] Selecting a Set of Useful Trend Determination Parameters
[0159] Referring again to
[0160] Processing the Vector Data Set with Trend Determination Parameter Sets
[0161] Referring again to
[0162]
[0163] A new trend is defined to have begun as described in the first embodiment.
[0164] Referring now to
[0165] In step
[0166] Initialize the trend determination parameters in step
[0167] Set j to Δm+1 in step
[0168] Set n, the number of points in the current trend to 0 in step
[0169] For each trend, check to see if the current value, (x
[0170] For the First Trend
[0171] In step
[0172] In step
[0173] Calculate the standard deviation, s
[0174] where {overscore (d)} is the average of the d
[0175] The weighted population standard deviation is used in this embodiment; however, this is only one of many measures of the spread of a distribution. This patent includes, but is not limited to, other measures of spread, such as the sample standard deviation, the population or sample variance, range, inter-quartile range, differences between any quantiles of a distribution, or a measure of spread from a neural network.
[0176] Steps
[0177] For Subsequent Trends
[0178] Select Δm+1 values at step
[0179] Calculate the best-fit curve, f
[0180] In step
[0181] at the point x
[0182] In step
[0183] Calculate the residuals, d
[0184] Calculate the standard deviation, s
[0185] Steps
[0186] The procedure preferably outputs the same arrays for each trend parameter set as described in the first embodiment. The only difference is that instead of exporting the data array, (y
[0187] Determining Attributes of Each Trend
[0188] As in the first embodiment, a number of attributes of each of the trends are calculated or updated in step
[0189] Referring now to
[0190] Determining the beginning time or location for each trend at step
[0191] Determining the ending time or location for each trend at step
[0192] Determining the average trend correlation at step
[0193] Determining the average slopes or first derivatives of each trend at step
[0194] 9
[0195] This embodiment refers to step
[0196] The difference is that instead of considering data elements to be individual points: (x