> RFGHIJKLxyz{}~0
1
2
3
4
5
6
ERoot Entry F P#mXP#S@WordDocumentO ObjectPooltaP#aP#SummaryInformation(Q
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
!"#$%&'()*+,./0123456789:;<=>?DMNgUTVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{CompObjCj
!$&'()*+,./01249;<=>?AD
!"#$%&'()+9,./012345678:K;<=>?@ABCDEFGHIJLMNOPQRSTUVWXYZ[\^_`abcdeijklmnopqrstuvwxyz{}~L`4@
FMicrosoft Word Document
MSWordDocWord.Document.69qssion) we have a regression plane. Such models are the oldest, most important, and single most widely used form of predictive model. One reason for this is their evident simplicity, a simple weighted sum is very easy both to compute and to understand. Another reason is the compelling one that they often perform very well  even in circumstances where one knows enough to be confident that the true relationship between the explanatory and response variable cannot be linear. This is not altogether surprising: when one expands continuous mathematical functions in a Taylor series one often finds that the lowest order terms  the linear terms  are the most important, so that the best simple approximation is obtained by using a linear model.
It would be an extremely rare situation in which the chosen model was exactly right. This is especially true in data mining situations, where ones model is generally empirical rather than iconic (see Chapter 4XXX) so that it is not based on any underlying theory. The model may not include all of the explanatory variables which would be needed for perfect prediction (many may not have been measured or even be measurable). It may not include certain functions of the explanatory variables (maybe EMBED Equation.2 is needed as well as EMBED Equation.2 , or maybe products of the explanatory variables are needed because they interact in their effect on y). And, in any case, no measurement is perfect: the y variable will have errors associated with it so that each vector EMBED Equation.2 will be associated with a distribution of possible y values, as we have noted above.
All of this means that the actual y values in a sample will differ from the predicted values. The differences between observed and predicted values are called residuals, and we denote them by e:
EMBED Equation.2
In matrix terms, if we denote the observed y measurements on the n objects in the design sample by the vector y and the measurements of the explanatory variables by the n by p matrix X, we can express the relationship between the observed response and explanatory measurements, in terms of our model, as
y = Xa + e (9.2)
In this equation, EMBED Equation.2 represents the vector of parameter values, and the vector e EMBED Equation.2 contains the residuals. The coefficient EMBED Equation.2 , providing the intercept of the model, has been included here, so that the rows of matrix X include a first term which is always 1. Clearly we want to choose the parameters in our model, the values in the p+1 vector a, so as to yield predictions which are as accurate as possible. Put another way, we must find estimates for the EMBED Equation.2 EMBED Equation.2 which minimise the e discrepancies in some way. To do this, we combine the elements of e in some way to yield a single numerical measure which we can minimise.
Various ways of combining the EMBED Equation.2 have been proposed, but by far the most popular method is to sum their squares. That is, we seek the values for the parameter vector EMBED Equation.2 which minimises
EMBED Equation.2 (9.3)
In this expression, EMBED Equation.2 is the observed y value for the ith design sample point and EMBED Equation.2 is the vector of explanatory variables for this point. For obvious reasons, this method is known as the least squares method. For simplicity, we will denote the parameter vector which minimises this by EMBED Equation.2 . (It would be more correct, of course, if we used some notation to indicate that it is an estimate, such as EMBED Equation.2 .)
In matrix terms, the values of the parameters which minimise (9.3) are given by
EMBED Equation.2 (9.4)
In linear regression in general, the a parameters are called regression coefficients. Once the parameters have been estimated, they are used in (9.1) to yield predictions. The predicted value of y, EMBED Equation.2 , for a vector of explanatory variables EMBED Equation.2 , is given by EMBED Equation.2 .
Solution (9.4) requires that the matrix EMBED Equation.2 be invertible. Problems will arise if the sample size n is small (rare in data mining situations) or if there are linear dependencies between the measured values of the explanatory variables (not so rare). In the latter case, modern software normally issues warnings and appropriate action can be taken, such as dropping some of the explanatory variables.
A rather more subtle problem arises when the measured values of the explanatory variables are not exactly linearly dependent, but are almost so. Now the matrix can be inverted, but the solution will be unstable. This means that slight alterations to the observed X values would lead to substantial differences in the estimated values of a. Different measurement errors or a slightly different design sample would have led to different parameter estimates. This problem is termed multicollinearity. The instability in the estimated parameters is a problem if these values are the focus of interest  for example, if one wants to know which of the variables is most important in the model. However, it will not normally be a problem as far as predictive accuracy is concerned: although substantially different a vectors may be produced by slight variations of the data, all of these vectors will lead to similar predictions for most EMBED Equation.2 vectors.
In Chapter 4XXX we remarked that the additive nature of the regression model could be retained while permitting more flexible model forms by replacing the raw EMBED Equation.2 by transformations of them. Figure XXX1 shows a plot of data collected in an experiment in which a subject performed a physical task at a gradually increasing level of difficulty. The vertical axis shows a measure on the gases expired from the lungs while the horizontal axis shows the oxygen uptake. The nonlinearity of the relationship between these two variables is quite clear from the plot. A straight line EMBED Equation.2 provides a poor fit  as is shown in the figure. The predicted values from this model would only be accurate for x (oxygen uptake) values just above 1000 and just below 4000. (Having said that, the model is not grossly inaccurate  the point made above about models linear in x providing reasonable approximations is clearly true.) However, the model EMBED Equation.2 gives the fitted line shown in Figure XXX2. This model is still linear in the parameters, so that these can be easily estimated using the standard matrix manipulation shown above. It is clear that the predictions obtained from this model are about as good as one can do. The remaining inaccuracy in the model is the irreducible measurement error associated with the variance of y about its mean at each value of x.
Figure XXX1: Expired ventilation plotted against oxygen uptake in a series of trials, with fitted straight line.
EMBED SPLUSGraphSheetFileType
Figure XXX2: The data from Figure XXX1 with a model which includes a term in EMBED Equation.2 .
EMBED SPLUSGraphSheetFileType
The informal data analytic route described above allows us to fit a regression model to any data involving a response variable and a set of explanatory variables, and obtain a vector of estimated regression coefficients. If our aim were merely to produce a convenient summary of the design data (as, very occasionally, it is) then we could stop there. However, this chapter is concerned with predictive models. Our aim is to go beyond the design data to predict y values for other objects. Goodness of fit to the given data is all very well, but we are really interested in fit to future data which arises from the same process, so that our future predictions are as accurate as possible. In order to explore this, we need to embed the above modelbuilding process in a more formal inferential context. To do this, we suppose that each observed value EMBED Equation.2 is produced as a sum of weighted explanatory variables ( EMBED Equation.2 ) and a random term EMBED Equation.2 which follows a N(0, EMBED Equation.2 ) distribution independent of other values. The random vector Y thus takes the form Y = X( + (. The observed y vector in (9.2) is a realisation from this distribution. The components of ( are often called errors. Note that they are different from the residuals, e. An error is a random realisation from a given distribution, whereas a residual is a difference between a fitted model and an observed y value.
It turns out that, within this framework the least squares estimate a above is also the maximum likelihood estimate. Furthermore, the covariance matrix of the estimate a obtained above is EMBED Equation.2 . In the case of a single explanatory variable, this gives EMBED Equation.2 for the variance of the intercept term and EMBED Equation.2 for the variance of the slope. Here EMBED Equation.2 is the sample mean of the single explanatory variable. The diagonal elements of the covariance matrix for a above give the variances of the regression coefficients  which can be used to test whether the individual regression coefficients are significantly different from zero: if EMBED Equation.2 is the jth diagonal element of EMBED Equation.2 , then the ratio EMBED Equation.2 can be compared with a t(np1) distribution to see if the regression coefficient is zero. However, as we discuss below, this test only makes sense in the context of the other variables included in the model, and alternative methods, also discussed below, are available for more elaborate modelbuilding exercises
If EMBED Equation.2 is the vector of explanatory variables for a new object, with predicted y value EMBED Equation.2 , then the variance of EMBED Equation.2 is EMBED Equation.2 . With one explanatory variable, this is EMBED Equation.2 . Note that this variance is greater the further EMBED Equation.2 is from the mean of the design sample  the least accurate predictions, in terms of variance, are those in the tails of the distribution of explanatory variables. Note also that confidence intervals (see Chapter 5XXX) based on this variance are confidence values for the predicted value of y. We might also be interested in (what are somewhat confusingly called) prediction intervals, telling us a range of plausible values for the observed y at a given value of x, not a range of plausible values for the predicted value. Prediction intervals must include the uncertainty arising from our prediction and also that arising from the variability of y about our predicted value. This means that the variance above is increased by an extra term EMBED Equation.2 , yielding EMBED Equation.2
( DISPLAY 9.1XXX
The most important special case of linear regression arises when there is just one predictor variable. Figure XXX3 shows a plot of Relative performance on the vertical axis against Estimated relative performance on the horizontal axis for 209 computer CPUs. We can use regression to predict Relative performance from Estimated relative performance. A simple regression of the data as shown in Figure XXX3 gives an estimated intercept value of 1.1091 and an estimated regression coefficient of 0.9300. Most modern data analytic packages will give the associated standard errors of the estimates, along with significance tests of the null hypotheses that the true parameters which led to the data are zero. In this case, the standard errors are 3.2989 and 0.0172, respectively, yielding significance probabilities of 0.7371 and 0.0000. From this we would conclude that there is strong evidence that the positive linear relationship is real, but no evidence of a nonzero intercept.
The plot in Figure XXX3 shows marked skewness in both variables. It is clear that the position of the regression line will be much more sensitive to the precise position of points to the right of the figure than it will be to the position of points to the left. Points which can have a big effect on the conclusion are called points of high leverage  they are points at the extreme values of Estimated relative performance in Figure XXX3. Points which actually do have a big effect are called influential points. For example, if the rightmost point in Figure XXX3 had Relative performance value 200 (while still having Estimated relative performance value around 1200), it would clearly have a big effect on the regression line. The asymmetry of the leverage of the points in the figure might be regarded as undesirable. One might try to overcome this by reducing the skewness  for example by log transforming both the variables before fitting the regression line.
(
Figure XXX3: A plot of relative performance against estimated relative performance for 209 computer CPUs.
EMBED SPLUSGraphSheetFileType
The coefficients in a multiple regression model can be interpreted as follows. If the jth explanatory variable, EMBED Equation.2 , is increased by one unit, while all the other explanatory variables are kept fixed, then the response variable y will increase by EMBED Equation.2 . The regression coefficients thus tell us the conditional effect of each explanatory variable, conditional on keeping the other explanatory variables constant. This is an important aspect of the interpretation. In particular, the size of the regression coefficient associated with the jth variable will depend on what other variables are in the model. This is clearly especially important if one is constructing models in some kind of sequential way: add another variable and the coefficients of those already in the model will change. (There is an exception to this. If the explanatory variables are orthogonal, then the regression coefficients are unaffected by the presence or absence of others in the model. However, this situation is most common in designed experiments, and is rare in the kinds of secondary data analyses encountered in data mining.) The sizes of the regression coefficients tells us the relative importance of the variables, in the sense that one can compare the effects of unit changes. Note also that the size of the effects depends on the chosen units of measurement for the explanatory variables. If one measures EMBED Equation.2 in kilometres instead of millimetres, then its associated regression coefficient will be multiplied by a million. This can make comparisons between variables difficult, so people often work with standardised variables  measuring each explanatory variable relative to its standard deviation.
We used the sum of squared errors between the predictions and the observed y values as a criterion through which to choose the values of the parameters in the model. This is the residual sum of squares or the sum of squared residuals, EMBED Equation.2 . In a sense, the worst model would be obtained if we simply predicted all of the y values by EMBED Equation.2 , the mean of the sample of y values. The total sum of squares is defined as the sum of squared errors for this worst model, EMBED Equation.2 . The difference between the residual sum of squares from a model and the total sum of squares is the sum of squares which can be attributed to the regression for that model  it is the regression sum of squares. This is the sum of squared differences of the predicted values, EMBED Equation.2 , from the overall mean, EMBED Equation.2 . The symbol EMBED Equation.2 is often used for the multiple correlation coefficient, the ratio of regression sum of squares to total sum of squares. A value near 1 tells us that the model explains most of the y variation in the data.
The number of independent components contributing to each sum of squares is called the number of degrees of freedom for that sum of squares. The degrees of freedom for the total sum of squares is n1 (one less than the sample size, since the components are all calculated relative to the mean). The degrees of freedom for the residual sum of squares is n1p (although there are n terms in the summation, p+1 regression coefficients are calculated). The degrees of freedom for the regression sum of squares is p, the difference between the total and residual degrees of freedom.
These sums of squares and their associated degrees of freedom are usefully put together in an analysis of variance table, as in Table XXX1, summarising the decomposition of the totals into components. The meaning of the final column is described below.
Table XXX1: The analysis of variance decomposition table for a regression
Source of Sum of Degrees of Mean
variation squares freedom square
Regression EMBED Equation.2 p EMBED Equation.2
Residual EMBED Equation.2 np1 EMBED Equation.2
Total EMBED Equation.2 n1
We have already noted that our real aim in this chapter is one of inference: we want to make statements (predictions) about objects for which we do not know the y values. This means that goodness of fit to the design data is not our real objective. In particular, merely because one has obtained nonzero estimated regression coefficients does not necessarily mean that the variables are related: it could be merely that ones model has captured chance idiosyncrasies of the design sample. As explained in Chapter 5, we need some way to test the model, to see how easily the observed data could have arisen by chance, even if there was no structure in the population the data were collected from. In this case, we need to test whether the population regression coefficients are really zero. (Of course, this is not the only test one might be interested in, but it is the one most often required.) It can be shown that if the values of EMBED Equation.2 are all zero (and still making the assumption that the EMBED Equation.2 are independently distributed as N(0, EMBED Equation.2 )), then
EMBED Equation.2
has an F(p, np1) distribution. This is just the ratio of the two mean squares given in Table 1. The test is carried out by comparing the value of this ratio with the upper critical level of the F(p, np1) distribution. If the ratio exceeds this value the test is significant  and we would conclude that there is a linear relationship between the y and EMBED Equation.2 variables (or that a very unlikely event has occurred). If the ratio is less than the critical value we have no evidence to reject the null hypothesis that the population regression coefficients are all zero. Alternatively, many data analysis packages give the probability that a regression sum of squares this large or larger would be observed by chance, if there really was no relationship between y and the xs in the population (a pvalue  this use of p is not to be confused with its use as the number of explanatory variables).
We have described an overall test to see if the regression coefficients in a given model are all zero. However, we are more often involved in a situation of model building, in which we examine a sequence of models to find one which is best in some sense. In particular, we often need to examine the effect of adding a set of explanatory variables to a set we have already included. Note that this includes the special case of adding just one extra variable, and also, by applying the idea in reverse, can handle the situation of removing variables from a model.
In order to compare models we need a measure of goodnessoffit. Once again, the obvious one is the sum of squared errors between the predictions and the observed y values. Suppose we are comparing two models: a model with p explanatory variables (model M) and the largest model we are prepared to contemplate, with say q variables (these will include all the untransformed explanatory variables we think might be relevant, along with any transformations of them we think might be relevant), model M*. Each of these models will have an associated residual sum of squares, and the difference between them will tell us how much better the larger model fits the data than the smaller model. (Equivalently, we could calculate the difference between the regression sum of squares. Since the residual and regression sum of squares sum to the total sum of squares, which is the same for both models, the two calculations will yield the same result.) The degrees of freedom associated with the difference between the residual sums of squares for the two models is qp, the extra number of regression coefficients computed in fitting the larger model, M*. The ratio between the difference of the residual sums of squares and the difference of degrees of freedom, again give us a mean square  now a mean square for the difference between the two models. Comparison of this with the residual mean square for model M* gives us an F test of whether the difference between the models is real or not. Table XXX2 illustrates this extension. From this table, the ratio [SS(M*)SS(M) / qp] [SS(T)SS(M*) / nq1] is compared with the critical value of an F(qp, nq1) distribution.
Table XXX2: The analysis of variance decomposition table for model building
Source of Sum of Degrees of Mean
variation squares freedom square
Regression SS(M) p SS(M) / p
Model 1
Regression SS(M*) q SS(M*) / q
Full model
Difference SS(M*)SS(M) qp SS(M*)SS(M) / qp
Residual SS(T)SS(M*) nq1 SS(T)SS(M*) / nq1
Total SS(T) n1
This is fine if one has just a few models one wants to compare, but data mining problems are such that often one needs to rely on automatic model building processes. Such methods are available in most modern data mining computer packages. There are various strategies which may be adopted. A basic form is a forward selection method, in which variables are added one at a time to an existing model. At each step that variable is chosen from the set of potential variables which leads to greatest increase in predictive power (measured in terms of reduction of sum of squared residuals), provided the increase exceeds some specified threshold. In principle, the addition is made provided the increase in predictive power is statistically significant, but in practice this is not the case. The variable selection process necessarily involves carrying out many tests, not all independent, so that computing correct significance values is a nontrivial process. The simple significance level based on Table XXX2 does not apply when multiple dependent tests are made.
An opposite strategy to forward selection is backwards elimination. One begins with the most complex model one might contemplate (the largest model, M*, above) and progressively eliminates variables, selecting them on the basis that they lead to least increase in sum of squared residuals (again, subject to some threshold). Other variants include combinations of forward selection and backwards elimination. For example, one might add in two variables, eliminate one, add two, remove one, and so on.
All of these stepwise methods are attempts to restrict the search of the space of all possible sets of explanatory variables, so that the search is manageable. But by restricting the search, it is possible that some highly effective combination of variables may be overlooked. Very occasionally (if the set of potential explanatory variables is small), one can examine all possible sets of variables (although, with p variables, there are 2p1 possible subsets). The range of search for all possible subsets has been expanded by the use of strategies such as branch and bound, which rely on the monotonicity of the residual sum of squares criterion (it can only decrease when new variables are added).
A couple of cautionary comments are worth making here. First, recall that the coefficients of variables already in the model will change as new variables are added. A variable which is important for one model, may become less so when the model is extended. Secondly, if too elaborate a search is carried out then there is a high chance of overfitting the design set, obtaining a model which provides a good fit to the design set (small residual sum of squares) but does not predict new data very well. Such issues are discussed in Chapter XXX.
Although multiple regression is a very powerful and widely used technique, some of the assumptions might be regarded as restrictive. The assumption that the variance of the Y distribution is the same at each vector x is often inappropriate. (The assumption of equal variances is called homoscedasticity. The converse is heteroscedasticity.) For example, Figure XXX4 shows the normal average January minimum temperature (in (F ) plotted against the latitude ((N) for 56 cities in the United States. There is evidence that the variance of the temperature increases with increasing latitude (although the mean temperature seems to decrease). We can still apply the standard least squares algorithm above to estimate parameters in this new situation, and the resulting estimates would still be unbiased, but one could do better in the sense that it is possible to find estimators with smaller variance. To do this we need to modify the basic method above. Essentially, we need to arrange things so that those values of x associated with y values with larger variance are weighted less heavily in the model fitting process. Formally, this idea leads to a modification of solution (9.4). Suppose that the covariance matrix of the random vector ( is EMBED Equation.2 V (previously we took V=I). The case of unequal variances means that V is diagonal with terms which are not all equal. Now it is possible (see books on linear algebra) to find a unique nonsingular matrix P such that PP=V. We can use this to define a new random vector f=P1(, and it is easy to show that the covariance matrix of f is EMBED Equation.2 I. Using this idea, we form a new model by premultiplying the old one by P1:
P1Y = P1X( + P1(
or
Z = W( + f,
now of the form required to apply the standard least squares algorithm. If we do this, and then convert the solution back into the original variables Y, we obtain:
EMBED Equation.2 (9.5)
a weighted least squares solution. The variance of this estimated parameter vector a is EMBED Equation.2 .
Figure XXX4: Temperature ((F) against latitude ((N) for 56 cities in the United States.
EMBED SPLUSGraphSheetFileType
Unequal variances of the y distributions for different x vectors is one way in which the assumptions of basic multiple regression can break down. There are others. What we really need are ways to explore the quality of the model and tools which will enable us to detect where and why the model deviates from the assumptions. That is, we require diagnostic tools.
In simple regression, where there is only one explanatory variable, one can see the quality of the model from a plot of y against x. Figure XXX1 provides an illustration. More generally, however, when there is more than one explanatory variable, such a simple plot is not possible and more sophisticated methods are needed. In general, the key features for examining the quality of a regression model are the residuals, the components of the vector e = y  EMBED Equation.2 . If there is a pattern to these it tells us that the model is failing to explain the distribution of the data. Various plots involving the residuals are used, including plotting the residuals against the fitted values, plotting standardised residuals (obtained by dividing the residuals by their standard errors) against the fitted values, and plotting the standardised residuals against standard normal quantiles. (The latter are normal probability plots. If the residuals are approximately normally distributed, then the points in this plot should lie roughly on a straight line.) Of course, interpreting some of the diagnostic plots requires practice and experience.
At this point it is convenient to make a general cautionary comment, which applies to all predictive models. Such models are only valid within the bounds of the data. It can be very risky to extrapolate beyond the data. A very simple example is given in Figure XXX5. This shows a plot of the tensile strength of paper plotted against the percentage of hardwood in the pulp from which the paper was made. But suppose only those samples with pulp values between 1 and 9 had been measured. The figure shows that a straight line would provide quite a good fit to this subset of the data. For new samples of paper, with pulp values lying between 1 and 9, quite good prediction of the strength could legitimately be expected. But the figure also shows, strikingly clearly, that our model would produce predictions which were seriously amiss if we used it to predict the strength of paper with pulp values greater than 9. Only within the bounds of our data is the model trustworthy. Figure XXX1 in Chapter 7XXX gives another example. Here the graph shows the number of credit cards in circulation each year. A straight line fitted to years 1985 to 1990 would provide a good fit  but if predictions beyond those years was based on this model, disaster would follow.
These examples are particularly clear  but they involve just a few data points and a single explanatory variable. In data mining applications, with large data sets and many variables, things may not be so clear. Caution needs to be exercised when making predictions.
Figure XXX5: A plot of tensile strength of paper against the percentage of hardwood in the pulp.
EMBED SPLUSGraphSheetFileType
9.3 Generalised linear models
Section 9.2 described the linear model, in which the random response variable was decomposed into two parts: a weighted sum of the explanatory variables and a random component: EMBED Equation.2 . For inferential purposes we also assumed that the EMBED Equation.2 were independently distributed as N(0, EMBED Equation.2 ). We can write this another way, which permits convenient generalisation, splitting the description of the model into three parts:
(i) The EMBED Equation.2 are independent random variables, with distribution N( EMBED Equation.2 , EMBED Equation.2 ).
(ii) The parameters enter the model in a linear way via the sum EMBED Equation.2 .
(iii) The EMBED Equation.2 and EMBED Equation.2 are linked by EMBED Equation.2 = EMBED Equation.2 .
This permits two immediate generalisations, while retaining the advantages of the linear combination of the parameters. Firstly, in (i) we can relax the requirement that the random variables follow a normal distribution. Secondly, we can generalise the link expressed in (iii), so that some other link function EMBED Equation.2 relates the parameter EMBED Equation.2 of the distribution to the linear term EMBED Equation.2 . These extensions result in what are called generalised linear models. They are one of the most important advances in data analysis of the last two decades. As we shall see, such models can also be regarded as fundamental components of feed forward neural networks.
To illustrate, one of the most important kinds of generalised linear model for data mining is logistic regression. In many situations the response variable is not continuous, as we assumed in Section 9.2, but is a proportion: the number of flies from a given sample which die when exposed to an insecticide, the proportion of questions people get correct in a test, the proportion of oranges in a carton which are rotten. The extreme of this arises when the proportion is out of 1  the observed response is binary: whether or not an individual insect dies, whether or not a person gets a particular one of the questions right, whether or not an individual orange is rotten. We shall deal with this case, in which the response is binary, since the resulting model may also be applied to situations where the data have been grouped.
So, we are dealing with a binary response variable, with the random variable EMBED Equation.2 taking values 0 or 1 corresponding to the two possible outcomes. We shall assume that the probability that the ith individual yields the value 1 is EMBED Equation.2 , and that the responses of different individuals are independent. This means that the response for the ith individual follows a Bernoulli distribution:
EMBED Equation.2
For logistic regression, this is the generalisation of (i) above.
Our aim is to formulate a model for the probability that an object with explanatory vector EMBED Equation.2 will take value 1. That is, we want a model for the mean value of the response, the probability EMBED Equation.2 . We could use a linear model  a weighted sum of the explanatory variables. However, this would not be ideal. Most obviously, a linear model can take values less than 0 and greater than 1. This suggests that we need to modify to the model to include a nonlinear aspect. We achieve this by transforming the probability, nonlinearly, so that it can be modelled by a linear combination. That is, we use a nonlinear link function in (iii).
A suitable function (not the only one) is a logistic (or logit) link function, in which EMBED Equation.2 . As p varies from 0 to 1, this clearly varies from EMBED Equation.2 to EMBED Equation.2 , matching the potential range of EMBED Equation.2 .
One of the advantages of the logistic link function over alternatives (for example, over a probit function, which was widely used in the past), is that it permits convenient interpretation. For example:
( The ratio p/(1p) in the transformation is the familiar odds that a 1 will be observed. EMBED Equation.2 is the log odds.
( Given a new vector of explanatory variables EMBED Equation.2 the predicted probability of observing a 1 is derived from EMBED Equation.2 . The effect on this of changing the jth explanatory variable by one unit is simply EMBED Equation.2 . Thus the coefficients tell us the difference in log odds  or, equivalently, the log odds ratio resulting from the two values. From this it is easy to see that EMBED Equation.2 is the factor by which the odds changes when the jth explanatory variable changes by one unit. (c.f. the discussion of the effect of a unit change of one variable in the multiple regression case discussed in Section 9.2)
( Display XXX
Two minutes into its flight on January 29th, 1996 the space shuttle Challenger exploded, killing everyone on board. The two booster rockets for the shuttle are made of several pieces, with each of three joints sealed with a rubber Oring, making six rings in total. It was known that these Orings were sensitive to temperature. Records of the proportion of Orings damaged in previous flights were available, along with the temperatures on those days. The lowest previous temperature was 53F. On the day of the flight the temperature was 31F, so there was much discussion about whether the flight should go ahead. One argument was based on an analysis of the seven previous flights which had resulted in damage to at least one Oring. A logistic regression to predict the probability of failure from temperature led to a slope estimate of 0.0014 with a standard error of 0.0498. From this, the predicted logit of the probability of failure at 31(F is 1.3466, yielding predicted probability 0.206. The slope in this model is positive, suggesting that, if anything, the probability of failure is lower at low temperatures. However, this slope is not significantly different from zero, suggesting that there is no relationship between failure probability and temperature.
This analysis is far from ideal. First, 31(F is far below 53(F, so one is extrapolating beyond the data  a practice we warned against above. Secondly, there is valuable information in the 16 flights which had not resulted in Oring damage. This is immediately obvious from a comparison of Fig XXX6 (a), which shows the numbers damaged for the seven flights above (vertical axis) against temperature (horizontal axis), and Fig XXX6(b), which shows the number for all 23 flights. These 16 flights all took place at higher temperatures. The second figure suggests that the relationship might, in fact, have a negative slope. A logistic model fitted to the data in Figure XXX6(b) gave a slope estimate of 0.1156, with a standard error of 2.46 (and an intercept estimate of 5.08 with standard error of 3.05). From this the predicted probability at 31(F is 0.817. This gives a rather different picture, one which could have been deduced before the flight if all the data had been studied.
Figure 6(a)
EMBED Word.Picture.6
Figure 6(b)
(
Generalised linear models thus have three features:
(i) The EMBED Equation.2 (i = 1,,n) are independent random variables, with the same exponential family distribution (see below).
(ii) The explanatory variables are combined in a form EMBED Equation.2 , called the linear predictor.
(iii) The mean EMBED Equation.2 of the distribution for a given explanatory vector EMBED Equation.2 is related to the linear combination in (ii) through the link function EMBED Equation.2 .
The exponential family of distributions is an important family which includes the normal, the Poisson, the Bernoulli, and the binomial distributions. Members of this family can be expressed in the general form
EMBED Equation.2
If EMBED Equation.2 is known, then EMBED Equation.2 is called the natural or canonical parameter. When, as is often the case, EMBED Equation.2 , EMBED Equation.2 is called the dispersion or scale parameter. A little algebra reveals that the mean of this distribution is given by EMBED Equation.2 and the variance by EMBED Equation.2 . Note that the variance is related to the mean via EMBED Equation.2 , and this, expressed in the form EMBED Equation.2 , is sometimes called the variance function. In the model as described in (i) to (iii) above, there are no restrictions on the link function. However (and this is where the exponential family comes in), things simplify if the link function is chosen to be the function expressing the canonical parameter for the distribution being used as a linear sum. For multiple regression this is simply the identity distribution and for logistic regression it is the logistic transformation presented above. For Poisson regression, in which the distribution in (i) is the Poisson distribution the canonical link is the log link (g(u) = log(u)). Prediction from a generalised linear model requires the inversion of the relationship EMBED Equation.2 .
The algorithms in least squares estimation were very straightforward, essentially only involving matrix inversion. For generalised linear models, however, things are more complicated: the nonlinearity means that an iterative scheme has to be adopted. We will not go into details of the mathematics here, but it is not difficult to show that the maximum likelihood solution is given by solving the equations
EMBED Equation.2 j = 1,, p (9.6)
where the i subscripts in EMBED Equation.2 and EMBED Equation.2 are in recognition of the fact that these vary from data point to data point. Standard application of the Newton Raphson method leads to iteration of the equations
EMBED Equation.2
where EMBED Equation.2 represents the vector of values of EMBED Equation.2 at the sth iteration, u is the vector of first derivatives of the log likelihood, evaluated at EMBED Equation.2 , and M is the matrix of second derivatives of the log likelihood, again evaluated at EMBED Equation.2 .
An alternative method, the method of scoring, replaces M by the matrix of expected second derivatives. The iterative steps of this method can be expressed in a form similar to the weighted version, (9.5) of the standard least squares matrix solution, (9.4):
EMBED Equation.2 (9.7)
where EMBED Equation.2 is a diagonal matrix with iith element EMBED Equation.2 evaluated at EMBED Equation.2 and EMBED Equation.2 is a vector with ith element EMBED Equation.2 again evaluated at EMBED Equation.2 . Given the similarity of this to (9.5) it will hardly be surprising to learn that this method is called iteratively weighted least squares.
We need a measure of the goodness of fit of a generalised linear model, analogous to the sum of squares used for linear regression. Such a measure is the deviance of a model. In fact, the sum of squares is the special case of deviance when applied to linear models. We introduced the deviance for a model M in Chapter 5XXX as EMBED Equation.2 , essentially the difference between the log likelihood of model M and the log likelihood of the largest model we are prepared to contemplate, M*. Deviance can be decomposed like the sum of squares to permit exploration of classes of models.
( Display XXX2
In a study of ear infections in swimmers, 287 swimmers were asked if they were frequent ocean swimmers, whether they preferred beach or nonbeach, their age, their sex, and also the number of selfdiagnosed ear infections they had had in a given period. The last variable here is the response variable, and a predictive model is sought, in which this number can be predicted from the other variables. Clearly linear regression would be inappropriate: the response variable is discrete and, being a count, is unlikely to look remotely like a normal distribution. Likewise, it is not a proportion, it is not bounded between 0 and 1, so it would be inappropriate to model it using logistic regression. Instead, it is reasonably to assume that the response variable follows a Poisson distribution, with parameter depending on the value of the predictor variables. Fitting a generalised linear model to predict the number of infections from the other variables, with the response following a Poisson distribution and using a log function for the link, led to the analysis of deviance table shown below.
mean deviance
d.f. deviance deviance ratio
Regression 4 1.67 0.4166 0.42
Residual 282 47.11 0.1671
Total 286 48.78 0.1706
Change 4 1.67 0.4166 0.42
To test the null hypothesis of no predictive relationship between the response variable and the predictors, we compare the value of the regression deviance (1.67, from the top of the second column of numbers) with the chisquared distribution with 4 degrees of freedom (given at the top of the first column of numbers). This gives a p value of 0.7962. This is far from small, suggesting that there is little evidence that the response variable is related to the predictor variables. Not all data necessarily lead to a model which gives accurate predictions!
(
Before leaving this section, it is worth noting a property of equations (9.6). Although these were derived on the assumption that the random variables followed an exponential family distribution, examination reveals that these estimating equations only make use of the means and variances EMBED Equation.2 , the variances EMBED Equation.2 , as well as the link function and the data values. There is nothing about any other aspect of the distributions. This means that even if we are not prepared to make tighter distributional assumptions, we can still estimate the parameters in the linear predictor EMBED Equation.2 . Because no full likelihood has to be formulated in this approach, it is termed quasilikelihood estimation. Once again, of course, iterative algorithms are needed.
9.4 Artificial neural networks
Artificial neural networks (ANNs) are one of a class of highly parameterised statistical models which have attracted considerable attention in recent years (other such models are outlined in later sections). In the present context, we will only be concerned with feed forward neural networks or multilayer perceptrons. In this section, we can barely scratch the surface of this topic and suitable further reading is suggested below. The fact that ANNs are highly parameterised makes them very flexible, so that they can accurately model relatively small irregularities in functions. On the other hand (see Chapter XXX) this flexibility means that there is a serious danger of overfitting. Indeed, early (by which is meant during the 1980s) work was characterised by inflated claims when such networks were overfitted to design sets, and predictions of future performance based on the design set performance. In recent years strategies have been developed for overcoming this problem, resulting in a very powerful class of predictive models.
To set ANNs in context, recall that the generalised linear models of the previous section formed a linear combination of the explanatory variables, and transformed this via a nonlinear transformation. Feedforward ANNs adopt this as the basic element. However, instead of using just one such element, they use layers of many. The outputs from one layer  the transformed linear combinations from each basic element  serve as inputs to the next layer. In this next layer the inputs are combined in exactly the same way  each element forms a weighted sum, which is then nonlinearly transformed.
Mathematically, for a network with just one layer of transformations between the input variables x and the final transformation f yielding the output y (one hidden layer), we have
EMBED Equation.2
Here the w are the weights in the linear combinations and the f are the nonlinear transformations. The nonlinearity of these transformations is essential, since otherwise the model reduces to a nested series of linear combinations of linear combinations  which is simply a linear combination. The term network derives from a graphical representation of this structure in which the explanatory variables and each weighted sum are nodes, with edges connecting the terms in the summation to the node.
There is no limit to the number of layers which can be used, though it can be proven that a single hidden layer (with enough nodes in that layer) is sufficient to model any continuous functions. Of course, the practicality of this will depend on the available data, and it might be convenient for other reasons (such as interpretability) to use more than one hidden layer. There are also generalisations, in which layers are skipped, with inputs to a node coming not only from the layer immediately below but also from other lower layers.
The earliest forms of ANN used threshold logic units as the nonlinear transformations: the output was 0 if the weighted sum of inputs was below some threshold and 1 otherwise. However, there are mathematical advantages to be gained by adopting differentiable forms for these functions. In applications, the two most common forms seem to be logistic EMBED Equation.2 and hyperbolic tangent EMBED Equation.2 transformations of the weighted sums.
We saw, when we moved from simple linear models to generalised linear models, that estimating the parameters became more complicated. A further extra level of complication occurs when we move from simple generalised linear models to ANNs. This will probably not come as a surprise, given the number of parameters (these now being the weights in the linear combinations) in the model and the fundamental nonlinearity of the transformations. As a consequence of this, neural network models can be slow to estimate. This can limit their applicability in data mining problems involving large data sets. (But slow estimation and convergence is not all bad. There are stories within the ANN folklore relating how severe overfitting by a flexible model has been avoided by accident, simply because the estimation procedure was stopped early.)
Various estimation algorithms have been proposed. A popular approach is to minimise the sum of squared deviations (again!) between the output and predicted values by steepest descent on the weight parameters. This can be expressed as a sequence of steps in which the weights are updated, working from the output node(s) back to the input nodes. For this reason, the method is called backpropagation.
Other criteria have also been used. When y takes only two values (so that the problem is really one of supervised classification) the sum of squared deviations is rather unnatural (being equivalent to loglikelihood for normal distributions) and a more natural one, based on loglikelihood for Bernoulli data is
EMBED Equation.2
In practical applications, with reasonably sized data sets, the precise choice of criterion seems to make little difference.
The vast amount of work on neural networks in recent years, by a diverse range of intellectual communities, has led to the rediscovery of many concepts and phenomena already wellknown and understood in other areas. It has also led to the introduction of unnecessary new terminology.
9.5 Other highly parameterised models
The characterising feature of neural networks is that they provide a very flexible model with which to approximate functions. Partly because of this power and flexibility, but also partly because of the appeal of their name and its implied promise, they have attracted a great deal of media attention. However, they are not the only class of flexible models. Others, in some cases also able to approximate any continuous functions on compacta, have also been developed. Some of these have advantages as far as interpretation and estimation goes. In this section we briefly outline two of the more important classes of flexible model. Others are mentioned in Section (XXXother methods).
9.5.1 Generalised additive models
We have seen how the generalised linear model extends the ideas of linear models. Yet further extension arises in the form of generalised additive models. These replace the simple weighted sums of the explanatory variables by weighted sums of transformed versions of the explanatory variables. To achieve greater flexibility, the relationships between the response variable and the explanatory variables are estimated nonparametrically (for example, by kernel or spline smoothing, as outlined in Chapter 4XXX), so that the generalised linear model form EMBED Equation.2 becomes EMBED Equation.2 . The right hand side here is sometimes termed the additive predictor. Such models take to the nonparametric limit the idea of extending the scope of linear models by transforming the explanatory variables, mentioned in Section 9.2XXX. Generalised additive models of this form retain the merits of linear and generalised linear models. In particular, how g changes with any particular explanatory variable does not depend on how other explanatory variables change: interpretation is eased. Of course, this is at the cost of assuming that such an additive form does provide a good approximation to the true surface. The model can be readily generalised by including multiple explanatory variables within individual f components of the sum, but this is at the cost of the simple additive interpretation. The additive form also means that we can examine each smoothed explanatory variable separately, to see how well it fits the data.
In the special case when g is the identity function (so we are merely discussing an additive model, rather than a generalised additive model) appropriate smoothing functions can be found by a backfitting algorithm. If the additive model EMBED Equation.2 is correct, then EMBED Equation.2 . This leads to an iterative algorithm in which, at each step the partial residuals EMBED Equation.2 for the kth explanatory variable are smoothed, cycling through the explanatory variables until the smoothed functions do not change. The precise details will, of course, depend on the choice of smoothing method: kernel, spline, or whatever.
To extend this from additive to generalised additive models, we make the same extension as in Section 9.3XXX when we extended the ideas from linear to generalised linear models. In Section 9.3XXX we outlined the iteratively weighted least squares algorithm for fitting generalised linear models. Equation (9.7) showed that this was essentially an iteration of a weighted least squares solution applied to an adjusted response variable, defined by EMBED Equation.2 . For generalised additive models, instead of the weighted linear regression in equation (9.7), we adopt an algorithm for fitting a weighted additive model.
( Display XXX: Sometimes blood pressure is deliberately lowered during surgery, using drugs. Once the operation is completed, and the administration of the drug discontinued, it is desirable that the blood pressure should return to normal as soon as possible. The data in this example relate to how soon (in minutes) systolic blood pressure returned to 100 mm of mercury after discontinuing the medication. There are two predictor variables: the log of the dose of the particular drug used and the average systolic blood pressure of the patient during administration of the drug. A generalised additive model was fitted, using splines (in fact, cubic Bsplines) to effect the smoothing. Figures XXX7 and XXX8 show, respectively, a plot of the transformed Log(dose) against observed Log(dose) values and a plot of the transformed blood pressure during administration against the observed values. (There is some nonlinearity evident in both these plots  although that in the Log(dose) plot seems to be attributable to a single point.). Predictions to new data points are made by adding to together the predictions from each of these components separately.
Figure XXX7: The transformation function of log(dose) in the model for predicting time for blood pressure to revert to normal.
EMBED SPLUSGraphSheetFileType
Figure XXX8: The transformation function of blood pressure during administration in the model for predicting time for blood pressure to revert to normal.
EMBED SPLUSGraphSheetFileType
(
9.5.2 Projection pursuit regression
Projection pursuit regression models can be proven to have the same ability to estimate arbitrary functions as neural networks, but they are not as widely used. This is perhaps unfortunate, since estimating their parameters can have advantages over the neural network situation. The additive models of the last section essentially focus on individual variables (albeit, transformed versions of these). Such models can be extended so that each additive component involves several variables, but it is not clear how best to select such subsets. If the total number of available variables is large, then one may also be faced with a combinatorial explosion of possibilities.
The basic projection pursuit regression model takes the form
EMBED Equation.2
This has obvious close similarities to the neural network model  it is a linear combination of (potentially nonlinear) transformations of linear combinations of the raw variables. Here, however, the EMBED Equation.2 functions are not constrained (as in neural networks) to take a particular form, but are usually found by smoothing, as in generalised additive models. This makes them a generalisation of neural networks. Various forms of smoothing have been used, including spline methods, Friedmans supersmoother (which makes a local linear fit about the point where the smooth is required), and various polynomial functions.
The term projection pursuit arises from the viewpoint that one is projecting X in direction EMBED Equation.2 , and then seeking directions of projection which are optimal for some purpose. (In this case, optimal as components in a predictive model.)
Various algorithms have been developed to estimate the parameters. In one, components of the sum are added sequentially up to some maximum value, and then sequentially dropped, each time selecting on the basis of least squares fit of the model to the data. For a given number of terms, the model is fitted using standard iterative procedures to estimate the parameters in the EMBED Equation.2 .
9.6 Tree models
The basic principle of tree models is to recursively partition the space spanned by the explanatory variables until each cell contains cases which have similar values of the response variable. In the case of a nominal response variable, where classification is the aim, one would ideally end up with cells each of which contained members from only one class (this is usually an unattainable ideal). Thus, for example, with three explanatory variables, x, y, and z, one might split x at a value EMBED Equation.2 , so that the explanatory space is divided into two domains. Each of these domains is then itself split into two, perhaps again at some threshold on x or perhaps at some threshold on y or z. This process is repeated as many times as necessary (see below), with each branch point defining a node of a tree. To predict the response value for a new case with known values of explanatory variables, one works down the tree, at each node choosing the appropriate branch by comparing the new cases with the threshold value of the variable for that node It is possible to write such models in forms similar to the methods described in previous sections, in terms of a set of basis functions, but this is probably not the clearest way to describe this class of models.
Tree models have been around for a very long time, although formal methods of building them are a relatively recent innovation. Before the development of such methods they were constructed on the basis of human understanding of the processes and phenomena concerned. Tree models have many attractive properties. They are easy to understand and explain. They can handle mixed variables (continuous and discrete, for example) with ease. They can predict the response variable for a new case very quickly (indeed, low level coding permits them to operate very rapidly indeed). They are also very flexible, so that they can provide a powerful predictive tool. Having said that, they essentially sequential nature, which is reflected in the way they are constructed, can sometimes lead to suboptimal partitions of the space of explanatory variables.
The basic strategy for building tree models is simplicity itself. One simply recursively splits the cells of the space of explanatory variables. To split a given cell (equivalently, to choose the variable and threshold on which to split the node) one simply searches over all variables and all possible thresholds to find that which leads to greatest improvement in predictive power. Predictive power is assessed on the basis of the design set elements. Thus, for example, if the aim is to predict the value of a continuous response variable, one might use sum of squared errors as the criterion: this necessarily is reduced (or, at least, cannot increase) when a region is divided so that different predicted values may be used in the two halves. Or, if the aim is to predict to which of two classes of an object belongs, one might decide to assign objects in each leaf node to the class which has most design set elements in that node. Now, once again, splitting a node cannot lead to a deterioration in predictive power.
In principle, this splitting procedure could be continued until each leaf node contained a single design set element  or, in the case when some design set elements have identical vectors of explanatory variables (which can happen if the explanatory variables are categorical) continuing until each leaf node contains only design set elements with identical explanatory variables. However, this can lead to severe overfitting. Better trees (in the sense that they lead to better predictions on new data drawn from the same distributions) can be obtained by not going to such an extreme.
Early work sought to achieve this by stopping the growing process before the extreme had been reached (this is analogous to avoiding overfitting in neural networks by terminating the convergence procedure). However, this suffers from a consequence of the sequential nature of the procedure: it is possible that the best improvement which can be made at the next step is only very small, so that growth stops, while the step after this could lead to substantial improvement in performance. The poor step was necessary to set things up so that the next step could take advantage of it. There is nothing specific to trees about this, of course. It is a general disadvantage of sequential methods: precisely the same applies to the stepwise regression search algorithms mentioned in Section 9.2  which is why more sophisticated methods involving stepping forwards and backwards were developed. For tree methods similar algorithms have evolved. A common strategy is to build a large tree  perhaps even the largest possible tree, mentioned above  and then to prune it back. At each stop the two leaf nodes are merged which lead to least reduction in predictive performance on the design set. Of course, more sophisticated methods use crossvalidation or other tools to decide how far back to prune.
One disadvantage of the basic form of tree is that it is monothetic. Each node is split on just one variable. Sometimes, in real problems, the response variable changes most rapidly with a combination of explanatory variables. For example, in a classification problem involving two explanatory variables, it might be that one class is characterised by having low values on both variables while the other has high values on both variables. The decision surface for such a problem would lie diagonally in the explanatory variable space. Standard methods would try to achieve this by multiple splits, ending up with a staircaselike approximation to this diagonal decision surface. The optimum, of course, would be achieved by using a threshold defined on a linear combination of the explanatory variables  and some extensions to tree methods do just this, permitting linear combinations of the raw explanatory variables to be included in the set of possible variables to be split. Of course, this complicates the search process required when building the tree.
9.7 Nearest neighbour methods
Nearest neighbour methods are applicable in the special case in which the response variable is nominal, so that the problem is one of classification. At their basic level, such methods are very straightforward: to classify a new object, with explanatory vector x, one simply examines the k closest design set points to x and assigns the object to the class which has the majority of points amongst these k. Close here, is defined in terms of the explanatory variables. Thus one is seeking those design set objects which are most similar to the new object, in terms of the explanatory variables, and classifying the new object into the most heavily represented class amongst these most similar objects.
Of course, this simple outline leaves a lot unsaid. In particular, we must choose a value for k and a metric through which to define close. The most basic form takes k=1. This makes a rather unstable classifier, and the predictions are made more consistent by increasing k. However, increasing k means that the design set points now being included are not necessarily very close to the object to be classified, so that the predicted probability may be biased from the true probability at the point in question. We are back at the ubiquitous issue of the bias/variance tradeoff. There is theoretical work on the best choice of k, but since this will depend on the particular structure of the data set, as well as general issues, the best strategy for choosing k seems to be a data analytic one: try various values, plotting the performance criterion (misclassification rate, for example) against k, to find the best. In following this approach, the evaluation must be on a data set independent of the design data (or else the usual problem of overoptimistic results ensues). However, it would be unwise to reduce the design set too much by splitting off too large a test set, since the best value of k clearly depends on the number of points in the design set. A leavingoneout approach might be a suitable strategy to follow.
Many applications of nearest neighbour methods adopt a Euclidean metric: if x is the explanatory vector for the point to be classified, and y is the explanatory vector for a design set point, then the distance between them is EMBED Equation.2 . The problem with this, of course, is that it does not provide an explicit measure of the relative importance of the different explanatory variables. One could, alternatively, use EMBED Equation.2 , where the EMBED Equation.2 are weights. The appearance that the Euclidean metric makes, of not requiring a choice of weights, is illusory  consider what happens if we change the units of measurement of one of the variables. (An exception is when all variables are measured in the same units  as, for example, with repeated measures data.)
In the two class case, an optimal metric would be one defined in terms of the contours of probability of belonging to class 1 (say): P(1x). Design set points on the same contour as x have the same probability of belonging to class 1 as does a point at x, so no bias is introduced by including them in the k nearest neighbours. This is true no matter how far from x they are, provided they are on the contour. In contrast, points close to x but not on the contour of P(1x) through x will have different probabilities of belonging to class 1, so including them amongst the k will tend to introduce bias. Of course, all this is all very well, but we do not know the positions of the contours. If we did, we would not need to undertake the exercise at all. What this means is that, in practice, one estimates approximate contours and bases the metrics on these. Both global (e.g. estimating the classes by multivariate normal distributions) and local (e.g. iterative application of nearest neighbour methods) have been used for finding approximate contours.
Nearest neighbour methods are closely related to kernel methods. The basic kernel method defines a cell by a fixed bandwidth and calculates the proportion of points within this cell which belong to each class. This means that the denominator in the proportion is a random variable. The basic nearest neighbour method fixes the proportion (at k/n) and lets the bandwidth be a random variable. More sophisticated extensions of both methods (for example, smoothly decaying kernel functions, differential weights on the nearest neighbour points according to their distance from x, choice of bandwidth which varies according to x) lead to methods which are barely distinguishable.
The nearest neighbour method has several attractive properties. It is very easy to program. Its classification accuracy can be very good, comparing favourably with alternative more modern methods such as neural networks. It permits very easy application of the reject option, in which a decision is deferred if one is not sufficiently confident about the predicted class. Extension to multiple classes is straightforward (though the best choice of metric is not so clear here). Handling missing values (in the vector for the object to be classified) is simplicity itself: one simply works in the subspace of those variables which are present.
From a theoretical perspective, the nearest neighbour method is a valuable tool: as the design sample size increases, so the bias of the estimated probability will decrease. If one can contrive to increase k at a suitable rate (so that the variance of the estimates also decreases), the misclassification rate of a nearest neighbour rule will converge to a value related to the Bayes error rate (the rate which could be achieved with perfect knowledge of the probability distributions). For example, an early result is that the asymptotic nearest neighbour misclassification rate is bounded above by twice the Bayes error rate.
A potential drawback of nearest neighbour methods is that they do not build a model, relying instead on retaining all of the design set points. If the design set is large, searching through them to find the k nearest could be a timeconsuming process. Methods have been developed for accelerating this search. For example, branch and bound methods can be applied: if it is already known that at least k points lie within a distance d of the point to be classified, then a design set point is not worth considering if it lies within a distance d of a point already known to be further than 2d from the point to be classified. This involves preprocessing the design set. Other preprocessing methods discard certain design set elements. For example, condensed nearest neighbour and reduced nearest neighbour methods selectively discard design set points so that those remaining still correctly classify all other design set points. The edited nearest neighbour method discards isolated points from one class which are in dense regions of another class, so smoothing out the empirical decision surface.
9.8 Other methods
A huge number of predictive methods have been developed in recent years. Many of these have been powerful and flexible methods, in response to the exciting possibilities offered by modern computing power. We have outlined some of these above, showing how they are related. But other methods also exist  in just one chapter of one book it is not feasible to do justice to all of them. Furthermore, development and invention have not finished. Exciting work continues even as we write. Examples of methods which we have not had space to cover are:
 Classical linear discriminant analysis. This can be approached from various perspectives. In the two class case, one seeks that linear combination of the explanatory variables which is maximally correlated with a binary class indicator variable. New points are classified by projecting them onto this linear combination and seeing to which side of a threshold the lie. Alternatively, one can (equivalently) describe the method in terms of an assumption that each of the two classes follow multivariate normal distributions with the same covariance matrix, classifying a new point according to its distance from the means of the distributions. This perspective permits ready generalisation to multiple classes.
 Quadratic discriminant analysis is similar, but relaxes the assumption of equal covariance matrices.
 Regularised discriminant analysis is based on a linear combination of linear and quadratic discriminant analysis, providing a smoothing which decreases the chance of overfitting resulting from the large number of parameters estimated in quadratic discriminant analysis.
 Multivariate adaptive regression splines divide the space of explanatory variables into disjoint regions, with a polynomial approximation to the function being defined in each region, constrained so that the overall model is continuous.
 Support vector machines fit a predictive model to the response variable which is as flat as possible, subject to the constraint that the maximum error in prediction is less than some value.
It will be seen from the above, that often a flexible model is fitted, which is then smoothed in some way to avoid overfitting (or the two processes occur simultaneously), and hence strike a suitable compromise between bias and variance. This is manifest in weight decay in fitting neural networks, in regularisation in discriminant analysis, in the flatness of support vector machines, and so on. A rather different strategy, which has proven highly effective in predictive modelling, is to estimate several (or many) models and average their predictions. This clearly has conceptual similarities to Bayesian approaches, which explicitly regard the parameters of a model as being randomly drawn from some distribution, so that a prediction is based on averaging over the values in this distribution.
Averaging is not the only way of combining separate models. Another general approach arises if the predictions of the different models are regarded as the input variables for a higher level predictive model (this structure clearly has similarities to neural network structures). Sometimes, majority voting amongst the inputs is used, but more general ways of combining them are also sometimes adopted.
9.9 Further Reading
The seminal text on generalised linear models is that of McCullagh and Nelder (1989). A good introduction to neural networks is given by Bishop (1995) and more general books which include discussion of neural networks are Ripley (1996) and Hand (1997). These also include discussion of models such as projection pursuit, generalised additive models, and trees. An early and influential work on trees was the book by Breiman et al (1984). A comprehensive outline of generalised additive models is the book of Hastie and Tibshirani (1990). General descriptions of nearest neighbour methods, including outlines of methods for reducing the size of the retained set, may be found in Hand (1981) and McLachlan (1992). Choice of metric for nearest neighbour methods is discussed in Short and Fukunaga (1981), Fukunaga and Flick (1984), and Myles and Hand (1990). The use of the reject option with nearest neighbour methods is discussed in Devijver and Kittler (1982), which also discusses other supervised classification methods which predate the renaissance of neural networks. Asymptotic properties of nearest neighbour rules are described in Devroye and Wagner (1982).
The computer CPU data set, the oxygen uptake data set, the ear infections in swimmers data set, and the blood pressure after surgery data are given in Hand et al (1994). The temperature and latitude data are from Peixoto (1990).
References
Bishop C.M. (1995) Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Breiman L., Freidman J.H., Olshen R.A., and Stone C.J. (1984) Classification and Regression Trees. Belmont, California: Wadsworth.
Devijver P.A. and Kittler J. (1982) Pattern Recognition: a Statistical Approach. Englewood Cliffs, New Jersey: PrenticeHall.
Devroye L.P. and Wagner T.J. (1982) Nearest neighbour methods in discrimination. In Handbook of Statistics, (Vol.2), P.R.Krishnaiah and L.N.Kanal (eds.) Amsterdam: NorthHolland, 193197.
Hand D.J. (1981) Discrimination and Classification. Chichester: Wiley.
Hand D.J. (1997) Construction and Assessment of Classification Rules. Chichester: Wiley.
Hand D.J., Daly F., Lunn A.D., McConway K.J., and Ostrowski E. (eds.) (1994) A Handbook of Small Data Sets. London: Chapman and Hall.
Hastie T.J. and Tibshirani R.J. (1990) Generalized Additive Models. London: Chapman and Hall.
McCullagh P. and Nelder J.A. (1989) Generalised Linear Models. (2nd ed.), London: Chapman and Hall.
McLachlan G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley.
Myles J.P. and Hand D.J. (1990) The multiclass metric problem in nearest neighbour discrimination rules. Pattern Recognition, 23, 12911297.
Peixoto J.L. (1990) A property of wellformulated polynomial regression models. American Statistician, 44, 2630.
Ripley B.D. (1996) Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
.Ahe S:V)V)pp Ʃ C (SRoot Entry P#mXP#SWordDocumentO ObjectPoolaP#aP#SummaryInformation(
!"#$%&'()*+,./0123456789:;<=>?QDMNgUTVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{C:;M`!@Shij{tu
/
0
r Kr Kr Kr Kr Kr r r r r r r r r r r r r r r r r r r r r r r r r r r r 488.) 4814.) !0
`
a
45tu9:RSfg@ A !!%%r r r r r r r =
r r r r r r vr r r r
r r .r r 5r r
r r r r r r r r r r {
r 488.) 488.) "C
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~%A+B++++?,@,e,f,2266;
;;; ;>>BBBBB7C\C]C=J>JNNPPQr r r ;r r r ]r r 3r r jr r Zr r (r r r r r r r r r r r r ;r r r r r r r r r 748n488.) #QQ1R2RTRvRwRRRR)S*SWWWW5[6[m]n]cdLdMdpddddddr r r r r r r r r r r r r r [r r Vr r r r r r r r r r r r r 488.) 7488.) 488.) Root Entry P#`BP#SWordDocumentObjectPoolaP#aP#SummaryInformation(
!"#$%&'()*+,./0123456789:;<=>?QUTVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{DocumentSummaryInformation8_974553966oFaP#P#_971939398iFP#P#_971943645cFP#P#
!"#$%&'()*+1356789:;<=>?AFHIJKLMOTVWXYZ\acdefginpqrstvz{}՜.+,0HPltThe Open UniversityxdPrinciples of Data MiningOh+'0 (
P\h
tPrinciples of Data MiningdoDavid HanddzENormal.dotTBT3GMicrosoft Word for Windows 95@G@@<U"@P#/kaqJlJ$rJ
wj
FMicrosoft Equation 2.0DS EquationEquation.29q_971939485]FP#) P#_974553980\WF) P#) P#_974553996QF) P#) P#_971940585KF) P#) P#_971940603EF) P#Q)P#_974554080
?FQ)P#Q)P#_972908310b9FQ)P#Q)P#_9729085323FQ)P#Q)P#_9721996738 FQ)P#0P#_972908617'F0P#0P#_974554265!F0P#0P#_971942261F0P#:P#_972908684F:P#:P#_972908712F:P#:P#_971943046 F:P#AP#_972024428 FAP#AP#_972024453FAP#AP#_972908815FAP#AP#_972908851F`JP#`JP#_971944060F`JP#`JP#_971962572F`JP#`JP#_971962630F`JP#RP#_971962656FRP#RP#_9719441682FRP#RP#_9720247634FRP#@[P#_972198464F@[P#@[P#_972198463#!F@[P#@[P#_972198462F@[P#`McP#_972198461G"1YE7DEST`McP#`McP#_972198460F`McP#`McP#_972198459I%1YE7DEST`McP# ulP#_972199972*F ulP# ulP#_972200298'1F ulP#@tP#_972200323F@tP#@tP#_972200235F@tP#@tP#_971945813F@tP#>}P#_971963117F>}P#>}P#_971963181,6F>}P#>}P#_971963215{F>}P# ߄P#_972200977(OuF ߄P# ߄P#_972200976oF ߄P# ߄P#_972200975)0iF ߄P#P#_971945979+5cFP#P#_971946089]FP#P#_972024905WFP#P#_97194615037QFP#P#_971963638.:KFP#P#_971946412EFP#ϞP#_972024979B?FϞP#ϞP#_9720250009FϞP#pP#_97196472041YE7DESTpP#aP#_972030895.FaP#aP#_9720308449;(FaP#P#_972031225<@"FP#P#_972043421AFP#P#_972043530FP#`*P#_972043497>?F`*P#`*P#_972043367
F`*P#P#_972043627=$FP#P#_972216438/WFP#P#_972043937HFP#@P#_972044019F@P#@P#_972043956DEF@P#`P#_972044043F&F`P#`P#_972043946F`P#`P#_972046617MKF`P# P#_972046576F P# P#_972197006F P#@]P#_972044661F@]P#@]P#_972045591LJF@]P#&P#_972204935RF&P#MP#_972204968NQFP#P#_972205369FP#P#_972205491PSFP#P#_9722040311YE7DESTP#
P#_972212521F
P#
P#_9722181221YE7DEST
P#P#_972285003TVFP#P#_972285091FP#P#_972285164UYFP#P#_972285244ZFP#P#_972285284X[F`'P#`'P#_972285280vF`'P#`'P#_972285423pF`'P#I/P#_972285474CajFI/P#I/P#_972285485_dFI/P#@q8P#_972285508]`^F@q8P#@q8P#_972285507XF@q8P#`@P#_972286179RF`@P#`@P#_972904708gLF`@P# :IP#_972904762FF :IP# :IP#_972289097ef@F :IP#@PP#_972288834^c:F@PP#@PP#_9722888904F@PP#ZP#_972289476.FZP#ZP#_972289507dm(FZP# aP#_972289853"F aP# aP#_972289928hjF aP#jP#_972289944FjP#jP#_972461470pFjP#jP#_972290815nk
FmrP#mrP#_972290466ilFmrP#mrP#_972290540oF{P#{P#_972290762F{P#{P#_972290917F{P#5P#_974556618
F5P#`P#_974698510F`P#@P#_974698509trF@P#`YP#_974698508F P#@"P#_974698507wsF@"P#QP#_974698506FQP#P#_974698505xvFP#P#_974698504FP#P#_974698503}uFP#P#_974698502FP#`uP#_974698501zF`uP#@>P#_974698500F`P# P#_974698499{F P#P#_974698498FP#"P#_974698497~F"P#a3P#_974698496Fa3P#*DP#_974698494yF*DP#TP#_974698493FTP#\P#_974698492F`eP#]mP#_974698491yF]mP#`&~P#_974698490sF`&~P#@P#_974698489mF@P#P#_974698488gFP#ߨP#_974698487aFߨP#P#_974698486[FP#qP#_974698485UFqP#P#_974698484OFP#@P#_974698483IF@P#^P#_974698482CF %P#GP#_974698481=FGP#_P#_9746984807F_P#pP#_9746984791FpP#`P#_974698478+F`P#`P#_974698477%F`P#P#_974698476FP#
P#_974698475F
P#`P#_974698474F`P#@!P#_974698473
F@!P# 2P#_974698472F 2P#VCP#_974698471F JP#TP#_974698470FTP#dP#_974698469FdP#uP#_974698468FQ}P#P#_974698467FP#@P#_974698466F@P# ԸP#_9746984651YE7DEST ԸP# >P#_9746984641YE7DEST >P#P#_974698463FP#P#_974698462qFP#P#_974698461FP#`P#_974698460F`P#@mP#_974698459F@mP# 6P#_974698458F 6P#P#_974698457FP#P#_974698456FP# P#~X
AB^hSK= A@PD:jzLUY`~0
1
D
E
F
G
/
uD:eKuD:acvK
uD9eKuD9acvK
uDݮ9eKuDݮ9acvK
uDF9eKuDF9acvK
uDn:eKuDn:acvKuDVUUc:/012Vn!&'()RS67:;NOPQ~
U
uDg9eKuDg9acvK
uD:eKuD:acvK
uD9eKuD9acvK
uD9eKuD9acvKVuD
uD:eKuD:acvK@#$%&OPcdef56@A89>?Rž
uDh9eKuDh9acvK
uDu9eKuDu9acvK
uD:eKuD:acvKV
uDIh9eKuDIh9acvKuDuD9UeKuD9acvKuDg9UeKuDg9acvK
uDUU3RSTUgh{}~gh{}~!"{yU
uD9eKuD9acvK
uD3i9eKuD3i9acvK
uDi9eKuDi9acvK
uD9eKuD9acvKV
uDl9eKuDl9acvK
uD9eKuD9acvKuD
uDUuDh9UeKuDh9acvK/"9P ' ( ; < = > i j } ~ ""=#>###%%%%%%%%T&U&h&i&j&k&
((!(
uD@9eKuD@9acvK
uD9eKuD9acvKU
uD9eKuD9acvK
uD 9eKuD 9acvK
uD9eKuD9acvK
uD9eKuD9acvKuDV5!("(#($(((;)<)))))))++>+?+B+O+++++++++&,',:,;,<,=,@,A,a,b,c,d,7.8.//////00#0$0%0&0
uDj9eKuDj9acvK
uD$9eKuD$9acvK
uD;9KuD;9vK
uD<9eKuD<9acvK
uD=9KuD=9vKU
uD>9eKuD>9acvKVuD
uD?9eKuD?9acvK5&0:0;0N0O0P0Q0f0g0z0{00}000000000000:1;1M1S11122a2b2222222223.3A3B3C3D3p3q33333333ü
uD9eKuD9acvK
uD9eKuD9acvK
uDU9eKuDU9acvKVJeUJaUU
uD+9eKuD+9acvK
uD9eKuD9acvKuD8333304144444444455*5+5,55>5?5R5S5T5U5m5n5o5s566666666667777,77@7A7B7C7G7
uDI9eKuDI9acvK
uDi9eKuDi9acvK
uD9eKuD9acvK
uD9eKuD9acvK
uD9eKuD9acvKV
uD9eKuD9acvKUuD
uDO9eKuDO9acvK2G7H7[7\7]7^7777777777777899
9V9j99999s:t:::::::::; ;
;;
;;;;R@Z@@@BBBBB7C8CJcVccUc
JUcU
uD9eKuD9acvK
uD9eKuD9acvKV
uD9eKuD9acvK
uD9eKuD9acvK
uD9eKuD9acvKuD88CXCYCZC[CCCCCCCCCVDWDiDjD}D~DDDDDEEIIIIIIJJJ(K*K+K>K?K@KAKKKKKKKKKKKKKKK5L6LIL
uD
59eKuD
59acvK
uD49eKuD49acvK
uD9eKuD9acvK
uD9eKuD9acvK
uD9eKuD9acvKVuD
uD09KuD09vK9ILJLKLLLM McMdMwMxMyMzMMMMMMMMMMMMMNNOOfOgOPPP PPP8P9PPPQQRRRRRRRRRRRRRRRR
uD69eKuD69acvK
uD69eKuD69acvKU
uDv9eKuDv9acvK
uDk59eKuDk59acvK
uDg49eKuDg49acvKVuD
uD49eKuD49acvK9RRRRRRRRRRRRSSSSSS#S%SSSFUJUVVVVVV&W'W:W;W?MNuD:acvK
uD:eKuD:acvKU
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvKuDV2UV"#6789TUhijk̯ͯίϯԯկ
45HIüuD:acvK
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvKV
uD:eKuD:acvK
uD:eKuD:acvKUuD
uD:eK0IJKװu}#$789:{ɲʲ./0<eyǸ 3456FGZ[\]fgz{}Ͻ&DNjn»U
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvKJc]cVcUccJc
uD:eKuD:avKVuD
uD:eK;67LMSYkl)*=>?@4Dqr6[4abuvwx
uD:eKuD:acvK
uD:eKuD:acvKU
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvKuDV9jk_`45HIJK]^qrstOPQ]Z[{}~<
uD:KuD:vKUccJc
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvK
uD:eKuD:acvKuDV<<=>?ABDgLM`abc./BCDE67EFYZ[\efyz{VWYZ`ast.OPTU
uD۷:eKuD۷:acvKV
uDܷ:eKuDܷ:acvK
uDݷ:eKuDݷ:acvK
uD:eKuD:acvK
uD߷:eKuD߷:acvKUJccuD
uD:KuD:vK8> ?
m
n
()67Z[nopq}~[\@ASTUV
uDط:eKuDط:acvK
uDٷ:eKuDٷ:acvK
uDڷ:eKuDڷ:acvKuDUVMV>?opyz= > \ ] !!!!U"n""
#/01145955555 6C66697O7777.8888
9N9g9k9m999\:o:q:s:::::;.;Z;hVUAOle
PIC
LMETApCompObjfObjInfoEquation Native
4Ole
.PIC
,LMETACompObjfObjInfoEquation Native Ole
BPIC
@LMETA4CompObj2fObjInfo/Equation Native 0tOle
PPIC
NLMETAGCompObjEfObjInfoCEquation Native D8Ole
]PIC
[LMETAUhCompObjSfObjInfoQEquation Native R@Ole
jPIC
hLMETAbhCompObj`fObjInfo^Equation Native _@Ole
wPIC
uLMETAopCompObjmfObjInfokEquation Native l4Ole
PIC
LMETA~CompObjfObjInfoxEquation Native yOle
PIC
LMETAObjInfoContents 1Ole
PIC
LMETAHObjInfoContents9Ole
PIC
LMETACompObjfObjInfoEquation Native Ole
PIC
LMETAhCompObjfObjInfoEquation Native Ole
PIC
LMETA0CompObjfObjInfoEquation Native Ole
PIC
LMETAHCompObjfObjInfoEquation Native Ole
PIC
LMETACompObjfObjInfoEquation Native Ole
)PIC
'LMETACompObjfObjInfoEquation Native xOle
PPIC
NLMETA2CompObj0fObjInfo*Equation Native
+Ole
dPIC
bLMETAVCompObjTfObjInfoQEquation Native RXOle
yPIC
wLMETAjCompObjhfObjInfoeEquation Native flOle
PIC
LMETACompObjfObjInfozEquation Native {Ole
PIC
!LMETApCompObj #fObjInfoEquation Native "hOle
PIC
$'LMETACompObj&)fObjInfoEquation Native (hOle
PIC
*LMETAhCompObj,/fObjInfoEquation Native .4Ole
PIC
03LMETACompObj25fObjInfoEquation Native 4Ole
PIC
69LMETACompObj8;fObjInfoEquation Native :HOle
PIC
<?LMETACompObj>AfObjInfoEquation Native @Ole
(PIC
BE&LMETACompObjDGfObjInfoEquation Native FHOle
:PIC
HK8LMETA.CompObjJM,fObjInfo)Equation Native L*HOle
SPIC
NQQLMETAACompObjPS?fObjInfo;Equation Native R<Ole
ePIC
TWcLMETAYCompObjVYWfObjInfoTEquation Native XUHOle
PIC
Z]LMETAmCompObj\_kfObjInfofEquation Native ^gOle
PIC
`cLMETACompObjbefObjInfoEquation Native dHOle
PIC
fiLMETACompObjhkfObjInfoEquation Native jHOle
PIC
loLMETACompObjnqfObjInfoEquation Native p\Ole
PIC
ruLMETACompObjtwfObjInfoEquation Native vDOle
PIC
x{LMETAdCompObjz}fObjInfoEquation Native Ole
PIC
~LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETA4CompObjfObjInfoEquation Native HOle
.PIC
,LMETACompObjfObjInfo
Equation Native Ole
BPIC
@LMETA4CompObj2fObjInfo/Equation Native 0xOle
QPIC
OLMETAGCompObjEfObjInfoCEquation Native D8Ole
bPIC
`LMETAVDCompObjTfObjInfoREquation Native S@Ole
vPIC
tLMETAhCompObjffObjInfocEquation Native dTOle
PIC
LMETA{4CompObjyfObjInfowEquation Native x<Ole
PIC
LMETACompObjfObjInfoEquation Native (Ole
PIC
LMETALCompObjfObjInfoEquation Native DOle
PIC
LMETACompObjfObjInfoEquation Native (Ole
PIC
LMETACompObjfObjInfoEquation Native (Ole
PIC
LMETAHCompObjfObjInfoEquation Native Ole
PIC
LMETA8CompObjfObjInfoEquation Native Ole
PIC
LMETACompObjfObjInfoEquation Native dOle
PIC
LMETAhCompObj
fObjInfoEquation Native 4Ole
)PIC
'LMETApCompObjfObjInfoEquation Native hOle
6PIC
4LMETA.pCompObj,fObjInfo*Equation Native +4PIC
ELMETA7
TCompObjChObjInfoBWordDocument~_SummaryInformation(;DocumentSummaryInformation87Ole
WPIC
ULMETAL8CompObjJfObjInfoIEquation Native GLOle
dPIC
bLMETA\hCompObjZfObjInfoYEquation Native X4Ole
PIC
zLMETAjCompObjhfObjInfogEquation Native eOle
PIC
LMETACompObjfObjInfoEquation Native }tOle
PIC
LMETAhCompObj
fObjInfoEquation Native dOle
PIC
LMETACompObjfObjInfo
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~Equation Native TOle
PIC
LMETACompObjfObjInfoEquation Native (Ole
PIC
LMETA,CompObjfObjInfo Equation Native ,Ole
PIC
!$LMETAxCompObj#%fObjInfo&Equation Native Ole
PIC
'*LMETApCompObj)+fObjInfo,Equation Native 4Ole
PIC
0LMETApCompObj/1fObjInfo2Equation Native 4Ole
!PIC
36LMETA$CompObj57fObjInfo8Equation Native Ole
.PIC
9<,LMETA&pCompObj;=$fObjInfo>#Equation Native "4Ole
;PIC
?B9LMETA3pCompObjAC1fObjInfoD0Equation Native /4Ole
MPIC
EHKLMETAApCompObjGI?fObjInfoJ>Equation Native <hOle
ZPIC
KNXLMETARhCompObjMOPfObjInfoPOEquation Native N4Ole
mPIC
QTkLMETA`CompObjSU^fObjInfoV]Equation Native [XOle
zPIC
WZxLMETArhCompObjY[pfObjInfo\oEquation Native n4Ole
PIC
]`LMETAhCompObj_a}fObjInfobEquation Native {4Ole
PIC
cfLMETAhCompObjegfObjInfohEquation Native 4Ole
PIC
ilLMETAhCompObjkmfObjInfonEquation Native 4Ole
PIC
orLMETApCompObjqsfObjInfotEquation Native hOle
PIC
uxLMETAhCompObjwyfObjInfozEquation Native 4Ole
PIC
{~LMETAhCompObj}fObjInfoEquation Native 4Ole
PIC
LMETApCompObjfObjInfoEquation Native 4Ole
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETACompObjfObjInfoEquation Native Ole
PIC
LMETA,$ObjInfoContentsTOle
PIC
LMETAhCompObjfObjInfo
Equation Native ,Ole
PIC
LMETAJObjInfoContents>:Ole
1PIC
/LMETA"CompObj fObjInfoEquation Native xOle
HPIC
FLMETA8dCompObj6fObjInfo5Equation Native 2Ole
UPIC
SLMETAMhCompObjKfObjInfoJEquation Native I4Ole
bPIC
`LMETAZhCompObjXfObjInfoWEquation Native V4Ole
oPIC
mLMETAgpCompObjefObjInfodEquation Native c4Ole
PIC
LMETAwCompObjufObjInfotEquation Native pOle
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETACompObjfObjInfoEquation Native dOle
PIC
LMETAlCompObjfObjInfoEquation Native Ole
PIC
LMETA(CompObjfObjInfoEquation Native tOle
PIC
LMETAlCompObjfObjInfoEquation Native tOle
(PIC
&LMETADCompObjfObjInfoEquation Native hOle
5PIC
3LMETApCompObj+fObjInfo*Equation Native )4Ole
KPIC
ILMETA;DCompObj9fObjInfo8Equation Native 6hOle
YPIC
WLMETAPCompObj
NfObjInfoMEquation Native L8Ole
mPIC
kLMETA_CompObj]fObjInfo\Equation Native ZdOle
zPIC
xLMETArTCompObjpfObjInfooEquation Native n,Ole
PIC
LMETACompObjfObjInfo ~Equation Native {Ole
PIC
!$LMETAtCompObj#%fObjInfo&Equation Native 4Ole
PIC
'*LMETAhCompObj)+fObjInfo,Equation Native 4Ole
PIC
0LMETApCompObj/1fObjInfo2Equation Native 4Ole
PIC
36LMETAX*ObjInfo57ContentsOle
PIC
8;LMETACompObj:<fObjInfo=Equation Native Ole
PIC
>ALMETAhCompObj@BfObjInfoCEquation Native 4Ole
PIC
DGLMETApCompObjFHfObjInfoIEquation Native 4Ole
PIC
JMLMETA\CompObjLNfObjInfoOEquation Native Ole
.PIC
PS,LMETAxCompObjRTfObjInfoUEquation Native Ole
<PIC
VY:LMETA3CompObjXZ1fObjInfo[0Equation Native /8Ole
JPIC
\_HLMETAACompObj^`?fObjInfoa>Equation Native =8Ole
WPIC
beULMETAOpCompObjdfMfObjInfogLEquation Native K4Ole
hPIC
hkfLMETA]4CompObjjl[fObjInfomZEquation Native XXOle
PIC
nqzLMETAnCompObjprlfObjInfoskEquation Native idOle
PIC
twLMETApCompObjvxfObjInfoy~Equation Native }4Ole
PIC
z}LMETATCompObj~fObjInfoEquation Native ,Ole
PIC
LMETAhCompObjfObjInfoEquation Native Ole
PIC
LMETACompObjfObjInfoEquation Native Ole
PIC
LMETACompObjfObjInfoEquation Native dOle
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETAhCompObjfObjInfoEquation Native 4Ole
PIC
LMETACompObjfObjInfoEquation Native DOle
PIC
LMETA pCompObj fObjInfo Equation Native 4Ole
PIC
LMETAObjInfo Contents7Ole
$ PIC
" LMETA pCompObj fObjInfo Equation Native 4Ole
( PIC
& LMETAObjInfo% Contents7Ole
: PIC
8 LMETA. hCompObj, fObjInfo+ Equation Native ) tOle
K PIC
I LMETA@ CompObj> fObjInfo= Equation Native ; TOle
X PIC
V LMETAP pCompObjN fObjInfoM Equation Native L 4Ole
e PIC
c LMETA] pCompObj[ fObjInfoZ Equation Native Y 4Ole
v PIC
t LMETAk CompObji fObjInfoh Equation Native f HOle
PIC
LMETA CompObjz fObjInfoy Equation Native w lOle
PIC
LMETA pCompObj fObjInfo Equation Native 4Ole
PIC
LMETA CompObj fObjInfo Equation Native 8Ole
PIC
LMETA CompObj fObjInfo Equation Native dOle
PIC
LMETA CompObj fObjInfo Equation Native dOle
PIC
LMETA CompObj fObjInfo Equation Native `Ole
PIC
LMETA CompObj fObjInfo Equation Native Ole
PIC
LMETA
pCompObj
fObjInfo
Equation Native
4Ole
*
PIC
(
LMETA
CompObj
fObjInfo
Equation Native
Ole
?
PIC
=
LMETA0
CompObj.
fObjInfo
Equation Native +
`Ole
L
PIC
J
LMETAD
pCompObjB
fObjInfoA
Equation Native @
4Ole
V
PIC
T
LMETAQ
CompObjO
fObjInfoN
Equation Native M
$Ole
c
PIC
#a
LMETA[
pCompObj"$Y
fObjInfo%X
Equation Native W
4Ole
p
PIC
&)n
LMETAh
pCompObj(*f
fObjInfo+e
Equation Native d
4Ole
PIC
,/
LMETAv
4CompObj.0t
fObjInfo1s
Equation Native q
`Ole
PIC
25
LMETA
CompObj46
fObjInfo7
Equation Native
dOle
PIC
8;
LMETA
CompObj:<
fObjInfo=
Equation Native
Ole
PIC
>A
LMETA
CompObj@B
fObjInfoC
Equation Native
\Ole
PIC
DG
LMETA
pCompObjFH
fObjInfoI
Equation Native
4Ole
PIC
JM
LMETA
CompObjLN
fObjInfoO
Equation Native
8Ole
PIC
PS
LMETA
CompObjRT
fObjInfoU
Equation Native
\Ole
PIC
VYLMETA
hCompObjXZ
fObjInfo[
Equation Native
,Ole
PIC
\_LMETA
pCompObj^`fObjInfoaEquation Native 4Ole
PIC
beLMETApCompObjdffObjInfogEquation Native 4Ole
5PIC
hk3LMETA%XCompObjjl#fObjInfom"Equation Native Ole
BPIC
nq@LMETA:hCompObjpr8fObjInfos7Equation Native 6,4{ .1@&&MathTypepTimes New RomanH 2
`@w Times New Roman 2
kj>
&
"SystemnL4{@hahlJ(qJmJ
wj
xj
yj
()2j
FMicrosoft Equation 2.0DS EquationEquation.29q) .1
& &MathType4Symbol2
i(4Symbol2
i)Times New Roman 2
@w 2
@Hx 2
@;y Times New Roman 2
2j> 2
+j> 2
j> 2
j>Symbol 2
@Symbol 2
v8 Times New Roman 2
3 2p
&
"SystemnL)TaXqJlJ$rJ
xj
yj
()2j
FMicrosoft Equation 2.0DS EquationEquation.29q; w .1 `& &MathType4Symbol2
i(4Symbol2
i)Times New Roman< 2
@x 2
@y Times New Roman 2
j> 2
oj> 2
j>Symbol 2
@GSymbol 2
v8 Times New Roman< 2
2p
&
"SystemnL; <qJlJ$rJ
t1x
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1&&MathTypePTimes New Roman< 2
`4tk Times New Roman$ 2
"xb Times New Roman< 2
1p
&
"SystemnL4,@$qJlJ$rJ
Symbolak
FMicrosoft Equation 2.0DS EquationEquation.29qW4 .1 &&MathTypePSymbol 2
`1a Times New Roman 2
\kb
&
"SystemnILW4T@$qJlJ$rJ
Symbolak
FMicrosoft Equation 2.0DS EquationEquation.29qW4 .1 &&MathTypePSymbol 2
`1a Times New Roman 2
\kb
&
"SystemnILW4T@qJlJ$rJ
fj
FMicrosoft Equation 2.0DS EquationEquation.29q{ .1@&&MathTypepTimes New Roman@ 2
`fk Times New Roman 2
:j>
&
"SystemnL{hxlJtJmJ
Y=a0
+fj
SymbolakT
X()
+e
FMicrosoft Equation 2.0DS EquationEquation.29q .1&@S&MathTypepSymbol2
(Symbol2
)Times New Roman4 2
(Y 2
fk Times New Roman 2
j> 2
kb 2
T}Symbol 2
= 2
+ 2
K
+Symbol 2
Symbol 2
a 2
he Times New Roman4 2
0pSymbol 2
aTimes New Roman4 2
iX
&
"SystemnL L&XL&XAXG@CTxGraphSheetObjectCDEKIFGHZ[????l?m?G?F333333?.SymbolTimes New RomanCentury Schoolbook Wingdings 1яnnd> ? 48,0,48@ Page2CTxGcGraphSheetCTxCompositeObjectvxeb'@*s( @H CObArray2CTxDisplayParam:c>@????@@?rB8BB5EBlfE ; ;? CGEDLEfFŴ
#EDjB5=u?2>b???@?>i?>i?>?yAc@@@@BȎ?r6@k@CA
ף<]? ; ;r=]?>:c>w@F%>?????Tq҉?A????B5Az@?BB? @@BB A@@@@?B?B5Az@o:@
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~ :c>GEMőeBA!EmB@{FBBhӮ~@h?X?c?X?B>$L>HBB@@@@!$%&'1=>?@CDHILMhijmnoyCTxDisplayLine
YXbeZu6X@;
~
C tW
Iz:J :c>B5EBlfE?? !"#AyCTxDisplayText
?eBP during administration :c>ZBBc?*?0CD!?ZeSmoothed BP during administration :c> CGEDLEh?X? !"#0ACDCTxDisplayAxis?HBpBBB :c>B5EBlfE? !"#ACTxDisplaySegment O OOsOs OsO :c>HBVW ?e50 :c>pBV?e60 :c>BV?e70 :c>BVs?e80 :c> CGEDLE !"#A :c>ZBBc?*?0CD?p @ :c>ZB5EBlfE? !"#AKK
:c>ZBBVWR?Ze20 :c>ZpWR?Ze15 :c>Z WR?Ze10 :c>ZWRK?Ze5 :c>ZWR
?Ze0 :c>Z@WR?Ze5 :c>Z CGEDLE !"#A :c>B5EBlfEeB??5Az@~7BB2/3CAh?X?c?A>? !"#%()*+078=>?@CDHLxOO :c>= CGEDLEeB5=u?2>b?yAc@{FBBhӮ~@ !"#%()*+78=>?@BHLx?5#NBqRBJ\B G_BfaBJiBiB jBbjBpnBoBSoB{BrB[rBrB1sBKKvBNNyBBB;BB3B?BфBB:oBL׆BBUB B.BwbBˈBBYBBBB(B>BzZBBBB%RBBJBq,BPFBsBDBBB :c>B5EBlfE? !"#A5OUOUsOsOOOOOOlOlOOOOOODODOOO O O O ` O` j Oj O O O O O
O
)
O)
5
O5
L
OL
Y
OY
O
O
O
O
O
O
O
XOXOOOO(O(
O
OOnOnOOO :c> CGEDLE !"#ACTxGraphSheetPage
CompositeObject$$GSD2$2vxeb'@*s( @H  :c>@????@@?rB8BB5EBlfE ; ;? CGEDLEBȨBA!EmB5=u?2>b???@@?>i?>i?>?yAc@@@@BDBBhӮ~@
ף ?X? ; ;gie?X?>:c>}>$L>?????TrOAB@????B5Az@?BBHBB@@BB@@@@?B?B5Az@o:@
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~ :c>GEfFŴ
#EDjB@Ȏ?r6@k@CA<]?r=]?w@F%>? @@ A@@!$%&'1=>?@CDHILMhijmnoy(<=(T\EGI
<
Ut7E
_
U
w
0 ~~
%
>Fe :c>B5EBlfE?? !"#Ay
?eBP during administration :c>ZBBn=aR?0CD!?ZeSmoothed BP during administration :c> CGEDLE<]? !"#0ACD??@ @ :c>B5EBlfE? !"#AO
O
OOO :c>?VW?e1.5 :c>@V
?e2.0 :c> @V?e2.5 :c> CGEDLE !"#A :c>ZBBn=aR?0CD? AA :c>ZB5EBlfE? !"#Am m NNN :c>Zq҉? VWR?Ze10 :c>ZWR?Ze0 :c>Z AWRm ?Ze10 :c>ZAWRN?Ze20 :c>Z CGEDLE !"#A :c>B5EBlfEgFŵ
#EjB??5Az@jt?<@LAA<]?? !"#$%'()*+078=>?@CDxOO :c>= CGEDLEfFŴ
#EjB5=u?2>b?yAc@Ȏ?r6@k@CA< !"#$%'()*+78=>?@Bx?5vޙ?Q????U?w?G?t?Jr????B?RB5EBlfE? !"#A5OOOOOO~O~OOOO5O58O8JOJQOQfOfOOOOOO O ? O? { O{ O 5
O5
?
O?
O
O
?O?wOwOO:O:OO^
O^
O
O
O
O
JOJcOcOOO4O4OOOKOKOOO :c> CGEDLE !"#A :c>?
ף B@
CompositeObject$$GSD2$1 CTxYamYamGraphSheetPageCTxYamqAqPage1APage2CompositeObjecttBCTxCoordSystem2DCTxTransform3DCTxMatrix4Dp@p@??{ b? b???&X p4O. "
" " " "    " " " " " " ArialN4%
YXbeZu6X@;
~
C tW
Iz:J. 2
BP during administration .Arial... Arial2
>
!Smoothed BP during administrationOArial .Arial% O %O%O%sOss% OsOOs. 2
50 .. 2
60 .. 2
70 .. 2
s80 .Arial%%%%KKK%
%%... Arial` 2
20OArial .... Arial 2
;
15OArial .... Arial` 2
u10OArial .... Arial2
5OArial .... Arial`2
0OArial .... Arial2
5OArial .Arial%OO "%O%UOUU%sOss%O%O%O%O%O%O%lOll%O%O%O%O%O%O%DODD%O%O%O% O % O % O %` O` ` %j Oj j % O % O % O % O % O %
O
%)
O)
)
%5
O5
5
%L
OL
L
%Y
OY
Y
%
O
%
O
%
O
%
O
%
O
%
O
%
O
%XOXX%O%O%O%O%(O((%
O
%O%O%nOnn%O%OOO
AXG@CTxGraphSheetObjectCDEKIFGHZ[????l?m?G?F333333?.SymbolTimes New RomanCentury Schoolbook Wingdings 1яnnd> ? 48,0,48@ Page1CTxGcGraphSheetCTxCompositeObjectvxeb'@*s( @H CObArray2CTxDisplayParam:c>@????@@?rB8BB5EBlfE ; ;? CGEDLEs4 EDjB5=u?2>b???@?>i?>i?>?yAc@@@@B?Y7@k@CA
ף.<]? ; ;Ѿ!=]?>:c>}@F%>?????T?A????B5Az@?BB? @@BB A@@@@?B?B5Az@o:@
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~ :c>BȨBA!EmB@@DBBhӮ~@ ?X?gie?X?}>$L>HBB@@@@$%&'1=>?@CDHILMhijmnoyCTxDisplayLine
#
uX9beZ6m@N
~
0
C d
Iz=J :c>B5EBlfE?? !"#AyCTxDisplayText
?eLog(dose) :c>ZBBaie?*?0CD?ZeSmoothed Log(dose) :c> CGEDLE ?X? !"#0ACDCTxDisplayAxis?HBpBBB :c>B5EBlfE? !"#ACTxDisplaySegmentAOAOOzOzAOzO :c>HBVWA?e50 :c>pBV?e60 :c>BV?e70 :c>BVz?e80 :c> CGEDLE !"#A :c>ZBBaie?*?0CD?p @ :c>ZB5EBlfE? !"#AKK
:c>ZrOABVWR?Ze20 :c>ZpWR?Ze15 :c>Z WR?Ze10 :c>ZWRK?Ze5 :c>ZWR
?Ze0 :c>Z@WR?Ze5 :c>Z CGEDLE !"#A :c>B5EBlfEBɨB??5Az@26BB2/3CA ?X?fie?}>? !"#$%()*+078=>?@CDHLxOO :c>= CGEDLEBȨB5=u?2>b?yAc@DBBhӮ~@< ?gie?}> !"#$%()*+78=>?@BCHLx?5HJB^MBX]Bש]B=bBRhBjBjB^lBmBcnBpB"BpBpBsBgvB?wBzBB{B~B"BB6B3BWBmBΏBB҆BFBBVB:BDBjBBBpBBBBVBt'B2`BTpBJ8B[B̡BBBB#By#B :c>B5EBlfE? !"#A5OOOO%O%OOOUOUhOhOOOOOVOVOOOFOFO O " O" O O O O O O O O
O
u
Ou
x
Ox
O
O
O
O
O
O
O
]O]O O O8O8bObOx
Ox
O>O>~O~OOO :c> CGEDLE !"#ACTxGraphSheetPage
CompositeObject$$GSD2$2vxeb'@*s( @H / @rB8BB5EBlfE ; ;?
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~ @rB8BB5EBlfE ; ;?
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~ N:c>????@@?CGEDLECEDl'E5=u?2>b?????>i?>i?>?yAc@@@@B??
ףsZ:c>yAc@?????T????B5Az@?BB?@BB?@@@?BB5Az@o:@
!"#$%&'()*+/123456789:;<>@BCDFGHIJKLMOPQRSTUY[]^abcdefgijklnopqwz{ :c> CDs4 EDjB@?Y7@k@CA.<]?Ѿ!=]?}@F%>? @@ A@@ "$%&'1=>?@CDHILMhijmno(<iOT6Ei  I
6
<
i7O
i
U
w
6 ~
%
6Fe :c>B5EBlfE?? !"#Ay
?eLog(dose) :c>ZBB;!=aR?0CD?ZeSmoothed Log(dose) :c> CGEDLE.<]? !"#0ACD??@ @ :c>B5EBlfE? !"#AOOOOO :c>?VW?e1.5 :c>@V?e2.0 :c> @V?e2.5 :c> CGEDLE !"#A :c>ZBB;!=aR?0CD? AA :c>ZB5EBlfE? !"#Am m NNN :c>Z? VWR?Ze10 :c>ZWR?Ze0 :c>Z AWRm ?Ze10 :c>ZAWRN?Ze20 :c>Z CGEDLE !"#A :c>B5EBlfEtŕjB??5Az@m?F<@LAA/<]?Ҿ!=@? !"#$'()*+078=>?@CDHLxOO :c>= CGEDLEsŖjB5=u?2>b?yAc@?Y7@k@CA<.<Ѿ!=}@ !"#$'()*+78=>?@BCHLx?5?R?.??k???:?Y?1?????
?ge?O?%?a%??`'?P??S?9?"?"?#??B5EBlfE? !"#A5OOOOOMOMfOfOOOO0O0VOV[O[bObOOOOOOOO O O O
O
#
O#
O
O
OOOOSOSeOev
Ov
O
O
O
O
O
OOOOO7O7eOeyOyOOOOO :c> CGEDLE !"#A :c>?
ף B@
CompositeObject$$GSD2$1 CTxYamYamGraphSheetPageCTxYamsAsPage1APage2CompositeObjectvBCTxCoordSystem2DCTxTransform3DCTxMatrix4Dp@p@??} b? b???&X TO. "
" " " "    " " " " " ` `Arial " " ` `Arial " " " ArialT%(<iOT6Ei  I
6
<
i7O
i
U
w
6 ~
%
6Fe. 2
Log(dose) .Arial... Arial2
Smoothed Log(dose)OArial .Arial%O%O%O%OOO. 2
1.5 .. 2
2.0 .. 2
2.5 .Arial%%%m m m %NNN%NN... Arial 2
N10OArial .... Arial2
@0OArial .... Arial2
{10OArial .... Arial2
20OArial .Arial%OO "%O%O%O%O%O%MOMM%fOff%O%O%O%O%0O00%VOVV%[O[[%bObb%O%O%O%O%O%O%O%O% O % O % O %
O
%#
O#
#
%
O
%
O
%O%O%O%O%SOSS%eOee%v
Ov
v
%
O
%
O
%
O
%
O
%
O
%O%O%O%O%O%7O77%eOee%yOyy%O%O%O%OOO "
mɠlJ(qJmJ
xij
aj
+yi
mi
())hi
mij
FMicrosoft Equation 2.0DS EquationEquation.29q
" .1`& g&MathTypeSymbol2
=(Symbol2
) "@
Times New Roman 2
x 2
ha 2
y Times New Roman1
2
@ij>> 2
@aj> 2
@i> 2
@V
i> 2
@i> 2
@i> 2
4j>Symbol 2
'+ 2
[Symbol 2
8Symbol 2
m
2
oh
2
m
&
"SystemnL
"<
XmhlJ(qJmJ
yaj
fj
xij
()jk
FMicrosoft Equation 2.0DS EquationEquation.29q)" .1
& &MathTypeOSymbol2
M(OSymbol2
\ )Times New Roman 2
Xy 2
fk 2
x Times New Roman  2
@kj> 2
@j>
2
@ij>> 2
@j> 2
@kbSymbol 2
\ Symbol 2
@ {Symbol 2
"Symbol 2
:a
&
"SystemnL)"TX
* !"#$%&'()+9,./012345678:L;<=>?@ABCDEFGHIJKM]NOPQRSTUVWXYZ[\^n_`abcdefghijklmo
pqrstuvw
mɼlJ(qJmJ
fk
Xk
()=EYaj
fj
Xij
()Xkjk
()
FMicrosoft Equation 2.0DS EquationEquation.29q .1&`&MathTypeSymbol2
(Symbol2
I)OSymbol2
.b(OSymbol2
.)Times New RomanT 2
fk 2
vX 2
yE 2
1Y 2
+fk 2
X 2
X Times New Roman) 2
`kb 2
`kb 2
`
j> 2
`j>
2
`)ij>> 2
` kb 2
`
j> 2
`kbSymbol 2
'= 2
q  2
2
A 2
2
2
A 2
Symbol 2
`{Symbol 2
B
Symbol 2
OaTimes New RomanT 2
YE
&
"SystemnL
mɀlJ(qJmJ
yi
=aj
fj
xij
()
+ei
FMicrosoft Equation 2.0DS EquationEquation.29qx+ .1
&&MathTypeRSymbol2
(RSymbol2
)Times New RomanQ 2
Xy 2
fk 2
x Times New Roman 2
@
i> 2
@j> 2
@~j>
2
@F ij>> 2
@i>Symbol 2
= 2
+Symbol 2
#+Symbol 2
a 2
e
&
"SystemnLx+48lJ(qJmJ
gmi
(
!"#$%&(,./13456789:;<=>?@ABCDEFGHIJKLMOSUWXYZ[\]^_`acgiklmnopqrstuvx}~)=aj
fj
xij
()
FMicrosoft Equation 2.0DS EquationEquation.29q
N .1&`&MathTypeSymbol2
(Symbol2
)4Symbol2
) (4Symbol2
))Times New Roman 2
Lg 2
fk 2
q
x Times New RomanO 2
`{i> 2
`j> 2
`[ j>
2
`#ij>>Symbol 2
m 2
aSymbol 2
=Symbol 2
6
&
"SystemnL
N8\lJ(qJmJ
gmi
()=aj
xij
FMicrosoft Equation 2.0DS EquationEquation.29q)  .1
& g&MathTypepSymbol2
(Symbol2
)Times New RomanL 2
Lg 2
yx Times New Roman 2
@{i> 2
@j>
2
@+ ij>>Symbol 2
m 2
aSymbol 2
=Symbol 2
&
"SystemnL)T!qJlJ$rJ
yi
log2y i
yi
1yi
()log12y i
()1yi
()[]i
FMicrosoft Equation 2.0DS EquationEquation.29qm k .1`@&&MathType "LSymbol2
N: (Symbol2
N
)Symbol2
(Symbol2
)Symbol2
(Symbol2
)ATimes New Roman 2
@y 2
y 2
Hy 2
@y 2
y 2
y Times New Roman  2
zi> 2
6i> 2
6i> 2
i> 2
`5i> 2
5i> 2
i>Times New Roman2
@%logkMT Extra 2
$~Times New Roman 2
@
logkMT Extra 2
$~Symbol 2
@! 2
@
 2
> 2
> 2
2
2
 2
2
X 2
X 2
X 2
XSymbol 2
v8Times New Roman  2
@ 1 2
R1 2
R1
&
"SystemnLmH
[!<qJlJ$rJ
fx()=tanhx()
FMicrosoft Equation 2.0DS EquationEquation.29qU
W k .1 ` & &MathTypePLSymbol2
eF(LSymbol2
e)LSymbol2
eS(LSymbol2
e)Times New Roman4 2
fk 2
x 2
xSymbol 2
=Times New Roman4
2
tanhk
&
"SystemnLU
WT!PqJlJ$rJ
fx()=)ex
1+ex
FMicrosoft Equation 2.0DS EquationEquation.29q{ .1@
&@
&MathTypePLSymbol2
F(LSymbol2
) "L#Times New Roman  2
fk 2
x 2
e 2
e Times New Roman 2
xb 2
xbSymbol 2
= 2
+Times New Roman 2
1
&
"SystemnL{h!ԴlJ(qJmJ
y=wk2()
fk2()
wj1()
xjj
()k
FMicrosoft Equation 2.0DS EquationEquation.29qS ~ .1&`&MathTypeSymbol2
E(Symbol2
Ez)Symbol2
E (Symbol2
E)Symbol2
E
(Symbol2
E
)Times New Romand 2
Xy 2
w 2
$fk 2
w 2
=x Times New Roman 2
`kb 2
`kb 2
`!
j> 2
` j> 2
T
j> 2
TFkbSymbol 2
t= 2
a 2
5a 2
a 2
2
5 2
Symbol 2
6
2
6 Times New Romand 2
T2p 2
Tc2p 2
TW
1p
&
"SystemnLSc
mLlJ(qJmJ
hi
=ai
xij
FMicrosoft Equation 2.0DS EquationEquation.29q 3 .1&[&MathTypepSymbol 2
#h Times New Roman 2
i> 2
i>
2
ij>>Times New Roman% 2
a 2
xSymbol 2
=Symbol 2
&
"SystemnL8LlJ(qJmJ
ai
f()Vmi
()
FMicrosoft Equation 2.0DS EquationEquation.29q p .1&/&MathType`Symbol2
w(Symbol2
)Symbol2
H(Symbol2
4)Times New Roman4 2
4a 2
*V Times New Roman  2
i> 2
i>Symbol 2
f 2
m
&
"SystemnLL8qJlJ$rJ
mi
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1&`&MathTypePSymbol 2
`@m Times New Roman 2
i>
&
"SystemnHL4@[ԜqJlJ rJ
DM()=2lnLM;Y()()lnLM*
,Y()()[]
FMicrosoft Equation 2.0DS EquationEquation.29qlN .1&&MathTypeLSymbol2
^(LSymbol2
)Symbol2
R
(Symbol2
)Symbol2
(Symbol2
a)Symbol2
(Symbol2
/)Symbol2
q(Symbol2
)%Symbol2
:&[%Symbol2
:(]Times New Roman4 2
LD 2
M> 2
 L 2
M> 2
Y 2
L 2
M> 2
8YSymbol 2
`= 2
 2
'Times New Roman4 2
u2
2
lnk 2
i;k
2
Hlnk 2
,` Times New Roman 2
T#*p
&
"SystemnLlNm,lJ(qJmJ
as1()
FMicrosoft Equation 2.0DS EquationEquation.29q4 ; .1`& &MathType Symbol2
(Symbol2
)Times New Romanl 2
4a Times New RomanX 2
zsW Symbol 2
{ Times New RomanX 2
W1p
&
"SystemnhL4@
!"#$%'+/012345679=>@BCDEFGHIJKLMNOPRVXZ[\]^_`abdhijlnopqrstuvwxyz{}~mɠlJ(qJmJ
xij
aj
+yi
mi
())hi
mij
FMicrosoft Equation 2.0DS EquationEquation.29q
" .1`& g&MathTypeSymbol2
=(Symbol2
) "@
Times New Roman 2
x 2
ha 2
y Times New Roman1
2
@ij>> 2
@aj> 2
@i> 2
@V
i> 2
@i> 2
@i> 2
4j>Symbol 2
'+ 2
[Symbol 2
8Symbol 2
m
2
oh
2
m
&
"SystemnL
"<
Xv,lJ(qJmJ
zs1()
FMicrosoft Equation 2.0DS EquationEquation.29qq{ ; .1@ &&MathTypepeSymbol2
(eSymbol2
)Times New Roman 2
`@z Times New Roman 2
VsW Symbol 2
{ Times New Roman 2
31p
&
"SystemnILq{hm,lJ(qJmJ
as1()
FMicrosoft Equation 2.0DS EquationEquation.29q4 ; .1`& &MathType Symbol2
(Symbol2
)Times New Romanw  2
4a Times New Roman 2
zsW Symbol 2
{ Times New Romanw  2
W1p
&
"SystemnL4@v΄qJlJ$rJ
))mi
hi
()2
varYi
()
FMicrosoft Equation 2.0DS EquationEquation.29q
.1&&MathType` "H`Symbol2
,(Symbol2
g)Symbol2
{ (Symbol2
() "DqSymbol
2
m
2
Th Times New Romanh 2
`<i> 2
`i> 2
`
i>Times New Roman 2
Y Times New Romanh 2
2pTimes New Roman2
var~
&
"SystemnL
lv,qJlJ$rJ
Ws1()
FMicrosoft Equation 2.0DS EquationEquation.29qE{ ; .1@&&MathTypepSymbol2
(Symbol2
_)Times New Roman  2
`@W Times New Roman) 2
sW Symbol 2
y{ Times New Roman  2
1p
&
"SystemnLE{lhmɰlJ(qJmJ
XWs1()
X()as()
=XWs1()
zs1()
FMicrosoft Equation 2.0DS EquationEquation.29qN .1&&MathTypeeSymbol2
l(eSymbol2
lD)1Symbol2
6,(1Symbol2
6)eSymbol2
O((eSymbol2
O)eSymbol2
l
(eSymbol2
l)eSymbol2
l(eSymbol2
lj)Times New Romanl 2
X 2
%W 2
X 2
X 2
~W 2
zTimes New Roman 2
'T 2
Ia 2
*'T Times New Romanl 2
qsW 2
TsW 2
qTsW 2
q!sW Symbol 2
q^{ 2
q{ 2
q{Symbol 2
= Times New Romanl 2
q1p 2
q11p 2
q1p
&
"SystemnLNm,lJ(qJmJ
as1()
FMicrosoft Equation 2.0DS EquationEquation.29q4 ; .1`& &MathType Symbol2
(Symbol2
)Times New Roman 2
4a Times New Romanl 2
zsW Symbol 2
{ Times New Romanl 2
W1p
&
"SystemnL4@m,lJ(qJmJ
as1()
FMicrosoft Equation 2.0DS EquationEquation.29q4 ; .1`& &MathType Symbol2
(Symbol2
)Times New Romanl 2
4a Times New Roman 2
zsW Symbol 2
{ Times New Romanl 2
W1p
&
"SystemnL4@m@lJ(qJmJ
a1
,K,ap
()
FMicrosoft Equation 2.0DS EquationEquation.29qq+ t .1&&MathTypeRSymbol2
,(RSymbol2
)Times New Roman, 2
a 2
~a Times New Romanl 2
@epp Times New Roman, 2
@n1pTimes New Romanl 2
,` 2
,`MT Extra 2
jK
&
"SystemnLq+8m(lJ(qJmJ
as()
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1`& &MathType Symbol2
(Symbol2
)Times New Roman` 2
4a Times New Roman 2
ssW
&
"SystemnL4@mɄlJ(qJmJ
as()
=as1()
Ms11
us1
FMicrosoft Equation 2.0DS EquationEquation.29qK . .1`
&
&MathTypePeSymbol2
(eSymbol2
)eSymbol2
(eSymbol2
)Times New Roman 2
4a 2
a Times New Romanl 2
ssW 2
3sW 2
sW 2
PsWSymbol 2
= 2
F Symbol 2
{ 2
b
{ 2
{ 2
{ Times New Roman 2
1p 2
1p 2
x
1p 2

1pTimes New Romanl 2
gMh 2
Tu
&
"SystemnLKvqJlJ$rJ
mi
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1&`&MathTypePSymbol 2
`@m Times New Roman8 2
i>
&
"SystemnKL4@v,lJ(qJmJ
ai
f()
FMicrosoft Equation 2.0DS EquationEquation.29q .1&@%
!"#$%&'()*+1356789:;<=>?AFHIJKLMNPUWXYZ[\]^_aegijklmnopqrsuz}~&MathType`Symbol2
w(Symbol2
)Times New Romanl 2
4a Times New RomanP 2
i>Symbol 2
f
&
"Systemn@L0vlJ(qJmJ
xij
yi
m,i
()ai
f()Vmi
()gmi
()i=1n
=0
FMicrosoft Equation 2.0DS EquationEquation.29q I .1&&MathTypeSymbol2
t(Symbol2
)Symbol2
?g(Symbol2
?)Symbol2
E8(Symbol2
E$)Symbol2
E (Symbol2
E) "dTimes New Roman 2
x 2
y 2
2$a 2
2V 2
2g Times New Roman%
2
ij>> 2
i> 2
i> 2
i> 2
i> 2
Xi> 2
aki> 2
ZnpSymbol 2
 2
= Symbol 2
a={Symbol 2
C8Symbol 2
m 2
2f 2
2m 2
2
mTimes New Roman 2
2 'A Times New Roman% 2
a;1pTimes New Roman 2
0
&
"SystemnL` m\lJ(qJmJ
gmi
()=aj
xij
FMicrosoft Equation 2.0DS EquationEquation.29q  .1
& g&MathTypepSymbol2
(Symbol2
)Times New Roman 2
Lg 2
a 2
Ux Times New Roman! 2
@{i> 2
@j>
2
@ ij>>Symbol 2
mSymbol 2
=Symbol 2
&
"SystemnL@vqJlJ$rJ
Vq()
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1@&&MathType@LSymbol2
e.(LSymbol2
e)Times New Roman 2
VSymbol 2
q
&
"Systemn0L4@v$lJ(qJmJ
bq()
FMicrosoft Equation 2.0DS EquationEquation.29qE4 .1&&MathType@LSymbol2
e(LSymbol2
e)Times New Roman1 2
(bTimes New Roman  2
'A 2
Z'ASymbol 2
)q
&
"SystemnLE4l@v8qJlJ$rJ
af()bq().
FMicrosoft Equation 2.0DS EquationEquation.29q* p .1&@=&MathType`Symbol2
(Symbol2
Y)LSymbol2
_(LSymbol2
)Times New Roman 2
4a 2
bSymbol 2
tf 2
qTimes New Roman 2
'A 2
'A
&
"SystemnL*v qJlJ$rJ
bq()
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1`& &MathType@LSymbol2
eR(LSymbol2
e)Times New Roman 2
(bTimes New Roman 2
'ASymbol 2
q
&
"SystemnL4@vqJlJ$rJ
f
FMicrosoft Equation 2.0DS EquationEquation.29q`4 .1@&&MathTypePSymbol 2
`%f
&
"SystemnIL`4@v(qJlJ$rJ
af()=af
FMicrosoft Equation 2.0DS EquationEquation.29q " .1`& =&MathType`Symbol2
(Symbol2
Y)Times New Roman  2
4a 2
qaSymbol 2
tf 2
1fSymbol 2
7=
&
"SystemnLvqJlJ$rJ
q
FMicrosoft Equation 2.0DS EquationEquation.29q` .1@&&MathType0Symbol 2
`q
&
"SystemnL`vqJlJ$rJ
f
FMicrosoft Equation 2.0DS EquationEquation.29q`4 .1@&&MathTypePSymbol 2
`%f
&
"SystemnIL`4@vΤlJ(qJmJ
fy;q,f()=exp)yqbq()[]af()+cy,f(){}
FMicrosoft Equation 2.0DS EquationEquation.29q .1`& }&MathTypepSymbol2
F(Symbol2
[)LSymbol2
(LSymbol2
)Symbol2
/
[Symbol2
`]Symbol2
h(Symbol2
)Symbol2
R(Symbol2
) "cVlSymbol2
{lSymbol2
y}Times New Romand 2
fk 2
y 2
y 2
b 2
a 2
c 2
yTimes New Roman  2
;k 2
,`2
sexp 2
,`Symbol 2
q 2
vf 2
q 2
q 2
f 2
*fSymbol 2
9= 2
 2
+
&
"SystemnLmplJ(qJmJ
gmi
()=hi
=aj
xij
FMicrosoft Equation 2.0DS EquationEquation.29q1 .1&g&MathTypepSymbol2
(Symbol2
)Times New Roman 2
Lg 2
a 2
1x Times New Roman5 2
@{i> 2
@i> 2
@
j>
2
@ij>>Symbol 2
m 2
hSymbol 2
= 2
=Symbol 2
&
"SystemnL1vHlJ(qJmJ
xi1
,K,xip
()
FMicrosoft Equation 2.0DS EquationEquation.29qN ~ .1@&&MathType4Symbol2
),(4Symbol2
))Times New Roman !"#$%&(/0123589:<=>?@ADFHKMNOPQRSTV[]^_`acfiklmnopqrstuvwxy{~ 2
x 2
x Times New Roman 2
`i>
2
`ip>p Times New Roman 2
`1pTimes New Roman 2
W,` 2
M,`MT Extra 2
K
&
"SystemnLNvqJlJ$rJ
mi
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1&`&MathTypePSymbol 2
`@m Times New Roman0 2
i>
&
"SystemnHL4@mLlJ(qJmJ
hi
=ai
xij
FMicrosoft Equation 2.0DS EquationEquation.29q 3 .1&[&MathTypepSymbol 2
#h Times New RomanX 2
i> 2
i>
2
ij>>Times New Roman 2
a 2
xSymbol 2
=Symbol 2
&
"SystemnL3qJlJ$rJ
Yi
FMicrosoft Equation 2.0DS EquationEquation.29q4 .1`& &MathTypePTimes New Roman4 2
`(Y Times New Roman% 2
i>
&
"SystemnL4@՜.+,0HPltThe Open University,Oh+'0
<HT
`lt16:15:David HandrtTdNormalDavid Hand22AMicrosoft Word for Windows 95H@@@8`"@8`"
FMicrosoft Word Picture
MSWordDocWord.Picture.69qLF)F)ff30lJ(qJmJ
expaj
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
! " # $ % & ' ( ) * + ,  . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {  } ~
!
"
#
$
%
&
'
(
)
*
+
,

.
/
8
9
:
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{

}
~
ܥhW e~_\l\ll\l\x\x\x\\\\\\\
\
\^1\\\\\\\\\\\\1]]^^XV_(^x\\\\\\^\l\l\\\\\\\l\\x\\\`q"\\l\l\l\l\\\\\
NNS:V)V)pp Ʃ C (S