###
Art. LIII.—*On the Phenomena of Variation and their Symbolic Expression*.

[*Read before the Wellington Philosophical Society, 11th March 1902*.]

#### Plates XXXVI., XXXVII.

“A PERSON who uses an imperfect theory with the confidence due only to a perfect one will naturally fall into abundance of mistakes; his prediction will be crossed by disturbing circumstances of which his theory is not able to take account, and his credit will be lowered by the failure. And inasmuch as more theories are imperfect than are perfect, and of those who attend to anything the number who acquire very sound habits of judging is small compared with that of those who do not get so far, it must have happened, as it has happened, that a great quantity of mistake has been made by those who do not understand the true-use of an imperfect theory. Hence much discredit has been brought upon theory in general, and the schism of theoretical and practical men has arisen.”—(De Morgan, “Penny Cyclopædia,” Art. “Theory.”)

##### Introduction.

The present writer proceeds upon the assumption that the means of comparing those theories which are used to predict

the quantities of physical phenomena with experiment upon those phenomena are in some cases not quite so effective as the theory of probability enables them to be made, and that the latter theory has even had a detrimental effect upon the comparison by reason of it having been frequently assumed to have provided a universally satisfactory method—that of least squares—by which we can determine those constants which arise from the unestablished properties of matter, and at the same time more or less tacitly institute a comparison between the theory and the results of experiment in the case of a phenomenon of variation where quantity is both measurable and supposed determinable by theory, given the properties of matter. The results of the theory of probability will be accepted with regard to the probable value of a single quantity directly measured and its probable error.

In the present paper the writer proposes to examine the representation of physical phenomena of variation by means of formulæ, whether empirical or founded more or less completely upon reason.

1. The phenomena which will be examined are those where a quantity (Y) varies with a variable (X)—that is to say, takes up magnitudes which, *ceteris paribus*, depend in some fixed way upon the magnitude of X. If we observe, by experiment, how the variation occurs we obtain knowledge which can be expressed by a graph. We may make the axis of Y the ordinate, that of X the abscissa. We shall consider only such cases where Y has, in fact, although it may not have been observed, one value, and only one, for each value of X, which in general extends from *plus* to *minus* infinity.^{*}

2. The first fact we notice is that in such case we observe values within a limited range. This we may call the “experimental range of X.” Beyond that range we know nothing, whereas most mathematical expressions will yield values from *minus* to *plus* infinity. The definite integral is in form a striking exception to this, and from one's experience of textbook formulæ it is to be wished that some simple means of indicating experimental range could be brought into general use. This idea of dealing with the experimental range only will be found of fundamental importance in later parts of this paper.

3. Now, the graph may be of two distinct kinds—(A) that of a curve or curves, or (B) that of a series of datum points. The first kind, that of a curve, contains the same complete statement of values of Y as does the analogous kind of mathematical formula, which is defined as holding between limits of range of X. The second kind, that of datum points, contains

[Footnote] * But see Appendix, III.

information which may be given also by a table.^{*} This refers to the information which is directly derived from experiment; but, this usually being insufficient for the practical applications, we have to perform interpolation in order to get what we want. This may be done graphically or by application of the calculus, but in either case the result is a guess. It will be here asserted that, *à priori*, we have nothing to show that the judgment of an engineer or physicist will lead to error more readily than will the corresponding assumptions of a computer. We shall refer to the judgment as the arbiter in this indeterminate question of interpolation.

4. So far we have accepted the results of experiment, but it is evident that such knowledge must (in continuous variation) be inaccurate to some extent; data we get by measuring must be subject to fortuitous error, and may be subject to systematic error due to the system of measurement—instruments may be wrongly calibrated, and so on. Fortuitous error may be made definite by the application of the theory of probability, provided, of course, that the necessary work is done in the experiments; and we may take from this application the information that the true values (but still affected by systematic causes) of the quantities lie within limits of probable error—more probably so than not—the probability of a value being the true one decreasing very rapidly outside these limits, as indicated by the well-known frequency curve. It is much to be regretted that in many researches, even of the classical kind, no attempt is made to assign limits of probable error. In an example which has come under the writer's notice this was not done, although repeated measurements at each datum point were made, with the result that a very laborious research is rendered very much less valuable than it would have otherwise been. The effects of this lack of system are usually not very apparent at the time the research is made; it is only when the matter comes to be looked at from a new standpoint, or examined for residual phenomena, that the absurdity of giving such figures as accurate without a statement of probable error becomes apparent. This, of course, applies to those measurements which form the connecting-link between theory and the things that happen; many practical experiments are made under a well-understood convention as to negligible error.

5. In a graph such information as to probable error could be conveyed by giving a band (twice the probable error in vertical width) instead of a line for a curve, or a row of vertical lines instead of a series of dots for datum values (that

[Footnote] * See sections 10 to 15.

is, supposing all the error attributable to the values of Y—*i.e*., where values of X may be taken as accurate for the purposes of reasoning, as we always suppose).

6. It is obvious, however, that systematic, or what we may call instrumental, error must be eliminated or it will infallibly render any reasoning wrong which is based on the results, provided, of course, that the error be sensible in amount.

7. While the graph forms a very complete representation of the observed facts, and indicates interpolation in the case of datum observations, and in the hands of a person of clear insight may often be the means of reasoning which may not be practicable or even possible by the more formal means of algebraic symbols, yet it is clearly necessary to find, if possible, some formula or function of X which will stand for the graph as well as may be. There are many reasons for this the chief theoretic one being the enormous developing-power of the algebraic calculus.

8. In the preceding we have considered the graph as the most natural mode of recording phenomena of variation, but we may have occasionally inferential reasons for believing that the phenomenon should follow some particular function of X more or less completely, and it is necessary to examine the rationale of the functions in various cases.

(*a*.) A function may be logically applicable to a phenomenon. For instance, formulæ which state the results of definition, or those which state such inferences as that the angles in a plane triangle are 180°, may be regarded as truly applicable. Even this class may be subject to systematic instrumental error.

(*b*.) Functions in which there are strong inferential grounds for the belief that they express the substantial truth. For instance, formulæ deduced from Newton's laws of motion may be expected to apply closely to the motion of the major objects of the solar system; but experiments of an accuracy greater than those upon which such laws were founded may always be apt to demonstrate that the functions are not strictly applicable to any given phenomenon, and that there are systematic residual causes which should be taken into account.

(*c*.) Functions which have some inferential foundation, but the substantial applicability of which it is worth while to question and examine.

(*d*) Functions whose foundation is largely hypothetical. This class we may term “empirical.”

(*e*.) Functions which have no foundation except, perhaps, certain notions of continuity in rates of change, and so on. This class we may term “arbitrary.”

9. It is intended to confine our attention chiefly to the example, of the last or last but one class, which is called the “power-expansion” or “Taylor's series formula.”^{*} It is, however, intended that the objections to the use of an arbitrarily systematic mode of computation should apply to all classes with respect to systematic instrumental error, and to all but the first with respect to the effect of systematic residual causes which are not allowed for in the function, or of any mistake or incompleteness in the inferring of the function.

10. Besides the curve and the datum-point graphs, we need to mention an intermediate class—namely, that of experiments which are arranged to give data for many points of X without any attempt to obtain repeated measures at any one point.

11. We might venture to define the characteristic virtues of the two main types of graph by saying that the curve yields a clear idea of the continuity of a phenomenon without allowing any great accuracy to be obtained in the measures of Y, while the datum point allows great accuracy to be attained in the measures of Y, and also permits definiteness to be attained in probable error, but leaves the interpolation to be judged. It may be put also thus: the curve gives a notion of *d*Y/*d*X, the datum of Y. It is sometimes possible to form a graph of both kinds of measures—to measure accurately datum points and also to get the slope of the curve near these points. This procedure is analogous to that of constructing mathematical tables where datum points are often computed exactly and intermediate points found by Taylor's theorem. By such means very full information would be given of the actual phenomena.

12. A graph of the above-mentioned intermediate class, while it combines the virtues of both main forms, combines also their defects. In contemplating such a graph one would feel more content if a likely value for probable error at a few points of X were provided by the experimenter. The difficulty with this form of measurement is the very large number of measures, necessary—theoretically a double infinity.

13. It is perhaps desirable to point out that in datum measurements we usually cannot get either X or Y exactly the same for each measure, accordingly we have to interpolate the values of Y to one common or mean value of

[Footnote] * It is to be observed that, in the case of functions the Taylor's series expansion of which are sufficiently convergent when applied to the experimental range, the result of the application of such a formula is practically identical with the result of the application of an unexpanded function of any class.

X (which we are going to take as absolutely accurate, theoretically). It is common to take the mean of both quantities, a process that often leads to the use of a few more decimal places. A more satisfactory process is to either measure *d*Y/*d*X or else estimate it from antecedent knowledge of the likely curve, and then make a graph of the measures of each datum point and analyse it by means of the curve of *d*Y/*d*X, which will be usually a straight line. From this we can get the probable (fortuitous) error, and also make a note of discordances, which is not always possible when the mean merely is taken.

14. A further advantage lies in the fact that we can avoid taking the mean value for X, and take instead a convenient adjacent value which has few integers, the last few significant figures being made noughts. This affords a vast saving in tabulation and in computation.

15. It may be thought by some that such matters as are being advanced are refinements for which time is too short; but the writer would appeal to those who may have honestly tried to get a reliable value for any physical constant which is not absolutely simple or else fundamental—even a there or thereabouts value—whether an enormous amount of labour has not been absolutely wasted by the neglect of such principles.

### Least Squares.

16. An assertion will now be given which it is believed can be substantiated by reference to some recent text-books—that if a formula be applied to the results of observation so that the sum of the squares of the residual quantities or deviations of observed quantities from those calculated is a minimum with regard to the constants of the formula, then this formula may be referred to as the best, or even the most probable, and, in fine, that such application is a strictly scientific process. It must not be supposed for a moment that it is intended to convey that this view is held by accurate thinkers, but simply that it is observable that others have been led by the beauty of the method, and the very evident desirability of possessing a method of computation which should be free from personal bias, into an unwarrantable and indiscriminate promulgation of the formal procedure of the method.

17. There are two distinct objections to the mode of computation which has been described, and which it is hoped may be described as “least squares” without misunderstanding—namely: (1) That least squares observably tends to eliminate the application of the judgment to the indications of a graph, and, further, that it tends to make systematic deviations

look as much like true errors as possible; and (2) that the computations of least squares are often prohibitively laborious, thus practically preventing the analytical application of all sorts of formulæ which it may be easily possible to apply by other means.

18. The first objection will be illustrated by a couple of examples. Suppose we had a graph which consisted of the curve of a phenomenon following exactly (although the computer is not aware of this) a power-expansion formula of four terms, or cubic; and for certain reasons—say, the labour of least squares—are unable to use a formula of more than three terms, or parabolic. Then it can easily be seen, or proved, that least squares (which becomes a problem in integration in the case of a continuous curve) leads to a symmetrical arrangement of the deviations the proportions of which are shown in Graph A. It is pretty clear that for the observed range this arrangement of deviations strikes a good average; but conceive extrapolation to be necessary, or even a terminal value to be an important physical constant, would it not be preferable to accept the notions which one gathers from the shape of the curve and to extrapolate by means of some such freehand curve as is drawn dotted? The answer seems obvious enough when put in this way, and yet an almost precisely analogous condition of things has been the cause of considerable error in a certain oft-quoted classical research which the writer is recomputing by the graphic process.

19. A still more conclusive example is contained in the very common case of a few datum points representing the only observed facts. Here a physicist will often feel justified in drawing a curve for interpolation, and will have a very strong conviction of the unlikelihood of certain other curves which are much different from one he might draw. If least squares is followed up it is obvious that it leads to an exact representation of *n* datum points in a formula of *n* constants. In the case of the power-expansion formula the solution is identical with that of simultaneous equations. Graph B shows the least-square curve passing through six points—at X = 0, 0.2, 0.4, 0.6, 0.8, and 1, Y being zero at all points except 0.6. The indeterminate question to be here answered is whether there are any particular virtues about the least-square curve as compared, for instance, with the dotted curve (which was made by a flexible spring passing over rollers at the points). s not the interpolation here very questionable, and the extrapolation doubtful in the utmost? It may be here remarked that the extrapolation of such formulæ of high degree is always very doubtful, except when there is a strong convergency.

20. We have here got two clear examples of what least squares leads to. In the first case, that of the curve, as we shall afterwards see, the shape of the curve of deviations is most strongly indicative of the need for the application of a formula of four terms, if not more. We have drawn the deviations according to least squares, which may be proved to arrange the deviations (given simply the direction of the axis of X) from a cubic phenomenon to which a parabola is applied, and where the observations are at nearly equal intervals of X, with a symmetry similar to that of the graph. The deviations, it will be observed, run ±(—,0, +,0,—,0, +). By the graphic process we should arrange the parabola so that they run ±(0, +, 0,—, 0)—so that, in fact, they bear a close resemblance to the standard cubic of Graph II. It is asserted that there is less likelihood of systematic deviations so arranged being mistaken for fortuitous errors than is the case with the least-squares arrangement. It may be again mentioned that this example is not, in its general features, a mere hypothetical case.

21. In the second case, that of six data, we have got a curve from our least squares which we have asserted to be quite unjustifiable, and not to be compared with the results that one would get from a common-sense judgment of the graph—not to be compared, that is, in avoiding rash assumptions as to the truth of the matter.

22. Following our definition of least squares, we have neglected to take any account of fortuitous probable error in these examples, but its vital necessity in such cases will be sufficiently obvious from what has been said in previous sections. The effect of probable error in the graph is to obscure the true points or line of the true curve of the phenomenon. When this occurs to such an extent as to hide any system there may be in the deviations, then, provided we are quite sure that our formula is substantially accurate compared with the scale of the probable errors, we might reasonably employ least squares to systematize our computations. This is a matter which is dependent upon circumstances, and more on judgment, and we believe that the employment of the latter will be found to be very largely dependent upon whether the treatment of empirical formulæ is taken as a mere extension of the beautiful applications of the theory of probability to astronomy and surveying or as a most important branch of the graphical calculus. This part of the subject is too complicated to treat of here except by suggestion, but we may refer to the example in Dr. F. Kohlrausch's work (see section 38), where a case of this complicated kind is given as if it were a simple and logical application of least squares; and where, moreover, the data are deliberately subjected to extra-

polation to the extent of half the observed range. It will be observed that we do not say that in this example anything better than is done by least squares could be done with such data, but we do say that it is absolutely misleading as an example of experimentation and of computation.

23. It is now necessary to draw attention to some theorems in the graphical calculus in which the combination of such curves as correspond to functions of the algebraic calculus is treated. X is the variable, A, B, &c., the (variable) constants. Suppose we have a curve whose function is—

F(X) = *f*_{1}(X) + *f*_{2}(X) + (other similar terms),

then we may build up the curve of F(X) by drawing the curves *f*(X) all to the same scale, and then adding their ordinates at corresponding points of X. This is the theorem of sliding, for we conceive the ordinates of each of the component curves (of *f*(X)) to be capable of being slid over one another parallel to themselves, or to the axis of Y, and we so slide them that they are placed end to end, when we have the ordinates of the curve of the additive function F(X); then always, if we have found enough ordinates, we can complete the curve by freehand drawing, or even by eye without drawing.

24. Next we have the theorem of one-way stretch of ordinates, by which we can introduce variation in the constants of additive functions which are linear in the said constants. Thus, considering one term of the additive function F(X), and writing it with its constants displayed, its expression is A.*f*(X). The theorem is that if we draw the curve of this function, making A take a convenient standard value—say, unity—we can find the ordinates corresponding to any given value of A by the use of some such device as proportional compasses applied to the curve we have drawn. So also with other similar terms. There is a curious point with these constants which had better be pointed out to prevent confusion—namely, that it is immaterial whether any algebraic relationships (independent of X variation) exist between them or not, provided that each is not fixed by any combination of the others, but is capable of taking up independent values. Such relationships should be studied, however, with a view to facilitating the graphical work. Thus, if two terms are*f*_{1}(X, A) and *f*_{2}(X, A, B), then we may have reason to prefer to take them as *f*_{1}(X, A) and *f*_{2}(X, C), or as *f*_{3}(X, A) and *f*_{4}(X, B), in the latter case breaking up the second function. Considerations such as these may be traced in the process for Taylor's series formulæ.

25. This is all we shall need, for the Taylor's series analysis, but we may refer to text-books on least squares

for the application of Taylor's theorem to the approximative treatment of non-linear functions, and mention two other theorems of the graphical calculus which are of occasional use. In cases where X (or Y) is invariably associated with a constant by addition or by multiplication we get possible graphical operations, for, if the expression is *f*(X + A) we may draw a curve to *f*(X) and then introduce the effect of any value of A by shifting the curve bodily along its X axis; and so also with regard to Y. In the case of multiplication we get a stretch of a drawn standard curve in either one way or in two ways. For, to take the latter case, when *f*_{1}(A x Y) = *f*_{2}(B x X), having drawn a standard curve to convenient values of A and B, we get the effect of any values of either constant by uniformly stretching the curve in directions parallel to both axes. This can be effected by means of throwing shadows, and appears of value in our subject, since the frequency curve is of this form (with an immaterial relationship between the constants).

26. Reverting to the question of appealing to the judgment to detect systematic deviation from a formula, we see that we expect the deviation to become evident as a recognisable additive curve—*i.e*., as if it were representable by a term *f*(X). Clearly, this is frequently the case even where the deviations may be logically functions of Y, as, indeed, we supposed all errors to be in section 5; for, in a graph, if a function of X be represented, the corresponding function of Y is also automatically represented by the curve. By such means we can sometimes form an estimate of causes of error or deviation, and sometimes also—as we shall see in the case of the Mississippi Problem—be able to form an idea whether it is any use or not to go on complicating the particular formula which we are employing. When our resources are practically exhausted we shall give our formula, together with a statement of its range and the relation between probable (fortuitous) error and observed deviation, exhibiting the latter quantities in a graph of deviations, and leave it for others to judge what degree of likelihood attaches to our formula. Circumstances may lead us to employ least squares, but the value of our experiments cannot be adequately indicated unless we provide at least the equivalent of the details mentioned.

### A Graphic Process for applying Power-expansion or Taylor's Series Formulæ.

27. A process will now be described by which it is very easy to graphically apply to date formulæ of four or even five terms in ascending powers of the variable.

28. It follows from Taylor's theorem that, if we use a

formula of *n* terms to approximate to a given curve, we obtain exactly the same choice of approximations whatever the scale of the variable may be or wherever its origin may be. We may therefore elect to make the experimental range unity in a new variable, and make the beginning of the range the origin. Thus, if the experimental variable X ranges from *p* to *p + q*, we take as a temporary variable *x* = X - *p*/*q*. It may be noted that in the case of a continuous function the corresponding Taylor's series becomes—

Y = Y*p* + [*q*/1 (*d*Y/*d*X)_{p}]. *x* +[*q*^{2}/1.2(*d*^{2}Y/*d*X^{2})_{p}]. *x*^{2} + &c.

29. If this expression were very convergent our analysis would lead us to the values of the bracketed quantities. Since, however, curves in general cannot be said to be representable by continuous functions, and particularly convergent ones, we cannot expect to make this conception of curves being built up of the effects of initial rates of change our basis of operations. We may with great convenience utilise the average rates of change for the whole range. Something of the sort is done in using an interpolation-table method, such as that given in “Thomson and Tait” (1890, i., p. 454).

30. The graphic process consists in taking for the first term the initial value of Y as given by the graph; for the second the average rate of change for the whole experimental range; for the third the average curvature for the whole scale expressed in terms of a parabola; for the fourth the difference in curvature of the first and second halves of the range expressed as a cubic standard formula; and soon. Up to the fourth term at least there is no difficulty whatever in keeping the effect of each of these three operations in the mind, and in forming one's conclusions whether a certain formula is as good as can be possibly got. The standard formulae which the writer has used for this purpose are for the parabola (*x—x*^{2}), and for the cubic (*x*—3*x*^{2} + 2*x*^{8}). This is as far as we shall go for the present, but a table is given of some standard functions which might be used up to *x*^{8} (or formulæ of nine terms) if one were clever enough to perform the work with all of them at once, or under special circumstances. These formulæ will be reverted to again.

31. The practical work is now very simple. We draw the graph of the experiments in terms of the temporary variable (which we should have mentioned is better not arranged to have its scale exactly equal to that of the experimental variable, but as nearly as is practicable, keeping *p* and *q*

simple numbers for convenience in conversion),^{*} and then, provided with a scale to measure the constant term, a straight-edge to produce linear terms and drawn curves of the standard parabolic and cubic, and with a protractor or proportional compasses, we proceed to build up a curve the ordinates of which are added proportions of each of these four constituents, till we get a curve that is as nearly like the given one as possible. It will be quite obvious when this is done that we have a most clear idea of the prospective advantages of any other cubic formula whatever, and that we can arrange the deviations in any desired way—for instance, to arrange them for the application of a quartic formula, if it appears that such a course is advisable. We shall also develope a decided opinion on the subject of the application of common-sense to the resultant curve of deviations, both for interpolation and for extrapolation, and for residual causes or for error in the theory of the formula. In cases where the accuracy of the figures is great it may be necessary, after a rough analysis, to replot the deviations to a larger scale, so as to get over the limited accuracy practicable in a graph.

32. Many details will become obvious if a trial is made, and we need not pause over them; but it may be mentioned that the process is obviously applicable to all such formulæ as are made up of sums of terms each of which is linear in and contains only one constant. So the process might be arranged for harmonic analysis, or for the solution of simultaneous equations, and so on. The process is approximative, so that it is of indefinite accuracy, and is limited solely by the power of the judgment to indicate what alterations are desirable. The vast difference between this procedure and that of least squares will be apparent from the fact that we may be led to apply formulæ which have more terms or constants than there are datum points. This is due to the part we are allowing the judgment to play in controlling the interpolation.

33. The process is evidently susceptible of mechanical treatment, and the writer hopes to be enabled to construct a machine for this purpose.

34. With respect to the higher-power formula, there is a point which seems of theoretic interest in simplifying mathematical formulæ which are to be applied for a definite range (the converse of our experimental range), for, as is noted in

[Footnote] * It is to be noticed that in these formulæ the adjustment of the scale by introducing *q* is a perfectly simple matter of arithmetic; but to alter the zero point (*p*) is more troublesome, and should be avoided when possible. The Z functions of the Appendix afford an alternative range of—1 to + 1, using the same curves.

the “Notes on the Graphs,” the numerical values of these functions (even better formulæ may be obtained for this particular purpose) become very small, within the range, in comparison with the numerical value of the coefficient of *x ^{n}* (that is, the highest power of

*x*in the standard formula). This means that in the expansion as given in section 32, and when it is converging, we may for values of

*x*from 0 to 1, by throwing the series into standard form, eliminate one or more of the higher-power terms, and so obtain an expression which is practically as accurate as the simple Taylor series, and is less in degree. The extent to which this may be expected to go is to be seen in the decreasing numerical or percentage values of the ordinates as the degree becomes large—with the octic it is already 1.3 parts in 10,000. of course, in doing this we sacrifice all pretence to accuracy outside our defined range.

35. To sum up, we may emphasize the importance of the idea of the experimental range, as we have seen this leads to a great accession of power in the case of what we have ventured (not without precedent, of course, but, still, with some misgiving) to call “Taylor's series formulæ.” An analogous idea is familiar enough in the “period” of the “Fourier series formulæ.” Secondly, we venture to think that too much stress cannot be laid on the necessity for the statement of probable error in the individual data. This matter is strongly stated in the extract from Sir G. B. Airy's works given in section 38. Even the warnings of so great an authority as the late Astronomer Royal seem to have been greatly disregarded.

36. Thirdly, however plausible or apparently authoritative the theory of a physical phenomenon of variation may be, the experimental data upon it should be so prepared that the precise support given to the theory by the observations should be made evident, as can often be done by a graph either of the observations themselves or of the deviations from the aforesaid plausible theory, the graph exhibiting probable error in the way mentioned in section 5.

37. Finally, the writer wishes to disclaim any novelty in the foregoing, with one exception, and to apologize for lack of references, which are, indeed, very incomplete in Wellington. His object has been to collect a number of what he believes to be true although, no doubt, trite remarks, with the object of collecting an argument which he has been unable to find in any of the works to which he has access, and which is necessary for the development of another paper, to which reference has been made. The portion for which it is thought some novelty may be claimed is that of the treatment of the Taylor's series formulæ and similar linear

additive functions.^{*} It is thought that the statement in Thomson and Tait's “Natural Philosophy” (ed. 1890, p. 454) of an interpolation method, and the reference to “a patient application of what is known as the method of least squares” in Professor Perry's “Calculus for Engineers” of 1897 (p. 18), form a sufficient ground for this conclusion.

#### Conclusion.

38. Those who may be inclined to question the necessity of such remarks as have been made upon an admittedly insufficient definition of least squares are recommended to examine, in the light of the considerations that have been advanced, the example of least squares put forth in Dr. F. Kohlrausch's work, English translation (called “Physical Measurements”), of 1894, from the German of 1892 (7th ed., chap. 3), and also Professor Merriman's “Theory of Least Squares” (1900 edition), with reference to Clairault's formula (about page 126), and from page 130 to the end of the Mississippi Problem. If, also, it is desired to observe how even legitimate least squares may lead to error, an examination may be made of the warnings of Sir G. B. Airy in the conclusion of his work on the “Theory of Errors of Observation, &c.” (pp. 112, 113).^{†} The 1874 edition of this work is available in the Public Library. A paper by F. Galton, F.R. S., in the “Proceedings of the Royal Society” of 1879, page 365, also contains a significant warning that the fundamental principle of the arithmetic mean is not always reliable. This should be considered in relation to the use of a curve of *d*Y/*d*X in treating measures where X cannot conveniently be adjusted to the desired datum point for every observation.

39. We may also venture on the suggestion that, while many writers have been quite wrong in calling the constants of an empirical formula the “most probable ones,” those who have called them “the best” merely may have been quite justified in making use of such an expression where it has not been shown that analytical resources of greater power are available, as has been the case with the Fourier series, and it is hoped will be now seen to be the case with the Taylor series and other linear additive formulæ. Further, the habit of referring to empirical formulæ as “laws” may have helped to give such formulæ an importance which, compared with the graph, they assuredly do not possess.

[Footnote] * It should be mentioned, however, that Professor Callendar (Phil. Trans., 1887, p. 161) uses the formula of the standard parabola in connection with the reduction of platinum thermometry.

[Footnote] † In this connection see section 4.

###
Appendix.

I.

The “Mississippi Problem” is of some celebrity, and may with advantage be discussed. It refers to the velocity of the water at different depths in the Mississippi at Carrollton and Baton Rouge. The experiments were made in 1851, and were reduced by the experimenters by means of a parabolic approximation, which they applied according to common-sense principles similar to those of the present writer, except that they apparently did not perceive the bearing of the facts that are fundamental theorems in the graphical calculus (section 23). Consequently they failed to get such a good approximation to the experiments as is possible, although many engineers may think their approximation quite sufficient. Then in 1877^{*} Professor Merriman, after referring somewhat caustically to “tedious approximative methods,” proceeds to give a reduction by what he calls the “strictly scientific” method of least squares. This application is one to which our definition of least squares is strictly applicable. The calculations are given also in Professor Merriman's “Theory of Least Squares,” 1900.

[The section below cannot be correctly rendered as it contains complex formatting. See the image of the page for a more accurate rendering.]

Again, in 1884, Mr. T. W. Wright, “Adjustment of Observations,” page 413, reverts to the phenomena, applying both a parabolic and a cubic formula by least squares; and he remarks that, since the latter formula yields a smaller “sum of the square of the residual *errors*”—the italics are the present writer's—“the observations are better represented by the formula last obtained.” From the graph of deviations obtained by the present writer he has no hesitation in saying that the indications are for the application of a discontinuous formula, the first section holding from depth 0 to 0.5 or 0.6, and the other from that to 0 9, the formulæ differing chiefly in the constant term. This reduces the deviations to 1/2000, about, at most (judging from the graph), against about 1/700 with the least-square parabolic. The value of the probable (fortuitous) errors is not given or discussed in either reference, so that it is quite a matter of speculation whether this indication of discontinuity is genuine or whether it is a mere matter of luck. At any rate, we should not attempt to improve such a graph by means of a cubic formula; it evidently would require a formula of a large number of terms to reduce the deviations to as small limits as those of the discontinuous parabolic. It is to be noted that all these considerations are obvious upon a mere inspection of a graph of the deviations which are given by Professor Merriman, and also that it is not suggested that

[Footnote] * Journ. Frank. Inst., C. iv., p. 233.

the motion of the water was discontinuous; more likely there are systematic instrumental errors. (See Graph C.)

### II.

A few remarks may be made with respect to the arrangement of deviations in least-square form, the graphic process in the case of the power-expansion formula especially giving very convenient first approximations to the least-square values of the constants. If we take the expression “the mean” to signify that the algebraic sum of the deviations concerned is zero, and “the weighted mean” the same with respect to the deviations multiplied by datum values of certain weighting functions, then we may define least squares as the process which makes the weighted mean zero for all the weighting functions which can be obtained by differentiating the formula with regard to each of the constants separately and introducing the datum values of X. By writing down the equations which are needed to bring this about we obtain the normal equations of least squares, and we notice a valuable check on the correctness of a least-square reduction,^{*} for in the power-expansion formula we see that the mean must hold, and also the weighted mean of the deviations, each multiplied by the datum values of *x*, of *x*^{2}, and so on to the last degree; or for *x*^{2} we may substitute the standard parabolic, and so on. If, considering the formula to be in the standard terms, we examine a graph of deviations we can easily see that to approximate to least-square form we must take out all the amounts of standard components that will diminish the general magnitude of the deviations, but without allowing our judgment to come into play with regard to the run of any systematic deviation.

A little practice will often enable us to get such a close approximation to least-square form that the solution of the normal equations becomes much simplified. The normal equations, again, may be found more easily solved if made up in standard terms, for in examples similar to that of section 18 some of the coefficients in the normal equations tend to become zero, with formulæ of larger degree than the second—that is, using the formulæ of the “Notes on the Graphs.”

### III.

It is perhaps profitable to remark that, for the proper appreciation of a graph, we must get rid of the confusion that sometimes arises from the algebraical usage of making the symbols − and + stand for the operations of addition and subtraction and also as signs to designate whether a magnitude

[Footnote] * Given in Mr. T. W. Wright's book, page 144.

is positive or negative. The usage being as it is, we often in physical problems need to go back to the old arithmetical notion of negative quantities being impossible or imaginary, and consider the graph accordingly. For instance, take Taylor's theorem. It is usually expressed in one formula for the introduction of positive or negative increments; but Taylor himself (De Morgan, P. Cyc., p. 126) gave two formulæ, one for increments (or additions to *x*) and another for decrements (or subtractions from *x*). If now we take Maclaurin's form, we readily see that the second formula is impossible if we cannot reduce the magnitude or quantity to less than nothing.

Thus, to take a typical case, the magnetisation, or B—H, curve of iron, we cannot properly regard B and H as positive and negative quantities, but as direct and reverse positive magnitudes. A Taylor's series increment curve may then, perhaps, hold for magnitudes in either direction. If, however, we adhere to the algebraic usage, we shall be unable to express both of the symmetrical halves of the curve unless we employ only odd-power terms in our formula. This is obviously a very great disadvantage from a graphical point of view. As an indication of the contrary advantage it may be mentioned that a complete half of the sine curve can be built up of added proportions of the standard parabolic and quartic curves, with an extreme error of about 1 in 1,000 units, π radians forming the unit range of *x*.

Further, unless we adhere to the arithmetical notions, we are led to alternative values and imaginary quantities when, as in the example of the B—H curve may be desirable, we employ formulæ of fractional powers. Here the only alternative is to drop the fractions which have even denominators, which we can easily foresee may make formulæ of this class improcticable for arbitrary approximation to a curve.

To make clear what is meant, consider the expression √−1. Arithmetically—1 directs 1 to be subtracted from something which appears in the context. To take the square root of that which directs 1 to be subtracted from something else is evidently meaningless arithmetically. So also with (—1)^{2}, and so on. Algebraically we here take the symbol—to indicate that the number to which it is attached is negative in quality or impossible arithmetically. This quality is also indicated by using a different symbol, *i ^{2n}*, instead of † and—, where

*i*is an imaginary unit powers of which when combined with arithmetical symbols make quantities impossible in arithmetic. It is conventions as to the effect of powers of

*i*upon the directions to add or subtract which enable us to perform calculations upon arithmetical quantities by algebraical methods with only occasional ambiguities.

Enough has now been said to guard against misleading use of the symbols + and—in graphical work.

### Notes on the Graphs. (Plates XXXVI., XXXVII.)

Graph A gives the characteristic curve of deviations of a cubic-formula curve to which a parabolic approximation is applied (section 20); Graph B is that of section 21; Graph C that of the “Mississippi Problem,” Appendix, I. Graph I. shows the curve of the standard parabolic and Graph II. that of the cubic. Graph III. shows two symmetrical quartics—(A), zero at *x* = 0, ⅓, ⅔, and 1, and having the formula—

(A) = 2*x*—11*x*^{2} + 18*x*^{8}—9*x*^{4},

and (B), zero at *x* = 0, ¼, ¾, and 1, with formula—

(B) = 3*x*—19*x*^{3} + 32*x*^{8}—16*x*^{4}.

The maxima roach 0.11, and 0.25 respectively, or 1.2 and 1.6 per cent. of the numerical value of the coefficient of *x*^{4}. Graph IV. is a symmetrical quintic, zero at *x* = 0, ¼, ½, ¾, and 1, with formula—

(C) = 3*x*—25*x*^{2} + 70*x*^{8}—80*x*^{4} + 32*x*^{5}.

Its maxima reach 0.11, which, the coefficient of *x*^{5} boing 32, represents 0.34 per cent. of the value of the latter. Graphs V., VI., and VII. are specimen standards of the hexic, heptic, and octic degree respectively. Their formulæ are given in the accompanying table. They are all mado zero at the same points as the previous quintic. The curves all being symmetrical, about *x* = 0.5, are, as has been before noted, both simpler in form and more easy to compute values from if the origin of the abscissa is taken at *x* = ½ and the scale of the variable halved. The formulæ are accordingly given in terms of *z* = 2*x*—1, as well as in *x*—which is the most straightforward variable for common cases of a few terms.

It will be remarked that, so far as these standard formulæ have been developed, it has been arranged to keep the formula simple. In constructing formulæ for actual practice, however, it may be better to sacrifice the mathematical simplicity altogether in order to obtain curves that are convenient for the visual processes of analysis (see Postcript).

The percentage values of the maximum ordinates in those curves compared with the value of the coefficient of the highest power of *x* are as follows: Hexic, 0.39 per cent.; heptic, 003,5 per cent.; octic, 0.01,3 per cent. Thus, if we throw a formula into the standard form we can see what the effect will be if we throw away the highest torm. It is ovident that we shall often be able thus to reduce a formula of

a high degree to a simpler expression with very little error, but only for the range between the limits of the variable (original or transformed) 0 and 1.

### Some Symmetrical Standard Functions.

Term. | Formula in z or x. |
---|---|

Constant (x) |
Unaltered. |

Linear (z) | +½(1+z). |

" (x) | x. |

Parabolic (z) | +¼(1−z^{2}) |

" (x) | (x−2^{2}) |

Cubic (z) | +¼(z−^{2})=−¼(1−5z^{2}+4^{2} |

" (x) | +x−3x^{2}+2x^{2} |

Quartic (z) | −¼(1−z^{2})(1−4z^{2})=−¼(1−5x^{2}+4z^{4}) |

" (x) | +3x−19x^{2}+32x^{3}−16x^{4} |

Quintic (z) | +1z/4(1−x^{2})(1−4z^{2})=+1z/4(1−5z^{2}+4z^{4}) |

" (x) | +8x−25x^{2}+70x^{3}−80x^{4}+82z^{5} |

Hexic (z) | −1/16(1−z^{2})(1−z^{2})(1−4z^{2})=−1/16(1−6z^{2}+9z^{4}−4z^{2}) |

" (x) | +8x^{2}−22x^{2}+51x^{4}−48x^{2}+16x^{6} |

Heptic (z) | +z/16(1−z^{2})^{2}(1−4x^{2})=+z/16(1−6z^{4}−4z^{5}) |

" (x) | +3x^{2}−28x^{3}+95x^{4}−150x^{5}+112z^{6}−32x^{7} |

Octic (z) | −z2/16(1−x^{2})^{2}(1−4z^{2})=−z2/16(1−6z^{4}+9z^{4}−4z^{5}) |

" (x) | +3x^{2}−34x^{3}+151x^{4}−340x^{6}−256x^{7}+64x^{8} |

Postscript.—Since writing the above I have had occasion to employ formulæ of more than four terms, and certain practical points have come to light. Suppose the graph consists of a “smooth curve,” or, if it is of the datum-point variety, a smooth curve can be satisfactorily drawn through the data, then the analysis may proceed as follows: The constant, linear, and parabolic (standard) terms are obtained as before, and we draw in the base-line corresponding to this formula of three terms. Thus we have reduced the deviations to zero at *x* = 0, ½, and 1. We then scale off the deviations from this formula at *x* = ¼ and ¾, and then, using the standard cubic and a quartic the formula of which may be *x*(1—*x*)(1—2*x*)^{2}, and which is obviously zero at the same three points as the cubic, we compute the amounts of these functions required to reduce the deviations to zero at the quarter points of *x*.

If further approximation be desired, the next systematic step would be to reduce the deviations at the odd-eighth points of *x* by means of quintic, hexic (somewhat different from the tabulated sample, obviously), heptic, and octic functions; and so on.

The same operations might possibly be considered easier if performed by simple algebra, but we should then lose the analytical power of judgment which is the vital advantage of the “standard” method.

With reference to the computation of values of Y from formulæ, the values of the standard functions up to the quartic degree may be computed by writing down the datum values of *x;* (1—*x*); and (1—*x*)—*x*, or (1—2*x*), and forming the proper products. Up to the octic degree we should need also (1—4*x*) and (3—4*x*), or 2(1—2*x*) ± 1. It is clear that if computation is to proceed by means of an arithmometer the labour of computation will not materially differ from that in which simple powers are used. With logarithms it will be necessary to enter the table once extra for each factor required.