# Re: linear regression with error in x

S. F. Thomas (sthomas@decan.com)
Mon, 5 Feb 1996 19:40:28 +0100

Herman Rubin (hrubin@b.stat.purdue.edu) wrote:
: Rolf R. Engel <re@psy.med.uni-muenchen.de> wrote:
: >In article <3108911D.2278@thalassa.seos.uvic.ca>, tang@thalassa.seos.uvic.ca
: >says...

: >>For estimting a and b in
: >> y=a*x+b,
: >>the traditional linear regression method is biased if there is
: >>observation error in the independent variable x.

: >>Is there better regression method for this situation?

: >In a two-dimensional situation the principal component analysis is a method
: >that minimizes the errors perpendicular to the principal component axis. This
: >method therefore accounts for errors in y as well as x.

: This does not address the problem stated. The problem is not easy,
: and without further assumptions, there is no identified solution if
: there is joint normality.

: The usual careful formulation of the problem is

: Y = a*X +b;
: x = X + u;
: y = Y + v.

: The problem of whether a and b can be determined from the joint
: distribution of x and y depends on the assumptions about the
: joint distribution of X, u, and v. In general, normality of
: X makes it impossible to clearly identify a solution without
: assumptions about the sizes of the errors.

There is an alternative approach to this problem if it is the
case that the observation error in x is actually of the possibilistic
(vague and/or imprecise) rather than of the probabilistic sort.
As an example, suppose we consider the heights (x) and weights (y) of
men, and seek to regress y upon x. Suppose further we have neither
meter stick nor weighing scale to help with our observations.
Nevertheless, we have a skilled observer who can make estimates
of height to within plus or minus about an inch, and estimates of
weight to within plus or minus about 10 lbs. The observations
are recorded, along with error brackets (2 inches on each height
observation, 20 lbs. on each weight observation), and now we want to
carry out the regression. Notwithstanding the range of imprecision
in any given x-value, there is no probabilistic uncertainty
involved, once the observtion has been made. Suppose one such
x-value is 73 +/- 1 inch. Ten individuals may have been
so characterized in the sample, yielding ten different weight
characterizations, all also imprecise. But in addition to the
observation error (20 lb. range) in each y-value, there is a
distributional variation. There is no similar distributional
variation, however, regarding any *given* x-value, although there
is certainly observational error. The essential structure of
simple regression is thus upheld, notwithstanding the observation
error (imprecision) in the x-values: *given* the independent
x-variable, what value (with possible distributional variation)
could we expect for the dependent y-variable. In other words,
the distribution over the x-variables is not per se at issue,
and neither is the joint distribution, rather only a series of
conditional distributions, that of y *given* various x-values,
which merely happen to be vague/imprecise/fuzzy in nature.
(I imagine truly probabilistic variation in the x-values to correspond
to the variation inherent in the population, as opposed to
the possible error in a measurement of a particular instance.
The joint distribution could certainly capture the covariation
between x and y variables in that case, but then the conditional
aspect of the variation sought to be illuminated in a regression
exercise would not be addressed. Nor should it be assumed that
the regression sample is random with respect to x-values, which
it would have to be if it were to illuminate the joint variation.)
The essential notions behind likelihood methods will therefore
stand, but some modifications become necessary.

Realistically, the x-values will not be crisp intervals,
eg. [72,74] for a height observation, rather
the edges will be *fuzzy*, and we would have in general fuzzy events,
for which probabilities may readily be calculated given their
membership functions, and a distributional assumption over
the sample space in question. Given a regression hypothesis

y ~ N(bx+a, sig)

a fuzzy y-event may readily be calculated for each fuzzy x-event
using the fuzzy arithmetic. To simplify matters, take its
centroid, which gives the mean of the postulated normal distribution
at that x-value. That in turn allows a corresponding
probability to be readily calculated for the fuzzy events
corresponding to the actual y-values observed; for any specified
a, b and sig values. In other words, we may still map the
multi-parameter likelihood function induced by all the fuzzy
(x,y) observations.

Now, what will not in general give unbiased results would be
simply to take the maximum likelihood estimators of a, b,
and sig ... probably what the original poster was complaining
about. Elimination of nuisance parameters by a maximization
rule of marginalization over the likelihood function is known
in certain cases to yield results which are misleading in
precision, location, or both. You use instead a product-sum
rule of marginalization, which appears (Thomas, 1995) to avoid
the problems associated with the maximization rule. This will
in general lead to a possibilistic/fuzzy/likelihood characterization
of the parameters a and b of interest. (You can hardly expect the
estimates to be precise; even if the x and y values were
precise points correct to an infinite number of decimal places,
there still would be imprecision in the a and b estimates,
as long as the sample were finite, which of course it must
be.) If you needed to defuzzify the results, you would then
compute the centroids of the marginal possibility distributions,
which I would expect to be unbiased.