Re: Please help in Neural Network Implementaion !!

Subject: Re: Please help in Neural Network Implementaion !!
From: Chin Wei Chuen (
Date: Thu Feb 03 2000 - 02:36:54 MET


>From your previous post, I see that you have done a factorial analysis using
Jacobi rotation method to obtain factors with different correlation coefficients
and in short, you are mainly trying to predict the implication (Y) of the new
factors (X). Well, factorial analysis isn't exactly equal to Pricipal Components
Analysis as there are different ways of achieving them. What you achieved, I
believe, are the 'factors' or 'latent constructs' that reside in your data. (You
may check out some literature on these.)

Regarding your problem, if you want to a less labour intensive way of
calculating the correlation / covariance matrix of the factors, SPSS or SAS
should provide a good means of doing that.

Regarding the prediction of Y vs [X], which in fact is a pattern recognition
problem, there are two classes of Neural Network algorithms,i.e. supervised and
unsupervised training methods.

In your case of unsupervised training (that is: you don't have the actual Y
values) then you would need to use NN algorithms like Kohonen or Hopfield
networks. Feel free to read up on those

For supervised training, which I am more familiar with, you would require some
training data that includes both X and Y values to form the weights and later
feeding your newly formed subjects into the network to predict the outcome.
Traditional supervised Neural Network algorithms include the Multi-Layer
Perceptron (MLP) network using backprop or the Radial Basis Function.

I hope that can be of some help to you.



> > How did you do the Jacobi transformation.
> Well I got the recipe from the book of "Numerical Recipes in C - The
> Art of Scientific Computing" second edition written by William H. Press,
> William T. Vetterling, Saul A. Teukolsky, Brian P. Flannery and
> publish by the Press Syndicate of the University of Cambridge.
> "The Jacobi method consists of a sequence of orthogonal similarity
> transformations of the form of equation:
> A -> P(1)^-1 . A . P(1) -> P(2)^-1 . P(1)^-1 . P(1)^-1 . A . P(1) . P(2)
> -> P(3)^-1 . P(2)^-1 . P(1)^1 . A . P(1) . P(2) . P(3) -> etc.
> Each transformation (a Jacobi rotation) is just a plane rotation design
> to annihilate one of the off-diagonal matrix elements. Successive
> transformations undo previously set zeros, but the off-diagonal elements
> nevertheless get smaller and smaller, until the matrix is diagonal to
> machine precision. Accumulating the products of the transformations as
> you go gives the matrix of eigenvectors ( Xr = P(1) . P(2) . P(3) ....),
> while the elements of the final diagonal matrix are the eigen values."
> To get the 1st principal component I select the largest eigen value
> and the normalized corresponding eigen vector and do the following:
> n = number of elements in eigen vector
> k = number of columns (predictors) in the original matrix
> n = k because we calculate an eigen value and vector for each column in X
> X* is the matrix of principal components
> x*(1) = u(1)x(1) + u(2)x(2) + u(3)x(3) + ... + u(n)x(k)
> t is a threshold I predefine as the criteria for deciding whether to
> include the predictor in the new matrix, after all the purpose of this
> procedure is to create the smallest possible matrix that explains
> the most variance on the original matrix and eliminates predictor
> overlap and redundancy. This is always a trade off between accuracy
> and performance.
> t = 99% or what ever threshold is necessary based on empirical analysis.
> v is the variance explained
> m is lambda ( the eigen value )
> v = (m(1) / (m(1) + .. + m(k))) * 100
> if v <= t then include principal component x*(1) in new matrix
> repeat for each predictor
> in the case of predictor # 4 you would calculate all the the principal
> component by picking the 4th largest eigen value and its corresponding
> eigen vector and the percentage of variance explained would be:
> v = (((m(1) + m(2) + m(3) + m(4)) / (m(1) + .. + m(k))) * 100
> if v <= t then include principal component x*(4) in new matrix ... etc.
> > What are these matrices?
> These matrices are a series of cases(rows) containing predictors(columns)
> for the results we what to forecast. The average size is 30X400
> > You said that you did a factorial analysis; how are the 'clusters of
> > common variance' related it?
> I guess that I'm using the wrong terminology here, I've seen in some
> books describe the factor analysis (principal components analysis)
> as a way to identify clusters of highly correlated predictors.
> > > 2) I calculate a multi-variate regression analysis on these
> > > matrices using matrix algebra to create a series of weights that
> > > allow me to predict y given X (y(j)=b(0)+b(1)x(1)+ .. +b(k)x(n)).
> > > This procedure gives me good results with an average adjusted multiple
> > > correlation coefficient of 0.9987
> >
> > How is the 'average multiple correlation coeff' calculated?
> 1. calculate SSE (sum of squares due to error) sum((y - y')^2)
> 2. calculate SST (total sum of squares) sum((y - y')^2)
> 3. calculate SSR (sum of squares due to regression) SST - SSE
> 4. calculate R^2 (multiple correlation coefficient squared) SSR / SST
> 5. calculate adj R^2 (adjusted multiple correlation coefficient squared)
> adj R^2 = 1 - ( (n - 1) / (n - (k + 1)) . (1 - R^2 )
> > > My problem lies in the fact that I need to crunch every matrix
> > > individually in order to predict them even though all the matrices
> > > are a product of the same process and I should be able to predict
> > > all of them by processing just one matrix the beta weights that
> > > result from the analysis are different for every matrix. As you
> > > can imagine this calculations take for ever and are not easy to
> > > parallelize since they are always dependent on the previous step.
> >
> > What are you trying to predict? The correlation coeffs between your
> > predictors or something else?
> Well, several things. The main goal is to predict Y given X. The problem
> is that the sample data has noise and that the model that works for one
> set does not work for another set (I guess I'm lacking generalization)
> I'm also hoping that with a large number of training sets the noise
> would cancel out since it has a mean of zero. My other problem is that
> I don't have the real value of Y, what I have is the rank order of Y
> as in y(1) >= y(2) >= y(3) ... etc. I derive the real distribution of Y
> by testing many Y vectors and modifying them until I reach the highest
> multiple correlation coefficient without violating the rank order of Y.
> > > I would like to know from you folks in this group, what kind of NN
> > > would best suit this problem, that would learn how to solve all the
> > > matrices without having to crunch all the matrices and that would do
> > > this in parallel rather than a serial manner.
> >
> > It's hard to answer this question unless you're more specific about your
> > goal.
> I don't know if the above notes clarifies my goals enough, please let me
> know.
> Thank you,
> Joe Smith

This message was posted through the fuzzy mailing list.
(1) To subscribe to this mailing list, send a message body of
"SUB FUZZY-MAIL myFirstName mySurname" to
(2) To unsubscribe from this mailing list, send a message body of
(3) To reach the human who maintains the list, send mail to
(4) WWW access and other information on Fuzzy Sets and Logic see
(5) WWW archive:

This archive was generated by hypermail 2b25 : Thu Apr 06 2000 - 15:59:42 MET DST