Introduction to Independent Component Analysis

April 10, 1019

Introduction to Independent Component Analysis with fMRI examples

Denition of ICA

\[X = AS\]

\(X\) is a random \(M \times T\) matrix representing the observed matrix at T time points
\(S\) is a random \(M \times T\) matrix where each row is mutually independent
A is an M \(\times\) M full-rank non-random mixing matrix
Estimate the unmixing matrix \(W = A^{-1}\) such that \(\hat{S}=\hat{W}X\) is the estimates of \(S\).

ICA Assumption 1: Independence

Definition and fundamental properties: we define that two variables and are independent if and only if the joint pdf is factorizable in the following way: \[p(s_1,s_2 )=p(s_1) p(s_2)\]
Expected Value Definition (came form moment generating function):
\[E(h_1(s_1)h_2(s_2))=E(h_1(s_1)) E(h_2(s_2))\] for any functions \(h_1\) and \(h_2\) such that the above expected values exist and are well-defined.

2: Non-Gaussian

Why Gaussian variables are forbidden? the distribution of any orthogonal transformation of the Gaussian \((s_1,s_2)\) has exactly the same distribution as \((s_1,s_2)\).
If \(R\) were some arbitrary orthogonal matrix so that \(RR^{\top}=R^{\top}R=I\). Let \(A^{\prime} = AR\). If data had been mixed according to \(A^{\prime}\) (instead of \(A\)), we would have \(X\) as a Gaussian with a mean still 0 and covariance: \[ Cov(X)=E XX^{\top}=E A^{\prime}SS^{\top}{A^{\prime}}^{\top}\]

\[ E ARSS^{\top}R^{\top}A^{\top}=ARR^{\top}A^{\top}=AA^{\top}\]

Measure of Non-gaussianity

We need to have a quantitative measure of non-gaussianity for ICA Estimation.
Kurtosis : gauss=0 (sensitive to outliers) \[kurt(Y)=E Y^4 - 3 (EY^2)^2\]
Entropy : gauss=largest \[H_Y(y)=-\int f_Y(y)\log f_Y(y)dy\]
Neg-entropy : gauss = 0 \[J(Y)=H(Y_{gaussian})-H(Y)\]
Approximations \[J(Y) \approx \frac{1}{12}(EY^2)^2 +\frac{1}{48} kurt(Y)^2\]

Caution & Limitation

Scaling

For any constant \(a\), \[ X=AS=aA \cdot \frac{1}{a} S\]
It is often assumed that each source has zero mean and unit variance.

2. Signal Permutations

The order of mixing matrix and independent components are unknown.

\[ X=AS=AP^{-1} PS\]

3: Number of Sensors

The number of separated signals cannot be larger than the number of inputs.
Current research is being done to reduce this constraint.

Example: Data generation

Two soures were generated from standard normal and Uniform(0,2\(\sqrt{3}\)).

Example: Source separation

PCA vs. ICA

Example: Source separation (Cont.)

Why PCA could not recover sources while ICA could?
PCA finds the directions that explains most variance (2nd order), while ICA finds the directions that maximize “independence”.
If the soures are from normal distribution, finding direction that makes the estimated \(S\) uncorrelated will achieve finding independent sources.
However, if the sources are not from gaussian distribution, uncorrelation does not mean independence.
e.g. S1 ~ unif(-1,1)*sqrt(6), S2 ~ exp(1)
X1=2*S1 + S2; X2=S1-S2: X1 and X2 are uncorrelated, but not independent.
(X1-X2, X1+2X2) = (S1+2S2, 4S1-S2), Uncorrelated, but not independent.

functional Magnetic Resonance Imaging (fMRI)

Creates a series of images that capture blood oxygination level dependent (BOLD) signal in parts of the brain
Non-invasive method
Popularly used in clinical practice and human brain research

-Find the part of the brain handling certain functions such as movement, sensation, thought, etc.

Characterize the brain functions

functional Magnetic Resonance Imaging (fMRI) Data

fMRI volume image. Red and yellow area highlight activated

fMRI Data Matrix

fMRI Data Decomposition

\[X = AS \]

ICA in fMRI

Relying on the intrinsic structure of the data
No assumptions about the form of the HRF or the possible causes of responses are made.
The only assumption is mutual independence in “Space” (spatial ICA) or in “Time” (temporal ICA).

Temporal vs. Spatial ICA}

Temporal vs. Spatial ICA

Temporal ICA

Components have independent temporal dynamics: “Strength of one component at a particular moment in time does not provide information on the strength of other components at that moment”
Components may be correlated/dependent in space
Popular for cocktail party problem or EEG

Spatial ICA

Components have independent spatial distributions: “Strength of one component in particular voxel does not provide information on the strength of other components in that voxel”
Components may be correlated/dependent in time
Popular for fMRI

Real example data

Dataset was downloaded from FSL (https://fsl.fmrib.ox.ac.uk/fslcourse/)
Demostrate run ICA in fsl and R, and how to look at the results.
Dataset was “preprocessed” using FSL FEAT (default setting)

library(fslr)
img = readnii('ica_intro_fsldata/filtered_func_data.nii.gz'); 
print(img)

## NIfTI-1 format
##   Type            : nifti
##   Data Type       : 4 (INT16)
##   Bits per Pixel  : 16
##   Slice Code      : 0 (Unknown)
##   Intent Code     : 0 (None)
##   Qform Code      : 1 (Scanner_Anat)
##   Sform Code      : 1 (Scanner_Anat)
##   Dimension       : 64 x 64 x 21 x 100
##   Pixel Dimension : 4 x 4 x 6 x 1
##   Voxel Units     : mm
##   Time Units      : sec

One volume

Five random voxels

ts.plot(t(img[c(25,30,35,40,45),32,10,]),col=1:5)

Five random voxels

ts.plot(scale(t(img[c(25,30,35,40,45),32,10,])),col=1:5)

How to apply ICA

The easiest way is to use a neuroimage package such as FSL-Melodic, GIFT, etc.
In R, f.ica.fmri function is available in AnalyzeFMRI R package.
Typically spatial ICA is applied for FMRI, and temporal ICA is applied to EEG.
The basic steps include: array flattening, prewhitening, optimization (estiomation), spatial/temporal component recovery

Practice (Spatial ICA):

Flattening 4D volumes to 2D (V \(\times\) T) matrix

dims=dim(img)
X=t(array(img, dim=c(dims[1]*dims[2]*dims[3],dims[4])))
X.c = X - apply(X,1,mean)

Prewhitening
Select number of components (in fMRI application it is often set to be 20, but there’s also some systematic way to determine).

Sigma.eigen=svd(X.c)
n.comp=10
Y = t(Sigma.eigen$v[,1:n.comp])

spatial ICA example (Cont.)

ICA is applied to Y

set.seed(1234)
system.time(re<-fastICA(t(Y),10))

##    user  system elapsed 
##   2.592   0.871   3.472

temporal component recoveray

A.est = Sigma.eigen$u[,1:n.comp] %*% re$A
ts.plot(scale(A.est),col=1:10)

IC1

par(mfrow=c(1,2))
ts.plot(scale(A.est[,1]),col=1:10,ylab='')
map1=array(scale(re$S[,1]),dim=dims[1:3])
image(map1[,,10],breaks=c(-100:100)/100*21,col=bluered(200),xaxt='n',yaxt='n')

IC5,10

Some note.

fastICA has built in prewhitening!
Unfortunately, it won’t give you exactly same answer because the algorithm is stochastic! (due to optimization)
A lot of ICA algorithms are designed to have random initialization (like K-means clustering). They often gives similar answers, but not always. Have to be cautious.

library(fastICA)

set.seed(1235)
system.time(re2<-fastICA(t(X.c),n.comp=10))
image(cor(cbind(re$S, re2$S))[1:10,1:10+10],
      col=bluered(200),breaks=c(-100:100)/100)

Conclusion

We will discuss group ICA and advanced ICAs (longitudinal ICA) next week!