Blog –

# Welcome to the Data Whisper blog, where data whisper.

## From Network Strength to Effective Network Strength

Last updated: July 24, 2016

You may say I'm a significant voxel, but I'm not the only one. I hope someday you'll join us. And the brain network will live as one.

- Adapted from John Lennon's Imagine

1. Network strength

Rosenberg et al (2016) introduced a neuromarker based on intrinsic whole-brain functional connectivity, the degree to which brain activity in distinct neural regions is correlated over time, and showed that functional brain networks strength during a sustained attention task predicted individual differences in performance.

To characterize participant’s degree of connectivity, Rosenberg et al (2016) defined network strength, a single summary statistic, which is equivalent to a weighted degree measure for each network. Formally, let us define $S_k$, as the network strength for subject $k$. Then

$S_k = \sum_{i,j} \rho_{ij}^{(k)} I_{ijk},$

where $\rho_{ij}^{(k)}$ is the $\{i,j\}^{\text{th}}$ entry of the connectivity matrix (here the connectivity matrix is computed by taking the node-wise correlation between node-specific time series); and $I_{ijk} = 1$ if $\rho_{ij}^{(k)}$ follows a specific thresholding rule (e.g. $I_{ijk} = 1$ if $\rho_{ij}^{(k)} \geq 0.8$), and 0 otherwise.

2. Effective network strength

However, a hyperactive subject’s brain could be extremely active during both the resting state and the task state; hence the large correlaton coefficients between nodes that are extremely active during both resting and task states could overshadow the functional changes of other edges. For example, if for a subject there are 100 nodes that are highly correlated during both resting and task states with $\rho = 0.99$ each (or without significant changes, say from $\rho = 0.98$ to $\rho = 0.99$), the change of other edges from 0.001 to 0.1 (despite statistically significant and plausiblly neurobiologically meaningful) would likely to be ignored if we are interested in the node-wise sum (as in Rosenberg et al (2016)). Therefore, we propose the definition of effective network strength as follows:

$S_k^{eff} = E(S_k \vert X_k = 1) - E (S_k \vert X_k=0),$

where $X_k = 1$ refer to the task state, and $S_k = 0$ refer to the resting state.

Effective network strength describes the average change in network strength due to a change in experimental status. It is experimental dependent; and for numerical tasks, it could be written as $X_k = x$, for some task level $x$.

As you may have realised, the idea of effective network strength was inspired by effective connectivity (Friston (2011) and Aertsen and Preißl (1991)). Nonetheless, effective connectivity regards the effect one neuron has on another and is time dependet, whereas effective network strength is time-independent as when taking the connectivity matrix, the time is averaged out.

3. Network strength with multiple inputs

What if we have more than one tasks? And what if are interested in effective network strength between male and female, or between different age groups? To address these interesting questions, we define the conditional network strength on different inputs.

$S (\boldsymbol{x}) = S (X_1 = x_1, X_2 = x_2, \cdots, X_n = x_n)$

where $X_i$’s define tasks, and $x_i$ is its realization.

For example, $S(\text{treatment} = 1, \text{gender} = 1)$ defines network strength under a task for male (gender = 1); and $\{S_{gender}^{eff} \vert \text{treatment} = 1 \} = E(S \vert \text{gender} = 1, \text{treatment} = 1) - E (S \vert \text{gender} = 0, \text{treatment} = 1),$ defines the gender effect on effective network strength conditioning on treatment.

4. Effective network strength rate

Once we have defined the effective network strength, it is natural to define the effective network strength rate: the increase of network strength per unit treatment increase.

$\nabla = \frac{\Delta S}{\Delta \boldsymbol{x}} = \frac{S(\boldsymbol{x} + \Delta \boldsymbol{x}) - S (\boldsymbol{x})}{\Delta \boldsymbol{x}}$

Certainly, when the task is multidimensional $\boldsymbol{X} = \boldsymbol{x} = (x_1, \cdots, x_n)$, the effective network strength rate could be specified at every task direction, namely $\nabla = \bigg( \frac{\partial}{\partial x_1}, \cdots, \frac{\partial}{\partial x_n} \bigg)$.

## On Intuition: A Bayesian View

Last updated: July 22, 2016

When to the sessions of large data thought,
I summon up remembrance of things past,
I sigh the lack of many priors I sought,
And with old prior new experiment my dear posterior's based.

- Adapted from a part of Shakespeare's Sonnet XXX

To
THOMAS BAYES

WE propose that Professors Mikhail Filippov, Varun Prasad and Semir Zeki’s definition of intuition could be mathematically formulated in a Bayesian framework.

An intuition is an unconscious logical brain process with an outcome or conclusion in the form of a statement or proposition. But whether the outcome of the intuitive process is “right” or “wrong”, or “correct” or “incorrect”, can only be determined by a conscious logical process.

First, for simplicity, define an event as $\theta$. Here, $\theta$ is a random variable, a random quantity whose value is subject to variations due to chance. For example, looking into the mirror, you think your height ($\theta$) is 72 inches. However, your visual judegement may not always be correct, or that the mirror may not be perfect. Hence, we could put a probability measure $Pr$ on $\theta$. If you strongly believe in your judegement, then the probability density $Pr(\theta)$ is rather spiky; for example, it covers the height from 71-73. On the other hand, we may have less confidence when a visually-impaired person claims that he finds himself 72 inches tall. In that case, the probability density is rather flat; for example, it covers the height from 61-83. We call $Pr(\theta)$ the prior. Another wonderful example of $\theta$ is the location of where a tennis ball lands in a match ( Körding and Wolpert, 2003).

Next, we define the data observed from experiments as $D$. For example, the measured height. Measured heights have errors, too. Furthermore, the measurement error for a child ($\theta = 50$) is in general smaller than it for a basketball player (e.g. Dwight Howard at $\theta = 83$). Therefore, at (conditioning on) every $\theta$, we define the likelihood measure as $Pr (D \vert \theta)$.

Finally, we define the intuition of an event given experimental data ($D$ ) as $\mathcal{I}(\theta \vert D)$.

By Bayes rule, we have:

$\mathcal{I}(\theta \vert D) = \frac{ Pr(D \vert \theta) Pr (\theta) } {\int_{\Theta} Pr(D \vert \theta) Pr(\theta) }$

It follows, $\mathcal{I}(\theta \vert D) \propto Pr(D \vert \theta) Pr (\theta)$, where $\propto$ stands for proportional to . Let us use the height example to explain this: $\mathcal{I}(\theta = 72 \vert D = d) \propto Pr(D = d \vert \theta = 72) Pr (\theta = 72)$, which indicates that, our intuition about one’s height being 72 inches, given observed data ($D = d$) depends on how well the measurement is, and how strongly our prior information of the height being 72 is. If our previous experience leads us to strongly believe the height is 72 (i.e. $Pr (\theta = 72)$ is large), our intuition that the height is 72 would be strong.

The above equation translates as follow: intuition is based upon the data available and the prior experience information. This conclusion echoes with the statement that Professors Mikhail Filippov, Varun Prasad and Semir Zeki made:

We believe, however, that to have an intuition in any area, one must have experience of that area or knowledge of it, to provide a conclusion or statement, whether correct or incorrect.

Certainly, intuition could change overtime, because our view (prior information) of the world changes. Hence, we could update our Bayesian intuition to one that is “dynamic”, namely:

$\mathcal{I}(\theta (t) \vert D) = \frac{ Pr(D \vert \theta (t) ) Pr (\theta (t)) } {\int_{\Theta } Pr(D \vert \theta (t)) Pr(\theta (t)) }$

This may be studied with Dynamic Bayesian Network (DBN) by Paul Dagum and Recursive Bayesian estimation (Bayes filter). I have little knowledge of this, interested readers could refer to corresponding papers.

Other Interpretations

Biologically, intuition could be explained by early brain signals preceding decisions (see Benjamin Libet, Haggard and Eimer, Soon et all, and Schultze-Kraft).

The thought-provoking post on intuition, and other posts regarding the implication of mathematical beauty on asynchronous brain operations, could be found on Professor Semir Zeki’s blog.

Professor Zeki and I would like to encourage discussion and comments. One question I would like to hear our readers’ opinion is: in the above post, I interpret intuition as the posterior that is based upon data and prior information (the latter of which is based upon previous training or experience). However, one could argue that intuition is indeed the prior; and with data, we update and form our belief (the posterior).

Further interested readers on these topics could refer to Professor Semir Zeki’s books: A Vision of the Brain, Inner Vision: an exploration of art and the brain, Splendors and Miseries of the Brain, and Balthus ou la quête de l’essentiel, and La bella e la bestia: arte e neuroscienze.

## On Brain Fingerprint: Extensions

Last updated: July 20, 2016

￼The unique fingerprint of every individual defines our unique purpose and mission on earth.

In this post, we shall discuss the wonderful brain fingerprinting paper published on Nature Neuroscience by Finn et al, and present a few potential extensions.

1. Recap

First, let us briefly describe the dataset and the fingerprinting procedures in Finn et al (2015). The study has 126 subjects, each subject has 268 nodes, each subject is measured at $T$ time points (rigorously we should write $T_i, \forall 1 \leq i \leq 6$ because the time points vary between sessions), and there are six image sessions: two resting sessions (rest 1 [on day 1], rest 2 [on day 2]) and four task sessions: working memory, emotion, motor and language (two on each day). Therefore, the dataset is

$126 \times \underbrace{268 \times T}_{\text{compute} \hspace{2mm} 268 \times 268 \hspace{2mm} \text{node-wise correlation matrix}} \times 6.$

For every subject at each image session, we could convert the $268 \times T$ matrix into a $268 \times 268$ node-wise correlation matrix. Therefore, at each session, we have 126 subject-specific node-wise correlation matrix. Next, we vectorize each node-wise correlation matrix. Finally, consider subject $i$ from one image session (say Rest 1). To find a matching subject from another image session (say Rest 2), we take $\bold{v}_i ^{(1)}$, the vectorized node-wise correlation matrix for subject $i$, where the superscript indicates it is from the first image session, to each of vector in the Rest 2. The subject identified is the one who achieves

$\rho_{i j*}^{(1,2)} = \arg\max_{1^* \leq k^* \leq 126^*} \rho_{i k^*}^{(1,2)}.$

where $\rho_{i j*}^{(1,2)} = corr(\bold{v}_i^{(1)}, \bold{v}_{j^*}^{(2)}).$

Notice, the $*$ indicates that subject $15^*$ in Rest 2 may differ from subject 15 in Rest 1.

2. Extensions

2. 1. Brain topology v.s. brain function on identication accuracy

The paper claims that identification power came from true differences in functional connectivity rather than anatomic idiosyncrasies. To support the claim, the authors used kernel smoothing on connectivity matrices, and showed that, with larger smoothing kernels (under which the registration advantages for the same brain compared to different brains should be vastly reduced or eliminated), only a slight drop in identification power. One may argue that smoothing does not eliminate differences caused by topology fully. Let us take an extreme example: consider two subjects (S1 and S2) with significant anatomical difference in 100 adjacent nodes, where the brain measurements of S1 for these nodes during two sessions are nonzeros, and those of S2 for these nodes during two sessions are all zeros. In this case, for those 100 nodes, a large smoothing kernel (unless an infinite bandwidth is chosen), will give S1 small but nonzero smoothed brain measurements whereas S2 has all zeros. Therefore, S1 in one image session would still match S1 in another session, and so is S2 - this could potentially argue against the functional connectivity claim.

To better eliminate anatomic idiosyncrasies, we need brains that are particularly similar anatomically. One could start by looking at the HCP twin data, concerning that the brain topology between identical twins is almost the same. Using the HCP twin and non-twin data, we could then ask three questions: is brain fingerprinting (1) due to brain functions alone, (2) due to brain topology, or (3) due to a combination of (1) and (2). Let us call the identication accuracy rate for twins as $T$ and it for non-twins as $NT$, both of which are in percentage. If $T \sim NT$, then we conclude that brain fingerprinting is not due to topoloty; if $0 << T < NT$ (i.e. we still achieve some accuracy using twin data, but not as good as if we use non-twin data) then we conclude that brain fingerprinting is due to a combination of brain function and topology; and if $T \rightarrow 0, NT >> 0$ (i.e. we almost cannot fingerprint twins), then we conclude that brain fingerprinting is due merely to brain topology. It would be interesting to see how much identification accuracy is due to functional connectivity and how much due to anatomic idiosyncrasies, via proposing a Functional/Anatomic Ratio Score, by studying the twin and non-twin data.

2. 2. “Dynamic” brain fingerprinting

The study showed that “longer time courses better preserved individual characteristics in connectivity profiles”. However, it would be interesting in examining if the identification accuracy rate changes across the time of the day. This question is driven by studies we did on physical activity data analysis, where our daily physical activity on average follows a “M” shape. It will be tremendously interesting if the identification accuracy rate across time somewhat matches the physical activity pattern (especially during task sessions) - which would indicate that stronger physical activity could possibly facilitate brain fingerprinting. See below.

Here, we present the brief procedure. First, divide the total time points in a time $T$ into $N$ periods, each with $t$ time points, i.e. $T = N \times t$. For every $t$ time points, we repeat the fingerprinting procedure discussed in the paper, and obtain $N$ identification accuracy rates, and call them $R_1, \cdots, R_N$. Finally, we plot $R_1, \cdots, R_N$ against $N$ time points of the day.

2. 3. Age, gender, etc. effects on identification accuracy rate
It would be interesting to investigate the age effect on identification accuracy rate. If there is a obvious difference identification accuracy rate between age groups, one may argue that this is due to functional difference related to age. Potentially, there could be six types of age effect trends (see below). Based on the trend, the age effect on identification accuracy rate would gain us neurobiological understanding of the brain functional similarity due to age: for example, the top left image below would indicate that the brain functions are the most stable (similar between two sessions) during one’s adulthood while not so much during childhood (possibly in that kids’ brain signals are super active one day and less so the second day or their state of mind has more variations - possibly not yet functionally matured), nor during ederly population (possibly due to lesions); the image to its right would indicate that age seems to not affect measured brain functions significantly.

Similarly, one may look at the diffence of identification accuracy rate between male and female.

2. 4. Neurodegenerative Disease Similarities due to Brain Functions
One may be interested in seeing association between differentn neurodegenerative diseases due to functional similarities. For example, one may ask, amongst patients who develope Parkinson’s (PD) , Alzheimer’s (AD), and Huntington’s (HD) diseases, are PD patients brain functions more similar to AD’s, than to HD’s?

Let us take an example to describe the procedure. Consider two task-based imaging sessions: working memory and motor (we choose these two because AD and PD patients’ brain functions may be correlated with respect to these two tasks). Now for subjects under the working memory task group, we could group them according their disease status. For those in the motor group, we could randomize the order of subjects (because our goal is to see how well we match patients in different disease groups). Next, we vectorize each correlation matrix; and calculate correlations between these vectors in two task groups. The results are stored in a disease-specific (dis)similarity matrix due to brain functions.

Certaily, the same subjects would likely be accurately matched; nonetheless, it would be interesting to see if the subjects in the same disease group are matched. The grouping is present in a disease-specific (dis)similarity matrix.

2. 5. Brain-Genome Fingerprinting Whilst genomic information may assist identify subjects alone (I have little knowledge about whether there are exisiting works on this), to match subjects using brain functions along with genomic information during two sessions gains us little because genes for the same subject do not vary much during these sessions; adding brain measurements into gene information would even likely to reduce identification accuracy rate. Nevertheless, it would be interesting to see if we could match one’s brain functions with gene sequence, and vice versa. However, one may argue, if the matching is accurate, we are actually matching gene sequence with brain structure, rather than with brain function. I am not able to adress this, even with the twin data in (2.1), because identical twins are likely to have similar brain structures, and have the same DNA (assuming SNPs are ignorable).

2. 6. Fingerprinting Defining Matrix

Consider $Y_1$ and $Y_2$, each of which is $n \times v$, where $n$ is the number of subject and $v$ is the number of vectorized nodes, as data from two sessions. In particular, for example, the first row of $Y_1$ is the node-wise correlation matrix for subject 1 in session 1 vectorized.

We then define a $n \times n$ (where $n$ is the number of subjects) Fingerprinting Defining Matrix, where if (consider row-wise) the $(i,j)^{\text{th}}$ entry is the largest among all entries in row $j$, we say subject $i$ in session 1 matches subject $j$ in session 2.

The Fingerprinting Defining Matrix $M$ of dimension $n \times n$ satisfies:

$M = \arg\min_{ X } \{ \parallel X \parallel _{\mathcal{F}}: Y_1^T X Y_2 = V_1^2 V_2^2 \},$

where $V_1$ and $V_2$ are scaling matrices of $Y_1$ and $Y_2$, respectively.

In other words, there are a lot of matching matrices that allows $Y_1^T X Y_2 = V_1^2 V_2^2$ (i.e. obtain the “product variance” of data from both sessions); but $M$ is the matching matrix with the smallest matrix norm to achieve so.

Indeed, the correlation matrix $P = Y_1^T Y_2$ achieves that $Y_1^T P Y_2 = V_1^2 V_2^2$ because $Y_1^T P Y_2 = D_1^T V_1 S_1^T (S_1 V_1 D_1 D_2^T V_2 S_2^T) S_2 V_2 D_2 = V_1^2 V_2^2$. However, I do not have a proof that the correlation matrix is the matching matrix that has the smallest matrix norm to achieve $Y_1^T X Y_2 = V_1^2 V_2^2$. It would be interesting if someone could prove this (or disprove this). If the former, then we will have a mathematical interpretation for the correlation matrix to be a fingerprinting matching matrix that matches the data from two sessions to abtain the “product variance” from the two data sets.

2. 7. Connecting Brain Fingerprinting with Encoding models, RSA, and Decoding Models

The connectivity matrix approach used in Finn et al (2015) is in essence identical to it used in RSA (Kriegeskorte et al, 2008). A few extensions could be made linking these approaches (I will write another post on this on a later date). Here, I would like to discuss the behavioral prediction discussed in the paper and potential extensions.

The paper uses feature network strength to predict gFscore (fluid intelligence) score. For subject $k$

$gF_k (X_k) = \beta_0 + \beta_1 X_k + \epsilon_k,$

where $X_k = \sum_{i,j} \stackrel{(k)}{\rho^{(+)}_{ij}}$ is the positive feature network strength for subject $k$, which sums up all significant (by thresholding) postive correlation $\rho_{ij}$’s. The model for negative feature network strength could be Similarly defined.

The prediction is well done. However, since the correlation between predicted and observed gF scores was $r = 0.50 (p< 10^{-9})$ for the positive-feature model and $r = 0.26 (P = 0.005)$, it would be interesting to see if there are things we could improve upon.

First, we could check if a subregion of connectivity matrix (e.g. only take the part of the connectivity matrix whose nodes correspond to Broca’s and Wernicke’s areas) under specific tasks (e.g. language) would predict gF better. Second, since the authors used simple linear regression to train the data, we could investigate if there is a model that better fits the training data (and hence have a better leave-one-out prediction). For example, a non-linear model

$gF_k(X_k) = f(X_k) \epsilon_k,$

where $f$ is some non-linear mapping. Or a semiparametric model

$gF_k(X_k) = f(X_k, \theta, \gamma) + \epsilon_k,$

where $\theta$ is some finite-dimensional parameter of neurobiological interest, and $\gamma$ is a nuisance parameter (such as the BOLD signal variance in each node).

3. Conclusion

In summary, Finn et al (2015) is a tremendously interesting paper that demonstrated existing subject-specifi functions in the brain. I eagerly look forward to more exciting follow-up works from the group.

## On Task-based Age and Gender Effects on Spatial Brain Representations: with Applications to the Human Connectome Project

Last updated: July 16, 2016

￼Biology gives you a brain. Life turns it into a mind.

– Jeffrey Eugenides

I had the idea of this post after an inspirational conversation with Professor Ruben Gur, who, together with his wife, Professor Raquel Gur, are world-renowned experts in neuropsychology, and have done a tremendous amount of work in gender and age differences in the brain.

Here, I would like to propose a functional mixed effect model to investigate how gender and age effects on encoded (representational) spatial brain signals under (i.e. conditioning on) different tasks. This framework can be implemented to study the 500-subject datasets from the Human Connectome Project, with a goal to understand age and gender effects on the following main categories: alertness, cognition, emotion, motor function, personality, sensory, substance use, etc. Each main category could be further divided into subcategories: e.g. one may be interested in studying age and gender effects on emotion, in particular, psychological well-being.

We consider a marginal model for the brain activity data. Specifically, for subject $i$ at visit $j$, consider:

$y_{ij}(\nu_k) \vert x_{ij}, z_i = \alpha(\nu_k, x_{ij}) + z_i \beta (\nu_k, x_{ij}) + \epsilon_{ij} (\nu_k),$

where $\nu_k \in \mathcal{V}$ indicates the $k^{\text{th}}$ voxel (for simplicty, we could vecterize voxels in a 3-D brain to a long vector), $y_{ij}(\nu_k)$ is the voxle-specific intensity, $x_{ij}$ is the age, $z_i$ is the gender, and the error follows a working model

$\epsilon_{ij} (\nu_k) = b_i (\nu_k) + w_{ij} (\nu_k) + \epsilon_{ijk},$

where $b_i (\nu_k)$ models the subject-specific mean function of brain activity, $w_{ij} (\nu_k)$ models the deviation of the $j^{\text{th}}$ visit from the subject-specific mean function, and $\epsilon_{ijk} \sim N(0, \sigma_\epsilon^2).$ Furthermore, we model $b_i (\nu)$ and $w_{ij} (\nu)$ by two zero mean Gaussian processes with spatial covariance functions $\Sigma^b(\nu,v)$ and $\Sigma^w(\nu,v)$, respectively. Finally, we assume that $b_i (\nu)$, $w_{ij} (\nu)$, and $\epsilon_{ijk}$, are mutually independent.

To interpret the model, here $\alpha(\nu, x)$ is the brain activity mean surface, specifically, $\alpha(\nu_k, x_{ij})$ gives the mean brain activity for voxel $\nu_k$ and age $x_{ij}$; $\beta(\nu, x)$ is the brain activity slope surface, specifically, $\beta(\nu_k, x_{ij})$ indicates the brain activity difference due to gender for voxel $\nu_k$ and age $x_{ij}$.

It would be relatively straightforward to implement computational wise if one’s research focus is on ROIs (in other words, the number of voxels is reasonably small). When the number of voxels is large, we may run into computational problems, because we are essentially dealing with a huge surface of $N \times p$, where $N$ is the number of ages and $p$ is the number of voxels. We may then consider dimension reduction or assume that the brain network is sparse. We shall not expatiate on this here; interested readers could consider approaches such as SVD, fPCA, ridge regression, etc.

Readers who are interested in the estimating procedures of the above model could discuss with me via email. If one is interested in learning more about functional data analysis, please refer to the following wonderful resources: Ramsay (2006) proivdes a broad overview of functional data analysis methods with applications to curve and image analysis; and Ruppert et al. (2003) overviews functional data analysis in the semiparametric framwork in detail.

This framework can be implemented to study age, gender, and age-gender effects on sensory data (audition, olfaction, pain, taste, vision), on substance use (alcohol, tobacco, and drug), psychiatric and life function, motor (endurance, locomotion, dexterity, and strength), emotion (emotion recognition, negative affect, psychological well-being, social relationships, stress and self efficacy), cognition (episodic memory, executive function/cognitive flexibility, executive function/inhibition, fluid intelligence, language/reading decoding, language/vocabulary comprehension, processing speed, self-regulation/impulsivity, spatial orientation, sustained attention, verbal episodic memory, working Memory, and alertness (cognitive status, and sleep). Each area could be developed into a journal article using the above framework.

## On Generating Artificial Functional Magnetic Resonance Imaging (fMRI) Data (II): Dynamic 3D Visualization

Last updated: June 7, 2016

￼The human brain starts working the moment you are born and never stops until you stand up to speak in public.

– George Jessel

Once we understand the data generation mechanism of fMRI data, and have become familiar with the parameters needed, we can proceed to simulate desired fMRI data. There are many wonderful articles providing examples and codes for simulating fMRI data (see, for example, Welvaert et al). For a list of software for medical image analysis, see here.

Here I shall focus on visualization of simulated fMRI data, so as to examine the quality of the realizatio of the simulation, such as, if we are to simulate fMRI data for a regional brain lesion, do the simulated fMRI datasets accord with real world scenarios? Additionaly, I am interested in seeing the intensity of signal changes across time. This has real world implications. For example, visualizing the brain dynamic functional connectivity; visualizing the growth of tumors; and etc. Let us consider a concrete example. Consider a patient with two brain regions associated with abnormal activities (e.g. the patient has two tumors); and we wish to simulate fMRI data in this scenario. Below, on the left we present dynamic 2-D intensity of simulated data for a slice (a cross-section) of the patient’s brain during 100 scans (with TR = 2); and on the right we present the dynamic 3-D image of threshold activities for the first 20 scans. Notice the two regions with intensive signals.

Please send me an email if you want the code for generating above dynamic 2D/3D plots.

For generating static 3D plot, there is a terrific article written by Muschelli et al.

For generating static 2D plot, there is a terrific article written by Tabelow et al. The data are here. Here is some further reading.

## On Generating Artificial Functional Magnetic Resonance Imaging (fMRI) Data (I): Generating Mechanism

Last updated: June 5, 2016

Understanding how the brain works is arguably one of the greatest scientific challenges of our time.

– Alivisatos et al

A useful approach to develop one’s skills in investigating human brain data is to understand incisively the data generation mechanism. At the low level, one may look at the biophysical basis of action potential, spike-timing dependent plasticity, neuronal dynamics, etc. I have little knowledge in these areas; but interested readers can refer to Bogacz, Lengyl, Pfister, amongst other great scientists working in these fields. Here, I aim to discuss the data generating mechanism at the voxel level, mainly, fMRI data. I have learned a tremendous amount from two wonderful papers: (1) Eloya et al; and (2) Welvaert et al. The post serves as a self-learning guide, as I selectively walk through a few key steps. Here is a useful reference of MRI glossary. I shall write another post in the near future discussing the importance of knowing the exact data generating mechanism on making “causal” claims, etc.

#### Part 1: Model Basics

First, fMRI data can be decomposed into two parts: (a) signals (experimental design caused activation, or resting state activation); and (b) noises. Namely,

$s(t) = h(t) + \epsilon(t),$

where $s(t), h(t),$ and $\epsilon(t)$ are signal response, activation function, and noise, respectively, at time $t$.

#### 1.1 Signals

The activation function $h(t)$ is convolved from $h(t) = f(t)*g(t) = \int_{-\infty}^{\infty} f(\tau) g (t-\tau) d\tau$, where we can treat $f(t)$ and $g(t)$ as a stimulas function (dependent upon experiemental design) and a “weight” function (a haemodynamic response function (HRF)), respectively. Then $h(t)$ is, intuitively, a weighted (by $g(t)$) average of $f(t)$ over the real line. For example:

There are a few choices of HRFs ($g(t)$), e.g., Gamma, double-Gamma, and the balloon model. Here, we take the Gamma function as an example. Consider the gamma-variate HRF function:

$g(t) = \frac{1}{k \tau_h (k-1)} \left( \frac{t}{\tau_h} \right)^k \exp{\{- \frac{t}{\tau_h}\}}$

with $k=3$ and $\tau_h = 0.242$;

Consider a block function $f(t)$ (corresponding to a block experimental design; and in the following figure, the black lined blocks), then the convolved activation function $h(t)$ using gamma-variate HRF function is shown below in dashed blue:

#### 1.2 Noises

There are four types of noises: (1) white noise, (2) colored noise, (3) temporal noise, and (4) spatial noise.

##### 1.2.1 White Noises
1. a. It is a system noise that is part of the fMRI data.
2. b. It could be Rician distributed (for fMRI data with low signal-to-noise ratio, SNR); or Gaussian distributed (higher SNR, > 10).
##### 1.2.2 Colored Noises
1. a. It depends upon the signal, the timing or the location
2. b. For signal-dependent colored noises, it could be (a) low-frequency drift (due to slow fuctuations in the scanner hardware; the drift is modeled as a basis of discrete cosine functions); (b) physiological noise (models the variability in the signal that is caused by cardiac and respiratory artefacts; the noise is modeled as sine and cosine functions); and (c) task-related noise (accounts for spontaneous neural activity due to the experimental task; modeled by adding random noise where and when activation is present; and it can be interpreted as residual noise from head motion that is not removed in the pre-processing stage).
##### 1.2.3 Temporal Noises
1. a. Due to the fact that fMRI data are repeated measurements;
2. b. It could be modeled as an $AR(p)$ model: $\epsilon(t) = \sum_{i=1}^p \rho_i \epsilon_{t-i} + \eta_i,$ where $\eta_i \sim N (0, \sigma^2)$.
##### 1.2.3 Spatial Noises
1. a. It models the spatial dependencies in fMRI data: adjacent voxels are more correlated than those that are far apart;
2. b . It could be modeled as (1) an AR correlation structure, (2) a Gaussian random field, and (3) a Gamma random field.

As an illustration, below on the left, we present a figure with task-based noise; and on the right, a 3-D view of a slide (20 by 20) of Gaussian random field spatial noises.

#### Part 2: Parameter Specification

Now that we understand the general mechanism of fMRI data generation, we need additional information such as when does the activation start for each experimental design, where does the activation occur in the brain, etc., in order to simulate real-life-like fMRI data. The additional information consists a few parameters:

• onset time;
• duration;
• number of scans;
• TR;
• hrf;
• effect size;
• SNR;
• noise type: low-frequency, physiological, task-related, mixture;
• spatial noise: AR correlation, Gaussian random field, Gamma random field;
• weights of each type of noise;
• noise type: gaussian, rician;
• the number of activated regions;
• the coordinates of the regions;
• the radius of the region;
• the shape of the region (cube and sphere); and
• other parameters

• Other useful packages includes:

• Package: oro.nifti
• write and read AFNI files;

## Get Started with SPM in Ten Steps

Last updated: July 18, 2016

￼My dull brain was wrought. With things forgotten.

– William Shakespeare

Many a friend of mine ask me about how to get started with Statistical Parametric Mapping (SPM) quickly. Being an R user primarily, I find myself incompetent to answer their question rigorously. However, as one of the most fundemental software in analyzing functional neuroimaging data, knowing how to use it - even moderately - has gained myself huge advantage in my research. Therefore, I present this post as a brief introductory guidance to readers who are new to SPM. Advanced readers may refer to SPM manual. Readers interested in knowning more about fMRI data analysis may also find the book Principles of fMRI and corresponding vedio lectures by Professors Lindquist and Wager, and Handbook of Functional MRI Data Analysis by Professor Poldrack et al, helpful.

Certainly, my experience with SPM is very limited; and likely I may present errors or omit important features of SPM in ten steps. I will update the following procedures regularly; if you have any suggestions, or criticisms, please send me an email to help me improve them.

Step 3: Open SPM in Matlab by entering spm in the command window. Based upon the data set you have (PET, EEG, or fMRI), click the corresponding icon (see Figure 1). Here, we choose fMRI, for example;

Step 4 (Display an image): First, download desired datasets, which are usually stored in  NIfTi (Neuroimaging Informatics Technology Initiative) files, saved in one *.nii file or a *.hdr and an *.img file. Functional images are saved as fM*.image; and structual images are saved as sM*.image. To display, click Display > file_location > *.nii > Done . We see the display of a structual image (see Figure 2).
Step 5 (Preprocessing): Preprocessing includes: motion correction; slice-time correction; coregistration; segmentation; normalization; and smoothing. These icons are in the top of the SPM menu (See the red frame in Figure 3). Let us take motion correction for example:  Realign > Realign (Est & Reg) > Double click Data <- X > Double click Session <- X   > in the pop-up window choose desired file to be motion corrected > Done > click the green bottom  (See Figure 4). If inputing files are sM*.image (similar for functional images), resulting files are located in the original filepath as meansM*.image and rsM*.image. Similarily, for slice-timing, the input image is rsM*.img and the output image is arsM*.img; for coregistration, the input images are reference image meansM*.img and source image sM*.img, while the output image is updated “source image”; for segmentation, input image is sM*.img and the output images are c1sMM*.img, c2sMM*.img, and sM*_seg_sn.img; for normalization, the input images are sM*.img and arsM*.img, and the output image is warsM*.img; and finally, for smoothing, the input images are warsM*.img and swarsM*.img.

## Sir R. A. Fisher on Brain Science: a Neo-fictional Interpretation

Last updated: June 1, 2016

Statistics may be regarded (i.) as the study of populations, (ii.) as the study of variation, (iii.) as the study of methods of the reduction of data.

– Sir R. A. Fisher, Statistical Methods for Research Workers

June 1, 1919.

My name is Ronald Aylmer Fisher. I am a researcher at Rothamsted Experimental Station (the director here, Dr. E. John Russell, calls it Rothamsted Research for short) in Harpenden, a small town in the St Albans City district in the county of Hertfordshire, England. Here at Rothamsted Research, I analyze data obtained from crop experiments during the day. At night, I am working on a few manuscripts on experimental design and gathering several statistical methods to analyse variance in field studies. Two weeks ago, I paid a trip back to Cambridge and had a long discussion with my old boys John Keynes, Reginald Punnett, and Horace Darwin, and have become very interested in studying data collected from the human brain. I would like to present a few thoughts sprung from our conversation.

I claim that the usefulness of statistics in neuroscience can be divided into three areas: the study of the brain in populations, (ii.) the study of variation of brain measurements in populations, and (iii.) the study of methods of the reduction of large brain data. I shall further expatiate these three areas by including statistical approaches with regards to data science development in neuroscience.

First, the study of brain data is to gain insights to understanding how the brain perceives, processes, stores, and output information, in populations, or aggregates of individuals, rather than of individuals. The term population in brain science refers not only to an aggregate of brain activity measurements from multiple subjects, but also to an aggregate of a single brain measurement repeated multiple times for one subject. The former indicates our recognition of variations of brain activities amongst different individuals, whereas the latter represents our appreciation that the object of studying single subject brain activities is not to attempt to achieve an individual result, but rather, we make our best effort to ensure our findings representative. There are significant merits in studying data containing measurements of multiple subjects and those containing multiple measurements of single subjects. One of the end goals of brain science is to make scientific progress on diagnostics, treatments, cures and management of brain disorders. In order to raise the findings we have about the brain to the rank of science, we shall make statistical arguments about properties of the brain in a large aggregates of individuals. In order to produce treatments that target at a particular individual, we shall make statistical arguments about properties of the brain for that individual, based upon a large aggregates of measurements of his/her brain. Understanding how the brain works across subjects allows us to apply these principles at the individual level, and to advance applications that achieve artificial intelligence by mimicing the way an average brain performs, such as neural networks computers. Understanding how the brain works at the individual level would assist us in understanding how a specific brain and its activities deviate from the average. It hence leads to scientific progress such as an introduction of personalized medical plans, and a usage of brain signals to identify a subject (See Constable et al and Wachinger et al, among others). With an advancement of data acquisition technology and the popularization of high-performance computers, we are obtaining brain data in an unprecedentedly high-resolution, rapid, and accurate manner. Yet, there are strides to make. We shall advance our understanding of how the brain works in different types of populations: infants V.S. adults, females V.S. males, etc., how the brain signals change across time, and how brain signals change according to different (visual, auditory, sensory, etc.) inputs. Furthermore, we shall reduce the errors caused by measurement and data processing, via improving and developing proper statistical and computing techniques. Additionally, we shall aim to increase the sensitivity of our study. It allows pharmaceutical companies to develop affordable medicine that would treat specific brain disorders for the marjority of patients.

Second, the brain is an extremely complicated organ stored in a blackbox. Despite the advance in brain science, little do we know about how information is processed in the box. For example, does the brain process information linearly, or more plausibly, non-linearly (but in which form)?; (b) there is a tremendous amount of variations amongst different brains in terms of sizes, volumes, shapes, etc; and (c) there is much variation in measuring brain signals. Whilist the first challenge is extensively tackled by physicists and computer scientists via the studying of dynamic systems, spiking neural networks, and other neural networks (e.g. recurrent neural networks; Boltzmann neural network; deep neural networks; adaptive neural networks; radial basis networks), statisticians working on neuroscience are actively seeking to solve the latter two. Once we have identified our goal in studying the brain in populations, it is a natural follow-up to study variations because the brains in populations display vairation in one of more aspects. We, nevertheless, are not interested in variation of the brain per se; rather we recognize that variation is an inevitably troublesome by-product delineating circumstances where repeated measurements of the brain deviate from the average. Therefore, while describing the absolute properties of the brain (via parameters, e.g. mean activation intensity of a region of the brain), we encompass them with variances to address their uncertainty (and confidence). The introduction of variances leads to two further areas of statistical studies in brain science: the study of frequency distributions, and the study of correlations. The frequency distribution may be expressed as a mathematical function of the variate (e.g. voxel-specific t-statistic), either (i.) the proportion of the population (regions, voxels, neurons, etc.) for which the variate is less than a given value, or (ii.) by differentiating this function, the infinitesimal proportion of the population for which the variate falls within any infinitesimal element of its range. In brain science, many frequency distributions are heavy-tailed - hence the study of them has some implication in studying certain financial models), and vice versa. On the other hand, we are not only interested in studying the variations of the parameters of interests at present, but also interested in estimating the quality and types of these variations. Especially, we are interested in examining the simultaneous variation among multiple variates. It, therefore, gives rise to the correlation analysis. For ultra-highdimensional brain data, however, a voxel-wise correlation analysis could be unmanageably troublesome. This leads to the following section.

The third usefulness of statistics in brain science is due to the practical need of reducing large bulks of data to a convenient amount that retains relevant information in the original data that our human minds (and our computer memories) are able to grasp, by means of a manageable anount of numerical values. How much data reduction, however, should we conduct? In all cases, it is possible to reduce data to a simple numerical form, or to an amount that our computers are able to efficiently handle, where, the reduced data are sufficient to shed light upon scientific questions the investigator has original in mind. In brain science, two widely-used approaches in conducting data reduction are (I) principal component analysis (PCA) and its variants and (II) assuming sparcity. The PCA method captures the marjority of the variation of the data. However, in brain science, oftentimes we have data with p ~ n or p » n, under which its estimates are inconsistent. There are a few papers on sparse PCA that have demonstrated subspace consistency. For example, Ma et al and Jung et al. The sparcity assumption is, in essense, to make p < n. It makes neurobiological sense in the following manner: the working brain consumes energy. At any given time, be it resting state or task state, only a portion of the neurons are activated so as to reserve energy. I had an amiable conversation with Professor Pien-Chien Huang, during which he mentioned that we human-beings do not dream in color (or at least have dreams less vivid and coloful). Schwitzgebel et al has an article discussing this. I conjecture (with absence of scientific evidence) that a part of the reason is the brain attempts to reserve energy while sleeping (so only the minimal amount of information is processed: e.g. the brain only recalls the outlines, orientations, movements, etc. of objects. But they are sufficient to distiguish one from another and form visual events) - statistically a natural way of conducting data reduction! Much of the exisiting and on-going projects with regards to brain decoding, such as facial recognition and dream decoding, (see decoding simple pictures: Haxby et al, decoding objects with edges and orientations: Haynes and Rees and Kamitani and Tong , decoding complex picutres: Gallant et al , decoding movies: Gallant et al , decoding intentions: Haynes et al , and decoding dreams: Kamitani et al ) are mainly conducted using one or more of the following approaches: PCA, LASSO (sparcity assumption), and Bayesian analysis.

## On Scientific Writing and Presentation

Last updated: May 1, 2016

If you would not be forgotten as soon as you are dead, either write something worth reading or do things worth writing.

- Benjamin Franklin

I have been long thinking about writing a blog that puts in order a few resources from which I have benefited immensely in my yet-immature-but-steadily-improving academic writing and presenting skills. On one hand, this project serves as a library to which I have a convenient access; on the other hand, I hope that some students like myself can benefit from reading these articles, essays, and blogs below. This is an on-going project; and the choice of resources purely subjective. Nonetheless, if you have any comment, or if you would like to recommend additional resources, please send me an email. Please check back every now and then as I update.

This aritical is recommended by Professor Andreas Buja as “an article everybody should read” on scientific reading. Here is a summary version of it.

The Elements of Style teaches concise and vigorous English writing.

Professor Jeff Leek gives good advice on effective presentation here.

As a student studying in the interdisciplinary field between statistics, computer science, and neurosicence, I have learned a great deal from reading the works of Sir R. A. Fisher’s, inter alia, the “Statistical Trilogy” : (1) Statisticcal Methods for Research Workers; (2) The Design of Experiments; and (3) Statistical Methods and Scientific Inference

In the Autobiography of Benjamin Franklin, Mr. Franklin taught us a few procedures to improve writing:

• Write short hints about each sentence (you could also outline the piece) and set it aside for awhile
• Rewrite the article in his own words
• Compare with the original, and
• Revise and improve your essay.

• Andrew Lee, MD, my former roommate, an ENT resident at the Johns Hopkins Hospital, and an excellent writer, suggests that the most effective thing to do to improve one’s writing skills is to read essays; and read them critically. “By critically”, Andrew elaborates, “I mean while reading, think why does the author choose that particular word, and what does that word do?” To highlight the importance of effective writing, I once asked Andrew to read an email of mine in response to a job application, which seemed to be on the verge of a declination at that time. He rewrote the email and got me the job. He did not change the context of the communication; but he carefully changed a few words, which improved the presentation drastically. Here is a list of resources he suggests to read:

• The Atlantic
• The Economist
• The New York Times Editorials
• Great Expectation, by Charles Dickens
• Crime and Punishment, by Fyodor Dostoevsky
• Catch 22, by Joseph Heller
• Wuthering Heights, by Emily Bronte
• Heart of Darkness, by Joseph Conrad
• Beloved, by Toni Morrison
• The Sound and the Fury, by William Faulkner