 Research
 Open Access
 Published:
Resolving intravoxel white matter structures in the human brain using regularized regression and clustering
Journal of Big Data volume 6, Article number: 61 (2019)
Abstract
The human brain is a complex system of neural tissue that varies significantly between individuals. Although the technology that delineates these neural pathways does not currently exist, medical imaging modalities, such as diffusion magnetic resonance imaging (dMRI), can be leveraged for mathematical identification. The purpose of this work is to develop a novel method employing machine learning techniques to determine intravoxel nerve number and direction from dMRI data. The method was tested on multiple synthetic datasets and showed promising estimation accuracy and robustness for multinerve systems under a variety of conditions, including highly noisy data and imprecision in parameter assumptions.
Introduction
The human brain is primarily composed of neural tissue, which is responsible for receiving and relaying electrical impulses for a variety of purposes. Neuronal cells (neurons) have three primary components: a cell body which contains all of its organelles, the signal input structures (dendrites), and the signal output structures (axons). Typically, dendrites are much shorter projections than axons. Dendrites and cell bodies are located on the outer edges of the brain and are collectively called grey matter. Axons, referred to as white matter, tend to be interior to grey matter [1].
The brain exhibits functional specialization, causing axons to traverse the brain from one functional group to another, relaying information. As this information frequently requires more than one axon to relay the complete signal, many axons are grouped together in fiber tracts known as nerves [2]. Nerves within the body are easily distinguishable due to their length and separation from other neural tissue. However, the sheer amount of neuropil that cross and intertwine within the white matter makes it extremely difficult to uniquely isolate and identify individual nerves and their paths within the brain. Adding to this complexity, neural pathfinding is not so highly correlated from one brain to the next; moreover, it would be difficult to get a clear picture of any one person’s brain by analyzing another’s [3]. Although the exact technology that delineates these neural pathways does not exist at this time, current medical imaging modalities, such as dMRI, can be leveraged for this purpose.
Diffusion magnetic resonance imaging (dMRI) relies on the temporary application of a magnetic field applied in several gradient directions to excite water molecules, causing molecular reorientation and motion, and ultimately creating detectable signals [4]. The reorientation of water molecules is restricted by the tissue composition; therefore, a baseline signal is achieved based on how much water is localized within a subregion of the brain. In addition, the motion caused by the gradients in the magnetic field dampens these baseline signals. This ultimately allows dMRI to provide insight into the microscopic details of tissue architecture and allows the mapping of white matter tracts throughout the brain. Furthermore, an impactful aspect of using dMRI for tracing neural pathways is that it can be done noninvasively and invivo using mathematical modeling. Applications of tracking white matter include treatment and management of traumatic brain injuries, neurodegenerative diseases, and presurgical visualization of the brain [5].
Data from dMRI are collected for small artificiallydivided subregions in the brain called volumetric pixels, or voxels. Voxels are cubes of side length 1–3 mm and form a 3dimensional grid for picturing the brain. This is similar to visualizing images in 2dimensions with pixels. The overall problem of mapping neural pathways can be thought of, mathematically, as resolving white matter structures for all voxels within the brain. Using dMRI data for understanding intravoxel white matter structure is a mathematically challenging problem. Several strategies have been proposed in literature [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]. Additionally, the difficulty is increased given that there is no “goldstandard” for evaluating the process.
Several novel methods exist to detect neural fiber orientation using dMRI data. Additionally, there have been multiple attempts to obtain high angular resolution of white matter fiber. Though the objectives of these methods seem related, there does not yet exist a method to combine these efforts and resolve intravoxel white matter structures in regards to both orientation and concentration.
The proposed method attempts to resolve white matter with robust accuracy using elastic net and clustering techniques. Compared to existing methods, the proposed method is less computationally expensive, since elastic net regression is employed. Additionally, the inclusion of elastic net regularization allows variable selection and shrinkage within the method. These advantages offer more robust results over existing methods that strictly employ classical leastsquares regression.
In this work, a novel method employing regularized regression and clustering is proposed. The method aims to determine the number of nerves and their direction within a single voxel of the brain. It modifies an existing intravoxel diffusion model and provides ground for accurate estimation. A review of related research is provided in "Related work" section. A formalism is presented in "Methods" section and performance evaluation in "Experiments and results" section. Finally, a discussion and concluding remarks are provided in "Discussion" and "Conclusion and future work" sections, respectively.
Related work
Several papers propose methods to determine white matter geometry given dMRI data [6, 11, 12, 14, 18,19,20]. These methods utilize classical leastsquares regression, which does not allow variable selection or shrinkage to be performed. Additionally, performing a stepwise selection process to determine the selected variables would be computationally expensive.
Other papers employ sparse Bayesian learning to estimate white matter fiber or utilize collaborative superresolution [10, 15, 16]. However, these methods become computationally expensive when running large data sets. The proposed method offers a relatively robust process with a small computational cost.
Given large data sets, some methods analyze the affect of white matter on the diffusivity [8, 9]. These papers signal the importance of the robustness of the diffusivity parameter. The mathematical models presented in [13, 21] provide a foundation for ensuring this robustness.
By adopting the elastic net framework developed in [22], the proposed method allows variable selection for diffusivity and nerve amount. This selection enables the method to operate with computational efficiency while offering promising results in resolving intravoxel white matter.
Other recent works from our research group include [23,24,25,26,27,28,29,30,31,32,33].
Methods
The intravoxel diffusion signals have been previously modeled by the ballandstick model [9]. Mathematically, this model may be written as
where \(S_0\) represents a baseline signal with no diffusion gradient, \(\mathbf r _i\) is the direction of the ith diffusion gradient, \(b_i\) is an experimentally set bvalue for the ith signal, d is the apparent diffusivity, \(\mathbf f = (f_0,f_1,\dots ,f_K)\) is a vector of volumefractions, \((\theta _j,\phi _j)\) represent the elevation and azimuthal angles of the principal diffusion direction of the jth nerve respectively, \(g(\cdot ,\cdot )\) is a matrix that rotates around the elevation and azimuth angles, \(\mu _i\) is the expected ith dampened diffusion signal, and n represents the number of diffusion signals obtained.
Note that the above system can be linearized by dividing by \(S_0\).
where \(\varvec{\mu }\) is a vector containing all expected diffusion signals and M is referred to as the dictionary matrix. The number of columns of M correspond to the number of compartments within a voxel (\(K+1\)), and the rows correspond to the number of signals (n). Each entry of the matrix represents the dampening effect of a particular compartment with respect to a given gradient direction. Mathematically, M can be written as:
In some previous applications of the ballandstick model, \(S_0\) is thought of as an unknown parameter. Since this parameter is directly measurable, it is possible to realize values in Eq. 2. It is further considered that a noisy version of these scaled observed signals, \(\mathbf y\), are actually observed with \(\epsilon _j \sim N(0,\sigma ^2) (iid)\).
At this point, Eq. 4 can be reinterpreted as a linear regression problem, as long as M is observed. Although observing M is not possible directly, it is computable if d and K can be determined.
Addressing d
The apparent diffusivity, d, is an unknown parameter but can be discerned within a reasonable range of values that make sense for the human brain. In this work, experiments are performed at multiple values to assess the performance of the method in relation to the imprecision in the estimated value of d. This is referred to as a sensitivity analysis of d.
Choosing K
The method of representing the direction of nerves in a voxel can be represented by points on the surface of a sphere of unit radius, without the loss of generality. This means that the parameters \(\theta\) and \(\phi\) for any nerve are contained within the intervals \((0^{\circ }, 180^{\circ })\) and \((0^{\circ }, 360^{\circ })\), respectively. Furthermore in this scenario, diametrically opposite points are equivalent, so it suffices to work on a hemisphere. This translates to \(\phi\) being contained in an interval \((0^{\circ }, 180^{\circ })\). Since, the number of nerve fibers, K, within a voxel is unknown, it is chosen to (severely) overestimate the number of nerves and obtain their directions by subdividing these intervals into roughly one hundred equal parts. Thereby, a grid with each point representing a unique nerve within the voxel is obtained. This is shown in Fig. 1.
Performing the regression
Now that it is possible to compute M, regression analysis can be performed. Given that the model has been overparameterized, it is important to remove all nerves that do not have a contribution. This is achieved by using a form of regularized linear regression called elastic net [22]. In classical regression, the estimate for \(\mathbf f\) is obtained by minimizing the following function:
In contrast, elastic net regularization adds a penalty term with the function in Eq. 6 being minimized.
The reason classical leastsquares regression is not employed is that there is no variable selection or shrinkage performed in the operation, and performing stepwise selection process would be computationally expensive. The adoption of elastic net regularization within this method allows variable selection for d and K, which is necessary to run the proposed efficient algorithm. If existing methods were employed, the classical leastsquares regression would inhibit the reasonable selection of d and K.
Processing the output
In Eq. 6 when \(\alpha\) is set to 1, a strenuous variable selection process is implemented. This can potentially cause the model to be underidentified, meaning that some, or all, nerve contributions cannot be accounted for in the signals. On the other hand, setting \(\alpha\) to 0 causes too many nerves to be represented. Given this sensitivity, \(\alpha\) is chosen closer to 0 to prevent potential underdetection of nerves. This leads to each true nerve being overestimated by a group of closelyrelated nerves. Therefore, to further reduce the number of nerves to a plausible number, clustering is performed. Partitioning around mediods (pam) is performed based on a dissimilarity matrix using an adjusted angular distance, the need for which arises due to the axial nature of the data.
where \(\omega (v,w) = \cos ^{1}\left( \frac{v \cdot w}{v_2 w_2}\right)\) and \(v,w \in \mathbb {R}^k\).
Algorithm
The overview of the steps used for nerve estimation are shown in Alg. 1 and a pictorial overview is shown in Fig. 2. If existing methods for resolving intravoxel white matter were employed, values of d and K could not be determined. Steps 1 and 2 of the algorithm require the flexibility of choosing values for these parameters. Thus, the elastic net is vital in ensuring this algorithm can run properly.
Experimental results
Different experiments were performed to verify the accuracy of the proposed method. We consider only the undirected and attributed graph for all the experiments. All the experiments are executed on 64 GB main memory in Intel Core i5 @ 3.70GHz on a Windows 10 operating system. Python 2.7 is used to implement the algorithms with networkx package for graph related operations.
Experiments and results
The method’s performance was tested using multiple synthetic datasets. The datasets included 64 diffusion gradients with \(b_i = 3000\) for 1–3 nerve systems. Although the model assumes additive Gaussian noise is added to the observations in Eq. 4, the data is simulated using Rician noise to ensure positivity of signals. The procedure for obtaining noisy signal is shown in Eq. 8.
To account for the total amount of noise in the system, the signaltonoise ratio (SNR) is defined by \(\frac{1}{\sigma }\). Nine datasets were simulated using Eq. 1 with \(S_0 = 1\) in conjunction wth Eq. 8, one for each of 1, 2, and 3nerve system at SNR = 30, 20, 10 (low, medium, high noise). The true diffusivity was assumed to be \(d=0.001\,{\rm mm^2}/{\rm s}\). To test the sensitivity of the diffusivity parameter, three dictionaries were created with different diffusivity values: \(d=0.005\), \(d=0.001\), and \(d=0.002\). These correspond to 50% of true value, 100% of true value, and 200% of true value. It should also be noted that the lowest and highest chosen values form a reasonable range for apparent diffusivity in the human brain.
Tables 1, 2, and 3 present the nerve direction estimation for 1, 2, and 3nerve systems, respectively. To summarize the precision of the estimated directions in relation to the true values, the adjusted angular deviation was calculated by using Eq. 7. Furthermore, a mean angular deviation is also presented for multinerve systems. It should also be noted that the \(\alpha\)value for elastic net (Eq. 6) was set to 0.2 except in the 3nerve system at SNR = 10 where it was set to 0, and 1, 2nerve systems with \(d=0.0005\) where it was set to 1. Given the large number of experiments, Figs. 3, 4, 5, 6 and 7 present a graphical view of the estimation corresponding to the bolded lines in Tables 1, 2, and 3.
Discussion
The proposed method utilizes regularized regression and clustering techniques for estimation of the principal direction of nerves within a voxel. The method’s robustness has been heavily relied on because the apparent diffusivity, d, for the voxel is crudely picked from a plausible range. A test for sensitivity shows that the method’s performance is mostly unaffected when this parameter is obtained within a reasonable range of the true value. In the event that d is severely underestimated, the number of nerves can be overestimated, since some of the artificial nerves do not get eliminated by the regression step. This is explained by the inability of the system to produce higher dampening effects with little diffusivity and a small number of nerves. On the other hand, a severe overestimation in d can cause an underestimation of nerves because their corresponding dampening effects are exceedingly pronounced. In the case of 1, 2nerve systems at \(\hbox {SNR} = 30\) with \(d=0.0005\), \(\alpha\) (Eq. 6) had to be set equal to 1 to reduce overestimation of nerves. Alternatively in the future, it is possible to devise algorithms that learn d simultaneous to the regression. It is also possible to borrow information from other relevant algorithms or perform additional MRIrelated experiments to obtain an estimated value of d, which would then reduce the onus of estimation off the assumptions of this model and make it easier to pick more lenient values for regularization.
It should also be noted that in the case of 3nerve systems at \(\hbox {SNR}= 10\), the \(\alpha\)value had to be dropped to 0 exactly. This is explained by the complexity of the confounded dampening effect from the three nerves and highnoise. This issue is slightly harder to overcome, but it is argued that a finer discretization of the parameter space may potentially provide a solution and even improve estimation accuracy.
Figures 3, 4, 5, 6 and 7 show that the number of clusters formed were obvious. This may not be the case in the future. Therefore, it would be advisable to run the clustering algorithm multiple times with a variable number of clusters, and, additionally, use an external criteria (such as the elbow method using sum of squared errors) to evaluate the best number of nerves.
The adoption of elastic net regularization enables variable selection for the proposed method. Given a sensible interval for the diffusivity parameter and a severe overestimate for the number of nerves, the selection for d and K are reasonable and can be used in the algorithm. This is computationally more efficient than running a Bayesian and collaborative approach to estimate these variables.
Conclusion and future work
The proposed method has shown promising preliminary results in a host of unfavorable conditions, including noisy data and imprecision in parameter assumptions. In the future, the method’s efficacy will be tested on real patient data along with a presentation of comparative analyses with other relevant methods in the field.
Availability of data and materials
This research analyzed experimental data that was created by the authors to reflect realistic parameters and variation between individual brains. The analysis was conducted in R.
Abbreviations
 MRI:

magnetic resonance imaging
 dMRI:

diffusion magnetic resonance imaging
 SNR:

signaltonoise ratio
References
 1.
Bloomington IU. Grey matter and white matter. Bloomington: Indiana University; 2003.
 2.
Swan J. Spinal notes. The University of New Mexico—Class Notes; 2005.
 3.
Graham R, McCabe H, Sheridan S. Neural networks for realtime pathfinding in computer games. ITB J. 2004;5(1):21.
 4.
Bansal R, Hao X, Liu F, Xu D, Liu J, Peterson BS. The effects of changing water content, relaxation times, and tissue contrast on tissue segmentation and measures of cortical anatomy in mr images. Magn Reson Imaging. 2013;31(10):1709–30.
 5.
JohansenBerg H, Behrens TE. Diffusion mri: from quantitative measurement to invivo neuroanatomy, vol. 2. Cambridge: Academic Press; 2009.
 6.
Lenglet C, Deriche R, Faugeras O. Inferring white matter geometry from diffusion tensor mri: application to connectivity mapping. In: European conference on computer vision. Springer, 2004, pp. 127–140.
 7.
Jones DK, Knosche TR, Turner R. White matter integrity, fiber count, and other fallacies: the do’s and don’ts of diffusion mri. Neuroimage. 2013;73:239–54.
 8.
Vos SB, Jones DK, Jeurissen B, Viergever MA, Leemans A. The influence of complex white matter architecture on the mean diffusivity in diffusion tensor mri of the human brain. Neuroimage. 2012;59(3):2208–16.
 9.
Behrens TE, Berg HJ, Jbabdi S, Rushworth MF, Woolrich MW. Probabilistic diffusion tractography with multiple fibre ori entations: what can we gain? Neuroimage. 2007;34(1):144–55.
 10.
Pisharady PK, Sotiropoulos SN, DuarteCarvajalino JM, Sapiro G, Lenglet C. Estimation of white matter fiber parameters from compressed multiresolution diffusion MRI using sparse bayesian learning. Neuroimage. 2018;167:488–503.
 11.
Tuch DS, Reese TG, Wiegell MR, Makris N, Belliveau JW, Wedeen VJ. High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity. Magn Reson Med. 2002;48(4):577–82.
 12.
Anderson AW. Measurement of fiber orientation distributions using high angular resolution diffusion imaging. Magn Reson Med. 2005;54(5):1194–206.
 13.
Kaden E, Knosche TR, Anwander A. Parametric spherical deconvolution: inferring anatomical connectivity using diffusion mr imaging. Neuroimage. 2007;37(2):474–88.
 14.
Sotiropoulos SN, Bai L, Morgan PS, Auer DP, Constantinescu CS, Tench CR. A regularized twotensor model fit to low angular resolution diffusion images using basis directions. J Magn Reson Imaging. 2008;28(1):199–209.
 15.
Sotiropoulos SN, Jbabdi S, Andersson JL, Woolrich MW, Ugurbil K, Behrens TE. Rubix: combining spatial resolutions for bayesian inference of crossing fibers in diffusion mri. IEEE Trans Med Imaging. 2013;32(6):969–82.
 16.
Coupe P, Manjon JV, Chamberland M, Descoteaux M, Hiba B. Collaborative patchbased superresolution for diffusionweighted images. Neuroimage. 2013;83:245–61.
 17.
Scherrer B, Schwartzman A, Taquet M, Sahin M, Prabhu SP, Warfield SK. Characterizing brain tissue by assessment of the distribution of anisotropic microstructural environments in diffusioncompartment imaging (diamond). Magn Reson Med. 2016;76(3):963–77.
 18.
Tournier JD, Calamante F, Gadian DG, Connelly A. Direct estimation of the fiber orientation density function from diffusionweighted mri data using spherical deconvolution. Neuroimage. 2004;23(3):1176–85.
 19.
Ozarslan E, Shepherd TM, Vemuri BC, Blackband SJ, Mareci TH. Resolution of complex tissue microarchitecture using the diffusion orientation transform (dot). Neuroimage. 2006;31(3):1086–103.
 20.
Dell’Acqua F, Rizzo G, Scifo P, Clarke RA, Scotti G, Fazio F. A modelbased deconvolution approach to solve fiber crossing in diffusionweighted mr imaging. IEEE Trans Biomed Eng. 2007;54(3):462–72.
 21.
Aganj I, Lenglet C, Sapiro G, Yacoub E, Ugurbil K, Harel N. Reconstruction of the orientation distribution function in singleand multipleshell qball imaging within constant solid angle. Magn Reson Med. 2010;64(2):554–66.
 22.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol. 2005;67(2):301–20.
 23.
Chiu C, Zhan J. Deep learning for link prediction in dynamic networks using weak estimators. IEEE Access. 2018;6(1):35937–45.
 24.
Bhaduri M, Zhan J. Using empirical recurrences rates ratio for time series data similarity. IEEE Access. 2018;6(1):30855–64.
 25.
Wu J, Zhan J, Chobe S. Mining association rules for low frequency itemsets. PLoS ONE. 2018;13(7):e0198066.
 26.
Ezatpoor P, Zhan J, Wu J, Chiu C. Finding topk dominance on incomplete big data using mapreduce framework. IEEE Access. 2018;6(1):7872–87.
 27.
Chopade P, Zhan J. Towards a framework for community detection in large networks using gametheoretic modeling. IEEE Trans Big Data. 2017;3(3):276–88.
 28.
Bhaduri M, Zhan J, Chiu C. A weak estimator for dynamic systems. IEEE Access. 2017;5(1):27354–65.
 29.
Bhaduri M, Zhan J, Chiu C, Zhan F. A novel online and nonparametric approach for drift detection in big data. IEEE Access. 2017;5(1):15883–92.
 30.
Chiu C, Zhan J, Zhan F. Uncovering suspicious activity from partially paired and incomplete multimodal data. IEEE Access. 2017;5(1):13689–98.
 31.
Ahn R, Zhan J. Using proxies for node immunization identification on large graphs. IEEE Access. 2017;5(1):13046–53.
 32.
Zhan F, Waters B, Mijangos M, Chung L, Bhagat R, Bhagat T, Pirouz M, Chiu C, Tayeb S, Ploutz E, Zhan J, Gewali L. An efficient alternative to personalized page rank for friend recommendations. In: The 15th IEEE annual consumer communications and networking conference, January 12–15. USA: Las Vegas; 2018.
 33.
Zhan F, Laines G, Deniz S, Paliskara S, Ochoa I, Guerra I, Tayeb S, Chiu C, Pirouz M, Ploutz E, Zhan J, Gewali L, Oh P. Prediction of online social networks users’ behaviors with a game theoretic approach. In: The 15th IEEE annual consumer communications and networking conference, January 12–15. USA: Las Vegas; 2018.
Acknowledgements
We sincerely and gratefully acknowledge the following organizations for their help and support: Department of Defense for funding the AEOP UNITE participants as well as computing facilities via Grants #W911NF1610416 and #W911NF1710088. National Science Foundation for funding the RET participants via #1710716 as well as computing facilities via Grant #1625677.
Funding
This research was funded by the Army Education Outreach Program (AEOP) UNITE and National Science Foundation Research Experience for Teachers (RET) program.
Author information
Affiliations
Contributions
All authors have contributed to the research and manuscript with the order they appear. All authors discussed the final results as well as improved the final manuscript. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Justin Zhan.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Hart, A., Smith, B., Smith, S. et al. Resolving intravoxel white matter structures in the human brain using regularized regression and clustering. J Big Data 6, 61 (2019) doi:10.1186/s4053701902232
Received:
Accepted:
Published:
Keywords
 Ballandstick model
 Diffusion MRI
 Tractography
 Nerve
 Neural tracts
 Brain imaging