Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–11.

Article
Google Scholar

Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. PNAS. 2000;97(18):10101–6.

Article
Google Scholar

Antoniadis A, Lambert-Lacroix S, Leblanc F. Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics. 2003;19(5):563–70.

Article
Google Scholar

Boulesteix AL. PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol. 2004;3(1):1–30.

Article
MathSciNet
MATH
Google Scholar

Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high dimensional genomic data. Brief Bioinf. 2008;8:24–32.

Google Scholar

Bughin J. Reaping the benefits of big data in telecom. J Big Data. 2016;3:14.

Article
Google Scholar

Casaca JA, da Gama AP. Marketing in the Era of Big data, human and social sciences at the common conference. 2013.

Cai T, Liu WD. A direct estimation approach to Sparse linear discriminant analysis. J Am Stat Assoc. 2011;106:1566–77.

Article
MathSciNet
MATH
Google Scholar

Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat. 2005;35(6):23132351.

MathSciNet
Google Scholar

Chen S, Donoho D, Saunders M. Atomic decomposition by basis pursuit. SIAM J Sci Comput. 1998;20(1):3361.

Article
MathSciNet
MATH
Google Scholar

Chiaromonte F, Martinelli J. Dimension reduction strategies for analyzing global gene expression data with a response. Math Biosci. 2002;176:123144.

Article
MathSciNet
MATH
Google Scholar

Christopher G, Jiashun J, Wasserman L, Yao Z. A comparison of the lasso and marginal regression. J Mach Learn Res. 2011;13:21072143.

MathSciNet
Google Scholar

Crawford M, Khoshgoftaar M, Prusa D, Richter N, Al Najada H. Survey of review spam detection using machine learning techniques. J Big Data. 2015;2:23.

Article
Google Scholar

Depeige A, Doyencourt D. Actionable knowledge as a service (AKAAS): leveraging big data analytics in cloud computing environments. J Big Data. 2015;2:12.

Article
Google Scholar

Demchenko Y, Grosso P, de Laat C, & Membrey P. Addressing Big Data issues in scientific data infrastructure. Proceedings of the international conference on collaboration technologies and systems, May 20–24. San Diego: IEEE Xplore Press; 2013. p 48-5. DOI: 10.1109/CTS.2013.6567203.

Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics. 2004;20(18):3583–93.

Article
Google Scholar

Donoho DL, Elad M. Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization. Proc Natl Acad Sci. 2013;100(5):2197–202.

Article
MathSciNet
MATH
Google Scholar

Kondziolka Benjamin T C, Lunsford LD, Silverman J. Development, implementation, and use of a local and global clinical registry for neurosurgery. Big Data. 2015;3(2):80–9.

Article
Google Scholar

DongGuo H, Zhang L, WeiZhu L. Earth observation big data for climate change research. Adv Clim Change Res. 2015;6(2):108–17.

Article
Google Scholar

Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2003;32:407451.

MathSciNet
MATH
Google Scholar

Einav L, Levin J. Economics in the age of big data. Science. 2014;346(6210):1243089.

Article
Google Scholar

Fan J, Fan Y. High dimensional classification using features annealed independence rules. Ann Stat. 2008;36:260537.

Article
MathSciNet
MATH
Google Scholar

Fan J, Guo S, Hao N. Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Ser B. 2012;74(1):3765.

Article
MathSciNet
Google Scholar

Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):13481360.

Article
MathSciNet
MATH
Google Scholar

Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with disussion). J R Stat Soc Ser B. 2007;70(5):849911.

Google Scholar

Fan J, Liao Y. Endogeneity in ultrahigh dimension, technical report. New Jersey: Princeton University; 2014.

Google Scholar

Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res. 2009;10:20132038.

MathSciNet
MATH
Google Scholar

Fisher R. Statistical methods for research workers. ISBN 0-05-002170-2; 1926.

Friedman J, & Popescu B. Gradient directed regularization for linear regression and classification. Technical report. 2004.

Gesing S, Connor T, & Taylor I. Genomics and biological Big Data: facing current and future challenges around data and software sharing and reproducibility. Position paper at BDAC-15 (Big Data Analytics: Challenges and Opportunities), workshop in cooperation with ACM/IEEE SC15, Austin; 2015.

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7.

Article
Google Scholar

Hall P, Miller H. Using generalized correlation to effect variable selection in very high dimensional problems. J Comp Graph Stat. 2009;18(3):533550.

Article
MathSciNet
Google Scholar

Hall P, Miller H. Modeling the variability of rankings. Ann Stat. 2010;38(20):2652–77.

Article
MathSciNet
MATH
Google Scholar

Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoro NV. Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci USA. 2000;97:8409–14.

Article
Google Scholar

Husain S, Kalinin A, Truong A, Dinov D. SOCR data dashboard: an integrated big data archive mashing medicare, labor census and econometric information. J Big Data. 2015;2:13.

Article
Google Scholar

Kastrin A, Peterlin B. Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data. Exp Syst Appl. 2010;37(7):5178–85.

Article
Google Scholar

Kramer A, Guillory J, Hancock J. Experimental evidence of massive scale emotional contagion through social networks. Proc Natl Acad Sci USA. 2014;111(24):8788–90.

Article
Google Scholar

Laney D. 3D Data management: controlling data volume, velocity and variety. 2001.

Liao Y, Jiang W. Posterior consistency of nonparametric conditional moment restricted models. Ann Stat. 2011;39(6):30033031.

Article
MathSciNet
MATH
Google Scholar

Loureno JR, Cabral B, Carreiro P, Vieira M, Bernardino J. Choosing the right NoSQL database for the job : a quality attribute. J Big Data. 2015;2(1):1–26.

Google Scholar

Mardia KV, Kent JT, Bibby JM. Multivariate analysis. San Diego: Academic Press Inc; 1979.

MATH
Google Scholar

McLachlan GJ. Discriminant analysis and statistical pattern recognition. New York: Wiley; 1992.

Book
MATH
Google Scholar

Meulman JJ, Heiser JW. IBM SPSS Categories 20. 2011. pp. 233–248

Narock TW, & Hitzler P. Crowdsourcing semantics for Big Data in geosciences applications. In: AAAI 2013 Fall symposium series, semantics for Big Data, November 15–17. Arlington; 2013.

Nguyen DV, Rocke DM. On partial least squares dimension reduction for microarray-based classification: a simulation study. Comput Stat Data Anal. 2004;46(3):407–25.

Article
MathSciNet
MATH
Google Scholar

Pääkkönen P. Feasibility analysis of AsterixDB and spark streaming with Cassandra for stream-based processing. J Big Data. 2016;3:6. doi:10.1186/s40537-016-0041-8.

Article
Google Scholar

Pearson ES. Review of statistical methods for research workers (R. A. Fisher). Sci Prog. 1926;20:733–4.

Google Scholar

Pittelkow PH, Ghosh M. Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J R Stat Soc B. 2008;70:15973.

MathSciNet
MATH
Google Scholar

Pursell L, Trimble SY. Gram-Schmidt orthogonalization by Gauss elimination. Am Math Month. 1991;98(6):544549. doi:10.2307/2324877.

Article
MathSciNet
MATH
Google Scholar

Ripley BD. Pattern recognition and neural networks. Cambridge: Cambridge University Press; 1996.

Book
MATH
Google Scholar

Santos F. Le rapport de corrlation : mesurer la liaison entre une variable qualitative et une variable quantitative. CNRS, UMR 5199 PACEA. 2015.

Shaldehi AH. Using Eta (η) correlation ratio in analyzing strongly nonlinear relationship between two variables in practical researches. J Math Comput Sci. 2013;7(3):213–20.

Google Scholar

Toga W, Dinov D. Sharing big biomedical data. J Big Data. 2015;2:7.

Article
Google Scholar

Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996;58(1):267288.

MathSciNet
MATH
Google Scholar

Zhang C. Nearly unbiased variable selection under minimax concave penalty. Ann Stat. 2010;38(2):894942.

Article
MathSciNet
Google Scholar

Zuech R, Koshgoftaar M, Wald R. Intrusion detection and big heterogeneous data: a survey. J Big Data. 2015;2:3.

Article
Google Scholar