Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–11.
Article
Google Scholar
Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. PNAS. 2000;97(18):10101–6.
Article
Google Scholar
Antoniadis A, Lambert-Lacroix S, Leblanc F. Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics. 2003;19(5):563–70.
Article
Google Scholar
Boulesteix AL. PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol. 2004;3(1):1–30.
Article
MathSciNet
MATH
Google Scholar
Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high dimensional genomic data. Brief Bioinf. 2008;8:24–32.
Google Scholar
Bughin J. Reaping the benefits of big data in telecom. J Big Data. 2016;3:14.
Article
Google Scholar
Casaca JA, da Gama AP. Marketing in the Era of Big data, human and social sciences at the common conference. 2013.
Cai T, Liu WD. A direct estimation approach to Sparse linear discriminant analysis. J Am Stat Assoc. 2011;106:1566–77.
Article
MathSciNet
MATH
Google Scholar
Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat. 2005;35(6):23132351.
MathSciNet
Google Scholar
Chen S, Donoho D, Saunders M. Atomic decomposition by basis pursuit. SIAM J Sci Comput. 1998;20(1):3361.
Article
MathSciNet
MATH
Google Scholar
Chiaromonte F, Martinelli J. Dimension reduction strategies for analyzing global gene expression data with a response. Math Biosci. 2002;176:123144.
Article
MathSciNet
MATH
Google Scholar
Christopher G, Jiashun J, Wasserman L, Yao Z. A comparison of the lasso and marginal regression. J Mach Learn Res. 2011;13:21072143.
MathSciNet
Google Scholar
Crawford M, Khoshgoftaar M, Prusa D, Richter N, Al Najada H. Survey of review spam detection using machine learning techniques. J Big Data. 2015;2:23.
Article
Google Scholar
Depeige A, Doyencourt D. Actionable knowledge as a service (AKAAS): leveraging big data analytics in cloud computing environments. J Big Data. 2015;2:12.
Article
Google Scholar
Demchenko Y, Grosso P, de Laat C, & Membrey P. Addressing Big Data issues in scientific data infrastructure. Proceedings of the international conference on collaboration technologies and systems, May 20–24. San Diego: IEEE Xplore Press; 2013. p 48-5. DOI: 10.1109/CTS.2013.6567203.
Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics. 2004;20(18):3583–93.
Article
Google Scholar
Donoho DL, Elad M. Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization. Proc Natl Acad Sci. 2013;100(5):2197–202.
Article
MathSciNet
MATH
Google Scholar
Kondziolka Benjamin T C, Lunsford LD, Silverman J. Development, implementation, and use of a local and global clinical registry for neurosurgery. Big Data. 2015;3(2):80–9.
Article
Google Scholar
DongGuo H, Zhang L, WeiZhu L. Earth observation big data for climate change research. Adv Clim Change Res. 2015;6(2):108–17.
Article
Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2003;32:407451.
MathSciNet
MATH
Google Scholar
Einav L, Levin J. Economics in the age of big data. Science. 2014;346(6210):1243089.
Article
Google Scholar
Fan J, Fan Y. High dimensional classification using features annealed independence rules. Ann Stat. 2008;36:260537.
Article
MathSciNet
MATH
Google Scholar
Fan J, Guo S, Hao N. Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Ser B. 2012;74(1):3765.
Article
MathSciNet
Google Scholar
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):13481360.
Article
MathSciNet
MATH
Google Scholar
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with disussion). J R Stat Soc Ser B. 2007;70(5):849911.
Google Scholar
Fan J, Liao Y. Endogeneity in ultrahigh dimension, technical report. New Jersey: Princeton University; 2014.
Google Scholar
Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res. 2009;10:20132038.
MathSciNet
MATH
Google Scholar
Fisher R. Statistical methods for research workers. ISBN 0-05-002170-2; 1926.
Friedman J, & Popescu B. Gradient directed regularization for linear regression and classification. Technical report. 2004.
Gesing S, Connor T, & Taylor I. Genomics and biological Big Data: facing current and future challenges around data and software sharing and reproducibility. Position paper at BDAC-15 (Big Data Analytics: Challenges and Opportunities), workshop in cooperation with ACM/IEEE SC15, Austin; 2015.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7.
Article
Google Scholar
Hall P, Miller H. Using generalized correlation to effect variable selection in very high dimensional problems. J Comp Graph Stat. 2009;18(3):533550.
Article
MathSciNet
Google Scholar
Hall P, Miller H. Modeling the variability of rankings. Ann Stat. 2010;38(20):2652–77.
Article
MathSciNet
MATH
Google Scholar
Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoro NV. Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci USA. 2000;97:8409–14.
Article
Google Scholar
Husain S, Kalinin A, Truong A, Dinov D. SOCR data dashboard: an integrated big data archive mashing medicare, labor census and econometric information. J Big Data. 2015;2:13.
Article
Google Scholar
Kastrin A, Peterlin B. Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data. Exp Syst Appl. 2010;37(7):5178–85.
Article
Google Scholar
Kramer A, Guillory J, Hancock J. Experimental evidence of massive scale emotional contagion through social networks. Proc Natl Acad Sci USA. 2014;111(24):8788–90.
Article
Google Scholar
Laney D. 3D Data management: controlling data volume, velocity and variety. 2001.
Liao Y, Jiang W. Posterior consistency of nonparametric conditional moment restricted models. Ann Stat. 2011;39(6):30033031.
Article
MathSciNet
MATH
Google Scholar
Loureno JR, Cabral B, Carreiro P, Vieira M, Bernardino J. Choosing the right NoSQL database for the job : a quality attribute. J Big Data. 2015;2(1):1–26.
Google Scholar
Mardia KV, Kent JT, Bibby JM. Multivariate analysis. San Diego: Academic Press Inc; 1979.
MATH
Google Scholar
McLachlan GJ. Discriminant analysis and statistical pattern recognition. New York: Wiley; 1992.
Book
MATH
Google Scholar
Meulman JJ, Heiser JW. IBM SPSS Categories 20. 2011. pp. 233–248
Narock TW, & Hitzler P. Crowdsourcing semantics for Big Data in geosciences applications. In: AAAI 2013 Fall symposium series, semantics for Big Data, November 15–17. Arlington; 2013.
Nguyen DV, Rocke DM. On partial least squares dimension reduction for microarray-based classification: a simulation study. Comput Stat Data Anal. 2004;46(3):407–25.
Article
MathSciNet
MATH
Google Scholar
Pääkkönen P. Feasibility analysis of AsterixDB and spark streaming with Cassandra for stream-based processing. J Big Data. 2016;3:6. doi:10.1186/s40537-016-0041-8.
Article
Google Scholar
Pearson ES. Review of statistical methods for research workers (R. A. Fisher). Sci Prog. 1926;20:733–4.
Google Scholar
Pittelkow PH, Ghosh M. Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J R Stat Soc B. 2008;70:15973.
MathSciNet
MATH
Google Scholar
Pursell L, Trimble SY. Gram-Schmidt orthogonalization by Gauss elimination. Am Math Month. 1991;98(6):544549. doi:10.2307/2324877.
Article
MathSciNet
MATH
Google Scholar
Ripley BD. Pattern recognition and neural networks. Cambridge: Cambridge University Press; 1996.
Book
MATH
Google Scholar
Santos F. Le rapport de corrlation : mesurer la liaison entre une variable qualitative et une variable quantitative. CNRS, UMR 5199 PACEA. 2015.
Shaldehi AH. Using Eta (η) correlation ratio in analyzing strongly nonlinear relationship between two variables in practical researches. J Math Comput Sci. 2013;7(3):213–20.
Google Scholar
Toga W, Dinov D. Sharing big biomedical data. J Big Data. 2015;2:7.
Article
Google Scholar
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996;58(1):267288.
MathSciNet
MATH
Google Scholar
Zhang C. Nearly unbiased variable selection under minimax concave penalty. Ann Stat. 2010;38(2):894942.
Article
MathSciNet
Google Scholar
Zuech R, Koshgoftaar M, Wald R. Intrusion detection and big heterogeneous data: a survey. J Big Data. 2015;2:3.
Article
Google Scholar