TY - STD TI - Bakshi K. Considerations for big data: architecture and approach. In: 2012 IEEE aerospace conference, 2012. Piscataway: IEEE; 2012. p. 1–7. ID - ref1 ER - TY - CHAP AU - Aziz, K. h. a. d. i. j. a. AU - Zaidouni, D. o. u. n. i. a. AU - Bellafkih, M. o. s. t. a. f. a. PY - 2018 DA - 2018// TI - Big Data Optimisation Among RDDs Persistence in Apache Spark BT - Communications in Computer and Information Science PB - Springer International Publishing CY - Cham ID - Aziz2018 ER - TY - STD TI - Aziz K, Zaidouni D, Bellafkih M. Big data processing using machine learning algorithms: Mllib and mahout use case. In: Proceedings of the 12th international conference on intelligent systems: theories and applications, 2018. 2018; New York: ACM; p. 25. ID - ref3 ER - TY - JOUR AU - Zaharia, M. AU - Xin, R. S. AU - Wendell, P. AU - Das, T. AU - Armbrust, M. AU - Dave, A. AU - Meng, X. AU - Rosen, J. AU - Venkataraman, S. AU - Franklin, M. J. PY - 2016 DA - 2016// TI - Apache spark: a unified engine for big data processing JO - Commun ACM VL - 59 UR - https://doi.org/10.1145/2934664 DO - 10.1145/2934664 ID - Zaharia2016 ER - TY - STD TI - Apache Spark. http://spark.apache.org. Accessed 20 Feb 2019. UR - http://spark.apache.org ID - ref5 ER - TY - STD TI - Databricks. https://databricks.com/spark/about. Accessed 15 Mar 2019. UR - https://databricks.com/spark/about ID - ref6 ER - TY - JOUR AU - Dean, J. AU - Ghemawat, S. PY - 2008 DA - 2008// TI - Mapreduce: simplified data processing on large clusters JO - Commun ACM VL - 51 UR - https://doi.org/10.1145/1327452.1327492 DO - 10.1145/1327452.1327492 ID - Dean2008 ER - TY - STD TI - Shanahan JG, Dai L. Large scale distributed data science using apache spark. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015. 2015; New York: ACM; p. 2323–2324. ID - ref8 ER - TY - JOUR AU - Salloum, S. AU - Dautov, R. AU - Chen, X. AU - Peng, P. X. AU - Huang, J. Z. PY - 2016 DA - 2016// TI - Big data analytics on Apache Spark JO - Int J Data Sci Anal VL - 1 UR - https://doi.org/10.1007/s41060-016-0027-9 DO - 10.1007/s41060-016-0027-9 ID - Salloum2016 ER - TY - JOUR AU - Shahrivari, S. PY - 2014 DA - 2014// TI - Beyond batch processing: towards real-time and streaming big data JO - Computers VL - 3 UR - https://doi.org/10.3390/computers3040117 DO - 10.3390/computers3040117 ID - Shahrivari2014 ER - TY - JOUR AU - Shi, J. AU - Qiu, Y. AU - Minhas, U. F. AU - Jiao, L. AU - Wang, C. AU - Reinwald, B. AU - Özcan, F. PY - 2015 DA - 2015// TI - Clash of the titans: mapreduce vs. spark for large scale data analytics JO - Proc VLDB Endow VL - 8 UR - https://doi.org/10.14778/2831360.2831365 DO - 10.14778/2831360.2831365 ID - Shi2015 ER - TY - JOUR AU - Liu, X. AU - Wang, X. AU - Matwin, S. AU - Japkowicz, N. PY - 2015 DA - 2015// TI - Meta-mapreduce for scalable data mining JO - J Big Data VL - 2 UR - https://doi.org/10.1186/s40537-015-0021-4 DO - 10.1186/s40537-015-0021-4 ID - Liu2015 ER - TY - JOUR AU - Singh, D. AU - Reddy, C. K. PY - 2015 DA - 2015// TI - A survey on platforms for big data analytics JO - J big data VL - 2 UR - https://doi.org/10.1186/s40537-014-0008-6 DO - 10.1186/s40537-014-0008-6 ID - Singh2015 ER - TY - JOUR AU - Herodotou, H. AU - Lim, H. AU - Luo, G. AU - Borisov, N. AU - Dong, L. AU - Cetin, F. B. AU - Babu, S. PY - 2011 DA - 2011// TI - Starfish: a self-tuning system for big data analytics JO - Cidr VL - 11 ID - Herodotou2011 ER - TY - STD TI - Wang G, Xu J, He B. A novel method for tuning configuration parameters of spark based on machine learning. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th International conference on smart city; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS), 2016. 2016; Piscataway: IEEE; p. 586–593 ID - ref15 ER - TY - STD TI - Renner T, Thamsen L, Kao O. Adaptive resource management for distributed data analytics based on container-level cluster monitoring. In: DATA. 2017. p. 38–47. ID - ref16 ER - TY - STD TI - Apache Hadoop. http://hadoop.apache.org. Accessed 15 Feb 2019. UR - http://hadoop.apache.org ID - ref17 ER - TY - JOUR AU - Doulkeridis, C. AU - NØrvåg, K. PY - 2014 DA - 2014// TI - A survey of large-scale analytical query processing in mapreduce JO - VLDB J VL - 23 UR - https://doi.org/10.1007/s00778-013-0319-9 DO - 10.1007/s00778-013-0319-9 ID - Doulkeridis2014 ER - TY - STD TI - Gu L, Li H. Memory or time: Performance evaluation for iterative operation on Hadoop and spark. In: IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (HPCC EUC), 2013, 2013. Piscataway: IEEE; p. 721–727. ID - ref19 ER - TY - JOUR AU - Zaharia, M. AU - Chowdhury, M. AU - Das, T. AU - Dave, A. AU - Ma, J. AU - Mccauley, M. AU - Franklin, M. AU - Shenker, S. AU - Stoica, I. PY - 2012 DA - 2012// TI - Fast and interactive analytics over Hadoop data with spark JO - Usenix Login VL - 37 ID - Zaharia2012 ER - TY - STD TI - Lin C-Y, Tsai C-H, Lee C-P, Lin C-J. Large-scale logistic regression and linear support vector machines using spark. In: 2014 IEEE international conference on Big Data (Big Data), 2014. 2014; Piscataway: IEEE; p. 519–528. ID - ref21 ER - TY - STD TI - Li P, Luo Y, Zhang N, Cao Y. Heterospark: A heterogeneous cpu/gpu spark platform for machine learning algorithms. In: IEEE international conference on networking, architecture and storage (NAS), 2015 , 2015. Piscataway: IEEE; p. 347–348. ID - ref22 ER - TY - JOUR AU - Maillo, J. AU - Ramírez, S. AU - Triguero, I. AU - Herrera, F. PY - 2017 DA - 2017// TI - knn-is: an iterative spark-based design of the k-nearest neighbors classifier for Big Data JO - Knowl-Based Syst VL - 117 UR - https://doi.org/10.1016/j.knosys.2016.06.012 DO - 10.1016/j.knosys.2016.06.012 ID - Maillo2017 ER - TY - STD TI - Siegal D, Guo J, Agrawal G. Smart-mllib: a high-performance machine-learning library. In: IEEE international conference on cluster computing (CLUSTER), 2016, 2016. Piscataway: IEEE; p. 336–345. ID - ref24 ER - TY - STD TI - Assefi M, Behravesh E, Liu G, Tafti AP. Big data machine learning using apache spark mllib. In: IEEE international conference on Big Data (Big Data), 2017 , 2017. Piscataway: IEEE; p. 3492–3498. ID - ref25 ER - TY - STD TI - Dhar S, Yi C, Ramakrishnan N, Shah M. Admm based scalable machine learning on spark. In: IEEE International conference on Big Data (Big Data), 2015, 2015. Piscataway: IEEE; p. 1174–1182. ID - ref26 ER - TY - STD TI - Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, et al. Apache hadoop yarn: Yet Another Resource Negotiator. In: Proceedings of the 4th annual symposium on cloud computing, 2013. 2013. New York: ACM; p. 5. ID - ref27 ER - TY - JOUR AU - Hindman, B. AU - Konwinski, A. AU - Zaharia, M. AU - Ghodsi, A. AU - Joseph, A. D. AU - Katz, R. H. AU - Shenker, S. AU - Stoica, I. PY - 2011 DA - 2011// TI - Mesos: a platform for fine-grained resource sharing in the data center JO - NSDI VL - 11 ID - Hindman2011 ER - TY - BOOK AU - Karau, H. AU - Warren, R. PY - 2017 DA - 2017// TI - High performance Spark: best practices for scaling and optimizing Apache Spark PB - O’Reilly Media, Inc. CY - Sebastopol ID - Karau2017 ER - TY - STD TI - Penchikala S. Big Data processing with Apache Spark. Lulu. com, 2018. ID - ref30 ER - TY - BOOK AU - Karau, H. AU - Konwinski, A. AU - Wendell, P. AU - Zaharia, M. PY - 2015 DA - 2015// TI - Learning Spark: Lightning-fast Big Data analysis PB - O’Reilly Media, Inc. CY - Sebastopol ID - Karau2015 ER - TY - JOUR AU - Zaharia, M. AU - Chowdhury, M. AU - Franklin, M. J. AU - Shenker, S. AU - Stoica, I. PY - 2010 DA - 2010// TI - Spark: cluster computing with working sets JO - HotCloud VL - 10 ID - Zaharia2010 ER - TY - STD TI - Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. Resilient Distributed Datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, 2012. 2012; Berkeley: USENIX Association; p. 2. ID - ref33 ER - TY - STD TI - Spark Architecture. https://spark.apache.org/docs/latest/cluster-overview.html. Accessed 13 Feb 2019. UR - https://spark.apache.org/docs/latest/cluster-overview.html ID - ref34 ER - TY - JOUR AU - Meng, X. AU - Bradley, J. AU - Yavuz, B. AU - Sparks, E. AU - Venkataraman, S. AU - Liu, D. AU - Freeman, J. AU - Tsai, D. AU - Amde, M. AU - Owen, S. PY - 2016 DA - 2016// TI - Mllib: machine learning in apache spark JO - J Mach Learn Res VL - 17 ID - Meng2016 ER - TY - JOUR AU - Armbrust, M. AU - Das, T. AU - Davidson, A. AU - Ghodsi, A. AU - Or, A. AU - Rosen, J. AU - Stoica, I. AU - Wendell, P. AU - Xin, R. AU - Zaharia, M. PY - 2015 DA - 2015// TI - Scaling spark in the real world: performance and usability JO - Proc VLDB Endow VL - 8 UR - https://doi.org/10.14778/2824032.2824080 DO - 10.14778/2824032.2824080 ID - Armbrust2015 ER - TY - STD TI - Venkataraman S, Yang Z, Franklin MJ, Recht B, Stoica I. Ernest: Efficient performance prediction for large-scale advanced analytics. In: NSDI, 2016. 2016; p. 363–378. ID - ref37 ER - TY - STD TI - ML Guide for Apache Spark. https://spark.apache.org/docs/latest/ml-guide.html. Accessed 16 Apr 2019. UR - https://spark.apache.org/docs/latest/ml-guide.html ID - ref38 ER - TY - BOOK AU - Schölkopf, B. AU - Burges, C. J. AU - Smola, A. J. PY - 1999 DA - 1999// TI - Advances in Kernel methods: support vector learning PB - MIT Press CY - Cambridge ID - Schölkopf1999 ER - TY - STD TI - Dua D, Graff C. UCI Machine Learning Repository 2017. http://archive.ics.uci.edu/ml. Accessed 11 Jan 2019. UR - http://archive.ics.uci.edu/ml ID - ref40 ER - TY - JOUR AU - Baldi, P. AU - Sadowski, P. AU - Whiteson, D. PY - 2014 DA - 2014// TI - Searching for exotic particles in high-energy physics with deep learning JO - Nat Commun VL - 5 UR - https://doi.org/10.1038/ncomms5308 DO - 10.1038/ncomms5308 ID - Baldi2014 ER - TY - STD TI - Cloud Dataproc Service. https://cloud.google.com/dataproc/. Accessed 1 Jan 2019. UR - https://cloud.google.com/dataproc/ ID - ref42 ER - TY - STD TI - Spark Tuning. https://www.cloudera.com/documentation/. Accessed 20 Apr 2019. UR - https://www.cloudera.com/documentation/ ID - ref43 ER -