Leveraging resource management for efficient performance of Apache Spark

Journal of Big Data

Table 4 Storage levels of RDDs persistence

Storage level	Characteristics
memory_only	Storing data in memory if it is possible. when the RDD size is higher than memory size, It will not cache the partitions which have not enough space, In consequence, it will not recompute these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time
memory_and_disk	For this level, it is possible to store the partitions in disk if there is no enough space in memory. consequently, it will retrieve these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time
disk_only	Storing all the partitions only on the disk, it provides more space efficient. For this level, the storage space becomes small and the computation time becomes high
memory_only_ser	This level stores the RDD as serialized Java object and only in memory. It provides more space efficient compared to deserialized levels. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high
memory_and_disk_ser	This level stores the RDD as serialized Java object in memory and on disk. It provides more space efficient compared to deserialized level. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high