Skip to main content

Table 4 Storage levels of RDDs persistence

From: Leveraging resource management for efficient performance of Apache Spark

Storage level

Characteristics

memory_only

Storing data in memory if it is possible. when the RDD size is higher than memory size, It will not cache the partitions which have not enough space, In consequence, it will not recompute these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time

memory_and_disk

For this level, it is possible to store the partitions in disk if there is no enough space in memory. consequently, it will retrieve these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time

disk_only

Storing all the partitions only on the disk, it provides more space efficient. For this level, the storage space becomes small and the computation time becomes high

memory_only_ser

This level stores the RDD as serialized Java object and only in memory. It provides more space efficient compared to deserialized levels. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high

memory_and_disk_ser

This level stores the RDD as serialized Java object in memory and on disk. It provides more space efficient compared to deserialized level. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high