Skip to main content

Advertisement

Table 4 Storage levels of RDDs persistence

From: Leveraging resource management for efficient performance of Apache Spark

Storage level Characteristics
memory_only Storing data in memory if it is possible. when the RDD size is higher than memory size, It will not cache the partitions which have not enough space, In consequence, it will not recompute these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time
memory_and_disk For this level, it is possible to store the partitions in disk if there is no enough space in memory. consequently, it will retrieve these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time
disk_only Storing all the partitions only on the disk, it provides more space efficient. For this level, the storage space becomes small and the computation time becomes high
memory_only_ser This level stores the RDD as serialized Java object and only in memory. It provides more space efficient compared to deserialized levels. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high
memory_and_disk_ser This level stores the RDD as serialized Java object in memory and on disk. It provides more space efficient compared to deserialized level. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high