From: Leveraging resource management for efficient performance of Apache Spark
Storage level | Characteristics |
---|---|
memory_only | Storing data in memory if it is possible. when the RDD size is higher than memory size, It will not cache the partitions which have not enough space, In consequence, it will not recompute these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time |
memory_and_disk | For this level, it is possible to store the partitions in disk if there is no enough space in memory. consequently, it will retrieve these partitions whenever required. This level provides very high space for storage and reduces the CPU computation time |
disk_only | Storing all the partitions only on the disk, it provides more space efficient. For this level, the storage space becomes small and the computation time becomes high |
memory_only_ser | This level stores the RDD as serialized Java object and only in memory. It provides more space efficient compared to deserialized levels. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high |
memory_and_disk_ser | This level stores the RDD as serialized Java object in memory and on disk. It provides more space efficient compared to deserialized level. However, it raises the CPU overhead. For this level, the storage space becomes small and the computation time becomes high |