Close
data center, spark job, cluster
data center, spark job, cluster

Spark Important Configurations

Memory Configurations

yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb

spark.driver.memory
spark.driver.memoryOverhead

spark.executor.memoryOverhead
spark.executor.memory
spark.executor.fraction
spark.executor.storageFraction
spark.executor.cores

spark.memory.offHeap.enabled
spark.memory.offHeap.size
spark.executor.pyspark.memory

Adaptive Query Execution

spark.sql.adaptive.enabled=true
spark.sql.shuffle.partitions=10
spark.sql.autoBroadcastJoinThreshold=10MB

spark.sql.adaptive.coalescePartitions.enabled
spark.sql.adaptive.coalescePartitions.initialPartitionNum
spark.sql.adaptive.coalescePartitions.minPartitionNum

spark.sql.adaptive.localShuffleReader.enabled=true	
spark.sql.adaptive.advisoryPartitionSizeInBytes

spark.sql.adaptive.skewjoin.enabled=true
spark.sql.adaptive.skewjoin.skewPartitionFactor=5
spark.sql.adaptive.skewjoin.skewPartitionThresholdInBytes=256MB

Partitioning & Caching

//Pruning
spark.sql.optimizer.dynamicPartitionPruning.enabled

Cache Vs Persist

  • Storage level config in Persist
    • Use Disk
    • Use Memory
    • Deserialized
    • Replication

Hints & Accumulators

Partitioning Hints:

  • COALESCE
  • REPARTITION
  • REPARTITION_BY_RANGE
  • REBALANCE

Join Hints:

  • BROADCAST alias BROADCASTJOIN and MAPJOIN
  • MERGE alias SHUFFLE_MERGE and MERGEJOIN
  • SHUFFLE_HASH
  • SHUFFLE_REPLICATE_NL

Accumulators – At Action level, gurantee accuracy

Speculative Execution

spark.speculation=true
spark.speculation.interval=100ms
spark.speculation.multiplier=1.5
spark.speculation.quantile=0.75
spark.speculation.minTaskRuntime=100ms
spark.speculation.task.duration.threshold=None

Dynamic Resource Allocation

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.shuffleTracking.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60s
spark.dynamicAllocation.schedulerBacklogTimeout=1s

Spark Schedulers

spark.scheduler.mode=FAIR

Leave a Reply

Your email address will not be published. Required fields are marked *