Configuration of the cluster is as below :
Nodes = 10
Each Node has core = 16 cores (-1 for operating systems)
Each Node Ram = 61 GB Ram (-1 for Hadoop Deamons)
Number of cores identification:
Number of cores is, number of concurrent tasks an executor can run in parallel so the general rule of thumb for optimal value is 5 (–num-cores 5)
Number of executor identification :
No.of.executor = No.of.cores / concurrent tasks (5 in general)
15/5 = 3 is no.of.executor in each node
No.of.nodes * no.of.executor in each node = no.of.executor (for spark job)
10 * 3 = 30 (-–num-executors 30 )