in Apache Storm by
What rules of thumb can you give me for configuring Storm+Trident?

1 Answer

0 votes
by

number of workers a multiple of number of machines; parallelism a multiple of number of workers; number of kafka partitions a multiple of number of spout parallelism

  1. Use one worker per topology per machine
  2. Start with fewer, larger aggregators, one per machine with workers on it
  3. Use the isolation scheduler
  4. Use one acker per worker — 0.9 makes that the default, but earlier versions do not.
  5. Enable GC logging; you should see very few major GCs if things are in reasonable shape.
  6. Set the trident batch millis to about 50% of your typical end-to-end latency.
  7. Start with a max spout pending that is for sure too small — one for trident, or the number of executors for storm — and increase it until you stop seeing changes in the flow. You’ll probably end up with something near 2*(throughput in recs/sec)*(end-to-end latency) (2x the Little’s law capacity).
...