How does Splunk avoid the duplicate indexing of logs?

At the indexer, Splunk keeps track of the indexed events in a directory called fishbucket with the default location:


It contains seek pointers and CRCs for the files we are indexing, so splunkd can tell us if it has read them already.