in DevOps Culture by
How the Data Ages in Splunk?

1 Answer

0 votes

Data entering in an indexer gets directories, also known as buckets. Over a period of time, these buckets roll over different stages from hot to warm, cold, frozen, and finally thawed. The indexer goes through a pipeline and this is where the event processing takes place. It occurs in two stages, Parsing breaks the in individual events, while indexing takes these events into the pipeline for the processing.

splunk admin

This is what happens to the data at each stage of the indexing pipeline:

As soon as the data center the pipeline, it goes to the hot bucket. There can be multiple hot buckets at any point in time, which you can both search and write to.

If any problem like the Splunk getting restarted or the hot bucket has reached a certain threshold value/size, then a new bucket will be created in its place and the existing ones roll to become a warm bucket. These warm buckets are searchable, but you cannot write anything in them.

Further, if the indexer reaches its maximum capacity, the warm bucket will be rolled to become a cold one. Splunk will automatically execute the process by selecting the oldest warm bucket from the pipeline. However, it doesn’t rename the bucket. All the above buckets will be stored in the default location ‘$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*’.

After a certain period of time, the cold bucket rolls to become the frozen bucket. These buckets don’t have the same location as the previous buckets and are non-searchable. These buckets can either be archived or deleted based on the priorities.

You can’t do anything if the bucket is deleted, but you can retrieve the frozen bucket if it’s being archived. The process of retrieving an archived bucket is known as thawing. Once a bucket is thawed it becomes searchable and stores into a new location