0 votes
in AWS by

An application is publishing a custom CloudWatch metric any time an HTTP 504 error appears in the application error logs. These errors are being received intermittently.

There is a CloudWatch Alarm for this metric, and the Developer would like the alarm to trigger ONLY if it breaches two evaluation periods or more. What should be done to meet these requirements?

1 Answer

0 votes
by

Answer - D.

Our scenario states that we are receiving HTTP Error 504 intermittently.

The scenario requires that the ALARM should trigger ONLY if it breaches 2 evaluation periods.

None of the options listed is a good choice.

When you create an alarm, you specify three settings to enable CloudWatch to evaluate when to change the alarm state:

Period is the length of time to evaluate the metric to create each individual data point for a metric.

It is expressed in seconds.

If you choose one minute as the period, there is one data point every minute.

Evaluation Period is the number of the most recent data points to evaluate when determining alarm state.

Datapoints to Alarm is the number of data points within the evaluation period that must be breached to cause the alarm to go to the ALARM state.

The breaching data points do not have to be consecutive.

They must all be within the last number of data points equal to the Evaluation Period.

Let us look at an example.

In the following figure, the alarm threshold is set to three units.

The alarm is configured to go to the ALARM state, and both Evaluation Period and Datapoints to Alarm are 3

That is, when all three data points in the most recent three consecutive periods are above the threshold, the alarm goes to the ALARM state.

In the figure, this happens in the third through fifth time periods.

At period six, the value dips below the threshold, so one of the periods being evaluated is not breaching, and the alarm state changes to OK.

During the ninth time period, the threshold is breached again, but for only one period.

Consequently, the alarm state remains OK.

Option A is incorrect since here there is no mention of any special kind of notification.

Option B is incorrect since you don't need to mention a 0 value.

Place a 1 value when the result is received.

Option C is incorrect since there is no mention of the frequency, so we don't know if we need high resolution for metrics.

For more information on the aggregation of data in Cloudwatch, please refer to the below Link-

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-common-scenarios.html#CloudWatch-Agent-aggregating-metrics

Units  ‘After3 periods ‘overthreshold, an actionisinvoked  ‘Onlyone period ‘overthreshold, no actionisinvoked  —threshold —value

The correct answer is D.

Explanation:

In this scenario, the Developer wants to trigger a CloudWatch alarm only if it breaches two evaluation periods or more, so we need to adjust the evaluation period and the number of data points required to breach the alarm.

A CloudWatch alarm has two important parameters to consider for this scenario: Evaluation Period and Datapoints to Alarm.

Evaluation Period specifies the time range in which the alarm will evaluate the metric data to decide whether to breach the alarm threshold. The duration of the evaluation period is specified in minutes, and it must be a multiple of 60. For example, if the evaluation period is set to 5 minutes, CloudWatch will evaluate the metric data every 5 minutes.

Datapoints to Alarm specifies the number of data points that must breach the threshold before the alarm triggers. This parameter prevents false alarms triggered by occasional spikes in the metric data.

To meet the requirements of this scenario, we need to set the Evaluation Period and Datapoints to Alarm parameters appropriately. We can't use custom notifications or high-resolution metrics to solve this problem. Similarly, publishing a value of zero for the metric when there are no errors is not a good solution since it doesn't represent the actual state of the system.

Therefore, the correct answer is D. We should set the Evaluation Period to at least 2 times the time it takes for the metric to be published, and set Datapoints to Alarm to 2. This means that the alarm will trigger only if the HTTP 504 error appears in two or more consecutive evaluation periods. By setting the Datapoints to Alarm to 2, we ensure that the alarm only triggers when the error is recurring rather than just a one-time event.

In summary, the correct answer is D.

...