Class TableMultiResultWorker<S,R extends TableMultiResult>
- All Implemented Interfaces:
Runnable
TODO: Instead of a fixed history size, aggregate data into larger time ranges and keep track of mean, min, max, and standard deviation (or perhaps 5th/95th percentile?). Keep the following time ranges:
1 minute for 2 days = 2880 samples 5 minutes for 5 days = 1440 samples 15 minutes for 7 days = 672 samples 30 minutes for 14 days = 672 samples 1 hour for 28 days = 672 samples 2 hours for 56 days = 672 samples 4 hours for 112 days = 672 samples 1 day forever beyond this ================================== total: 7680 samples + one per day beyond 224 days
Update in a single background thread across all workers, and handle recovery from unexpected shutdown gracefully by inserting aggregate before removing samples, and detect on next aggregation. Also, the linked list should always be sorted by time descending, confirm this on aggregation pass.
- Author:
- AO Industries, Inc.
-
Constructor Summary
ModifierConstructorDescriptionprotected
TableMultiResultWorker
(File persistenceFile, Serializer<R> serializer) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Cancels the current getSample call on a best-effort basis.protected abstract AlertLevelAndMessage
getAlertLevelAndMessage
(S sample, Iterable<? extends R> previousResults) Determines the alert level and message for the provided result.protected long
The default future timeout is 5 minutes.protected TimeUnit
The default future timeout unit is MINUTES.protected abstract int
The number of history items to store.protected int
The default startup delay is within five minutes.protected abstract S
This is the main monitor routine.protected long
getSleepDelay
(boolean lastSuccessful, AlertLevel alertLevel) The default sleep delay is five minutes when successful or one minute when unsuccessful.protected boolean
isIncrementalRampUp
(boolean isError) Enables incremental alert level ramp-up, where the node's alert level is only incremented one step at a time per monitoring pass.protected abstract R
newErrorResult
(long time, long latency, AlertLevel alertLevel, String error) Creates a new result container object for error condition.protected abstract R
newSampleResult
(long time, long latency, AlertLevel alertLevel, S sample) Creates a new result container object for success condition.final void
run()
protected boolean
By default, the call togetSample
uses aFuture
and times-out at 5 minutes.
-
Constructor Details
-
TableMultiResultWorker
- Throws:
IOException
-
-
Method Details
-
getNextStartupDelay
protected int getNextStartupDelay()The default startup delay is within five minutes. -
isIncrementalRampUp
protected boolean isIncrementalRampUp(boolean isError) Enables incremental alert level ramp-up, where the node's alert level is only incremented one step at a time per monitoring pass. This makes the resource more tolerant of intermittent problems, at the cost of slower reaction time.Implementation Note:
Enabled by default- See Also:
-
run
public final void run() -
getSleepDelay
The default sleep delay is five minutes when successful or one minute when unsuccessful.- Parameters:
alertLevel
- Whennull
, treated asAlertLevel.UNKNOWN
-
getHistorySize
protected abstract int getHistorySize()The number of history items to store. -
getSample
This is the main monitor routine. Gets the current sample for this worker, any error should result in an exception. The sample may be any object that encapsulates the state of the resource in order to determine its alert level, alert message, and overall result.- Throws:
Exception
-
newErrorResult
Creates a new result container object for error condition. -
newSampleResult
Creates a new result container object for success condition. -
cancel
Cancels the current getSample call on a best-effort basis. Implementations of this method must not block. This default implementation callsfuture.cancel(true)
. -
getAlertLevelAndMessage
protected abstract AlertLevelAndMessage getAlertLevelAndMessage(S sample, Iterable<? extends R> previousResults) throws Exception Determines the alert level and message for the provided result. If unable to parse, may throw an exception to report the error. This should not block or delay for any reason.- Throws:
Exception
-
useFutureTimeout
protected boolean useFutureTimeout()By default, the call togetSample
uses aFuture
and times-out at 5 minutes. If the monitoring check cannot block indefinitely, it is more efficient to not use this decoupling. -
getFutureTimeout
protected long getFutureTimeout()The default future timeout is 5 minutes. -
getFutureTimeoutUnit
The default future timeout unit is MINUTES.- See Also:
-