Class TableMultiResultWorker<S,R extends TableMultiResult>

java.lang.Object
com.aoindustries.noc.monitor.TableMultiResultWorker<S,R>
All Implemented Interfaces:
Runnable

public abstract class TableMultiResultWorker<S,R extends TableMultiResult> extends Object implements Runnable
The workers for table multi-results node.

TODO: Instead of a fixed history size, aggregate data into larger time ranges and keep track of mean, min, max, and standard deviation (or perhaps 5th/95th percentile?). Keep the following time ranges:

 1 minute for 2 days = 2880 samples
 5 minutes for 5 days = 1440 samples
 15 minutes for 7 days = 672 samples
 30 minutes for 14 days = 672 samples
 1 hour for 28 days = 672 samples
 2 hours for 56 days = 672 samples
 4 hours for 112 days = 672 samples
 1 day forever beyond this
 ==================================
 total: 7680 samples + one per day beyond 224 days
 

Update in a single background thread across all workers, and handle recovery from unexpected shutdown gracefully by inserting aggregate before removing samples, and detect on next aggregation. Also, the linked list should always be sorted by time descending, confirm this on aggregation pass.

Author:
AO Industries, Inc.
  • Constructor Details

  • Method Details

    • getNextStartupDelay

      protected int getNextStartupDelay()
      The default startup delay is within five minutes.
    • isIncrementalRampUp

      protected boolean isIncrementalRampUp(boolean isError)
      Enables incremental alert level ramp-up, where the node's alert level is only incremented one step at a time per monitoring pass. This makes the resource more tolerant of intermittent problems, at the cost of slower reaction time.

      Implementation Note:
      Enabled by default

      See Also:
    • run

      public final void run()
      Specified by:
      run in interface Runnable
    • getSleepDelay

      protected long getSleepDelay(boolean lastSuccessful, AlertLevel alertLevel)
      The default sleep delay is five minutes when successful or one minute when unsuccessful.
      Parameters:
      alertLevel - When null, treated as AlertLevel.UNKNOWN
    • getHistorySize

      protected abstract int getHistorySize()
      The number of history items to store.
    • getSample

      protected abstract S getSample() throws Exception
      This is the main monitor routine. Gets the current sample for this worker, any error should result in an exception. The sample may be any object that encapsulates the state of the resource in order to determine its alert level, alert message, and overall result.
      Throws:
      Exception
    • newErrorResult

      protected abstract R newErrorResult(long time, long latency, AlertLevel alertLevel, String error)
      Creates a new result container object for error condition.
    • newSampleResult

      protected abstract R newSampleResult(long time, long latency, AlertLevel alertLevel, S sample)
      Creates a new result container object for success condition.
    • cancel

      protected void cancel(Future<S> future)
      Cancels the current getSample call on a best-effort basis. Implementations of this method must not block. This default implementation calls future.cancel(true).
    • getAlertLevelAndMessage

      protected abstract AlertLevelAndMessage getAlertLevelAndMessage(S sample, Iterable<? extends R> previousResults) throws Exception
      Determines the alert level and message for the provided result. If unable to parse, may throw an exception to report the error. This should not block or delay for any reason.
      Throws:
      Exception
    • useFutureTimeout

      protected boolean useFutureTimeout()
      By default, the call to getSample uses a Future and times-out at 5 minutes. If the monitoring check cannot block indefinitely, it is more efficient to not use this decoupling.
    • getFutureTimeout

      protected long getFutureTimeout()
      The default future timeout is 5 minutes.
    • getFutureTimeoutUnit

      protected TimeUnit getFutureTimeoutUnit()
      The default future timeout unit is MINUTES.
      See Also: