OTEL Receiver Configuration

OpenTelemetry (OTel) native receivers are integral components of the OpenTelemetry Collector, designed to collect telemetry data such as metrics, traces, and logs directly from supported applications or services. These native receivers understand and ingest data in the format emitted by specific technologies without requiring translation or external exporters.

Similarly, the Prometheus receiver in OpenTelemetry facilitates the collection of metrics from systems exposing data in the Prometheus exposition format, enabling seamless integration with existing Prometheus instrumentation.

Native OTEL Receivers

What Are Native OTEL Receivers?

Native OTEL receivers connect directly to applications or services (e.g., Redis, Jaeger, MySQL), collecting telemetry using the native protocols or APIs of those technologies. This direct integration simplifies observability by reducing the need for custom instrumentation.

Key Features

  • Direct Integration: Native receivers connect directly to applications or services (like Redis, Jaeger, or MySQL) and collect telemetry data using the application’s native protocols or APIs.
  • Automatic Data Collection: They simplify observability by automatically gathering relevant metrics or traces, reducing the need for custom instrumentation or additional exporters.
  • Configuration via YAML: Receivers are configured using YAML files, specifying endpoints, authentication, and other parameters.

Common OTel Native Receivers

  • Redis Receiver: Collects metrics from Redis instances by connecting to the Redis server and querying for statistics such as memory usage, command counts, and latency.
  • Jaeger Receiver: Ingests trace data from Jaeger clients or agents, allowing the OpenTelemetry Collector to process and export traces to various backends.
  • Other Examples: Receivers exist for many technologies, including Kafka, MongoDB, MySQL, and more.

Example Configurations

Redis Receiver:

receivers:
  redis:
    endpoint: "localhost:6379"
    password: ""
    collection_interval: 10s

Jaeger Receiver:

receivers:
  jaeger:
    protocols:
      grpc:
      thrift_http:
      thrift_compact:
      thrift_binary:

Prometheus Receiver in OpenTelemetry

What is the Prometheus Receiver?

The Prometheus receiver is designed to scrape metrics endpoints that expose data in the Prometheus format (typically via HTTP). It collects these metrics and makes them available for processing, transformation, and export to various backends supported by OpenTelemetry.

Why Use the Prometheus Receiver?

  • Leverage Existing Instrumentation: Many applications and infrastructure components already expose metrics in Prometheus format. The Prometheus receiver allows you to reuse this instrumentation without modification.
  • Unified Observability: By collecting Prometheus metrics alongside other telemetry data (traces, logs), you achieve a unified observability pipeline.
  • Flexibility: Integrate with a wide range of exporters and backends supported by OpenTelemetry, beyond what Prometheus natively supports.

Example Configurations

Scraping Node Exporter Metrics:

Node Exporter is a Prometheus exporter that exposes hardware and OS metrics from *nix systems.

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'node'
          static_configs:
            - targets: ['localhost:9100']

Scraping Windows Exporter Metrics:

Windows Exporter exposes Windows system metrics in Prometheus format.

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'windows'
          static_configs:
            - targets: ['localhost:9182']

Exporter Management Options

Beyond scraping existing exporters, you can automate deployment of exporters such as Node Exporter or Windows Exporter:

  • Automatic Download and Run: Enable the exporter from the configuration UI to download and run it from the OpsRamp portal on the default port.
  • Custom Exporter Port: Specify a custom port via command-line arguments.
  • Custom Configuration: Provide a custom configuration file for the exporter.

This automation ensures consistent metrics collection, even if exporters are not pre-installed on target systems.

Benefits

  • Unified Observability: Collect telemetry from diverse sources using a standardized approach.
  • Flexibility: Easily extend monitoring by enabling or configuring new receivers.
  • Vendor Neutrality: OpenTelemetry is open-source and vendor-agnostic, suitable for varied environments.

Summary

OTel native receivers are essential for collecting telemetry data from various applications and services without additional dependencies. They provide a flexible, scalable, and unified approach to observability, making it easier to monitor and troubleshoot distributed systems.

Exporter Config File

You can maintain configurations for both Node Exporter and Windows Exporter in YAML files. These configurations specify settings such as the web endpoint, telemetry path, enabled and disabled collectors, and logging level. Customize these files as needed and reference them in your deployment or startup scripts.

Example Configurations

Node Exporter Configuration (node_exporter_config.yaml)

web:
  listen-address: ":9100"
  telemetry-path: "/metrics"

collectors:
  enabled:
    - cpu
    - meminfo
    - filesystem
    - netdev
    - loadavg
    - diskstats
    - time
    - uname
    - vmstat
    - stat
    - systemd
    - textfile
  disabled:
    - hwmon
    - mdadm
    - nfs
    - zfs
log:
  level: "info"

Windows Exporter Configuration (windows_exporter_config.yaml)

web:
  listen-address: ":9100"
  telemetry-path: "/metrics"
collectors:
  enabled:
    - cpu
    - meminfo
    - filesystem
    - netdev
    - loadavg
    - diskstats
    - time
    - uname
    - vmstat
    - stat
    - systemd
    - textfile
  disabled:
    - hwmon
    - mdadm
    - nfs
    - zfs
log:
  level: "info"

How It Will Be Used In Agent?

  • Agent Reference these config files when starting the exporter using the appropriate command-line flag, for example:
    • Node Exporter:
         ./node_exporter --config.file={agent_installed_path}/plugins/node_exporter_config.yaml
         
    • Windows Exporter:
         windows_exporter.exe --config.file="C:\path\to\windows_exporter_config.yaml"
         
  • Customize the enabled/disabled collectors and other settings as per your monitoring requirements.

Agent Alert Definitions

In addition to collecting metrics, our system supports user-defined alert definitions. These alerts allow you to monitor specific conditions on your devices and receive notifications when thresholds are breached. Alert definitions are specified in YAML format, as shown below.

Template for Alert Definition

Below is a sample template for a single alert definition:

alertDefinitions:
   - name: alert_definition_name
     interval: alert_polling_time
     expr: promql_expression
     isAvailability: true
     warnOperator: operator_macro
     warnThreshold: str_threshold_value
     criticalOperator: operator_macro
     criticalThreshold: str_threshold_value
     alertSub: alert_subject
     alertBody: alert_description

Field Descriptions:

  • name : Provide a unique name for the alert definition.
  • interval: Polling interval at which alert definition should run. The interval should given in time duration format like 1m, 5m, 15m or 1h.
  • expr: Valid promQL query expression that computes the value for alert generation.
  • isAvailability: boolean variable that tells whether the alert definition should consider for resource availability computation or not.
  • warnOperator / criticalOperator: Operators used to compare metric values against thresholds. Supported operators:
    • GREATER_THAN_EQUAL
    • GREATER_THAN
    • EQUAL
    • NOT_EQUAL
    • LESS_THAN_EQUAL
    • LESS_THAN
    • EXISTS
  • warnThreshold: Specify warning-level threshold Value for the metric.
  • criticalThreshold: Specify critical-level threshold Value for the metric.
  • alertSuband / alertBody: are for content displayed for a warning or critical alert on the alert browser. We can use macros to get the dynamic values in it. The actual values replace the alert displayed on the alert browser. Below are macros that can be used while defining alert subject/body.
    • ${severity}
    • ${metric.name}
    • ${component.name}
    • ${metric.value}
    • ${threshold}
    • ${resource.name}
    • ${resource.uniqueid}

User Configuration

By default, OpsRamp provides basic alert definitions for pods, nodes, and more. Users can customize alert definitions by editing the alert definitions section within the template.

Example configuration:

alertDefinitions:
  - name: "HighCPUUsage"
    interval: "1m"
    expr: "avg(rate(node_cpu_seconds_total[5m])) > 0.8"
    isAvailability: false
    warnOperator: "GREATER_THAN"
    warnThreshold: "0.7"
    criticalOperator: "GREATER_THAN"
    criticalThreshold: "0.8"
    alertSub: "High CPU Usage Alert"
    alertBody: "CPU usage is critically high on the system."

  - name: "MemoryUsage"
    interval: "2m"
    expr: 
    isAvailability: false
    warnOperator: "GREATER_THAN"
    warnThreshold: "0.85"
    criticalOperator: "GREATER_THAN"
    criticalThreshold: "0.9"
    alertSub: "High Memory Usage Alert"
    alertBody: "Memory usage is critically high on the system."

  - name: "DiskSpace"
    interval: "5m"
    expr: "(node_filesystem_free_bytes{fstype!~\"nfs|tmpfs|rootfs\"} / node_filesystem_size_bytes{fstype!~\"nfs|tmpfs|rootfs\"})"
    isAvailability: false
    warnOperator: "LESS_THAN"
    warnThreshold: "0.15"
    criticalOperator: "LESS_THAN"
    criticalThreshold: "0.1"
    alertSub: "Low Disk Space Alert"
    alertBody: "Disk space is critically low on the system."

You can remove or add new alerts using standard PromQL expressions.

Configure Availability

To configure resource availability, define an alert and set isAvailability to true. This alert definition will be used to compute the availability status of the resource. For example, to define pod availability based on pod memory usage:

alertDefinitions:
  - name: "HighCPUUsage"
    interval: "1m"
    expr: "avg(rate(node_cpu_seconds_total[5m])) > 0.8"
    isAvailability: false
    warnOperator: "GREATER_THAN"
    warnThreshold: "0.7"
    criticalOperator: "GREATER_THAN"
    criticalThreshold: "0.8"
    alertSub: "High CPU Usage Alert"
    alertBody: "CPU usage is critically high on the system."

If the alert HighCPUUsage is triggered at either warning or critical level, the availability of the resource will be considered down; otherwise, it will be up.

References