Monitoring · Gentics Mesh

Configuration

The network settings for the monitoring server can be configured within the mesh.yml or via environment settings.

The montoring server can also be turned off via the enabled flag.

mesh.yml

…
monitoring:
  enabled: true
  port: 8081
  host: "127.0.0.1"
  memoryLimit: 95
  gcTimeLimit: 50

Container deployments which need to access the monitoring API need to change the monitoring server binding in order to expose the port.
This can be done via the environment http host setting: MESH_MONITORING_HTTP_HOST=0.0.0.0

Liveness Probe in Docker

The liveness of the Gentics Mesh instance running in a docker container can be checked by executing the script /mesh/live.sh inside the container. This has some advantages over the liveness probe using the HTTP endpoint: Liveness checked with with this script does not depend on the http verticles and vert.x to be fully started, but only on the Java process to be running.

Monitoring of Java Heap Memory and Garbage Collections

When both memoryLimit and gcTimeLimit are set to positive integers (which is the default), Gentics Mesh will regularly check the amount of used Heap Memory and the amount of time spent for garbage collections (in the last 10 seconds). If both amounts exceed the configured limits (in percent), the instance will be considered unhealthy and both the liveness probe and the readiness probe will fail. The instance will be unhealthy until used memory and the time spent for garbage collections drop below the configured limits.

Setting at least one of the values of memoryLimit and gcTimeLimit to 0 will disable the monitoring of Java Heap Memory and Garbage Collections.

Endpoints

Liveness Probe

The healthcheck liveness probe endpoint GET /api/v2/health/live indicates if server is working as expected when status code 200 is returned.

Readiness Probe

The readiness probe endpoint GET /api/v2/health/ready returns status code 200 if the server is accepting connections. This endpoint can be used to check when the server instance if ready to accept requests once it has been started. This is especially useful if you run rolling cluster upgrades.

Status

The current Gentics Mesh server status can be checked against the GET /api/v2/status.

Status	Description
STARTING	Status which indicates that the server is starting up.
WAITING_FOR_CLUSTER	Status which indicates that the server is waiting/looking for a cluster to join.
READY	Status which indicates that the server is operating normally.
SHUTTING_DOWN	Status which indicates that the instance is shutting down.

Status

Description

STARTING

Status which indicates that the server is starting up.

WAITING_FOR_CLUSTER

Status which indicates that the server is waiting/looking for a cluster to join.

READY

Status which indicates that the server is operating normally.

SHUTTING_DOWN

Status which indicates that the instance is shutting down.

Cluster Status

The GET /api/v2/cluster/status endpoint returns the status of all cluster nodes to which the queries instance has establishes a connection.

Version Info

The version information can be retrieved via the GET /api/v2/versions endpoint.

{
  "meshVersion" : "3.0.0",
  "meshNodeName" : "Singing Chandelure",
  "databaseVendor" : "mariadb",
  "databaseVersion" : "10.6",
  "searchVendor" : "elasticsearch",
  "searchVersion" : "6.1.2",
  "vertxVersion" : "4.5.2",
  "databaseRevision" : "0be9a986"
}

Prometheus Metrics

Gentics Mesh server exposes Prometheus compatible data on the /api/v2/metrics endpoint.

The Prometheus server can scrape metric data from this endpoint. Using Grafana in combination with Prometheus is a typical usecase to display metric data.

Prometheus

Example Prometheus configuration:

prometheus.yml

…
scrape_configs:
  - job_name: 'mesh'
    scrape_interval: 30s
    metrics_path: '/api/v2/metrics'
    static_configs:
        - targets: ['mesh:8081']
…

Metrics

Gentics Mesh exposes the following metrics in addition to the default Vert.x metrics. More metrics will be added over time.

<cache> is one of permission, projectbranchname, projectname, webroot.

Key Description

Key	Description
`mesh_tx_created`	Meter which measures the rate of created transactions over time.
`mesh_notx_created`	Meter which measures the rate of created noTx transactions over time.
`mesh_graph_element_reload`	Meter which tracks the reload operations on used vertices.
`mesh_tx_time`	Timer which tracks transaction durations.
`mesh_tx_retry`	Amount of transaction retries which happen if a conflict has been encountered.
`tx_interrupt`	Amount of commit interrupts.
`commit_time`	Timer which tracks commit durations.
`mesh_node_migration_pending`	Pending contents which need to be processed by the node migration.
`mesh_cache_<cache>_hit`	Amount of cache hits.
`mesh_cache_<cache>_miss`	Amount of cache misses.
`mesh_cache_<cache>_clear_all`	Amount of invalidations of the whole cache.
`mesh_cache_<cache>_clear_single`	Amount of invalidations for a single entry in the cache.
`mesh_write_lock_waiting_time`	Tracks the time which is spent waiting on the write lock.
`mesh_write_lock_timeout`	Amount of timeouts of acquiring the write lock.
`mesh_topology_lock_waiting_time`	Tracks the time which is spent waiting on the write lock.
`mesh_topology_lock_timeout`	Amount of timeouts of acquiring the write lock.
`graphql_time`	Timer which tracks duration of graphql requests.
`mesh_storage_disk_total`	Total disk size in bytes for the storage.
`mesh_storage_disk_usable`	Usable (free) disk size in bytes for the storage.

mesh_tx_created

Meter which measures the rate of created transactions over time.

mesh_notx_created

Meter which measures the rate of created noTx transactions over time.

mesh_graph_element_reload

Meter which tracks the reload operations on used vertices.

mesh_tx_time

Timer which tracks transaction durations.

mesh_tx_retry

Amount of transaction retries which happen if a conflict has been encountered.

tx_interrupt

Amount of commit interrupts.

commit_time

Timer which tracks commit durations.

mesh_node_migration_pending

Pending contents which need to be processed by the node migration.

mesh_cache_<cache>_hit

Amount of cache hits.

mesh_cache_<cache>_miss

Amount of cache misses.

mesh_cache_<cache>_clear_all

Amount of invalidations of the whole cache.

mesh_cache_<cache>_clear_single

Amount of invalidations for a single entry in the cache.

mesh_write_lock_waiting_time

Tracks the time which is spent waiting on the write lock.

mesh_write_lock_timeout

Amount of timeouts of acquiring the write lock.

mesh_topology_lock_waiting_time

Tracks the time which is spent waiting on the write lock.

mesh_topology_lock_timeout

Amount of timeouts of acquiring the write lock.

graphql_time

Timer which tracks duration of graphql requests.

mesh_storage_disk_total

Total disk size in bytes for the storage.

mesh_storage_disk_usable

Usable (free) disk size in bytes for the storage.

Clients

The Monitoring Java Client can be used to interact with the endpoints using Java.