Skip to content

feat: remove agents on AgentDisconnect message #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

rogercoll
Copy link
Contributor

@rogercoll rogercoll commented Apr 29, 2025

The goal of this PR was to remove entries from the central configuration fetcher map uidToService map[string]agentcfg.Service, but the agent's service catching logic was moved to the already existing sync.Map in the OpAMP callbacks → the fetcher does not need to deal with AgentDisconnect messages

  • Agent's identifying attributes are now stored in the OpAMP callbacks structure.
  • Unit and integration tests added for AgentDisconnect messages and new test file for the fetcher struct.
  • Adopted the new deployment.environment.name instead of the deprecated deployment.environment attribute.

Fixes #398

@rogercoll rogercoll marked this pull request as ready for review April 30, 2025 13:48
@rogercoll rogercoll requested a review from a team as a code owner April 30, 2025 13:48
@rogercoll
Copy link
Contributor Author

PTAL @elastic/ingest-otel-data

@rogercoll rogercoll changed the title feat: move cached agents to callbacks feat: remove agents on AgentDisconnect message Apr 30, 2025
switch attr.GetKey() {
case semconv.AttributeServiceName:
serviceParams.Name = attr.GetValue().GetStringValue()
case semconv.AttributeDeploymentEnvironment:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it (or should it be):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! The best option is to use the most up-to-date semantic convention (semconv) attribute that represents the deployment environment name. So, if all the OpenTelemetry SDKs we plan to support with this extension already use deployment.environment.name, then that’s the attribute we should go with.

I am going to change the expected value to the new deployment.environment.name, as well the integration tests and docs for clarity. Note that Kibana, either if the Otel SDK is providing deployment.environment or deployment.environment.name, it will store it in the apm config index with the service.name keyword (due classic SDKs). The following is an example of a Otel Java SDK config:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".apm-agent-configuration",
        "_id": "abc",
        "_score": 1,
        "_source": {
          "agent_name": "opentelemetry/java/elastic",
          "service": {
            "name": "dice-java-5",
            "environment": "prod-otel"
          },
          "settings": {
            "send_logs": "true",
            "send_traces": "true",
            "deactivate_instrumentations": "that-module-please"
          },
          "@timestamp": 1746175386213,
          "applied_by_agent": false,
          "etag": "abc"
        }
      }
    ]
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to the new deployment.environment.name attribute 5c39db9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Kibana, either if the Otel SDK is providing deployment.environment or deployment.environment.name, it will store it in the apm config index with the service.name keyword (due classic SDKs).

You mean service.environment in that sentence, I believe.

switched to the new deployment.environment.name attribute

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use deployment.environment for iOS and Android agents. The iOS one will switch from using Central Config to using OpAMP afaik. Should we add a case for deployment.environment to keep backwards compatibility? cc @bryce-b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks. I think for Android we can just start sending the new attribute when we enable OpAMP, since its current version doesn't support central config anyway. For iOS, I think we could send both attributes so that people can use the same build with both central config and OpAMP? Today we have the agent's weekly meeting, let me ask for some advice there in case I might me missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @LikeTheSalad. Because exposing an OpAMP endpoint, this extension should only be used for OTel SDKs (legacy SDKs can use the elasticapmreceiver which exposes an intake v2 endpoint), does the iOS OTel SDK use central config already?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the iOS OTel SDK use central config already?

It does.

legacy SDKs can use the elasticapmreceiver which exposes an intake v2 endpoint

Does this mean that elasticapmreceiver can act as an APM Server replacement, in that it provides central config too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that elasticapmreceiver can act as an APM Server replacement, in that it provides central config too?

Yes, that's the idea. These are the http endpoints it exposes: "/config/v1/agents" and "/intake/v2/events".

As mentioned, it looks like the iOS SDK is using the deployment.environment resource attribute, thus it would make sense to have back-wards compatibility if we want OpAMP to be a drop-in replacement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the weekly meeting, the new SDKs should use the new name when adding support for OpAMP config. And I think that if the elasticapmreceiver will always be available, then I don't see why we should make the OpAMP route backwards compatible. In terms of an agent that should support both, the APM Server and the OTel Collector routes, I'm only aware of the iOS one, so I'll loop @bryce-b in just in case, though it seems like relying on the elasticapmreceiver should do.

@rogercoll rogercoll requested a review from a team May 2, 2025 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[extension/apmconfig] Handle on AgentDisconnect message
3 participants