-
Notifications
You must be signed in to change notification settings - Fork 28
[Ratelimit Processor] Instrument the ratelimiter service with telemetry metrics #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. Some thoughts on additional metrics that we would need to introduce
- Request duration
- Can we measure the no of requests that are over the limits at a given time? I believe we can by using the decision dimension, just confirm if thats is the case.
- Concurrent requests
@lahsivjar Could you think of additional metrics that were useful in MIS so we can add them here? Thanks.
} | ||
|
||
func newGubernatorRateLimiter(cfg *Config, set processor.Settings) (*gubernatorRateLimiter, error) { | ||
var behavior int32 | ||
|
||
telemetryBuilder, err := metadata.NewTelemetryBuilder(set.TelemetrySettings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add this also to the local rate limiter, to make sure we track all the incoming requests to this component. I would move this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigneshshanmugam Could you explain a bit more here please? Sorry I'm lost 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was definitely without much context, apologeis. What I meant was to add it to
func (r *localRateLimiter) RateLimit(ctx context.Context, hits int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you think of additional metrics that were useful in MIS so we can add them here? Thanks.
Currently in MIS we collect only 2 metrics from the rate limiter (ref):
- A metric to track the status of the ingest requests. This metric has dimensions with bounded cardinality to record the decision and the reason for the decision by the rate limiter service.
- A metric to track the update requests sent between gubernator instances to update them. This metric has just one dimension of status.
While I like the idea of collecting request duration, however, I am not sure how useful concurrent requests would be. We should also be wary of adding too many metrics in hot path, as it does add some overhead.
Co-authored-by: Vignesh Shanmugam <vignesh.shanmugam22@gmail.com>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
monotonic: true | ||
attributes: ["project_id", "processor_id", "ratelimit_decision", "limit_threshold", "reason"] | ||
|
||
attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this comment #562 (comment), We dont want to expose the project_id
etc details, instead keep this to attributes being derived from metadata_keys. Not sure if thats something possible with the metadata schema, but we can come back to later as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh sorry I missed it. Will remove project_id
. Yeah I dont think the current mdatagen
supports dynamic attribute keys. Can look into it more later.
How about processor_id
then? ProcessorID is from telemetry.WithProcessorID(r.set.ID.String())
.
Todo list in followup PR:
|
This is the initial implementation for https://github.com/elastic/hosted-otel-collector/issues/513
The code adds a newMetrics reader in the ratelimitprocessor.
At the moment we have implemented only
ratelimitRequests
metric.Int64CounterAlso the newMetricsReader makes use of the global otel processor that we assume that is initialized upon processor usage.
To be reviewed:
Sample document
After running
make smoketest
from thehosted-otel-collector
repo, I can see this document in Kibana: