Skip to content

Use new Dictionary AlternateLookup to avoid allocation in DictionaryJumpTable #58305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

andrewjsaid
Copy link
Contributor

Use new Dictionary AlternateLookup to avoid allocation in DictionaryJumpTable

Description

Simple change which removes the substring allocation in DictionaryJumpTable using the GetAlternateLookup<ReadOnlySpan<char>> feature of .NET 9.

With this change all JumpTable implementations can accept ReadOnlySpan<char> as a parameter instead of string, PathSegment. I'd be happy to include that change if it's wanted.

@andrewjsaid andrewjsaid requested a review from javiercn as a code owner October 8, 2024 21:08
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Oct 8, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 8, 2024
@BrennanConroy
Copy link
Member

Awesome, was looking forward to aspnetcore adopting the new alternate lookup feature.

Could you run the micro-benchmarks before and after this change?
https://github.com/dotnet/aspnetcore/blob/main/src/Http/Routing/perf/Microbenchmarks/Matching/JumpTableMultipleEntryBenchmark.cs

@andrewjsaid
Copy link
Contributor Author

andrewjsaid commented Oct 8, 2024

Interestingly enough the PR looks to be slightly slower (5ns -> 11ns) with the benchmark as-is whereas I would have expected no difference.

However there's a flaw with the benchmark in that it currently always compares the entire string which is unusual and which negates the benefit proposed in this PR. In most real-world cases the PathSegment is only part of the string and so substrings are allocated.

I've modified the benchmark to account for this by using only part of the path string. The modified benchmark results are below. Overall less significant than I would have liked but it's faster and allocates less memory but only if we accept PathSegment usually points to a substring of Path. [edit: I confirmed that this is always the case since all paths start with /]

Before:

Method Count Mean Error StdDev Op/s Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Baseline 2 0.6304 ns 0.0052 ns 0.0049 ns 1,586,354,557.5 1.00 0.00 - - - -
Dictionary 2 13.9512 ns 0.1030 ns 0.0860 ns 71,678,284.5 22.13 0.29 0.0011 - - 82 B
Baseline 5 0.6195 ns 0.0067 ns 0.0060 ns 1,614,331,173.5 1.00 0.00 - - - -
Dictionary 5 14.5809 ns 0.1890 ns 0.1578 ns 68,582,745.6 23.55 0.32 0.0011 - - 82 B
Baseline 10 0.6169 ns 0.0111 ns 0.0104 ns 1,620,884,005.6 1.00 0.00 - - - -
Dictionary 10 15.2295 ns 0.2681 ns 0.3486 ns 65,661,888.5 24.69 0.86 0.0011 - - 82 B
Baseline 25 0.6165 ns 0.0060 ns 0.0056 ns 1,621,971,184.8 1.00 0.00 - - - -
Dictionary 25 16.9301 ns 0.2896 ns 0.2568 ns 59,066,511.0 27.47 0.46 0.0011 - - 82 B
Baseline 50 0.6172 ns 0.0060 ns 0.0056 ns 1,620,306,284.5 1.00 0.00 - - - -
Dictionary 50 18.1510 ns 0.0577 ns 0.0482 ns 55,093,297.9 29.42 0.34 0.0011 - - 82 B
Baseline 100 0.6176 ns 0.0070 ns 0.0066 ns 1,619,194,748.0 1.00 0.00 - - - -
Dictionary 100 20.4224 ns 0.1325 ns 0.1174 ns 48,965,762.2 33.04 0.38 0.0011 - - 82 B

After:

Method Count Mean Error StdDev Op/s Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Baseline 2 0.6283 ns 0.0088 ns 0.0082 ns 1,591,566,799.9 1.00 0.00 - - - -
Dictionary 2 11.8230 ns 0.1413 ns 0.1321 ns 84,580,838.3 18.82 0.22 - - - -
Baseline 5 0.6149 ns 0.0052 ns 0.0046 ns 1,626,151,413.8 1.00 0.00 - - - -
Dictionary 5 11.8060 ns 0.1567 ns 0.1466 ns 84,702,576.7 19.19 0.33 - - - -
Baseline 10 0.6141 ns 0.0067 ns 0.0063 ns 1,628,406,229.3 1.00 0.00 - - - -
Dictionary 10 12.0957 ns 0.1984 ns 0.1856 ns 82,673,853.8 19.70 0.39 - - - -
Baseline 25 0.6163 ns 0.0044 ns 0.0039 ns 1,622,602,556.1 1.00 0.00 - - - -
Dictionary 25 13.0903 ns 0.2482 ns 0.2200 ns 76,392,361.3 21.24 0.34 - - - -
Baseline 50 0.6159 ns 0.0095 ns 0.0089 ns 1,623,584,223.2 1.00 0.00 - - - -
Dictionary 50 14.6091 ns 0.0292 ns 0.0273 ns 68,450,281.6 23.72 0.35 - - - -
Baseline 100 0.6143 ns 0.0078 ns 0.0073 ns 1,627,848,658.0 1.00 0.00 - - - -
Dictionary 100 16.1624 ns 0.0569 ns 0.0475 ns 61,871,831.2 26.28 0.34 - - - -

@@ -27,7 +27,7 @@ public void Setup()

for (var i = 0; i < _strings.Length; i++)
{
_segments[i] = new PathSegment(0, _strings[i].Length);
_segments[i] = new PathSegment(1, _strings[i].Length - 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be - 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to trim one from the start and one from the end but I don't suppose it matters too much as long as it's a substring.

@andrewjsaid
Copy link
Contributor Author

I've created dotnet/runtime#108732 which if accepted/merged should further improve the performance of the AlternateLookup used in this PR.

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Oct 19, 2024
@BrennanConroy
Copy link
Member

/azp run

Was waiting for the runtime PR so we could see the full perf gains. But since it's taking time to get a review we can just merge this since it's still goodness.

Copy link

Command 'run

Was' is not supported by Azure Pipelines.



Supported commands

  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@BrennanConroy
Copy link
Member

/azp run

@dotnet-policy-service dotnet-policy-service bot removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Nov 19, 2024
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@BrennanConroy BrennanConroy enabled auto-merge (squash) November 19, 2024 18:45
@BrennanConroy BrennanConroy merged commit 3cbb71c into dotnet:main Nov 19, 2024
27 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the 10.0-preview1 milestone Nov 19, 2024
@davidfowl
Copy link
Member

@BrennanConroy did we ever re-run the benchmarks?

@BrennanConroy
Copy link
Member

Not 100% apples-to-apples comparison since I'd need to run against an older .NET 10 runtime. But I ran the alternate lookup code on 9.0 vs. 10.0 p4 and 10.0 is about 2x faster than the 9.0 version.

net9.0

Method Count Mean Error StdDev Ratio RatioSD
Baseline 2 0.8147 ns 0.0112 ns 0.0105 ns 1.00 0.00
Dictionary 2 18.1174 ns 0.1606 ns 0.1503 ns 22.24 0.33
Baseline 5 0.7959 ns 0.0144 ns 0.0121 ns 1.00 0.00
Dictionary 5 20.7396 ns 0.2838 ns 0.2216 ns 26.08 0.52
Baseline 10 0.8101 ns 0.0164 ns 0.0176 ns 1.00 0.00
Dictionary 10 20.7011 ns 0.1297 ns 0.1083 ns 25.40 0.61
Baseline 25 0.8065 ns 0.0165 ns 0.0146 ns 1.00 0.00
Dictionary 25 20.7905 ns 0.3887 ns 0.3446 ns 25.79 0.72
Baseline 50 0.7988 ns 0.0063 ns 0.0059 ns 1.00 0.00
Dictionary 50 23.0723 ns 0.3425 ns 0.2674 ns 28.95 0.28
Baseline 100 0.8038 ns 0.0160 ns 0.0134 ns 1.00 0.00
Dictionary 100 26.6376 ns 0.1442 ns 0.1278 ns 33.17 0.55

net10

Method Count Mean Error StdDev Ratio RatioSD
Baseline 2 0.7341 ns 0.0063 ns 0.0059 ns 1.00 0.00
Dictionary 2 8.7388 ns 0.1193 ns 0.1116 ns 11.91 0.22
Baseline 5 0.9253 ns 0.0117 ns 0.0098 ns 1.00 0.00
Dictionary 5 9.2274 ns 0.0306 ns 0.0272 ns 9.97 0.09
Baseline 10 0.9326 ns 0.0095 ns 0.0074 ns 1.00 0.00
Dictionary 10 10.0774 ns 0.1035 ns 0.0968 ns 10.81 0.14
Baseline 25 0.9305 ns 0.0188 ns 0.0184 ns 1.00 0.00
Dictionary 25 11.5418 ns 0.0854 ns 0.0667 ns 12.35 0.25
Baseline 50 0.9207 ns 0.0041 ns 0.0040 ns 1.00 0.00
Dictionary 50 13.8788 ns 0.0925 ns 0.0866 ns 15.08 0.07
Baseline 100 0.9530 ns 0.0192 ns 0.0228 ns 1.00 0.00
Dictionary 100 18.2492 ns 0.1495 ns 0.1398 ns 19.13 0.55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants