Count vs Length vs Any - Checking Collection Emptiness
[C#, Under The Hood]
When working with collections there are several ways that you can determine if a collection is empty.
For arrays you have the following:
- Length property
- Count() extension method
- Any() extension method
For lists you have the following:
- Count property
- Count() extension method
- Any() extension method
Most of the time you use these interchangeably.
But have you ever wondered what is running under the hood?
Let’s look at some code.
We will be using a console application.
This class has a private array that is setup with 101 elements.
Then there are 3 methods that check if the array is empty in 3 different ways.
If we look at the code generated by the compiler using SharpLab we see the following:
Couple of things of interest:
- dataArray.Count() is replaced with Enumerable.Count
- dataArray.Any() is replaced with Enumerable.Any()
We can use a Stopwatch
to time the performance of each of these methods, but a much better (and more scientific) way of doing it is with the tooling provided by Benchmarkdotnet
We start by adding a reference with nuget
dotnet add package benchmarkdotnet
Then we add a couple of attributes to the code:
- SimpleJob here specifies that we want to benchmark the code using the .NET Core 3.1 runtime
- RPlotExporter generates a number of graphs of data
- GlobalSetup runs any initialization we need before running the benchmarks
- Benchmark marks the code as code to be benchmarked
Once this is done we can now run the benchmarks.
To do so, change the Main()
so that the benchmark infrastructure can initiate the benchmarks
To run benchmarks, run the application in release mode like so:
dotnet run –c Release
If all goes well you should see some text starting with the following:
As you can see it has found the 3 benchmarks and is proceeding to execute them.
It does this several times (by default it automatically selects between 15 and 100), factoring in cold starts so that the final numbers are representative of performance.
It then computes statistics such as mean (average), median and standard deviation.
Once it has completed you can scroll down the intermediate statistics to the table of results.
You can see here that the Length
property is significantly faster than Count()
or Any()
.
This is likely due to the fact that an array size must be known prior before declaring and instantiating it, therefore it is a cheap operation to get this value.
As opposed to Count()
or Any()
– the Linq extension methods that have to do a little more work.
So if you have code in a critical section of your application that is called repeatedly and performance is an issue – you know what approach to take.
We can do the same benchmarks for lists.
The code is virtually identical, except for the fact that lists do not have a Length
property – they have a Count
property instead.
Running the code through SharpLab yields the following
Almost identical.
Running the benchmarks yields the following:
A couple of interesting things:
- Determining the emptiness of an array is twice as slow as for a list
- The
Count()
extension method is much faster for a list than for an array - The
Any()
extension method is marginally faster.
Moral – use Length
for an array, and Count
for a list to get the fastest performance.
In the project folder, there is a folder named BenchmarkDotNet.Artifacts – therein you can get all the statistical data in Excel, as well as a bunch of graphs generated by R.
You can get R from here or you can install it using the package manager of your choice - chocolatey (Windows), apt / yum (Linux) or Homebrew (OSX)
If you are keen on mining the raw data, those files will be invaluable.
The code is in my github.
Happy hacking!