June 10, 2018

LINQ: .NET Collection Classes

Good morning,

Various collections

LINQ is one of the most compelling features of C#. Whenever we are dealing with data in lists (which is to say, almost all the time), we require methods to retrieve and manipulate this data. LINQ provides a) an easy and consistent way of working with lists, and b) a functional approach to list manipulation.

This article is the first in a series of articles on LINQ. In this series we will

  • take a look at the collection classes of .NET
  • learn about lambdas, closures, Action<> and Func<>
  • use the power of extension methods
  • explore the IEnumerable<> interface
  • dive into the various LINQ methods
  • and try some PLINQ (parallel LINQ).

Without further ado, let's dive right into the various collection classes available in .NET Framework1 and explore their uses and limitations. Since most business applications are purely data driven (how many serious programs without a database have you worked on?), knowledge on the collection classes of a language, to me, is as fundamental as knowledge on the respective languages basic data types.

Most important the collections in .NET are contained in the namespace System.Collections.Generic. Their non-generic equivalents contained in the System.Collections namespace should not be used. They only remain for compatibility reasons. Concurrent collections can be found in the System.Collections.Concurrent namespace.

Array / List<T>

An array (or a List<T>, which is basically a wrapper around the array) in C# is pretty much what you would expect an array to be in any language: a collection of items contained in a single continuous block in memory. As such, access via index is very fast (similar to pointer arithmetic in C). Finding items by attributes, however, requires a full scan of the array.

Adding items to an array is typically cheap, because C# reserves array space in blocks. Only inserting and removing items in the middle of an array is somewhat costly, since all following items have to be moved. If this is your use case, consider using a LinkedList<T> (see bellow).

HashSet<T>

As the name implies, the HashSet<T> saves a hash per each item in the collection. As opposed to an array, a HashSet<T> does not guarantee the order of items to be preserved. In addition, a specific item may only be contained once, which makes it optimal in case every item in the collection must be unique. The equality of items in the HashSet<T> is computed using the Equals() method of the items.2

The HashSet<T> shines when it comes to finding an item via instance (or rather via "thing" that is considered equal). A good example is a random list of strings. Since identical (in regards to content) strings are considered equal, converting a List<string> to HashSet<string> makes for an easy way to make every string in a random list unique.

var list = new List<string> {
    "One", "Two", "Three", "Two", "Three"
};
var set = new HashSet<string>(list); // Output: "One", "Two", "Three"

Dictionary<T, U>

The Dictionary<T, U> is a key-value-collection. Each key (of type T) maps to an item (of type U). Access via key is very efficient, which makes the Dictionary<T, U> excellent for retrieving items by means of an attribute (such as its primary key).

The keys of a Dictionary<T, U> have to be unique. Whereas the HashSet<T> quietly "overlook" duplicate entries, the Dictionary<T, U> throws an exception when a key is added twice. As with the HashSet<T>, equality of keys is checked using the Equals() method.

Whenever you find yourself iterating over a collection multiple times, to find a single item via a specific field (e.g. a person by name in a List<Person>), consider using a Dictionary<T, U>. Even temporarily creating a Dictionary<T, U> for only a few iterations may provide considerable performance enhancements. LINQs ToDictionary() method is your friend.3

ConcurrentDictionary<T, U>

All concurrent collections allow for parallel read and write access from multiple threads. Since they typically need to hold a copy of the collection per thread and copies have to be synchronized, using a concurrent collection in a single-threaded use-case is not advisable. They are, however, powerful tools in highly parallel scenarios.

The most useful concurrent collection is the ConcurrentDictionary<T, U>. It provides a thread-safe implementation of the Dictionary<T, U> and is especially useful for data caching and multi-threaded service implementations.

Even though concurrent collections are in essence thread-safe, not all operations on these collections may be thread-safe. The documentation actually denies thread-safety of explicit interface implementations, extension methods (LINQ anyone?) and methods which take delegate parameters.4 Always check the specific methods documentation, especially when using LINQ against a concurrent collection.

Other useful collections

Of course, there are a number of other useful but more specific collections:

  • Stack<T>: A first in last out (FILO) collection
  • Queue<T>: A first in first out (FIFO) collection
  • LinkedList<T>: Your typical linked list. Useful in case you often want to insert or remove items from the middle of a large collection of items.
  • SortedList<T>, SortedSet<T>, SortedDictionary<T>: Variants of List<T>, HashSet<T> and Dictionary<T, U> which order the contained items.5
  • The other collections in the System.Collections.Concurrent namespace: ConcurrentStack<T> and ConcurrentQueue<T> are concurrent implementations of Stack<T> and Queue<T> respectively. ConcurrentBag<T> is somewhat special, as it really is a bag of items: Items are unsorted and not accessible by index or instance. You stuff things into it and iterate over everything whenever you need an item (am I the only one reminded of my drawers?).
  • The various collections from the System.Collections.ObjectModel namespace: They are typically used for UI development. Especially the ObservableCollection<T>, which implements INotifyCollectionChanged and is often used for DataBindings in WPF.
  • The new and shiny ImmutableCollections from the System.Collections.Immutable NuGet package:6 The idea is simple: Collections which cannot be changed, can freely be accessed from multiple threads. There are implementations of all the typical collections, ImmutableList<T>, ImmutableDictionary<T, U>, ImmutableQueue<T>, etc. Even though these collections are not directly a part of .NET framework, I wanted to mention them because they provide an additional way to use collections over multiple threads.7