June 17, 2018

On 3rd Party Libraries

Good Morning,

One of those things, oftentimes preached in the coding community is code reuse. Ideally a problem should be solved exactly once. Everybody who has the same problem, should use this one implementation and contribute to it if changes are necessary.1 There are three main points backing up this idea:

The Good

  • Less development time, because large parts of an application can be made of existing code.
  • Libraries are used and worked on by many developers, so they should have good code quality and should be well tested.
  • Less maintenance, because libraries get maintained by others.

In addition, it is often recommended to concentrate on open source libraries. The reason being, that one can access, change or even fork the library if necessary. Open source is a whole different topic, I do not want to go into right now. Just keep in mind that open source is not only upsides either.

I have been a firm believer of the idea of code reuse for many years. And I do think, that you should write and package your own code in a reusable manner and create a set of high quality in-house libraries to be used in any of your applications.

When it comes to 3rd party libraries, though, I have over time found several disadvantages, which I want to bring to your attention.

The Bad

  • Do you know all those 425 libraries?Any library of decent size has a learning curve. Typically, the library is initially used in a wrong way, maybe even against its design. And it leads to bad application design, and in some bad cases even to the addition of more libraries solving the same problem. No one will want to maintain this code.
  • Libraries often get "plugged together" in a more or less random way. Using 3rd party libraries requires special attention to the architecture and the correct inclusion of those libraries into an application. As architecture is a huge issue in many applications, anyways, gluing together a bunch of libraries can only make it worse.
  • Most libraries offer way more features than we actually require. You might say: "What's so bad about having more features than we need?". Well, those features are overhead. Code we have to deliver to the customer without them needing it. They lead to a greater learning curve because we are required to understand all those features to be able to decide whether to use them or not.
  • And then there are those pesky developers; in my experience, any feature available, will be used at some point. This is especially contra productive when we believe a feature to be a bad idea. Let me give you an example:
    I am a huge friend of dependency injection. As far as .NET goes, there are various good libraries out there, LightInjectNinjectAutofac, etc. But I have rolled my own.2 Why?
    Well, I believe that dependency injection requires a very clear statement about the dependencies a class requires. And this statement is best given in the form of a constructor which enforces the provision of all necessary dependencies. I thus want to use constructor injection only. All those libraries I quoted, allow for property injection. This is nice for the developer, because less code is needed. The downside is that I can create and use an object without filling in all properties. Do you want to account, in every single method, for the possibility that your required dependencies are not there? I don't! That's why my DI-container only supports constructor injection, nothing more, nothing less.
  • We usually find (or stumble upon) some use cases the library does not account for. To solve this, we typically add to it. This is done with wrappers, custom derivatives, extension methods and so on. While this does not at first sound too bad, I have found most of these cases to lead to extremely bad code and architecture.
    The problem is, that we don't choose a library we expect to not have the features we need.3 So when we do find a missing feature, we are usually stumped and in a hurry to get it implemented. What makes it even worse is, that many libraries are not exactly built to be easily extended from the outside.4 While they work fine for what they are designed to do, they often fail when asked to do more.
    This is where the open source argument comes in. Just add the feature and contribute to the library. But does that not defeat our goals? When I use a 3rd party library, I want to rid myself of the responsibility to extend and maintain it. I don't want to understand it's code, I want it to just work. What I certainly do not want is, that I have to get my hands dirty implementing additional features into or even forking the library.

The Ugly

You ask me "What now? Should I be using 3rd party libraries or not?". My answer is a firm and ascertaining "It depends". As all things in live, there is no easy answer, no one right way. What I can offer you is this: Let us consider some questions one should ask themselves before using a 3rd party library:

  1. Do you believe you can write a maintainable, well tested, reusable piece of software that solves the problem at hand? If the answer is "No", don't even bother. Invest your time into finding the best library out there, that fits your needs.
  2. Do you know a library (from deep experience working with it) that fits the bill? If you or one of your more advanced developers know such a library, go for it.5
  3. Do you think (as opposed to "know") there is a library that would solve your problem? This is where it gets complicated. Find out everything there is about the library. Try it out, test it to the core. Try to consider all use cases you may run into (especially those you do not want to run into). And pay extra attention to the correct use of the library in your application.
  4. If none of the above are true, meaning that you cannot find a library that solves your problem, the learning curve is too steep and you get a headache thinking of all those library features, that you do not want to have in your application. Then, you should consider rolling your own. But as always, make it reusable for yourself. Implement automatic tests. Pay special attention to the interface and architecture in general. Your libraries should have an even higher code standard than your applications!

Whenever you use 3rd party libraries, you must only consider those, that are "a safe bet". Make sure the license is acceptable, that it is well maintained and actively developed by a decently sized group, that the code is of high quality and well tested. And do not look at the price.6 A library usually is a lifetime investment, as far as the application using it is concerned. Never choose the "viable alternative" over the "perfect match" because it is free. Those decision will come back to haunt you tenfold in your maintenance budget.

I hope I could give you some pointers as to the selection of 3rd party libraries. Do not make hasty decisions. Think it through, find the best answer you can, and stick with it. And I am sure you will be successful with your decision.

  1. This is, of course, not possible as there are always multiple ways to solve a complex problem, none of which are perfect. The goal still stands, though.
  2. Which you can find at https://github.com/programmersdigest/Injector
  3. Although we should! Do not expect any library to meet all your needs. Think ahead. Implement ways to add functionality before you need it. It will be worth it quicker than you may think.
  4. I'm looking at you, Microsoft, with your sealed classes.
  5. Of course you should still check it, test it and so on. But having a developer who knows your needs well, whose advice you trust, who has lots of experience with the library in question and will most certainly stay in your team for a long time to come, is a pretty good start.
  6. This argument is to be seen in the context of a company. If you are a private developer, the price may absolutely affect your decision (it does for me). But then again, the investment into a library and the risks attached to a wrong decision are not as high as they are for a company with tens or even hundreds of employees working on the application in question.
June 10, 2018

LINQ: .NET Collection Classes

Good morning,

Various collections

LINQ is one of the most compelling features of C#. Whenever we are dealing with data in lists (which is to say, almost all the time), we require methods to retrieve and manipulate this data. LINQ provides a) an easy and consistent way of working with lists, and b) a functional approach to list manipulation.

This article is the first in a series of articles on LINQ. In this series we will

  • take a look at the collection classes of .NET
  • learn about lambdas, closures, Action<> and Func<>
  • use the power of extension methods
  • explore the IEnumerable<> interface
  • dive into the various LINQ methods
  • and try some PLINQ (parallel LINQ).

Without further ado, let's dive right into the various collection classes available in .NET Framework1 and explore their uses and limitations. Since most business applications are purely data driven (how many serious programs without a database have you worked on?), knowledge on the collection classes of a language, to me, is as fundamental as knowledge on the respective languages basic data types.

Most important the collections in .NET are contained in the namespace System.Collections.Generic. Their non-generic equivalents contained in the System.Collections namespace should not be used. They only remain for compatibility reasons. Concurrent collections can be found in the System.Collections.Concurrent namespace.

Array / List<T>

An array (or a List<T>, which is basically a wrapper around the array) in C# is pretty much what you would expect an array to be in any language: a collection of items contained in a single continuous block in memory. As such, access via index is very fast (similar to pointer arithmetic in C). Finding items by attributes, however, requires a full scan of the array.

Adding items to an array is typically cheap, because C# reserves array space in blocks. Only inserting and removing items in the middle of an array is somewhat costly, since all following items have to be moved. If this is your use case, consider using a LinkedList<T> (see bellow).

HashSet<T>

As the name implies, the HashSet<T> saves a hash per each item in the collection. As opposed to an array, a HashSet<T> does not guarantee the order of items to be preserved. In addition, a specific item may only be contained once, which makes it optimal in case every item in the collection must be unique. The equality of items in the HashSet<T> is computed using the Equals() method of the items.2

The HashSet<T> shines when it comes to finding an item via instance (or rather via "thing" that is considered equal). A good example is a random list of strings. Since identical (in regards to content) strings are considered equal, converting a List<string> to HashSet<string> makes for an easy way to make every string in a random list unique.

var list = new List<string> {
    "One", "Two", "Three", "Two", "Three"
};
var set = new HashSet<string>(list); // Output: "One", "Two", "Three"

Dictionary<T, U>

The Dictionary<T, U> is a key-value-collection. Each key (of type T) maps to an item (of type U). Access via key is very efficient, which makes the Dictionary<T, U> excellent for retrieving items by means of an attribute (such as its primary key).

The keys of a Dictionary<T, U> have to be unique. Whereas the HashSet<T> quietly "overlook" duplicate entries, the Dictionary<T, U> throws an exception when a key is added twice. As with the HashSet<T>, equality of keys is checked using the Equals() method.

Whenever you find yourself iterating over a collection multiple times, to find a single item via a specific field (e.g. a person by name in a List<Person>), consider using a Dictionary<T, U>. Even temporarily creating a Dictionary<T, U> for only a few iterations may provide considerable performance enhancements. LINQs ToDictionary() method is your friend.3

ConcurrentDictionary<T, U>

All concurrent collections allow for parallel read and write access from multiple threads. Since they typically need to hold a copy of the collection per thread and copies have to be synchronized, using a concurrent collection in a single-threaded use-case is not advisable. They are, however, powerful tools in highly parallel scenarios.

The most useful concurrent collection is the ConcurrentDictionary<T, U>. It provides a thread-safe implementation of the Dictionary<T, U> and is especially useful for data caching and multi-threaded service implementations.

Even though concurrent collections are in essence thread-safe, not all operations on these collections may be thread-safe. The documentation actually denies thread-safety of explicit interface implementations, extension methods (LINQ anyone?) and methods which take delegate parameters.4 Always check the specific methods documentation, especially when using LINQ against a concurrent collection.

Other useful collections

Of course, there are a number of other useful but more specific collections:

  • Stack<T>: A first in last out (FILO) collection
  • Queue<T>: A first in first out (FIFO) collection
  • LinkedList<T>: Your typical linked list. Useful in case you often want to insert or remove items from the middle of a large collection of items.
  • SortedList<T>, SortedSet<T>, SortedDictionary<T>: Variants of List<T>, HashSet<T> and Dictionary<T, U> which order the contained items.5
  • The other collections in the System.Collections.Concurrent namespace: ConcurrentStack<T> and ConcurrentQueue<T> are concurrent implementations of Stack<T> and Queue<T> respectively. ConcurrentBag<T> is somewhat special, as it really is a bag of items: Items are unsorted and not accessible by index or instance. You stuff things into it and iterate over everything whenever you need an item (am I the only one reminded of my drawers?).
  • The various collections from the System.Collections.ObjectModel namespace: They are typically used for UI development. Especially the ObservableCollection<T>, which implements INotifyCollectionChanged and is often used for DataBindings in WPF.
  • The new and shiny ImmutableCollections from the System.Collections.Immutable NuGet package:6 The idea is simple: Collections which cannot be changed, can freely be accessed from multiple threads. There are implementations of all the typical collections, ImmutableList<T>, ImmutableDictionary<T, U>, ImmutableQueue<T>, etc. Even though these collections are not directly a part of .NET framework, I wanted to mention them because they provide an additional way to use collections over multiple threads.7
June 6, 2018

Microsoft buys GitHub

Good morning,

GitHub Octocat

There have been rumors in the net, now it is finally confirmed: Microsoft has bought GitHub1 - and the community is in panic. Many open source projects fear that Microsoft will use its influence to harm them and push their own agenda. Let's take a look at the pros and cons of the "GitHub sellout" and the involved parties goals.

Why would Microsoft buy GitHub?

I think this one is easy: Microsoft has used GitHub as a code hosting platform for many years. They moved some of their biggest projects to GitHub in 2014/20152 in preparation of closing their own platform CodePlex.3 Today Microsoft has over 1800 repositories on GitHub in their main account. In addition, Microsoft has embraced Git as their main versioning tool and has moved all of the windows source code into "the largest Git repo on the planet".4 Strategically, it makes perfect sense to me, that Microsoft would buy GitHub. This move allows them to tailor their main code hosting platform to their needs as well as the needs of their customers (= developers on the Microsoft platforms).

Why would GitHub be sold?

In 2012, Chris Wanstrath, one of the founders of GitHub, stepped down to let Tom Preston-Werner (another co-founder) take the lead. In 2014 Chris Wanstrath again became CEO, but by the end of 2017 wanted to step down again (Mid 2018, GitHub had still not found a replacement for him). By then, the company had a recurring annual revenue of $200 million and was valuated at $2 billion.5 They never earned a penny, though. If you were in this position and Microsoft would come to you, saying: "Well, this company is worth $2 billion, we'll give you $7.5 billion for it (and we have a good CEO for it, as well)", what would you do? I know I would sell!

And what about us?

Microsoft logo

So what do we, as users of GitHub, take away from that? I personally think, that Microsoft will not change GitHub in a radical way. There will be changes, don't get me wrong, and some will be great, some will be not so great. Overall, however, I think GitHub has finally become a stable company (and with that, platform). Microsoft does not need GitHub to make money, they need developers to use their products, they need a community. And that's what they try to get from this acquisition. Also, I am happy to see that there is again a good CEO leading GitHub. Not that Wanstrath was not good (he made GitHub what it is today), but he did not want to continue and there was no replacement in sight. Well, this problem is solved now.

All in all, I look forward to the development of GitHub as part of Microsoft. There are risks, of course, but there are also great opportunities for GitHub to become even better. I hope the latter will be for us to enjoy.

May 27, 2018

Facing the Challenge of Customizability

Good Morning,

in software development, we differentiate between a product and a project. The distinction goes as follows:1

  • A product is developed for the market. The software company invests money into the development of a product with the goal to sell it to a large number of customers. Investing in this sense means advancing money, which may or may not reap returns when the product is finished. A product therefore offers good rewards (write once, sell many times), but also carries a sizable risk (the market doesn't care).
  • A project is developed for a single client according to their needs. Typically the development firm will still need to advance the money required for the project. The risk however is much lower, because there is a customer who will pay the full development cost of the software. The returns on the other hand are lower, too, because the project software cannot be sold to other customers.

Many software companies have found their happy place doing both: They provide standard products for the market and projects to tailor their standard products to specific customer requirements. This tailoring is typically called customizing - and it is one of the biggest challenges to be faced in designing a product.

In my years as software developer, I have encountered various products which have been heavily customized, and most of them had at least one of the following problems:

  • Everything is private
    This can be one of the most frustrating things to encounter as developer tasked with customizing a product. The product defines a few "entry points", which however do not give you the freedom you need. At every turn you have to ask the product development team to give you an additional entry point or to make yet another method publicly available. This can easily make a two day project an odyssey of multiple weeks.
  • Everything is public
    While this is the exact opposite of the previous bullet point, it is exactly as bad for the company. If a method is public, someone will use it (and forget that they did). A public interface can never be changed without worry that some customizing will break. And if the customizing was important, what seemed to be a small change in the product can quickly become a political nightmare. This leads to a major deadlock: if everything is public, nothing can be changed. Product development grinds to a hold.
  • The product architecture is a mess
    A messy architecture is bad enough in product development alone. At some point, changes to a bad code base become ever harder, the risk of breaking existing functionality becomes ever greater. How is one supposed to implement complex customer specific features into such a code base without breaking everything? And how to not make the customizing even messier than the already bad code base? Even worse, customers sometimes request a customizing added on top of an existing customizing. That's like a bad meal with awe-full sauce and an even worse spicing.
  • An extraordinary sales department
    Some salesmen can sell just about anything - and they do it, too.2 Even though the developers hair stands on end because he can barely make the requirements work. This will often lead to ugly workarounds with strange side-effects.
  • Putting the "product"-badge on customizings
    Depending on the agreements with the client, a software company may be allowed to include customizings as part of their product. This is great if the customizing is taken as a feature that gets thoughtfully embedded into the product. If customizings just get "slapped" into the existing code one after the other, however, the product more than likely becomes a mess after a while. The reason is simple: A customizing does not usually have to take all the products features into account. It is very clear which features the customer needs and uses. To keep costs for the customizing down, only the necessary changes are made. Why would you make the customizing work with feature x if the customer does not intend to ever use said feature? Putting this customizing into the products code base however leads to a problem: A different customer may very well use feature x, which is now incompatible.

Many of these challenges can only be met with some strong self-control and good foresight. There are, however, various ways to takle these issues:

  • During development of a new product, it must be clear whether the software should be customizable or not. If in doubt, go with yes.3
  • Any software should have a clear architecture. This is especially true if others should customize the software. I personally believe that the exact architecture is not as important as the consistency of said architecture. Once a new team member (in product or project development) has a grasp of one part of the application, they should be able to apply that knowledge to all other areas of the software.

  • Consider the implementation of a plugin system. Plugins are a powerful way to allow for customizability. In the extreme case that all product functionality is bundled in multiple layers of plugins, everything can be changed for the customer (the architecture does become harder to understand, though).
  • Use automated (regression) testing. Not only does automatic testing help in the development of the product or the specific customizing. The real power becomes apparent when automatic regression testing can be applied to the product and all customizings at once. Whenever product development has (again) broken the interface, automatic tests of customizings catch the error before it gets rolled out to the customer. This makes both the product developer as well as the project developer sleep easier at night, because they can be pretty sure their stuff still works the next day.
  • During customizing, think about product features which will not be compatible with the customizing - and document this fact4. If someone later moves the customizing into the product, incompatible features have a greater chance to become apparent.
  • Do not make everything public/private. Every part of the application has to be examined in regards to the necessity of it being customizable. Everything that needs to be customized must have a clear public interface (an API if you will). This interface must be created in such a way, that it a) provides all required functionality, b) can be extended whenever we forgot something in a).5 If product development is in doubt which parts of the interface have to be public, wait for the first customizings to come around. In my experience it is better to make too many things private than to make too many things public. There has to be an easy ways for project developers to get their public interfaces done when they need them, however. This leads me to the next and most important point:
  • Talk to your colleagues (coffee makers are a great place for that). In my experience there is oftentimes a barrier6 between product and project development. Especially if the two are separate departments. However, only the customizing department knows what they really need.

In my opinion, the challenge of customizability is a big one. Customizing is one of the biggest sources of income for many software companies, but there is no one right way to make it work. Still, I hope I have offered you some food for thought to enhance the work of your product development and project development alike.

  1. This is a simplified distinction, of course. As per usual, reality provides many gradients in between products and projects. The software company could, for example, sell a customizing at a discount but in turn get the clients allowance to sell it to other customers as well.
  2. Don't take this one too seriously. As is usually the case, neither is the salesmen at fault, nor the development team. There are customizings, however, which better had not been done. And some better communication between departments could works wonders to prevent such cases.
  3. Allowed answers are: yes and maybe.
  4. This documentation could even be a test in the form of: If feature x is active, break the test with an appropriate message.
  5. In my mind, this is one of the hardest nuts to crack. One good way to tackle this problem is the combination of a plugin system and dependency injection. Plugins get their dependencies injected, which in turn have a public interface that can easily be extended. This topic can fill books, however, and can not be fully discussed here.
  6. The size of this barrier ranges from garden fence to Great Chinese Wall. Only open communication can lead to good software, especially in the early stages of a products market adoption.
May 21, 2018

Think About Your Future

Good morning,

we are a lucky bunch, you know that? In 2018, software developers in Germany earn an average of 52,000 €.1 By comparison, the average salary over all employees in Germany is 41,000 €.2 Looking at various professions, IT is considered one of the most lucrative job markets.3 In addition, management positions in IT add an above average amount to the already good salary.4

I just got my pension award and I was a bit baffled. Inflation as well as raises in salary and pensions removed, my net pension would only be 1/2 of my current net income. And I am by no means worse off than others.5 That is a steep drop in quality of live!

And here is where I consider us lucky: As a software developers (with more than average salaries), we can prepare for the future, set aside some money to support our quality of live in our pension. That is what I urge you to do!

I do not want to make any suggestions on the ways you want to invest money to add to your pension. There are experts for that and I am not one of them. I want to point out one more thing, however: the earlier you start, the better - think about compounded interest.