I initially starting using LINQ as it was easy to order the objects in a list without having to write a Comparer. Just write your lambda expression and BOOM!, list sorted.
I want to take this thought a step further, and as implied by the post title, do a group by.
Starting, here is an order by % 2 giving us a list of even and then odd numbers:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var orderedNumbers = from n in numbers
orderby n % 2 == 0 descending
select n;
foreach (var g in orderedNumbers)
{
Console.Write("{0},", g);
}
This is all pretty straight forward, order by numbers that when modded by 2 are 0 and we have the numbers 4,8,6,2,0,5,1,3,9,7.
But what if I want to simply have two lists, one with evens and one with odds? That’s where group by comes in.
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var numberGroups = from n in numbers
group n by n % 2 into g
select new { Remainder = g.Key, Numbers = g };
foreach (var g in numberGroups)
{
if(g.Remainder.Equals(0))
Console.WriteLine("Even Numbers:", g.Remainder);
else
Console.WriteLine("Odd Numbers:", g.Remainder);
foreach (var n in g.Numbers)
{
Console.WriteLine(n);
}
}
with the output:
Odd Numbers:
5
1
3
9
7
Even Numbers:
4
8
6
2
0
What’s happening here is that LINQ is using anonymous types to create new dictionary (actually a System.Linq.Enumerable.WhereSelectEnumerableIterator<System.Linq.IGrouping<int, int>>).
It is important to note here that the key here that everything is keyed on is the first value after the “by”.
Taking this one simple step forward let’s group a bunch of words. The following doesn’t work quite right:
string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };
var wordGroups = from w in words
group w by w[0] into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
foreach (var g in wordGroups)
{
Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
foreach (var w in g.Words)
{
Console.WriteLine(w);
}
}
giving us the output:
Words that start with the letter 'b':
blueberry
Words that start with the letter 'c':
Chimpanzee
Words that start with the letter 'a':
abacus
apple
Words that start with the letter 'b':
Banana
Words that start with the letter 'c':
cheese
That’s because there is a bit of a red herring here. Remember that the first value after the by is what is used to group by. In our case w[0] for Chimpanzee is “C”, not c. If we change it to:
string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };
var wordGroups = from w in words
group w by w[0].ToString().ToLower() into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
foreach (var g in wordGroups)
{
Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
foreach (var w in g.Words)
{
Console.WriteLine(w);
}
}
then we get the results we expect with:
Words that start with the letter 'b':
blueberry
Banana
Words that start with the letter 'c':
Chimpanzee
cheese
Words that start with the letter 'a':
abacus
apple
Taking this even one step further we can throw an orderby above the group and order things alphabetically:
var wordGroups = from w in words
orderby w[0].ToString().ToLower()
group w by w[0].ToString().ToLower() into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
So let’s now make this a bit over the top complex. Given the classes:
public class Customer
{
public List<Order> Orders { get; set; }
}
public class Order
{
public DateTime Date { get; set; }
public int Total { get; set; }
}
lets group a customer list by customer, then by year, then by month:
List<Customer> customers = GetCustomerList();
var customerOrderGroups = from c in customers
select
new {c.CompanyName,
YearGroups = from o in c.Orders
group o by o.OrderDate.Year into yg
select
new {Year = yg.Key,
MonthGroups = from o in yg
group o by o.OrderDate.Month into mg
select new { Month = mg.Key, Orders = mg }
}
};
Whew! that took a lot to copy and paste from MSDN’s sample library!
As mentioned previously the important part here is that the keys for these are the first value after the “by”. This just creates a bunch of dictionarys keyed embeded together keyed on the values after the “by”.
The GroupBy method that is a part of Linq can also take an IEqualityComparer. Given the comparer:
public class AnagramEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return getCanonicalString(x) == getCanonicalString(y);
}
public int GetHashCode(string obj)
{
return getCanonicalString(obj).GetHashCode();
}
private string getCanonicalString(string word)
{
char[] wordChars = word.ToCharArray();
Array.Sort<char>(wordChars);
return new string(wordChars);
}
}
we can find all the matching anagrams. This is possible because the IEqualityComparer compares words based on a sorted array of characters. If you take “meat” and “team” they both become “aemt” when sorted by their characters.
string[] anagrams = { "from", "salt", "earn", "last", "near", "form" };
var orderGroups = anagrams.GroupBy(
w => w.Trim(),
a => a.ToUpper(),
new AnagramEqualityComparer()
);
foreach (var group in orderGroups)
{
Console.WriteLine("For the word \"{0}\" we found matches to:", group.Key);
foreach (var word in group)
{
Console.WriteLine(word);
}
}
Like the inline Linq, here the first value is the key and the second value is what to put into the list. The last value is the IEqualityComparer I mentioned earler. We don’t get double entries since “last” will match “salt” and there is no reason, therefore, to add a new key.
From MSDN:
The checked keyword is used to explicitly enable overflow-checking for integral-type arithmetic operations and conversions.
By default, an expression that contains only constant values causes a compiler error if the expression produces a value that is outside the range of the destination type. If the expression contains one or more non-constant values, the compiler does not detect the overflow.
The unchecked keyword is used to suppress overflow-checking for integral-type arithmetic operations and conversions.
In an unchecked context, if an expression produces a value that is outside the range of the destination type, the overflow is not flagged.
Okay, so what the hell does this mean?
The C# compiler checks at compile time for overflow exceptions.
If you do:
int tooBig = 2147483647 + 10;
you get the error “The operation overflows at compile time in checked mode.” The problem arises when you use variables. Since a variable could be anything the compiler doesn’t check for overflows. If you do:
int ten = 10;
int tooBig = 2147483647 + ten;
your code will compile but the value of tooBig will be -2147483639. This is because default behavior of the runtime environment is to skip checking for overflows and your values just wrap.
So, what do you care?
Most likely you don’t. I’ll be honest. Chances are you’re not writing algoritms so important that if you overflow the world is going to come crashing down about you. Every once in awhile you do, however, come across some peace of critical software that may need to handle overflow exceptions. That is where the checked and unchecked operations come into play.
The problem is that checking every math operation for an overflow is slow. Okay, maybe slow is a bit much. It is, however, a couple of microseconds that you could be using on other things and it’s really slow if an overflow exception is thrown and you have to handle it. In math heavy operations and algorithms these few microseconds could add up to a lot of time so be sure that your code is critical enough that an overflow must be handled.
So, how do we fix this? We use the checked operation (in either the block or expression form).
//block
checked{
int twenty = 10 + 10;
}
//expression
int thirty = checked(10 + 20);
This tells the compiler to check the numbers and if there is an overflow throw an overflow exception.
The opposite of the checked operator is unchecked:
//block
unchecked{
int twenty = 10 + 10;
}
//expression
int thirty = unchecked(10 + 20);
This works just like the default behavior of not checking for overflows with variables. Since this is done automatically you probably won’t use this too often unless you start using the checked operator. The nice thing is that you can combine both checked and unchecked for the appropriate values.
unchecked
{
int ten = 10;
int tooBig = checked(2147483647 + ten);
Console.WriteLine(i3);
}
yields an overflow exception being thrown since 2147483647 + ten overflows. Conversely,
checked
{
int ten = 10;
int tooBig = unchecked(2147483647 + ten);
Console.WriteLine(i3);
}
results in a value of -2147483639 since the unchecked means the runtime environment to not check for overflow.
For some reason, though I seem to grok most of WPF, data binding often eludes me and I have to refer to sources like the cheat sheet to figure out what I’m doing.
I’ve debated before the usefulness of the | and & operators. The use of the | operator depends on the context in which it is used.
In the case of bools a boolean operation is performed. In the case of ints a bitwise OR is performed.
For boolean operations when the | operator is used both sides of the operator are evaluated. In the example below both VerifyStatus and SetDefaults are called regardless of the boolean returned by each of them.
public void TestTransaction(PaymentTransaction Transaction)
{
if (Transaction != null &&
(VerifyStatus(Transaction.State) | SetDefaults(Transaction)))
{
//do something with the code
}
}
private bool VerifyStatus(ItemState State)
{
if (State == null)
return false;
if (State.Status == Status.NotVerified)
ValidateStatus(State);
if (State.Status == Status.Valid)
return true;
//Status.Invalid
return false;
}
private void ValidateStatus(ItemState State)
{
//validate the status
}
private bool SetDefaults(PaymentTransaction Transaction)
{
if (Transaction == null)
return false;
//no need to set the defaults if the state is already valid
if (Transaction.State.Status == Status.Valid)
return true;
//set all the default values for the transaction
return true;
}
Ok, I know it’s not a pillar of phenomenal code. But you get the point, both of the sides of the | are run. Now, what if you’ve all ready evaluated the status earlier in code? No need to do it again. But you may need to still set the defaults. Well, that is where the |= comes into play.
bool transStatus = VerifyStatus(Transaction.State);
//do something with the code
transStatus |= SetDefaults(Transaction);
if(transStatus)
{
//do something with the code
}
In the case of an int a bitwise operation is performed.
int a = 0x0c;
a = a | 0x06;
//a results in 0x0000000e
//1100
//0110 results in
//1110 which is e
Like the logical operation example above the | and assignment can be collapsed down to:
int a = 0x0c;
a |= 0x06;
I think it’s very important to understand the difference between | and || as well as & and &&. Recently I came across a bunch of code where the original developer used | and || interchangeably. They are not the same. Using | in place of || overrides the fail fast nature of boolean operations most modern languages support. They are not interchangeable. If Transaction was null the following code would not work
simply because both sides of the operation are evaluated so Transaction.State would throw a null pointer exception. Not only that but your code will run slower.
Next in this series will be checked and unchecked.
We all know O(n) = O(2n), right? I mean, we’re talking about how algorithms scale here. Linear is linear. I add 100 more elements to my dataset and the time to finish what ever I was doing scales linearly.
Now that’s all fine and good but think a bit when working with code. Just because O(n) = O(2n) doesn’t mean that you should just throw loops around willy-nilly when with a bit of smarter design you can do O(n) instead of O(2n).
In the real world when iterating over datasets (especially potientially large datasets in my case) there is a big difference between O(n) and O(2n).
For example:
foreach(MyObjectType myObject in myList)
{
//do something
}
foreach(MyObjectType myObject in myList)
{
//do something
}
I know this may be a bit of an oversimplification but lately I’ve been looking at a lot of code that basically works out to the above. It’s really more like:
MyList stuffToProcessNext = new MyList();
foreach(MyObjectType myObject in myList)
{
//do a lot of stuff here with a bunch more loops
//rather then write some recursion to another method to handle children
//and siblings just add to stuffToProcessNext
stuffToProcessNext(myObject);
}
//now that MyObjects are processed and I don't have to worry about the
//siblings I can finish processing each of the children to MyObjects
foreach(MyObjectType myObject in myList)
{
//do a lot of stuff here with a bunch more loops
foreach(MyObjectType childObject in myObject.Children)
{
//do something relating to the siblings of myObject
}
}
But don’t just look at your own code, understand how the underlying framework code works.
For example:
//assume myList is a List
//and myObject is MyObjectType
if(myList.Contains(myObject))
myList.Remove(myObject);
No need to call Contains and force the framework to iterate over the list twice. Just remove it. No worries if it isn’t there.
On a side note, now that the holidays have calmed down I’ve been working on a series of “Often Unused Operators” like ~ and |=. Credit goes to Jeff Clark for turning me onto this. Hopefully I’ll have the first one up on Friday.
It seems odd to me that I’ve never spoken on extension methods considering the broad range of subjects I’ve covered. So here I am writing on them.
So what are extension methods? They are methods you can tack on to existing objects to extend their functionality without having to extend them.
“Brian, if you want new functionality to an object why not just inherit the object and write the methods yourself?”
“Well Ivan, what if you don’t want to? I don’t mean to be flippant but if I have string and want to add a ‘ValidateEmail’ method should I create a new class that inherits from string and then provide the ‘ValidateEmail’ class?”
“No, you simply create a static class that takes a string and returns a boolean.”
“But Ivan, that is what extension methods do while providing an interface via the VS IDE. It’s cleaner and easier to use.”
“Oh, well, um, nothing to see here”
(wow, I need to work on some better inner dialogue )
The reason I titled this post ‘Extension Methods Pratically Applied’ is that I wanted to put out there how I tend to use extension methods. First of all when working on development of a solution that spans multiple projects we tend to have a project named “Common”. This contains custom controls used in different projects of the solution, images and other wpf resources used in different projects and finally extension methods. Each of these exist in their own directory. If they aren’t then things tend to get awfully messy.
So let’s throw in some code for two different sets of extention methods:
namespace ProjectName.Common
{
public static class DoubleExtensions
{
public static string NullToEmpty(this double? d)
{
return d == null ? "" : d.ToString();
}
public static double NullToZero(this double? d)
{
return d == null ? 0 : (double)d;
}
}
}
and
namespace ProjectName.Common
{
public static class StringExtensions
{
public static string NullToEmpty(this string s)
{
return s == null ? string.Empty : s;
}
public static string EmptyToNull(this string s)
{
return (string.IsNullOrEmpty(s)) ? null : s;
}
public static double? ToNullableDouble(this string s)
{
try { return double.Parse(s); }
catch { return null; }
}
}
}
First, extension methods have to be in a static class. Second, I would highly recommend naming your extension method class “ObjectExtensions” where “Object” is the type of object the extension method is for. When other programmers come along it’s nice if they know where to put in their own extension methods. Hopefully this will also help cut down on any redundancy. Third is the “this” keyword. That is what tells the framework what object the the extension method is for.
By far the most common usage for me of extension methods is within forms. The DAO generator we use is .netTiers. For nullable fields, like Height for a person in the example I’m about to show, these are presented as nullable primitives. When working with forms to show data that presents a problem. The following code does not work:
public void LoadForm(Person TargetFriendly)
{
this.txtPersonHeight.Text = TargetFriendly.Height;
}
public void SaveForm(Person TargetFriendly)
{
TargetFriendly.Height = this.txtPersonHeight.Text;
}
and gives me the errors “Cannot implicitly convert type ‘double?’ to ‘string’” and “Cannot implicitly convert type ‘string’ to ‘double?’”.
This is where the extension methods come in handy. At the top of our form for loading and saving person info we add ‘using ProjectName.Common’. Then we can do:
public void LoadForm(Person TargetFriendly)
{
this.txtPersonHeight.Text = TargetFriendly.Height.NullToEmpty();
}
public void SaveForm(Person TargetFriendly)
{
TargetFriendly.Height = this.txtPersonHeight.Text.ToNullableDouble();
}
Another great thing about this that intellisense fully supports extension methods. For my ‘ToNullableDouble’ it looks like:
where the method has the blue down arrow. What is also here that my screen capture software doesn’t show is the tool tip that pops up that says, “(extension) double? string.ToNullableDouble()”
I would encourage you to use extension methods as they can make life a lot easier. Just make sure you put them where they can easily be found and name them appropriately.
Hopefully we all know string is immutable. When you type:
string x = "";
x = "asdf";
what you are really doing is creating two different strings, not changing the value, per se, of x but actually creating a new x.
That is why in heavy string opperations it is recommended you use StringBuilder like:
StringBuilder sbName = new StringBuilder();
sbName.Append("asdf");
since appending to sbName doesn’t create a new string but continues to add to the string value of sbName.
So why do you care? Well, maybe you don’t. I’ve started to move into a more multi-threaded world where the state of objects and their properties could potientially be invalid if an external thread changes the value of a property. I’ve been reading Effective C# and More Effective C# and enjoying them. They are books that target specific development issues and paradigms that help you become a better C# developer. The books are written by Bill Wagner who is a regular blogger I follow.
Even though Effective C# came out in the .NET 2.0 days alot of the book is still relevant, however, quite a few of the code samples could be updated.
Item 7 in Effective C# says “Prefer Immutable Atomic Value Types”. Now to paraphrase the chapter to an extreme it basically says, “Immutable code is easier to maintain”. I would add an addendum that whether you like it or not there is a good chance that code you write will be used in a multi-threaded environment and immutable will matter.
Now obviously to blindly say all classes must be like this is absurd. As Bill’s Item 7 says, however, “Prefer Immutable Atomic Value Types”.
In .NET 3.0 properties made things a bit easier in general. Prior to 3.0 for a property you would have to do:
private int _myInt;
public int MyInt
{
get { return _myInt; }
set { _myInt = value; }
}
In .NET 3.0 you don’t have to define the private. When the code is compiled it will take care of that for you. So the above code becomes:
public int MyInt { get; set; }
Now say you have to class:
public class Customer
{
public string FirstName { get; set; }
public string LastName { get; set; }
public Address Address { get; set; }
public Customer(){}
}
If you were working in a multi-threaded environment you might have a problem if this happens:
public void HandleCustomerData(Customer MyCustomer)
{
MyCustomer.FirstName = "George";
//use myCustomer to do a bunch of stuff
MyCustomer.FirstName = "Joe";
//use myCustomer to do a bunch of stuff
}
This could cause a big problem if MyCustomer is getting passed around alot in different threads. Now I know this is rather contrived but it is still a real issue.
Properties an additional feature here in 3.0 that makes things easier for creating an immutable. Imagine you have to class:
public class Customer
{
public string FirstName { get; private set; }
public string LastName { get; private set; }
public Address Address { get; private set; }
public Customer(string FirstName, string LastName, Address Address)
{
this.FirstName = FirstName;
this.LastName = LastName;
this.Address = Address;
}
}
You can see here that I’m using the private keyword on the set method. As it implies this forces the set methods of the properties to private so they can only be changed internally in the class. I would recommend reading “Effective C#” for a better explanation of why to use immutable values but by using the private in a property this becomes easier.
So there it is, prefer immutable atomic values in your classes and use the private keyword to help you.
Later,
Brian
Update:
The more I read this the more I think I over-simplified Bill’s reason for prefering immutable atomic value types. Just get the book and read it. It’s a fairly small book but every nugget has value.
you can iterate over the whole 3D array here in a single foreach:
foreach (int i in myInts)
{
Debug.Write(i.ToString());
}
The resulting output is: 012345012345
Later,
Brian
UPDATE:
If you want to use foreach to access arrays of ints in a multidimensional array you have to declare the array as if you were declaring a jagged array. The flaw with this is that you will be unable to use foreach to access all elements of the array directly but must instead access the arrays to access the elements. Basically int[][] != int[,] as the first one is an array of arrays and the second one is a multidimensional array. I suppose it makes since if you think of arrays in C# terms as objects instead of C/C++ as memory locations.
int[][] myInts = new int[][]{ new int[]{ 0, 1, 2 }, new int[]{ 3, 4, 5, 6 } };
foreach(int[] childArray in myInts)
{
foreach(int child in childArray)
{
Debug.Write(child.ToString());
}
}
//no longer works
//foreach (int i in myInts)
//{
// Debug.Write(i.ToString());
//}