Tech Vice

Ramblings of a C# Engineer

Windows Azure Best Practices

Windows Azure best practices can be found all over. I want to make sure that I hit all the same great points that have been made in other blogs in place that I can find quickly (out of a personal desire). Certainly another page describing the practices never hinders engineers from stumbling upon these.

When interacting with the blob and table storage, there are several considerations to take in.
  1. The default connection limit is incredibly bad. It has a default value of two (2). This means only 2 sockets can be open at any given domain.
  2. The Nagle algorithm is great for messages that are greater that 14 kilobytes but are detrimental for messages smaller than this limit.
  3. Another recommendation is to disable the expect 100 Continue response for requests you expect to succeed.
  4. Increase ThreadPool minimum number of threads if using synchronous code with Async Tasks
Include this set of code when initializing your connection to Azure (each line corresponds to the numbered item above):

ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = 100;
ThreadPool.SetMinThreads(100,100); //(Determine the right number for your application)

These are a good bare minimum for configuring your application to communicate with the Windows Azure platform. Microsoft keeps a great document that details this information and much more. If you are working with the Azure platform, it would be hugely beneficial to you to check it out. Since Microsoft now has that documentation, it has become a key document to review anytime I setup a new project on Azure. Since it is maintained by Microsoft, it will always be up to date with the latest information. No more wondering about blog posts from years ago.

Project Rider: A C# IDE

On January 13, Jetbrains announced a new IDE for the C# language code named Project Rider. As an exclusive Visual Studio user, I wondered what on earth I would need a new IDE for. It took me all of 5 minutes for it to click.

What it comes down to are choices. There have been a few attempts over the years to create an IDE either directly for C# or that can handle C# but nothing has ever really panned out. While this project is just in it's infancy, we can expect that this will be a great tool for us in the coming year. It won't be able to match the powerhouse that is Visual Studio. However, many developers don't need to have a powerhouse. They need to be able to be swift and nimble. Have you tried loading up multiple copies of Visual Studio without a sizable chunk of RAM? It usually isn't a pretty sight.

When the makers of Resharper, one of the most coveted C# tools available, create their own IDE, I would hope that all C# engineers would notice. Even if you decide that the tool set isn't ready for you to use. It could be a tool in the future that you may prefer over Visual Studio. I plan to keep an keen eye on this project for many years to come.

Chasing down poor Azure Table Storage performance

A coworker and I have been puzzled for the last week or so. We have been seeing incredibly slow table storage performance. Here is one problem that we faced:

public class OurEntity : TableEntity
{
 public string PrimaryId
 {
  get { return PartitionKey; }
  set { PartitionKey = value; }
 }

 public string SecondaryId
 {
  get { return RowKey; }
  set { RowKey = value; }
 }

        ...
}

We created a class that inherited from TableEntity and had more qualified names for our Partition and Row Keys. We weren't aware of any implications when taking this approach.

We were then going on to use these properties in our IQueryable searches:


_entityRepository.FindAsync(x => x.PrimaryId == "ThatOneId" && x.SecondaryId == "ThatOtherId"); 

And boy were we surprised at the results. We loaded data into a test environment and fired off our integration tests. You can imagine the look on my face when I saw that one query took 20 minutes to complete (¯\_(°_°)_/¯ seems about right). My coworker realized when looking through table storage that our properties which were just supposed to be PartitionKey and RowKey were actually stored in table storage under their own slot.

So if you find yourself creating new properties to give PartitionKey and RowKey more qualifying names, make sure you use this:

public class OurEntity : TableEntity
{
 [IgnoreProperty]
 public string PrimaryId
 {
  get { return PartitionKey; }
  set { PartitionKey = value; }
 }

 [IgnoreProperty]
 public string SecondaryId
 {
  get { return RowKey; }
  set { RowKey = value; }
 }

        ...
}
_entityRepository.FindAsync(x => x.PartitionKey == "ThatOneId" && x.RowKey == "ThatOtherId");

Trust me, it will save you plenty of headaches.

Update:  

Okay, but why? 

I didn't get a chance to dive down into what was happening when I originally put this post up. My coworker came across this problem while reviewing the data that was in table storage while using Visual Studio. If you head to your Server Explorer and connect to your Azure Table Storage, you can view the rows that are currently available within table storage. While reviewing this data, he noticed two columns in the data called "PrimaryId" and "SecondaryId." At first glance, it didn't seem to be much of an issue. However, as we began chasing down this slowness, it dawned on him that this may be the root of our issues.  

Having the data duplicated in table storage isn't a problem. However, when you are making queries believing you are querying the PartitionKey, that is when this problem emerges. Our queries were against these PrimaryId properties. That means it wasn't using the PartitionKey to query the storage. That's right! Full table searches all around!  

I don't anticipate many individuals running into this problem but it was enough of a problem for us that I had to document it.

Coding Remotely



I've never worked as a remote employee. I don't know what it means to have to work from thousands of miles from your peers. But I have been thinking more about this aspect in my daily work. My wife and I are planning on a move to Arizona in 2016. As part of that, I'm keeping all possibilities open. These are the potentials as I see them:

  • Stay on with a Utah based company, working remote from my home.
  • Find a company that is very open to remote employees and work from home.
  • Find a company in the Phoenix area and work at headquarters full-time.
  • Find a company in the Phoenix area and spend 3-4 days working at headquarters and 1-2 days working from home.
As a self-service, I've recorded this post so I can update and keep track of my thoughts. If anyone does read this post, I wouldn't mind any feedback on your own experiences or the experiences of others as you have seen/heard about them.

The Good

Minimized Distractions
Or better distractions?

Even though I have a young child at home, the number of distractions I get at the office is more than what I would get at home. Especially in an open environment, I have to put on headphones just to drown out the noise of others on an hourly basis.

Even having a daughter give me small breaks means that I have more chances to interact with her during the day. Sometimes a few hours in the morning and evenings just aren't enough.

Enjoying Life
Life still occurs from 9-5

Having the ability to stop working for just 30 minutes after my child gets home from school would be an important aspect to me. It doesn’t detract from my overall productivity to take time for my child and allows me to have an important impact on them. It’s hard to put a price on being involved in the life you live at any point during the day.

No Commute
Does rolling out of bed count?

Although I enjoy a good Sunday drive, rush hour is frustrating. It’s not a burden that makes me want to live within walking distance to work. In the past, when I have had to commute an hour to and from work, I didn't come home in a better mood that I do today with a 10 minute commute.

Work for an Amazing Company
Startup, Corporate. All opportunities are available.

We can't all live in Palo Alto and I know my wife would rather pay Silicon Valley rent prices. My family is looking to settle down but that may not be in the same location as a company I'd like to work for. As long as a company is open to remote employees, it's a win-win. I get to work for a company I'm passionate about and they get the talent they are searching for.

Saving Money
Your desktop is already going to be on. Might as well use it.

Surprise! Not driving 30-60 minutes each way is going to save you. Gas money. Vehicle maintenance. Plus the company is going to be saving money on space where you would be working.

The Not So Good

Personal Relationships
"Want to go grab a beer? …Anyone?"

It can be more challenging to shoot the breeze while waiting for a compiler or pulling down the latest. Video and group chats can help relieve that. However, if you are the only remote engineer and your team has a celebratory lunch, you get to celebrate from the comfort of your office chair. Bringing the team together for regular or annual retreats will help build these relationships.

Adversity to Change
Going against the grain can be a challenge

This point is written specifically with my wife in mind. Whenever I have brought up the topic of working remote, she gets a look on her face of uncertainty. She trusts me to make the right decisions and will support me. However, working remote seems to cause her immediate concern for my job security. Perhaps you have a similar spouse or partner who has similar concerns. If you decide to work remote, you might find you have a battle in convincing those who are impacted.

Learning to Work Remote
It's not a skill that can be done on day one.

From the time we were young to today, we have been raised in a communal setting. This may not mean it was the best situation for each of us individually but it is a familiar setting for us. I've worked occasionally from home but I don't consider it the same aspect. In those times, I had a need to stay at home but I also felt like I could still be a productive team member. If I was offered a full-time remote position today, I would be excited at the prospect but I would also recognize a risk involved with accepting that position.

Reading Material

Alex Turnbull's blog is one that I have come across in my research of working remotely. It's become a new favorite of mine on the subject because he is writing about his experiences. When he started GrooveHQ, he decided to find remote employees because his local market was lacking. So he has been documenting the companies progress and many of those posts include challenges or thoughts about working remotely. One of the quotes I've like best so far is:
"Teams succeed because of culture, principles and vision, and the habits you build around all three of those factors."
Alex makes some great points in his blog post and I recommend you read check it out.

Thoughts

A new habit I've been working on in the last two months is tracking my time. Tracking productivity on my desktop computer has been great. Before I was tracking, I would find myself at the end of the day trying to think of the work that I have accomplished. Now, I know what I have done and I know how long I spent in front of the computer working on that. I recommend you check out RescueTime for tracking. Tracking application and web use helps me know that I need to stay focused to hit the goals I make for myself.

Overall, I think that working remotely will continue to increase in popularity among small businesses and teams. I think that the market is shifting slightly and it will be interesting to see the growth in working from home careers.

Using Azure Table Storage and IQueryable

Azure Table Storage icon
A recent task I was working on had me working with Microsoft's Azure Table Storage.

A recent realization was that we needed to have some filtering of the data that was being shown from table storage. I was the back end engineer assigned to review what could be done on the server side to provide a better way to filter. What we had was code that was not flexible enough for searching. It looked something like this:

public class AzureTableStorageRepository<T> : ITableStorageRepository<T> 
    where T : TableEntity, ITableStorageEntity, new()
{
    …

    public async Task<IEnumerable<T>> FindAsync(string partitionKey, 
        CancellationToken ct = default(CancellationToken), IList<string> projection = null)
    {
        var query = new TableQuery<T>()
            .Where(TableQuery.GenerateFilterCondition(
                "PartitionKey", QueryComparisons.Equal, partitionKey));

        if (projection != null)
        {
            query = query.Select(projection);
        }

        var items = await ExecuteSegmentedQuery(ct, query);

        return items;
    }

    async Task<List<T>> ExecuteSegmentedQuery(CancellationToken ct, TableQuery<T> query)
    {
        TableContinuationToken token = null;
        var items = new List<T>();
        do
        {
            var segment = await _cloudTable.ExecuteQuerySegmentedAsync(query, token, ct);
            token = segment.ContinuationToken;

            items.AddRange(segment.Results);
        } while (token != null && !ct.IsCancellationRequested);
        return items;
    }

The code doesn't provide any searching on the server side. The consumer of the code is always forced to pass in a partition key and any real filtering has to be done in memory once the result is returned.

It's not much of a find method. The only real benefit here is that it will makes use of ExecuteQuerySegmentedAsync to page through results. That is a benefit because table storage can only return up to 1000 times at a time. Any more than that and an exception will be thrown. Chances are if you are using table storage, you are planning on having much more than 1000 rows per partition.

My first inclination was just to continue expanding upon using the filter language that the Table Service API supports. This means either the consumers of my code will have to pass up a string tailored to their requirements or I will have to support some sort of builder for these. And that sounded nasty. What other options were available?

I did some digging around and came across this announcement from September 2013: Announcing Storage Client Library 2.1 RTM & CTP for Windows Phone. It is for Windows Phone specifically but it detailed what I was looking for. About half way down you will find "IQueryable Mode (2.1+)" and a table that shows Fluent Mode and IQueryable Mode.

We were using the Fluent Mode in our solution to working with Table Storage and this shows exactly how to transform our queries. Perfect. Some quick changes and I'll be on my way. Now my FindAsync method looks like this:

public async Task<IList<TResult>> FindAsync<TResult>(Expression<Func<T, bool>> predicate, Expression<Func<T, TResult>> selector, CancellationToken cancellationToken = new CancellationToken())
{
    Ensure.ArgumentNotNull(predicate, "predicate");
    Ensure.ArgumentNotNull(selector, "selector");

    var query = _cloudTable.CreateQuery<T>().Where(predicate).Select(selector).AsTableQuery();

    var items = await ExecuteSegmentedQuery(cancellationToken, query);

    return items;
}
 
private async Task<List<TResult>>> ExecuteSegmentedQuery<TResult>(CancellationToken ct, TableQuery<TResult> query)
{
    TableContinuationToken token = null;
    var items = new List<TResult>();
    do
    {
        var segment = await query.ExecuteSegmentedAsync(token, ct);
        token = segment.ContinuationToken;

        items.AddRange(segment.Results);
    } while (token != null && !ct.IsCancellationRequested);
    return items;
}

I put together an integration test that passed with flying colors. Things were looking great.

Then I tried to pass a predicate that performed a Contains on a list of strings. And that's where the trouble started. Query Operators Supported for the Table Service shows what LINQ methods are available against the Table Service. To my disappointment, only 6 LINQ Query Operators are supported at this time. They are: From, Where, Take, First, FirstOrDefault, and Select. There are many that aren't supported but just a few to give you a hint: Contains, Count, OrderBy, Single, and Distinct.

The Table Service supports very basic comparisons so this makes sense. But my LINQ focus thought process was disappointed. This can be overcome though. Let's just specify all times in a list individually for the predicate. Once I threw that together quickly, I was finished.

Takeaway: This was a great exercise for me. It gave me some understanding of the Table Storage and I walked away with more experience than I had before. However, you really should be aware of some gotchas.

If you use this incorrectly, there is a chance you are using Table Storage wrong.

Although, this is totally within the bounds of what is available to you, it's best to use a partition key and row key. That will always have the best performance in returning results. The next best query will just have a partition key. If the predicate you are passing is only filtering on the partition key and row key then you have nothing to worry about. If you are starting to filter on the dynamic data you are storing (read: any data other than partition key and row key), you might find the performance hindered more than you anticipated.

For our criteria, we are going to be querying against a single partition looking for data. It may slow down the response some but we anticipate that. However, if we are querying a single partition today, we open the doors to query the whole table tomorrow. That could bring on a slew of headaches.
"In Windows Azure Tables, the string PartitionKey and RowKey properties work together as an index for your table. So when using Partition and Row Keys, the storage will use its index to find results really fast, while when using other entity properties the storage will result in table scan, significantly reducing performance."

Use your best judgement when implementing your own solutions. A better understanding today will help you make good decisions for tomorrow.