A month ago or so, I was at a client doing some work involving an application that required a fast way to JOIN two SharePoint lists. Let me explain.
Sometimes you have a need to store data in 2 or more lists because the data in list ‘A’ represents real world entities that are unique instances, while data in lists ‘B’, ‘C’, etc hold data that is non-unique and reusable by its nature and for our purposes also possesses a 1-to-Many relationship to the data in list ‘A’.
Relational Database Applications 101.
Of course in some cases a data-driven ASP.NET application might be the best course to take; but that might not be an option. So in a SharePoint world, we will use lists and have to find a way to make the relations between lists work in our code in an efficient way.
It is very easy to take a simple approach that leads to performance/scale issues in applications of this type because the WSS Object Model does such a nice job of putting SP objects in collections, and iterating over these collections seems like a natural extension. Except that iterating through list item collections, especially when nested, is quite costly.
In the interest of stating the obvious, of course you will reduce the number of list items you need to iterate over with the appropriate CAML query filtering so that when you form the SPListItemCollection from a list.GetItems(query), it is already minimal in size.
However, in the context of this post, imagine nesting foreach loops to iterate over our ‘A’ list, while also iterating over an inner collection of items from our 1-to-Many ‘B’ list. You will start to get close to an N squared situation in terms of Big O notation and computational complexity.
Hence the need for a relational solution with a JOIN.
The solution is rather simple with WSS 3.0, and there are at least 2 versions of an approach you can take that involve placing your SPListItems in DataTable objects as part of a DataSet object.
For reduced footprint and added speed, I like defining the DataTable explicitly in code using just the DataColumns (list columns) I need. This reduces the footprint used over this approach:
DataTable table = list.GetItems(query).GetDataTable();
which returns the items with all column values present, and uses more memory.
Quite a few folks favor the minimalist code approach, and I am no different, but when it matters, performance should be paramount over a few extra lines of code. I tend to make fun of ASP.NET programmers for this same reason, as they tend to fall in love with the “look what I can do with 1 line of code” mantra.
Now you need to define a DataRelation object to represent what you are trying to accomplish with a JOIN and add that to your DataSet. This can transform a (slow) nested iteration over multiple sets of SharePoint list items into a single result set formed in a few milliseconds. Then you do your really meaningful operations on this result set now that the data is aligned.
One comment on this approach with respect to server round-trips.
Since you are working in the realm of a disconnected data set, you are in effect getting all the pertinent list items once. In my specific case, the list queries simply needed to run one time, but there is a caching benefit here that can make a big difference in other application situations.
Lastly, the one way to improve on this further is for the SharePoint Development Team to leverage LINQ in WSS 4.0.
That would be way cool.