Tag Archives: data

How to work with large lists in Office 365 SharePoint Online

What’s a large list? Anything over 5,000 items.

Why is this such a low limit?

Great question, and Microsoft does a good job explaining it in their help documents:

To minimize database contention SQL Server, the back-end database for SharePoint, often uses row-level locking as a strategy to ensure accurate updates without adversely impacting other users who are accessing other rows. However, if a read or write database operation, such as a query, causes more than 5,000 rows to be locked at once, then it’s more efficient for SQL Server to temporarily lock the entire table until the database operation is completed.

When the whole table is locked, it prevents other users from accessing the table. If this happens too often, then users will experience a degradation of system performance. Therefore, thresholds and limits are essential to help minimize the impact of resource-intensive database operations and balance the needs of all users.

Do you think you’re going to exceed 5,000 items?

There’s a few things you can do before you exceed 5k, which will make life a lot easier.

Create indexes! Indexes are pockets of focused data that render results much faster (very laymen terms here). Searching for ‘bob’ against a name field is faster if that field is indexed.

Office 365 boasts that it will automatically create indexes, sadly I’ve come across some large lists without indexes, nothing created. Reached out to Office 365 support to create some, I’ll let you know if they actually do (requested a few weeks ago!).

Look for fields that you’ll be using to create views and index those. Think broadly like categories, dates, departments, etc. The ID field IS indexed automatically out of the box. Otherwise, create some more indexes.

I’m over 5,000 items, now what?

In the past, working with SharePoint installed on-premises, you could override the limit and go higher than 5k and keep on truckin. If you wanted to stay within Microsoft’s recommendations, there are opportunities to open up maintenance windows to allow bulk operations on lists larger than 5k.

In Office 365, we don’t have that flexibility. Microsoft actually has a list limit of 30,000,000 (30 million) items. 30M!?! That’s far over their definition of a large list. But the list view limit is 5k (.01%). Weird but I guess I understand why.

So what do we do? We have a few options, depending on your needs.

Users need to find records to update or reference

Create smaller collections of data.

If you’re fortunate enough to get your fields indexed, you’ll still need to keep your views under 5k items. You can have hundreds of thousands of items in your list, but you can’t serve up more than 5k items at a time to your users.

Using your indexed columns, create your filtered views. You’ll notice that your columns will be labeled as (Indexed):

indexed columns in a view

PLEASE NOTE that just because you filter on an indexed field does NOT mean you’ll be under 5k items. Also, you can filter on a non-indexed field IF you include a filter for an indexed field, which by itself would return less than 5k items. For example, consider a list with 10k items:

  • Filter State (Indexed) by ‘Connecticut’, and you get back 6k items. Filtering by City (not-indexed) ‘Hartford’ might return 400 items, but because the indexed filter returns over 5k, you can’t perform additional non-indexed filters against the data.
  • Filter State (Indexed) by ‘Massachusetts’, and you get 4,900 items, then you can filter by City ‘Boston’ and get your subset of 400 items.

The magic of the ID. The ID column is automatically indexed, out of the box, even though it doesn’t show it in filtering. Sometimes if you need to find specific data you can create a view with a range of IDs and get your items. You can also create a view that shows all data, sans filter, but sorted by ID, that will work because that column is indexed.

This is great, but if you don’t have indexes…

Search is awesome.

Search can handle your list, and a gazillion of other records of data. Once the item is indexed, you can search for it pretty easily.

For specific business cases, like a call center for instance, you can create a ‘search application’ in SharePoint to allow call agents to find exactly what they’re looking for, by searching for specific data points like casenumber:1234, or callername:Bob. Some of this functionality would usually be done at the list level using column filters and such, but due to the mass amount of data, search is the answer.

I’ll cover more on creating a search application in my next post, I promise. For now, just try searching for your content, should come back for you.

Users need to report on the data

This is the biggest pain in the butt by far. Managers and teams want to see some sort of reporting, like how many calls in the last week, how many from a certain state, etc. Normally we could fudge this with the out of the box list views, but alas, there’s too much data. Instead we can use additional tools to get this data and report on it.

Microsoft Access is an option, though I don’t like it. If you’re looking for a fast solution, connect your local Access database to SharePoint and suck down that list. You can create reports and even export easily to Excel. This option does require your users to have Access installed to view the data (<– the reason I don’t like it)

Power BI is by far the better option (I know, it all depends on requirements). You can point PowerBI to your huge SharePoint list, it will take time to import it all (I imported 300,000 items in about 45 minutes) but once it’s there, you’re golden. You can even setup a nightly refresh so your data is fresh every day when you come in. Awesome sauce right? See below

Reporting on large lists

Sharing PowerBI reports is easy through the standard Office 365 Share experience. According to Microsoft, PowerBI Pro is only available for E5 licenses, but I’m on an E3 and I can do a ton. E5 does offer more functionality but just pulling in data, creating reports and sharing, is doable on E3.

That’s a wrap! Let me know if I’m missing something, I’d love to hear your war stories. ‘Til next time, Happy SharePointing!

Advertisements

My Users Don’t Like SharePoint because it is too slow!

This is Part 7 of my series on ‘My Users Don’t Like SharePoint…

As your SharePoint matures and grows, it can get considerably larger and complex. Content databases can grow to hundreds of gigs, search indexes grow larger, users rely more on Excel services, additional external business data is pulled in, some custom functionality is added, etc. All of this can impact performance when not implemented correctly.

I look at it like it’s a good sign: your users are using SharePoint! However, as your farm grows, you should monitor the farm and possibly reconfigure and add additional servers to the mix. The first step is to figure out why it’s slow.

It’s so slow, what can I do?!

There’s a lot of reasons SharePoint’s performance can dwindle, here’s a few. This is everything I could dream up, dealt with or heard about. Did I miss something? Leave a comment!

  • Is it designed properly? Check out Microsoft’s recommendations for hardware, software and farm architecture to cover the basics. If you’re running your entire farm on a single server, I’d start by adding some more servers. Check out the SharePoint 2010 Technical Diagrams for a great starting point: http://technet.microsoft.com/en-us/library/cc263199(v=office.14).aspx 
  • A very common slow issue is when SharePoint first wakes up. Since SharePoint is a .Net application running on IIS, the application pools need to spin up, compile all of the assemblies and serve up pages. This can take a few minutes when SharePoint initially starts up. In most cases, once you’re past this slow start up, SharePoint will continue to run smoothly throughout the day. One trick to avoid this slow start up is to keep SharePoint awake. There is a simple PowerShell script which you can schedule in Windows Tasks to run, and it’ll keep hitting your SharePoint sites, thereby keeping them awake. Check it out at http://spwakeuppowershell.codeplex.com.
  • Check the Task Manager on your servers. Simple enough, but it tells us a lot. Looking at the stats in Task Manager you can determine which services are taking up loads of RAM or are pinning the processor. If you’re seeing a lot of w3wp.exe processes, take a look at my PowerShell script which marries the w3wp with the process in SharePoint. This might help clarify things a little.

SharePoint's w3wp list

  • Sometimes a specific page or two may be loading slowly. Look into all of the web parts on the page. The more you load on a page, the more it has to do. Consider taking some off, or creating another page that can house some of the web parts.
  • Community Feedback: Thanks Marc! A large number of closed Web Parts, often on the home pages, will slow down a page. Every closed Web Part causes a small amount of processing overhead, and it can add up. To check this, add ?contents=1 to the page’s URL and remove any Web Parts which aren’t displayed. It can make a huge difference.
  • If all pages are doggy, and you have a custom design, it may be a good idea to look into the assets of the design: images, CSS and JavaScript files. If these aren’t sized correctly you might experience slow performance. There are applications available that can assess a page and tell you what’s slow and how large the assets are. I like to use YSlow, and add on for Chrome, gives some pretty neat stats:

SharePoint page stats using YSlow

    • One note about these types of page stats in SharePoint. There are going to be some elements you’re stuck with, namely this first result page. SharePoint has a long list of JavaScript files which are necessary and can’t be removed. These will always skew these stats. Fortunately, the files are minified (art of shrinking a file by removing wasted space) to minimize download time.
  • You should configure your content databases in SQL to handle the anticipated usage and traffic. See my older post which also talked about performance, and has more details on databases: Improving SharePoint’s Performance.
  • Your content databases can grow up to SQL’s limit, which is in the terabytes  It’s not a recommended practice as it does impede the performance of SharePoint. As your site collections grow, consider splitting off site collections to their own content databases. This allows you to manage each database individually, which will help improve performance. Check out this MSDN article on Moving sites between content databases.
  • Caching. Storing data closer to where you need it to help improve performance. Instead of sending SharePoint all the way back to the database to get some data, it can read a local cache instead. Check out MSDN article Plan for caching and performance.
  • Community Feedback: Thanks paslatek! There is a big performance issue when your server does not have internet access and it tries to validate certificates with clr.micosoft.com. I found this TechNet article which may help.
  • Community Feedback: Thanks mansi! Performance issues can be caused by BLOB Storage: since SharePoint saves all the files (documents, images, videos, etc.) in SQL Server in the form of unstructured objects known as Binary Large Objects (BLOBs). Having too many BLOBs can slow down SharePoint’s performance as they take lot of space and require SQL to process more. A solution to work around this issue is to store your BLOBs in a location other than your content database, which SQL refers to as RBS: Remote BLOB Storage. You can read more on this: Storing your SharePoint files outside of the database (RBS).
  • Update: Lists and libraries containing several thousand items can slow things down a lot too. Check out TechNet article Designing large lists and maximizing list performance.

Gah, I think that’s it for now. I’m sure there’s more, but these have been the primary reasons I’ve come across, and have seen some great improvements after applying. If I’m missing any, please leave a comment below.

Til next week, Happy SharePointing!

Use PowerShell to manipulate the values of a SharePoint choice field.

Thanks to Knoots for suggesting the idea for this post, from a comment on Using PowerShell to play with SharePoint Items.

Using PowerShell, we’re going to walk through handling a Choice field in a list. Specifically, this is a calendar list using the Category field. This may come in handy if you want to automate changing the values from another data source that BCS can’t connect to, or is too much work to get it to connect. I always prefer using SharePoint’s features, but sometimes we need to stretch it to make it work.

As a heads up, if you remove a value that someone has used in their item, the value will remain in the item UNTIL they come back in to edit the item. Since the value no longer exists in the option list, their value will be lost. However, if you have the option enabled on the field to specify their own value, then the previous value will be saved there.

Let’s get to it, open SharePoint 2010 Management Shell

$web = get-spweb http://site/web
$list = $web.lists["Calendar"]
$list

I like to run the $list just to ensure we have the list properly. Errors aren’t always displayed in PowerShell. Running $list should return the list name.

$choice = $list.fields["Category"]
$choice.choices

Again, make sure we have the right field, calling $choice.choices will list all of the current values. The Choices property is a StringCollection, so use typical commands to add/remove items.

Important. If you’re going to clear items then add items, or after any .update(), you’ll need to get the field again (type in $choice = $list.fields[“Category”] again) to access the new values properly.

remove all items

$choice.choices.clear()
$choice.update()

remove one item

$choice.choices.remove("Name")
$choice.update()

(remember get the field again to add items after clearing it)

add an array of choices

$choiceArray = @("Meeting","Work Hours","Business","Holiday","Get-together")
$choice.choices.addrange($choiceArray)
$choice.update()

add one at a time

$choice.choices.add("Name")
$choice.choices.add("Name Of Another")
$choice.update()

Happy SharePointing!

Questions on SharePoint 2010 Databases

What’s the maximum size for a SharePoint database?

This is a very common question across the board. Microsoft’s documentation recommends a maximum of 200GB per content database. This isn’t a limitation, just a recommendation. They are recommending this level due to performance and maintenance considerations. When your database gets much larger than that, there are some additional steps you need to take to optimize SQL’s performance. Check out MS’s white paper on Managing Multi-Terabyte Content Databases with Microsoft SharePoint 2010. This paper explains what’s needed to optimize databases between 200GB and 4TB, and then databases even larger than that!

What about Remote Blob Storage (RBS)?

RBS externalizes the documents SharePoint stores in SQL. Read more about RBS. This does allow for smaller content databases, however you’re still using up a large amount of space, somewhere. Microsoft’s recommendations as mentioned above should be considered including the space used in RBS.

Can I use the free SQL Express?

SQL Express is a great solution for development environments, and even staging if budgets are tight and the full SharePoint Server experience isn’t required. I wouldn’t recommend it for a production environment at all. Maintenance is a lot more difficult and performance will be significantly limited with SQL Express. You cannot schedule maintenance tasks like backups. You’ll have to manage all of that outside of SQL using scripts or third-party tools. Also, the Express edition has a database size limit of 10GB, and will only use 1GB of RAM. SQL wants as much RAM as you can give it, 1GB for production will slow things down considerably.

Should I run multiple site collections in a single database?

This can go two ways, though you’ll quickly see which is preferred. As noted above, the size of the database doesn’t doesn’t matter, too much, since you can work around maintenance tasks and performance issues. I prefer and recommend to load a site collection into its own content database, instead of managing multiple sites in one database. Not only does this keep the database sizes small, it also helps with recovery.

For instance, lets say you have a site collection for each of your departments, and the HR department does something that warrants a restore. If all of your sites were in one databases, you’d be forced to restore all of the site collections within that same database (or restore it to another farm, then backup that one site collection and move it over). Instead, with each site collection in its own databases, you can restore that one database and be back to work.

It’s not too late if you already have several site collections in your database. Check out the following article on Moving site collections between databases.

Additional resources on the SharePoint databases and capacity planning