You may have wondered, like me, how many files does it take to degrade the performance of isolated storage for Windows Phone 7 apps. You may develop apps or libraries where this will matter or you may just be curious to accumulate knowledge that could be helpful later.
I began benchmarking this some time ago and along with some interesting findings on the topic of this article, there was a number of other interesting patterns emerging from profiling the supporting code. I shared this result with the platform team at the time. They seemed quite interested in the results and asked if I minded having it circulated internally to the perf guys.
The short answer to the topic for this article is it’s possible to structure your isolated storage file system in such a way where performance virtually does not degrade at all.
The long story is a bit more interesting. If apps or libs you develop create a lot of files without regard for performance, apps could be experiencing performance degradation which, depending on the usage, could vary from minor to very substantial.
What I will share with you here are observations from a substantial volume of benchmarks I’ve run to improve visibility of the impact that our design choices have on the performance of specific isolated storage methods. This will help us to make better choices.
What I will also share, hopefully shortly, in another article is a helper class that will allow you to make these decisions declaratively using a light weight abstraction that will be demonstrated to allow a virtually limitless number of files to be created with virtually no impact on performance and require no coding on your part. Sounds alright? Cool, lets take a look at what’s going on…
What methods are affected?
What is the execution time for these operations under optimal conditions?
CreateFile() and OpenFile() start out at around 10 milliseconds and under optimal conditions can be shown to have slow and steady, linear growth. Unfortunately, the optimal conditions shown in this chart cannot be relied on without intervention.
All charts in this article show
X Axis: File Count
Y Axis: Milliseconds
GetFileNames() execution time starts out at around 5 milliseconds and grows linearly, consistently.
How long can it take to do these operations if performance is not considered?
Performance tends to degrade in steps, apparently as resources are consumed from more efficient pools, giving way to using progressively less efficient allocation mechanisms. How long it takes for execution time to step up in this manner has been the subject of interest for many benchmarking runs under different conditions.
So here we’ve gone from around 10 milliseconds to approaching 300 milliseconds.
After 3,000 files (in this run) an x25 fold increase can be observed in the execution time to create 1 file. Combined with other operations your code performs or if writing a few files at one time this is already starting to add up at the business end – the user’s experience.
You will notice at times execution cost will remain on the slow and steady incline for some time, other times only a limited number of files can be created before the next step up in execution time. Other times, often later in the cycle, it will create very few files at each step seemingly increasing in execution cost much more rapidly.
In another sampling run these operations are seen taking 1 second, per file. 4,500 files in this case to see a x100 fold increase in execution time.
Why would I want my app to make lots of files?
Firstly because isolated storage operations take a non trivial amount of time to execute, particularly on large files.
If you are rewriting these files or reading in whole files when only a subsection of the file is required by the user, you can easily be motivated to design your storage system to make use of files on a more granular basis for substantial performance gains.
Home grown databases or even database libraries that allow generation of an arbitrary volume of data may opt to store files on a highly granular basis to reduce the memory footprint of apps, which for 256MB Windows Phone 7 devices is limited to 90MB.
An app could also be generating files in response to user actions that can be repeated an arbitrary number of times. Taking a photo, for example.
What makes a difference to the perf curve?
You may not be too surprised to read, the number of files you store per directory affects performance.
There appears to be an element of chance involved with just how many files must be created in a directory before the first step up in execution cost takes place.
The chance of this happening appears to be influenced to a degree by the naming convention you use for your files.
Performance degradation of GetFileNames() is consistently linear and is effected by nothing other than the number of files physically in the directory. GetFileNames() does not currently observe the searchPattern provided. For every 2500 files you’re looking at close to 1 second execution time.
Long enough that you probably would rethink why you are using this operation with that many files in a directory.
How many files can I create before performance degrades?
After a substantial volume of benchmarking on files of a variety of file sizes (1, 5, 50, 500KB, 1.5, 5MB) it would appear that after creating 128 files in any directory there begins to be a chance that CreateFile() and OpenFile() performance for that file will degrade beyond the minor linear degradation that takes place up to at least 128.
When using GUIDs for filenames the chance of this degradation happening on or soon after 128 files is fairly high as shown here.
Using GUIDs for generated files can be convenient because you get (statistically likely) unique filenames without having to maintain additional state, and without having to consider strategies for dealing with potential power disruptions during transactions that require atomicity.
If you use an 8 digit, zero padded, numerically incrementing sequence for filenames, the chance of performance degradation stepping up appears to be significantly lower into the high hundreds, however there is still a chance for it to kick in shortly after 128 files as can be seen here.
In this chart (above), at each 2000 file boundary the process is restarted. In the second batch, starting at 2000 on the X Axis, the first step up occurs early. In another, starting at 8000 on the X Axis you can see even after 2000 files are created there still has been no substantial jump in execution time.
In a long running benchmark aimed at finding the lowest file count at which this step up in execution cost can occur, it appears in trials done so far, that this step up does not occur before 128 files. In fact over 760 tests of 500 files each, besides those that did not step up in execution cost (the majority), the ones that did step up, did so at either 128, 256 or 384.
Over 2,547 tests of 100 files each, the total number of occurrences of the execution cost stepping up was 0. Not a proof that it could never happen, but it seems reasonably dependable.
What if my app can create more files than this?
You’re likely interested in having a say in the performance of your file operations. Fortunately, this is easily done with a little help.
I’ll be posting a light weight helper class which allows you to declaratively control how many files are created per directory and hide this detail from the calling code. Here are some comparative results to demonstrate what is possible.
This benchmark was designed to be long running to find if any undiscovered limitations existed. Besides hitting an interesting issue with Visual Studio output copy/pasting for the logged results, this ran unhindered until the device was out of storage space.
And 380,000 files later… still humming along happily with negligible change in performance characteristics. Interestingly whilst there is a slight change in the consistency of the CreateFile() results, the OpenFile() results were unaffected by this.
All this with adding just 3 lines of declarative code to the app.
Are there any other limits to be aware of?
If you were inclined to ignore the performance degradation described here and allow directories to grow without limit, then it seems an exception will occur eventually: IsolatedStorageException – Operation not permitted on IsolatedStorageFileStream
In all benchmarks allowing files to be created without limit to the number of files per directory and without running out of storage space on device, an exception was raised after a little more than 10,000 files. In the case of 1k and 5k sized files the exception occurred after 10837 files were created and in the case of 50k sized files the exception occurred after 10826 files were created.
There are still opportunities for further optimisation, if you want ultra performant code for handling file create/open. The greatest gains have been accomplished here and using abstraction can be so done with minimal effort.
All results shown are the result of benchmarks executed on a Samsung Taylor 650Mhz Windows Phone 7 prototype device, OS Version 7.0.7004.0.