Ashley Holman from NetSpot realized (http://moodle.org/mod/cvsadmin/view.php?conversationid=5503#c219806) that the current distribution of files into three levels of subdirectories in the filepool may be quite an overkill. If we expect that SHA1 hash has randomly distributed bit values for reasonable file contents, the chance of even having more than one file in the subdirectory seems to be around 1/16 millions, so it is pretty unlikely that even two files share the same directory at majority of Moodle sites. So we probably use four file descriptors (three directories plus one normal file) at the harddisk filesystem to keep a single file. That may be considered as wasting of OS resources.
We had a discussion about this with Petr today and we agree. But we were not classroom stars in statistics and cryptography to offer a trustworthy analysis and mathematical proof on how many levels would be enough. Such an analysis should take the following notes into account:
- Common file systems (like ext3) has limit of maximum files/subdirs per directory at ~32000
- It looks reasonable for us to require that the distribution of files in the filepool guarantees there are no more than 1024 files per directory in almost 100% of cases.
- The two-characters length of directory name could be replaced with three-characters alternative. This leads into increasing the number of subdirectories but only up to the limit of 4096. We could end up with lower number of file descriptors (/abc/def/ instead of /ab/cd/ef/) but still having the same big reserves. But the question is if, at the end of the day, the total number of directories would really be lower...
So we would like to ask somebody with experiences in statistics to decide this. The implementation itself and the upgrade procedure is quite easy to do (citing Petr).