Password Protected File Storage and Preview Script
Overview
Let’s say you have a current site that you want to add file storage functionality to. Everything is already password protected and you just want a place for each user to store some files online. Once files are online, users should be able to manage them using a browser. This includes download, delete and update. Additionally, you would like users to be able to “preview” a document in their browser without having to download it.
Our Experience
We’ve worked with this in a couple of situations, both with PHP and Java/WebWork. It’s pretty straightforward as long as you think out the details ahead of time.
Additional Requirements
- You’ll likely want to have an upload progress meter.
- Most PHP installations have a hard upload limit of 2mb. If you want to handle files larger than that, you’ll need to make arrangements with your host.
- You’ll need to allow for files of the same name, even if they’re from the same user. In other words, the name the user gives the file and the name that you store it under on your host will have to be independent.
- Will you let your customers store in multiple directories or just a single directory? Any limit on the number of directories?
- File type limitations. You should decide ahead of time what files you will allow people to upload. If you want people to be able to upload anything, consider using ZIP files.
- The ability to preview a document without downloading will require some sort of server based file processing. For example, you could extract raw text from a file for the preview, create an image based on a print preview or turn it into a PDF (while it downloads–everyone has the plug in in their browser).
- Do you want to add additional viewing rights? For example, could User A upload a file that should the be visible to User B? Or is there always a strict wall between the two?
Risks
- Viruses and other malware. It’s important to think about this up front. Do you want to scan each inbound file? Do you want to simply scan your system nightly? Some other arrangement? Not scanning any of the files is not a good idea.
- How much bandwidth do you want to give customers? The larger their file uploads, the more bandwidth each customer will consume. While it may not be important to meter it in the beginning, it probably makes sense to plan for metering down the road.
Approach
Create the user storage area first. Establish the rules that handle space limitations, viewing privileges, directory limitations, downloading the files to a browser, etc. Make sure they work to your satisfaction.
If the preview without download feature is the most important, do it here. If upload is more important, skip to the next paragraph and come back. Decide on the approach you want to take. If you’re on a Linux host, and you want to convert Word documents, you’ll need something like Antiword (See it in action.). Another library worth looking at is from BadBlue. These both convert to text or HTML, so there’ll be some formatting limitations for complex documents. Windows based server? You could choose to use MS Office Automation calls, but those will require you to have Office installed on the server and may not scale well.
Whew–on to upload. It’s probably best to start the upload with a limited set of file types. Go ahead with, for example, Word, Excel, PowerPoint, PDF and Zip. Get these basic uploads working without a progress bar. If you’re allowing files larger than the default 2mb, test one that’s big enough to be on the outer limits of what you want to allow.
Add the server based file handling next. Make sure it can handle the duplicate file names gracefully and that all files are not accessible via a URL, not even a hidden URL. Essentially, the upload should move the file to the TMP directory and then into a hidden customer specific directory with a guaranteed unique name. When downloading, the file, your script should copy if from the hidden directory to a temporary public space. After some period of time, the script should automatically delete files in that public space.
Now add the upload progress bar. You’re best bet is to find a component you can use and customize. For example, RadLinks has one for less than $50.
Now add the rest of the file handling functions–copy, rename, delete, etc. These are straightforward in almost all operating systems, so you’re heavy lifting is done.
Now link all of these into your user management system.
Conclusion
Although the upload itself isn’t challenging, there are a number of details to think about. Spend some time up front on these and you’ll be happy you did.
If you have questions, comments or ideas, just post them below.