A common file ingestion strategy is to watch a folder for a new file to appear, process then delete it.  On most operating systems files can be read while they are still being written to, so there is a risk of loading a half-written file.  Strategies to get around this are:

  • write the file being uploaded with a .tmp extension then rename it to the correct name when finished so the import can be triggered.

  • Write a second zero length file to indicate that the first is complete.

If the file being uploaded is outside your control, a delay is required before the file can be processed, checking the “last modified” property to ensure the file writing is complete.

Trap: Especially with files outside your control, be careful loading the entire file into memory.  Instead, check the file size or use a buffer to load only portions of the file at a time. See [trap.byte] for more information.

blog comments powered by Disqus