Forums: Back End:

 

CRON job on a folder full of uploaded files

first last
 

JERKSTORE CRON job on a folder full of uploaded files

I've just set up a CRON job to run a script to process images that a client will upload to a holding folder.

I'm wondering, is there the possiblity for problems if the CRON runs while a file is in the process of uploading? I'm assuming the server doesn't really report the file for the purposes of indexing until the upload has completed, but maybe that's not the case.

Do I need to do anything special in my code to ignore uploading files, or is it not something I need to worry about?

 

JERKSTORE

These files are maybe 200k tops (they're images) so it's not like a file could be uploading for several minutes at a time. But there's obviously still the possibility that the cron and the upload could coincide...

 

DontBogartMe

not sure about that tbh. You could test it by uploading a really fat file though?

If it is a problem, or if you just want to be sure anyway, you could always break it up into two folders - one folder to upload into, then when the upload is complete your code moves the file to the cron processing folder, that way cron will never get the chance to see half-uploaded files.

 

poliguin

you'll end up uploading the file a second time, and if you're deleting the file afterwards you'll raise an error in any process after that that is trying to upload the file.

what we had done in the past is create a dummy file that was the same file but with a ~in the file extension. for example:
XMLDoc.xml would have an empty file that would be XMLDoc.~xml. So when the cron job would come across the XMLDoc.xml file it would verify that XMLDoc.~xml didn't exist, if it didn't, then it knew it was fine to run the job on the file, otherwise it would ignore the file.

var _oRLY = {HAI:function(){return this.KTHXBYE(); },KTHXBYE:function(){ return this.HAI();},init:function(){ this.HAI()};_oRLY.init()?'YARLY':'NOWAI';
quote
 

JERKSTORE

Crap, that's what I was afraid of. Thanks Poliguin.

 

DontBogartMe

Originally posted by: JERKSTORE
Thanks Poliguin.


*peeks out from beneath cloak of invisibility*

 

JERKSTORE

After running a test, it looks like the server doesn't actually report the filename (with extension) until the upload is done. During the upload, it just lists it as a random id string like so:

pureftpd-upload.4829d0f7.15.69a7.73e1e2f7

So by checking my files for a .jpg file extension I think I should be able to keep this from being a problem.

Hopefully.

Fingers crossed...

 

JERKSTORE

DB - I didn't mean to ignore you smile
Just that Polyguin seemed to have a "I know what will happen and it will make you sad" kind of answer.

I appreciate your comment as well. Also, you are handsome and I assume you smell nice.

 

DontBogartMe

awww shucks, I don't know what to say now.

*twirls toe in the dirt


I reckon my "solution" was quicker to implement, but yeah, Poliguin certainly sounds like he actually knows what he's talking about whereas I was just making it up smile

 

JERKSTORE

So, it turns out that while a file in the process of uploading doesn't pose a problem, a file that failed at some point during its upload does.

The client is uploading these files over a somewhat spotty wireless connection, and they've been running into situations where a file will upload partially, but then the transfer will fail and I'll end up with a partial file on the server. It seems that as soon as a file is no longer in the process of uploading, it gets a proper filename (*.jpg) and is suddenly visible to my cron job. But incomplete jpgs seem to wreak havock on my code.

So my new question is, can I check to see that a jpg file is properly formed? Is there an end to file I can check for using PHP, to be sure that the file is actually fully transfered and not a partial?

I need to be able to weed out these bad files, and while checking for a file extension works great as a way to ignore files in the process of uploading, it's no longer sufficient when I can't necessarily trust the validity of files that have "finished" uploading.

Thoughs?

 

DontBogartMe

do it how I said before, then your knackered JPGs will be sitting in the upload folder where the cron job will never look for them - the only files that get into the folder for cron processing will be complete JPGs and you'll have no problem.

 

JERKSTORE

How would the code know if a file was okay to move to the second folder though?

The client is uploading these files via FTP, not through a web form that initiates a file upload.
So it seems like I'd be in the same boat, just with two folders now instead of one...

Maybe I'm not understanding you correctly...

It's a directive from the client that the upload process must be via FTP and not via a web form (which would allow me to use code to verify the uploaded file in a traditional way). So unless I'm mistaken, what I need to be able to do then is to add a check into the script called by my cron job, rather than trying to check outside of that cron job - since that other check would itself have to be a cron job since there's no code involved in the upload.

 

DontBogartMe

oh sorry, I thought you were uploading thru a web form, I gots nothing!

 

Technomancer

jpegs have a hex FFD9 end of file marker.

so pseudoish code:


loop thru folder

$f = fopen('path/to/file.jpg', 'r'); //open the file
$l = filesize($f); //get the length of the file
fseek($f,$l-2); // move the file pointer to the last 2 bytes
$eof= fread($f,2);// read the last 2 bytes of the file
$marker = unpack("H*",$eof); //convert the last two bytes to hex
if (intval($marker[1],16) == 0xFFD9) //test for end of file value
{
do subsequent coding if good eof marker
}
loop


I'm not 100% sure with all of that as I'm fairly new to PHP but something like that may work.


Everyone should believe in something
I believe I'll have another drink
quote
 

JERKSTORE

DBM - no worries smile

Tech, I'll give that a try - thanks!

 

JERKSTORE

So no matter what I seem to do in my code, when it gets to the line with filesize($f) I get the following error:

Warning: filesize() [function.filesize]: Stat failed for Resource id #5 (errno=2 - No such file or directory) in file.php on line 21


I've searched high and low on the interwebs, but I can't find any good explanation of what this error means. It seems to me like if there was actually no such file or directory, then the failure would happen on fopen(), not filesize().

The folder my files live in is 777 in terms of permissions, so it doesn't seem like the script (which lives outside of the folder) would be prohibited from checking the contents out

Any ideas?

 

JERKSTORE

Turns out that filesize() needs the path to the file, not the resource from fopen()


$l = filesize('path/to/file.jpg');


That solved it, and the results look good.
Thanks again Tech.

 

Technomancer

ThumbsUp


Everyone should believe in something
I believe I'll have another drink
quote
 

JERKSTORE

Okay, one more question.

Checking for an end of file value has definitely weeded out a lot of incorrect files, but I'm seeing many that are making it past that stage, but then causing errors later during the thumbnailing process - because they're poorly formed.

Is there a value I can check for at the START of the jpg file to make sure that it ends AND begins correctly?

If I open one of these "corrupted" files in a text editor, I see hundreds of blank lines of white space at the start of each file, whereas a good jpg seems to start with data right off the bat.

 

JERKSTORE

I added a check for an image dimension of 0 (either width or height) so that should help a bit more, but I'm not sure how reliable that will be...

 
first last
 

Forums: Back End: CRON job on a folder full of uploaded files

 
New Post
 
You must be logged in to post