[ale] 117000 files vs 240 missing - amazon

Lightner, Jeff JLightner at water.com
Mon Nov 25 11:16:46 EST 2013


Thanks.

We've been asking those questions but have been having to go through our Director so aren't getting direct responses.   Maddening.

At first blush it doesn't appear curl supports recursive downloads itself but things like curlmirror and curlftpget have been written for such purposes.    I've not tried either of those previously.  

Due to the length of the download at this point we'll probably hold off until we've gotten answers to how the files were stored and how they were counted by the other side.    I doubt it is the "." and ".." thing which I'd already considered simply due to the sheer numbers of subdirectories.   (That is they would account for a lot more than 240.)

-----Original Message-----
From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of Michael H. Warfield
Sent: Monday, November 25, 2013 10:27 AM
To: Atlanta Linux Enthusiasts
Subject: Re: [ale] 117000 files vs 240 missing - amazon

On Thu, 2013-11-21 at 21:59 +0000, Lightner, Jeff wrote: 
> A vendor put a site on Amazon with some files we need.   We don’t have
> sftp access to this Amazon site but do have ftp access.   
> 
>  
> 
> Accordingly we did a wget to download all the files using our ftp
> credentials.    When all done we got over 117,000 files and saw no
> errors in the wget.
> 
>  
> 
> The problem is vendor is telling our director there are 240 more files
> in their count than we downloaded.    This is less than a 0.2%
> difference so I suspect it has something to do with the way they count
> vs. the way we did.  (We used find piped to wc –l.)   Our count
> matches the summary wget output when it finished so we are sure we’re 
> correctly counting what wget did but of course it’s possible wget 
> actually missed something though it seems unlikely to me.
> 
>  
> 
> The question is does anyone know what might cause such a difference?
> Alternative does anyone know another way we could count the files on 
> the Amazon site using our ftp credentials other than going in and 
> counting them one by one?
> 
I can think of several reasons why their count might be off, and different reasons depending on if they were running on Windows, Mac, or *NIX.  It's important to find out their methodology on how they counted noses in a complex directory hierarchy to really know (did they accidentally count . and .. in the directories, for instance).  They should have provided you with a directory tree listing in the root of that tree so you could compare.  If they can, they should go back and create an "ls -R" listing in that directory.  Sending something blind like that with no verification information seem rather incompetent to me.

That being said, my next step would be to use curl instead of wget.
There are some, albeit rare, circumstances, mostly to do with http redirects - but there are others, where wget does not always do the right thing but curl does.

Curl also has some ftp options as well for fine grained control over whether it uses multiple CWD commands, a single CWD command, or no CWD commands when retrieving a tree.  Depending on the ftp server, this can make a big difference (note: MultiCWD is the slowest but the most formally correct by RFC).

I would also use the listing command, which uses NLIST, in a shell script to simulate a recursive list by parsing out the directories and issuing commands for each directory to drill into the hierarchy, then count the files from the resulting hairball.

You mentioned in another message you had also done a find for files and directories and added them up, which matched your total.  What were the specific detailed counts?  Files, directories, your total, their total expected.
> 
> We’re trying to find out how the vendor did their count but I was 
> hoping someone already knows of some vagary on Amazon sites that would 
> cause this kind of discrepancy.
> 
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Athena®, Created for the Cause™
> 
> Making a Difference in the Fight Against Breast Cancer
> 
>  
> 
>  
> 
> How and Why I Should Support Bottled Water!
> Do not relinquish your right to choose bottled water as a healthy 
> alternative to beverages that contain sugar, calories, etc. Your 
> support of bottled water will make a difference! Your signatures 
> count! Go to 
> http://www.bottledwatermatters.org/luv-bottledwater-iframe/dswaters
> and sign a petition to support your right to always choose bottled 
> water. Help fight federal and state issues, such as bottle deposits 
> (or taxes) and organizations that want to ban the sale of bottled 
> water. Support community curbside recycling programs. Support bottled 
> water as a healthy way to maintain proper hydration. Our goal is
> 50,000 signatures. Share this petition with your friends and family 
> today!
> 
>  
> 
> ---------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or 
> confidential information and is for the sole use of the intended 
> recipient(s). If you are not the intended recipient, any disclosure, 
> copying, distribution, or use of the contents of this information is 
> prohibited and may be unlawful. If you have received this electronic 
> transmission in error, please reply immediately to the sender that you 
> have received the message in error, and delete it. Thank you.
> ----------------------------------
> 
>  
> 
> 
> --
> This message has been scanned for viruses and dangerous content by 
> MailScanner, and is believed to be clean.
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at 
> http://mail.ale.org/mailman/listinfo

-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!




More information about the Ale mailing list