[ale] 117000 files vs 240 missing - amazon

Lightner, Jeff JLightner at water.com
Mon Nov 25 08:31:30 EST 2013


You apparently missed the part where I don't have a login other than ftp.

Don't teach your grandmother to suck eggs, Sonny.

From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of David Ritchie
Sent: Monday, November 25, 2013 12:51 AM
To: Atlanta Linux Enthusiasts
Subject: Re: [ale] 117000 files vs 240 missing - amazon

I suggest gzip (or a mutually agreeable archive format) the file structure and sending one
file...

On Fri, Nov 22, 2013 at 7:50 AM, Lightner, Jeff <JLightner at water.com<mailto:JLightner at water.com>> wrote:
Long directory structures involved.  In fact on our initial attempt we found that it didn't download everything because the default behavior of wget is to only go down 5 levels so we had restarted with 99 levels the max it would allow.  I don't think we had any that actually hit 99 levels but we probably ought to verify that.

The find was a straight forward find with no flags initially.    Later find for -type f was done then another more complicated one done just to show directories.   Adding those together resulted in the same total as the initial find and wget summary.

We tried NLIST (LIST not available) but it doesn't do recursion at the remote site.

From: ale-bounces at ale.org<mailto:ale-bounces at ale.org> [mailto:ale-bounces at ale.org<mailto:ale-bounces at ale.org>] On Behalf Of David Tomaschik
Sent: Thursday, November 21, 2013 8:43 PM
To: Atlanta Linux Enthusiasts
Subject: Re: [ale] 117000 files vs 240 missing - amazon

Is it all in one directory, or was there directory structure transferred?  What were the predicates to your find command?  (Thinking their count might've included directories or something.)

On Thu, Nov 21, 2013 at 1:59 PM, Lightner, Jeff <JLightner at water.com<mailto:JLightner at water.com>> wrote:
A vendor put a site on Amazon with some files we need.   We don't have sftp access to this Amazon site but do have ftp access.

Accordingly we did a wget to download all the files using our ftp credentials.    When all done we got over 117,000 files and saw no errors in the wget.

The problem is vendor is telling our director there are 240 more files in their count than we downloaded.    This is less than a 0.2% difference so I suspect it has something to do with the way they count vs. the way we did.  (We used find piped to wc -l.)   Our count matches the summary wget output when it finished so we are sure we're correctly counting what wget did but of course it's possible wget actually missed something though it seems unlikely to me.

The question is does anyone know what might cause such a difference?  Alternative does anyone know another way we could count the files on the Amazon site using our ftp credentials other than going in and counting them one by one?

We're trying to find out how the vendor did their count but I was hoping someone already knows of some vagary on Amazon sites that would cause this kind of discrepancy.









Athena(r), Created for the Cause(tm)

Making a Difference in the Fight Against Breast Cancer





How and Why I Should Support Bottled Water!
Do not relinquish your right to choose bottled water as a healthy alternative to beverages that contain sugar, calories, etc. Your support of bottled water will make a difference! Your signatures count! Go to http://www.bottledwatermatters.org/luv-bottledwater-iframe/dswaters and sign a petition to support your right to always choose bottled water. Help fight federal and state issues, such as bottle deposits (or taxes) and organizations that want to ban the sale of bottled water. Support community curbside recycling programs. Support bottled water as a healthy way to maintain proper hydration. Our goal is 50,000 signatures. Share this petition with your friends and family today!



---------------------------------
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.
----------------------------------



_______________________________________________
Ale mailing list
Ale at ale.org<mailto:Ale at ale.org>
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo



--
David Tomaschik
OpenPGP: 0x5DEA789B
http://systemoverlord.com
david at systemoverlord.com<mailto:david at systemoverlord.com>

_______________________________________________
Ale mailing list
Ale at ale.org<mailto:Ale at ale.org>
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20131125/8702b1b6/attachment-0001.html>


More information about the Ale mailing list