[ale] 100 million Facebook pages leaked on torrent site
Jim Lynch
ale_nospam at fayettedigital.com
Sun Aug 1 12:47:09 EDT 2010
On 08/01/2010 11:15 AM, Michael B. Trausch wrote:
> On Fri, 2010-07-30 at 11:55 -0400, Jim Philips wrote:
>
>> I saw a report today that major corporations are already downloading
>> the file through BitTorrent. A free goldmine of information for them!
>>
> I have already downloaded it myself, just to take a look at what's
> actually in the whole thing.
>
> There is a *lot* of data, mostly names, but also URLs to profile pages
> for each of those names. It's about 17GB worth of data, enough to burn
> to a BD-R for storage. It's not indexed, just plain-text, along with
> counts for various names which could be used to determine popularity, as
> an example.
>
> I can see some of this data taking the place of 1930 Census Data in
> terms of storage of proper names, such that businesses that use the aid
> of data to parse free-form documents would benefit.
>
> Here are the ten most listed first names (with frequency of occurrence):
>
> 977014 michael
> 963693 john
> 924816 david
> 819879 chris
> 640957 mike
> 602088 james
> 584438 mark
> 515686 jason
> 503658 robert
> 484403 jessica
>
> And the ten most listed last names (also with frequency of occurrence):
>
> 913465 smith
> 571819 johnson
> 512312 jones
> 503266 williams
> 471390 brown
> 386764 lee
> 360010 khan
> 355639 singh
> 343220 kumar
> 324972 miller
>
> I guess "Michael Smith" would be the most generic name possible if you
> look at those numbers. :-)
>
>
Hm, "*Stranger in a Strange Land
<http://en.wikipedia.org/wiki/Stranger_in_a_Strange_Land>*" comes to mind.
More information about the Ale
mailing list