<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Skippy Records &#187; long tail</title>
	<atom:link href="http://skippyrecords.wordpress.com/tag/long-tail/feed/" rel="self" type="application/rss+xml" />
	<link>http://skippyrecords.wordpress.com</link>
	<description></description>
	<lastBuildDate>Tue, 17 Jan 2012 22:41:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='skippyrecords.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Skippy Records &#187; long tail</title>
		<link>http://skippyrecords.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://skippyrecords.wordpress.com/osd.xml" title="Skippy Records" />
	<atom:link rel='hub' href='http://skippyrecords.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Misunderstanding the long tail</title>
		<link>http://skippyrecords.wordpress.com/2009/12/01/misunderstanding-the-long-tail/</link>
		<comments>http://skippyrecords.wordpress.com/2009/12/01/misunderstanding-the-long-tail/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 17:23:28 +0000</pubDate>
		<dc:creator>Dr. Skippy</dc:creator>
				<category><![CDATA[Everything Else]]></category>
		<category><![CDATA[Physics and Mathematics]]></category>
		<category><![CDATA[long tail]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[the economist]]></category>

		<guid isPermaLink="false">http://blog.drskippy.com/?p=372</guid>
		<description><![CDATA[The Economist blows it!  (I don&#8217;t get to say that very often.) On page 79 of the Nov 28 &#8211; Dec 4 2009 issue, there is an article (a world of hits) that misrepresents and misunderstands the ideas behind Anderson&#8217;s Long tail and proceeds for 2 more pages to build a story on the misunderstanding.  This is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=372&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The Economist blows it!  (I don&#8217;t get to say that very often.) On page 79 of the Nov 28 &#8211; Dec 4 2009 issue, there is an article (a world of hits) that misrepresents and misunderstands the ideas behind Anderson&#8217;s <a title="The Long Tail" href="http://www.amazon.com/Long-Tail-Revised-Updated-Business/dp/B001PTG4BO/" target="_blank">Long tail</a> and proceeds for 2 more pages to build a story on the misunderstanding.  This is not just useless reporting, writing and thinking, it is damaging and a sad illustration of mathematical and analytic carelessness.</p>
<p>I am sure Anderson and others will jump on the many errors here.  I just posted this rant in the comments section of the <a title="A world of hits" href="http://www.economist.com/displayStory.cfm?story_id=14959982" target="_blank">online version of the article</a>&#8230;feel much better now <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>&lt;rant&gt;</p>
<p>The mistake seems to be not understanding that long tail statistics come from underlying dynamics. In particular, books and movies are small hits before they are big hits. The difference between the long tail and the big hits (possibly call it &#8220;tall head&#8221;) is how fast a book or movie becomes a hit and how big of a hit it becomes. It is difficult to propose a dynamic in which the bottom and middle of the distribution grow where the big hits don&#8217;t get bigger as well. Anderson never proposes such a fundamental shift in underlying dynamics.</p>
<div class="comment-body clearfix">
<p>There seems to be a complete disconnect between the statistics and the market dynamics producing the statistics in this article.</p>
<p>To say that the long tail ideas predict the demise of the hit is sloppy (or ignorant?). In fact, as technologies that enable access to the long tail become more efficient and capable, there may be an argument that the outcome will be that big hits get bigger. The tall head may grow taller since all tall head items start out with no following at all and grow from long tail-dom&#8211;the place where broad cheap access is changing our access most.</p>
<p>Hits will always be big (seems like I wouldn&#8217;t need to write a sentence that dumb on purpose!) and big hits will always make big money. The point of the long tail is that there are now opportunities to make significant money on less popular items (e.g. books and movies in the long tail.)</p></div>
<p>&lt;/rant&gt;</p>
<br /> Tagged: long tail, mathematics, the economist <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/skippyrecords.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/skippyrecords.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/skippyrecords.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/skippyrecords.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/skippyrecords.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/skippyrecords.wordpress.com/372/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/skippyrecords.wordpress.com/372/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/skippyrecords.wordpress.com/372/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=372&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://skippyrecords.wordpress.com/2009/12/01/misunderstanding-the-long-tail/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fd95bd67cd406fcb27a627a44570f2a2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">drskippy27</media:title>
		</media:content>
	</item>
		<item>
		<title>Day-to-Day Tall Head of URL Exploration</title>
		<link>http://skippyrecords.wordpress.com/2008/09/03/day-to-day-tall-head-of-url-exploration/</link>
		<comments>http://skippyrecords.wordpress.com/2008/09/03/day-to-day-tall-head-of-url-exploration/#comments</comments>
		<pubDate>Thu, 04 Sep 2008 00:53:38 +0000</pubDate>
		<dc:creator>Dr. Skippy</dc:creator>
				<category><![CDATA[Networks and Webs]]></category>
		<category><![CDATA[long tail]]></category>

		<guid isPermaLink="false">http://h180745wp.setupmyblog.com/?p=95</guid>
		<description><![CDATA[This is the 4th post on the statistics of URL exploration. In the previous three (The Long Tail of URL Exploration, What does the Nth Explorer of the Web Find? and The Tall Head of URL Exploration) I looked at how adding users grows the long tail and tall head of URLs for a single [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=95&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the 4th post on the statistics of URL exploration. In the previous three (<a href="http://h180745wp.setupmyblog.com/2008/08/29/the-tall-head-of-url-exploration/" title="Long Tail URLs">The Long Tail of URL Exploration</a>, <a href="http://h180745wp.setupmyblog.com/2008/08/27/what-does-the-nth-explorer-of-the-web-find/" title="The Nth Explorer">What does the Nth Explorer of the Web Find?</a> and <a href="http://h180745wp.setupmyblog.com/2008/08/27/the-long-tail-of-url-exploration/" title="Tall head of URLs">The Tall Head of URL Exploration</a>) I looked at how adding users grows the long tail and tall head of URLs for a single day.&nbsp; Today, the data covers 20 days with a relatively constant population.</p>
<p>To get some idea how the tall head evolves, compare the tall head on day 0 with 19 successive days. The plot below shows the Top 10, Top 50, Top 100, Top 500 and Top 1000 URLs for day zero and the fraction of the top URLs on day zero appearing in the tall head on the nth day.</p>
<p>&nbsp;</p>
<div style="text-align:center;"><img height="331" width="331" border="0" src="http://drskippy.net/img/tallheaddays_2008-09-03.png" alt="Tall Head URLs by day" title="Tall Head URLs by day" /></div>
<div align="center">Figure 1.&nbsp; Red-Top 10 URLs; Blue Top 50 URLs; <br />Green-Top 100 URLs; Cyan-Top 500 URLs; Yellow-Top<br /> 1000 URLs. (URLs ranked by visits).</div>
<div align="center">&nbsp;</div>
<p>While the Top 10 and Top 50 URLs show stability day after day, the Top 500 and Top 1000 roll over at a fairly constant rate after day one.&nbsp; The plot can be used to estimate the size of the persistent tall head of URLs for this population and the rate at which the tall head evolves.</p>
<p>First, look for a change in behavior from maintaining a constant fraction of the day-0 URLs to a steady decline from day to day. By this heuristic, estimate the persistent tall head to be between 50 and 100 URLs.</p>
<p>Secondly, to estimate the turnover of the tall head, choose the approximate desired tall head, e.g., the Top 500 URLs (cyan), and look at the slope of the line for days 1-19. (Alternately, choose a timescale for which the tall head should turn over to a given fraction remaining, say, 75%, giving a timescale of approximately 15 days.)</p>
<div style="text-align:center;"><img height="331" width="331" border="0" src="http://drskippy.net/img/tallheaddaysfit_2008-09-03.png" alt="Tall Head Top 500 Fit" title="Tall Head Top 500 Fit" /></div>
<div style="text-align:center;">Figure 2.&nbsp; Red-Top 500 URLs; Blue Top 1000 URLs, shown<br />for comparison; Green-Fit to Top 500 URLs. (Days 1-20, <br />URLs ranked by visits).</div>
<p>The plot above shows the Top 500 URLs rollover about 0.5% per day from days 1 to 20.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/skippyrecords.wordpress.com/95/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/skippyrecords.wordpress.com/95/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/skippyrecords.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/skippyrecords.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/skippyrecords.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/skippyrecords.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/skippyrecords.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/skippyrecords.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/skippyrecords.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/skippyrecords.wordpress.com/95/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=95&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://skippyrecords.wordpress.com/2008/09/03/day-to-day-tall-head-of-url-exploration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fd95bd67cd406fcb27a627a44570f2a2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">drskippy27</media:title>
		</media:content>

		<media:content url="http://drskippy.net/img/tallheaddays_2008-09-03.png" medium="image">
			<media:title type="html">Tall Head URLs by day</media:title>
		</media:content>

		<media:content url="http://drskippy.net/img/tallheaddaysfit_2008-09-03.png" medium="image">
			<media:title type="html">Tall Head Top 500 Fit</media:title>
		</media:content>
	</item>
		<item>
		<title>The Tall Head of URL Exploration</title>
		<link>http://skippyrecords.wordpress.com/2008/08/29/the-tall-head-of-url-exploration/</link>
		<comments>http://skippyrecords.wordpress.com/2008/08/29/the-tall-head-of-url-exploration/#comments</comments>
		<pubDate>Fri, 29 Aug 2008 16:02:24 +0000</pubDate>
		<dc:creator>Dr. Skippy</dc:creator>
				<category><![CDATA[Networks and Webs]]></category>
		<category><![CDATA[long tail]]></category>

		<guid isPermaLink="false">http://h180745wp.setupmyblog.com/?p=93</guid>
		<description><![CDATA[In The Long Tail of URL Exploration, I looked at the distribution of URL visits by 102K people in a day.&#160; In What does the Nth Explorer of the Web Find?, I looked at how adding users grows the long tail and the number of unique URLs explored.&#160; In this 3rd (of 4) post, I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=93&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In <a title="long tail of url exploration" href="http://h180745wp.setupmyblog.com/2008/08/27/the-long-tail-of-url-exploration/">The Long Tail of URL Exploration</a>, I looked at the distribution of URL visits by 102K people in a day.&nbsp; In <a title="what does the nth explorer find" href="http://h180745wp.setupmyblog.com/2008/08/27/what-does-the-nth-explorer-of-the-web-find/">What does the Nth Explorer of the Web Find?</a>, I looked at how adding users grows the long tail and the number of unique URLs explored.&nbsp; In this 3rd (of 4) post, I look at the how the tall head changes as we add explorers.</p>
<p>The tall head is the short list of sites that get the most visits.&nbsp; We could call it the Top 10, Top 100 or whatever we think is relevant.&nbsp; Later I propose a couple of heuristics for determining tall head membership in real time.</p>
<p>The tall head is made up of URLs that much of the population visits.&nbsp; These are the &quot;winner take all&quot; URLs of user attention&#8211;URLs like cnn.com, google.com, or facebook.com.&nbsp; For this reason, we might expect that while the long tail is growing with more unique URLs and the number of URLs with 2-3 hits is growing rapidly, the tall head is relatively stable.</p>
<p>This is the case.</p>
<p>One way to think about how stable the tall head might be is to ask how well a subset of the population predicts membership in the tall head for the entire sample.&nbsp; The data from the last two posts is well-suited to look at this question.</p>
<p>Below is a plot of the accuracy of various subsets of the population (the same subsets we used previously) in predicting the Top 10, Top 20, Top 50 and Top 100 URLs of the entire population.&nbsp; Just over 40% percent of the population predicts the Top 100 URLs with 90% accuracy.&nbsp; The Top 10 are predicted to 90% accuracy by 10% of the population.</p>
<div style="text-align:center;"><img border="0" title="predicting the tall head" alt="predicting the tall head" src="http://drskippy.net/img/predtallhead_20080829" /></div>
<div align="center">Figure.&nbsp; Red-Top 10 URLs by visit; Blue Top 20 </div>
<div align="center">URLs by visit; Green-Top50 URLs by visits; Cyan-Top </div>
<div align="center">100 URLs by visits.</div>
<p>The composition of the tall head depends relatively weakly on the subset of the population doing the predicting.</p>
<p>How can I predict the tall head for the day by 9 am in the morning? This is the real-time problem of long tail distributions.&nbsp; The dynamics of the system are that real time Web exploration data appears as a time-ordered list of URLs from whatever users happen to be surfing.&nbsp; This means that a real time heuristic for determining top URLs for the day has to rely on the properties of the time series including a small surfer sample size and recent counts of visits.</p>
<p>Fortunately, by the results illustrated above, a small sample size is a pretty good bet for determining tall head URLs. What we are still missing is metrics or intuition for how the long tail distribution evolves over time.</p>
<p>We do know that for a URL to end up in the tall head, it must be visited by many Web explorers.&nbsp; This means that we can rule out all URLs that are visited by only one or two users.&nbsp; This assumption also leads to a heuristic based on the time between visits&#8211;URLs visited by many people should have the same visit/time distribution as the users/time distribution of the entire sample.&nbsp; More specifically, we might guess that if the time between visits has an average near 1 day/number of visits and relatively low variance, it is likely to be in the tall head.&nbsp; A project for a little later&#8230;</p>
<p>Next post: How does the composition of the tall head change from day to day?</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/skippyrecords.wordpress.com/93/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/skippyrecords.wordpress.com/93/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/skippyrecords.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/skippyrecords.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/skippyrecords.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/skippyrecords.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/skippyrecords.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/skippyrecords.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/skippyrecords.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/skippyrecords.wordpress.com/93/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=93&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://skippyrecords.wordpress.com/2008/08/29/the-tall-head-of-url-exploration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fd95bd67cd406fcb27a627a44570f2a2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">drskippy27</media:title>
		</media:content>

		<media:content url="http://drskippy.net/img/predtallhead_20080829" medium="image">
			<media:title type="html">predicting the tall head</media:title>
		</media:content>
	</item>
		<item>
		<title>What does the Nth explorer of the Web find?</title>
		<link>http://skippyrecords.wordpress.com/2008/08/27/what-does-the-nth-explorer-of-the-web-find/</link>
		<comments>http://skippyrecords.wordpress.com/2008/08/27/what-does-the-nth-explorer-of-the-web-find/#comments</comments>
		<pubDate>Wed, 27 Aug 2008 20:17:59 +0000</pubDate>
		<dc:creator>Dr. Skippy</dc:creator>
				<category><![CDATA[Networks and Webs]]></category>
		<category><![CDATA[long tail]]></category>

		<guid isPermaLink="false">http://h180745wp.setupmyblog.com/?p=91</guid>
		<description><![CDATA[In The Long Tail of URL Exploration, I looked at the distribution of URLs and visits. This was on the way to trying to answer questions like: How much overlap is there between the URLs 10 people visit and those in the 11th person&#8217;s click stream? How about the 100th or 100,000th person? Does the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=91&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://h180745wp.setupmyblog.com/2008/08/27/the-long-tail-of-url-exploration/" title="The Long Tail or URL Exploration">The Long Tail of URL Exploration</a>, I looked at the distribution of URLs and visits. This was on the way to trying to answer questions like:</p>
<ul>
<li>How much overlap is there between the URLs 10 people visit and those in the 11th person&#8217;s click stream? </li>
<li>How about the 100th or 100,000th person? Does the millionth user explore any unique URLs at all? </li>
<li>Can we build a model to answer How many people are required to crawl 10% of the Web?</li>
</ul>
<p>The second part of the answer is to look at how the model of URLs and visits evolves as we add users.&nbsp; To get samples with of different sizes using the same click stream data set, randomly select a subset of the users and run the analysis from the previous post.&nbsp; Through everyone back into the pot and randomly select a slightly larger set.&nbsp; Repeat.</p>
<p>I reran the model for 3%, 4%, 5%, 7%, 9%, 11%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of the overall users in the 1-day data set. The sample sized ranged from 3,000 to 91,200 users. For the entire data set, the average user made 184 URL visits during the day. In the randomly chosen subsets, users made an average of between 181 and 187 URL visits with most of the variation in the smaller sample size as expected.</p>
<p>Do I expect the number of unique URLs be linearly proportional to the number of users? Or if users are visiting many of the same URLs and URLs tend to have the &quot;winner take all&quot; properties we looked at before, we might expect the number of unique URLs added by the 90,000th user to be fewer than the number of unique URLs added by the 1,000th user.</p>
<p>I first plotted the number of unique URLs against the number of users in each sample.&nbsp; The curve looks straight but may be slightly concave downward.&nbsp; It is very subtly.&nbsp; I needed to look at the data in a way the amplified the change over the various subsamples.</p>
<p>Below is a plot of the number of unique URLs/user vs. the number of users. This line is flat if the number of URLs is growing linearly with the number of users.</p>
<div style="text-align:center;"><img height="331" width="331" border="0" src="http://drskippy.net/img/uniqperuser_20080827.png" alt="URLs per user" title="URLs per user" /></div>
<p>The blue curve is the best fit to another power function ( f(x)=ax^k ). The first few thousand users are contribute more original URLs (&gt;90 URLs per user) to the sample than the 100,000th (83 URLs).&nbsp; If you are the first explorer of the a new world, all of your discoveries are original; when you are a late comer, your contributions are around the margins. It may be surprising how much original content being explored by the 100,000th explorer.</p>
<p>Does the long tail get relatively longer or shorter?&nbsp; For simplicity, I use the URLs with only one visit to represent the long tail.&nbsp; Then ratio of 1-visit URLs to unique URLs decreases subtly. For the smallest samples size 70.0% of the unique URLs are hit only once; for the overall data set, the ratio is 69.2%. To amplify this change like above, the plot of 1-visit URLs per user is shown below.</p>
<div style="text-align:center;"><img height="331" width="331" border="0" src="http://drskippy.net/img/longtailperuser_20080827.png" alt="long tail per user" title="long tail per user" /></div>
<p>At 100,000 users, the long tail is growing at 57 URLs per additional user.&nbsp; The decrease with each additional user is slowing.&nbsp; The blue curve is the best fit to another power function.</p>
<p>If the power function is the best explanation of the underlying dynamics, the number of unique URLs and the long tail both continue to grow no matter how many people are exploring.&nbsp; Since an increasing number of people need to explore to keep the exploration rate constant, the cost of exploration per URL goes up as explorers are added.</p>
<p>Does anything interesting happen in the tall head where the big winners are? That will have to wait for another post.</p>
<p>&nbsp;</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/skippyrecords.wordpress.com/91/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/skippyrecords.wordpress.com/91/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/skippyrecords.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/skippyrecords.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/skippyrecords.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/skippyrecords.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/skippyrecords.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/skippyrecords.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/skippyrecords.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/skippyrecords.wordpress.com/91/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=91&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://skippyrecords.wordpress.com/2008/08/27/what-does-the-nth-explorer-of-the-web-find/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fd95bd67cd406fcb27a627a44570f2a2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">drskippy27</media:title>
		</media:content>

		<media:content url="http://drskippy.net/img/uniqperuser_20080827.png" medium="image">
			<media:title type="html">URLs per user</media:title>
		</media:content>

		<media:content url="http://drskippy.net/img/longtailperuser_20080827.png" medium="image">
			<media:title type="html">long tail per user</media:title>
		</media:content>
	</item>
		<item>
		<title>Is this long tail distribution a power law?</title>
		<link>http://skippyrecords.wordpress.com/2007/10/11/is-this-long-tail-distribution-a-power-law/</link>
		<comments>http://skippyrecords.wordpress.com/2007/10/11/is-this-long-tail-distribution-a-power-law/#comments</comments>
		<pubDate>Thu, 11 Oct 2007 15:45:07 +0000</pubDate>
		<dc:creator>Dr. Skippy</dc:creator>
				<category><![CDATA[Networks and Webs]]></category>
		<category><![CDATA[long tail]]></category>

		<guid isPermaLink="false">http://h180745wp.setupmyblog.com/?p=53</guid>
		<description><![CDATA[The discussion of power law vs long tail came up on Chris Anderson&#8217;s Long Tail blog a couple of days ago.&#160; &#34;Power law or not?&#34; vs &#34;Long-Tail or not?&#34; are separate questions. If I understand Chris&#8217;thesis, Long Tail is the idea that there is a significant population in the &#34;not-hit&#34; part of the distribution, usually [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=53&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div class="comment-content">The discussion of power law vs long tail came up on <a href="http://www.thelongtail.com/the_long_tail/2007/10/answer-facebook.html?cid=86055766#comment-86055766" target="_blank" title="Facebook apps are *not* a Long Tail">Chris Anderson&#8217;s <em>Long Tail</em> blog</a> a couple of days ago.&nbsp;</div>
<div class="comment-content">
<p>&quot;Power law or not?&quot; vs &quot;Long-Tail or not?&quot; are separate questions. If I understand Chris&#8217;thesis, Long Tail is the idea that there is a significant population in the &quot;not-hit&quot; part of the distribution, usually of low volume in any rank, but continuing out to very high ranks.</p>
<p>The idea that there is a region where the distribution is essentially &quot;scale free&quot; seems like the key concept. If we start there, interesting questions include: Can we characterize this region with a power law? And (my favorite), what are the dynamics of the system where scale matters? This last question is at the core of the economics of the long tail businesses in general. For example, determining how inexpensive we make &quot;find&quot; and &quot;acquire&quot; activities corresponds with the &quot;knee&quot; in the distribution.</p>
<p>Scale-free is a misleading mathematical idea in that nothing in nature is actually scale free for all domains. For example, an absurdity of assuming scale-free in every domain WRT music or movie hits is that anything created has at least 1 fan (i.e. we don&#8217;t have an arbitrarily small hit)&#8211;this introduces scale and consequently, the region of arbitrarily small hits with less than 1 fan can&#8217;t be modeled by a power law. That&#8217;s a toy case, but illustrates how much scale matters.</p>
<p>Instead of including all the points in the power law fit, maybe we can look at the points up to the knee in power law model (it looks like x~3) and then try to understand what interesting dynamics shape the knee for x&gt;3 with the assumption that some scaling has been introduced by cost, potential audience size limits, or whatever&#8230; </p>
</p></div>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/skippyrecords.wordpress.com/53/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/skippyrecords.wordpress.com/53/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/skippyrecords.wordpress.com/53/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/skippyrecords.wordpress.com/53/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/skippyrecords.wordpress.com/53/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/skippyrecords.wordpress.com/53/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/skippyrecords.wordpress.com/53/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/skippyrecords.wordpress.com/53/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/skippyrecords.wordpress.com/53/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/skippyrecords.wordpress.com/53/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=skippyrecords.wordpress.com&amp;blog=13069636&amp;post=53&amp;subd=skippyrecords&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://skippyrecords.wordpress.com/2007/10/11/is-this-long-tail-distribution-a-power-law/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/fd95bd67cd406fcb27a627a44570f2a2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">drskippy27</media:title>
		</media:content>
	</item>
	</channel>
</rss>
