<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Microsoft Sucks Bandwidth</title>
	<atom:link href="http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/feed/" rel="self" type="application/rss+xml" />
	<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/</link>
	<description>A technical tumblelog of links and articles on programming, design, and other geek interests</description>
	<lastBuildDate>Wed, 02 Jun 2010 20:28:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Matt Cutts: Gadgets, Google, and SEO &#187; Crawl caching proxy</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-1329</link>
		<dc:creator>Matt Cutts: Gadgets, Google, and SEO &#187; Crawl caching proxy</dc:creator>
		<pubDate>Sun, 23 Apr 2006 21:07:38 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-1329</guid>
		<description>&lt;p&gt;[...] As part of the Bigdaddy infrastructure switchover, Google has been working on frameworks for smarter crawling, improved canonicalization, and better indexing. On the smarter crawling front, one of the things we&#8217;ve been working on is bandwidth reduction. For example, the pre-Bigdaddy webcrawl Googlebot with user-agent &#8220;Googlebot/2.1 (+http://www.google.com/bot.html)&#8221; would sometimes allow gzipped encoding. The newer Bigdaddy Googlebots with user-agent &#8220;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#8221; are much more likely to support gzip encoding. That reduces Googlebot&#8217;s bandwidth usage for site owners and webmasters. From my conversations with the crawl/index team, it sounds like there&#8217;s a lot of head-room for webmasters to reduce their bandwith by turning on gzip encoding. [...]&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p><a rel="tag" target="_new" href="http://google.com/search?q=\%20&amp;btnI=">&#8230;</a> As part of the Bigdaddy infrastructure switchover, Google has been working on frameworks for smarter crawling, improved canonicalization, and better indexing. On the smarter crawling front, one of the things we&#8217;ve been working on is bandwidth reduction. For example, the pre-Bigdaddy webcrawl Googlebot with user-agent &#8220;Googlebot/2.1 (+http://www.google.com/bot.html)&#8221; would sometimes allow gzipped encoding. The newer Bigdaddy Googlebots with user-agent &#8220;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#8221; are much more likely to support gzip encoding. That reduces Googlebot&#8217;s bandwidth usage for site owners and webmasters. From my conversations with the crawl/index team, it sounds like there&#8217;s a lot of head-room for webmasters to reduce their bandwith by turning on gzip encoding. <a rel="tag" target="_new" href="http://google.com/search?q=\%20&amp;btnI=">&#8230;</a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: ChipCuccio.US</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-327</link>
		<dc:creator>ChipCuccio.US</dc:creator>
		<pubDate>Wed, 31 Aug 2005 23:27:34 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-327</guid>
		<description>&lt;p&gt;&lt;strong&gt;No more MSNbot&lt;/strong&gt;&lt;/p&gt;

&lt;pre class=&#039;prettyprint&#039;&gt;&lt;code&gt;Today I added the following rule/exclusion from my robots.txt;
     User-agent: msnbot
 Disallow: /
Why? Because MSN&#8217;s spiders request too many resources, too frequently. MSN spiders don&#8217;t play nice, and I&#8217;ve had my eye on them...
&lt;/code&gt;&lt;/pre&gt;
</description>
		<content:encoded><![CDATA[<p><strong>No more MSNbot</strong></p>

<pre class='prettyprint'><code>Today I added the following rule/exclusion from my robots.txt;
     User-agent: msnbot
 Disallow: /
Why? Because MSN&amp;#8217;s spiders request too many resources, too frequently. MSN spiders don&amp;#8217;t play nice, and I&amp;#8217;ve had my eye on them...
</code></pre>]]></content:encoded>
	</item>
	<item>
		<title>By: mx</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-326</link>
		<dc:creator>mx</dc:creator>
		<pubDate>Fri, 26 Aug 2005 17:25:38 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-326</guid>
		<description>&lt;p&gt;It also looks like MSNBot isn&#039;t caching things very well.  If it&#039;s re-grabbing all the pages on a daily basis, then they&#039;re not looking at the last-modified header.  It also looks like the bot is re-reading resources (images) that are on each page.  Looks like a very beta-version, or maybe they just don&#039;t care.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>It also looks like MSNBot isn&#8217;t caching things very well.  If it&#8217;s re-grabbing all the pages on a daily basis, then they&#8217;re not looking at the last-modified header.  It also looks like the bot is re-reading resources (images) that are on each page.  Looks like a very beta-version, or maybe they just don&#8217;t care.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Fisher</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-325</link>
		<dc:creator>Steven Fisher</dc:creator>
		<pubDate>Fri, 26 Aug 2005 17:00:30 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-325</guid>
		<description>&lt;p&gt;Well, you know me well enough to know I&#039;m not a Microsoft apologist. Still, I&#039;d like to believe that they could get something like a spider correct.&lt;/p&gt;

&lt;p&gt;The evidence seems to be against it, though. Thanks for the link. Just to be safe, I&#039;ve added a line to robots.txt excluding msnbot... hopefully their spider works at least that well by now.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Well, you know me well enough to know I&#8217;m not a Microsoft apologist. Still, I&#8217;d like to believe that they could get something like a spider correct.</p>

<p>The evidence seems to be against it, though. Thanks for the link. Just to be safe, I&#8217;ve added a line to robots.txt excluding msnbot&#8230; hopefully their spider works at least that well by now.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: mx</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-324</link>
		<dc:creator>mx</dc:creator>
		<pubDate>Fri, 26 Aug 2005 16:47:49 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-324</guid>
		<description>&lt;p&gt;I can find quite a few complaints from Google too: [google: MSNBot hits greedy].&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I can find quite a few complaints from Google too: <a rel="tag" target="_new" href="http://google.com/search?q=msnbot+hits+greedy&amp;">MSNBot hits greedy</a>.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: mx</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-323</link>
		<dc:creator>mx</dc:creator>
		<pubDate>Fri, 26 Aug 2005 16:46:22 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-323</guid>
		<description>&lt;p&gt;From what I can see the hits are all from a single group of IPs that reverse-map to Microsoft (sampling a few daily logs).  I&#039;ve read elsewhere that the MSNBot is aggressive, but only this month has it eclipsed GoogleBot on my site.  It may be possible that the MSNBot is buggy, or that someone has hijacked some machines in their IP range (though that seems unlikely).&lt;/p&gt;

&lt;p&gt;I would exclude it from crawling my site, except that the exposure is probably not a bad thing.  It just seems that they&#039;re beeing a bit greedy (to the point of insanity) with their crawling.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>From what I can see the hits are all from a single group of IPs that reverse-map to Microsoft (sampling a few daily logs).  I&#8217;ve read elsewhere that the MSNBot is aggressive, but only this month has it eclipsed GoogleBot on my site.  It may be possible that the MSNBot is buggy, or that someone has hijacked some machines in their IP range (though that seems unlikely).</p>

<p>I would exclude it from crawling my site, except that the exposure is probably not a bad thing.  It just seems that they&#8217;re beeing a bit greedy (to the point of insanity) with their crawling.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Fisher</title>
		<link>http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/comment-page-1/#comment-322</link>
		<dc:creator>Steven Fisher</dc:creator>
		<pubDate>Fri, 26 Aug 2005 15:58:40 +0000</pubDate>
		<guid isPermaLink="false">http://warpedvisions.org/2005/08/26/microsoft-sucks-bandwidth/#comment-322</guid>
		<description>&lt;p&gt;Is that based strictly on the user agent? If so, it is probably spammers crawling with the same user agent string.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Is that based strictly on the user agent? If so, it is probably spammers crawling with the same user agent string.</p>]]></content:encoded>
	</item>
</channel>
</rss>
