05 August 2011 by Web Bureau
Earlier this week, Google launched a functionality update to its ‘Fetch as Googlebot’ feature, contained within its Webmaster Tools to speed up indexing of new or updated web pages.
The Fetch as Googlebot feature now provides a way to submit new and updated URLs to Google for indexing. After you fetch a URL as Googlebot, if it issuccessful, you have the option to submit that URL to Google’s index. Googlebot will crawl the URL, usually within a day and consider it for inclusion in their index. However there is no guarantee that every URL submitted in this way will be indexed, the URL will have to be assessed in the usual manner to evaluate its suitability.
When to use ‘Fetch as Googlebot’?
An XML Sitemap from a site with solid internal link architecture and external links to pages through the site is still the best way to provide a comprehensive list of URLs to Google and encourage Google to crawl and index those pages. However this improved functionality is useful if you’ve just launched a new site, added some important new pages or updated existing indexed pages. It could also help if you’ve accidentally published information that you didn’t mean to, and want to update Google’s cached version after you’ve removed the information from your site.
How Google Crawls & Indexes the Web
Google prioritises crawling of the URL’s on its list based on many factors, including:
1. PageRank – This is Google’s system of counting link votes and determining which pages are most important based on them. These scores help to determine if a page will rank well in a search.
2. Frequency of content changes
3. Importance of indexing a new page e.g. a site’s news section
Once a page is crawled, Google then goes through another algorithmic process to determine whether to store the page in their index. Google therefore doesn’t crawl every page they know about and don’t index every page they crawl.
How Google Finds Pages to Crawl
1. Links – This makes it vital to ensure that you link to every page on your site internally, as you may not have an external link to every page.
2. RSS Feeds – RSS feeds benefit publishers by letting them syndicate content automatically. A standardised XML file format allows the information to be published once and viewed by many different programs.
3. XML Sitemaps – A sitemap enables you to submit a complete list of URLs to Google and Bing. Again, the search engines don’t guarantee that they’ll crawl every URL submitted, but they do feed this list into their crawl scheduling system.
4. Public Requests – Google’s “Add URL” form which was available for searchers to request that a URL be added to the index, has been renamed CRAWL URL. Once you log in to it with a Google Account you can then submit up to 50 URLs a week for any site, not just those on a site you’ve verified you own.
For further information on Search Engine Optimisation and Online Marketing email firstname.lastname@example.org