Why Crawl Budget and URL Scheduling Might Impact Rankings in Website Migrations

why-crawl-budget-and-url-scheduling-might-impact-rankings-in-website-migrations

During a migration, many webmasters will notice that there is turbulence happens in PageRank, this is because all signals impacting rankings haven’t passed to the new pages yet, so they assume that PageRank was lost. Besides, Googlebot also needs to collect huge amounts of data for collation in logs, mapping and updated internally, and rankings which can fluctuate throughout this process. If you are a SEO service engineer or web developer, you may need to read the following passages to understand why website migration can impact on their PageRank.

Crawl Budget = host load + URL scheduling combined

URL scheduling is important since they will show what does Googlebot want to visit (URLs), and how often?” while host load is based around “what can Googlebot visit from an Ip/host, based on capacity and server resources?” Both of them still matter in migrations, together, these make up “crawl budget” for an IP or host.

This will not bring a lot of impact, if you only have few pages of websites, but this things terribly matter when you have an e-commerce of news site with tens of thousands, hundreds of thousands, or more URLs. Sometimes, even crawling tools prior to migration “go live,” cannot detect any wrongs but the result will show that there any rankings and overall visibility drops.

This can be caused by “any late and very late signals in transit”, rather than “lost signals.” In fact, some signals could even take months to pass since Googlebot does not crawl large websites like crawling tools do.

Change Management/Freshness is Important

Everyone knows that change frequency impacts crawl frequency and URLs change all the time on the web. Keeping probability of embarrassment for search engines (the “embarrassment metric”) by returning stale content in search results below acceptable thresholds is key, and it must be managed efficiently. In order to avoid any “embarrassment”, scheduling systems are made to prioritize crawling important pages which change frequently over less important pages, such as those with insignificant changes or low-authority pages.

These kinds of key pages will be easily seen by search engine users versus pages which don’t get found often in search engine results pages. This also shows that search engines learn over time the important change frequency on web pages by comparing the latest with previous copies of the page to detect patterns of critical change frequency.

Why can’t Googlebot visit migrated pages all at once?

The above explanation has given us two conclusions; first Googlebots usually arrive at a website with a purpose, a “work schedule,” and a “bucket list” of URLs to crawl during a visit. Googlebot will surely complete its bucket list and checks around to see if there is anything more important that the URLs on the original bucket list that may also need collecting.

Furthermore, if there is important URLs, Googlebot may go a little further and crawl these other important URLs as well. If nothing further important is discovered, Googlebot returns for another bucket list to visit on your site next time.

Since Googlebot is mostly focusing on very few (important) URLs,  wheterh you’ve recently migrated a site or not, with occasional visits from time to time to those deemed least important, or not expected to have changed materially very often.

Moreover, Googlebot will likely send a signal to tell us if there is a migration of some sort underway over there when Googlebot comes across lots of redirection response codes. Once again, mostly only the most important migrating URLs will get crawled as a priority, and maybe more frequently than they normally would, too. Due to this, it is importance to know several factors, aside from page importance and change frequency that would make URLs be visited. They are limited search engine resources, host load, and URL queues an low importance of migrating pages.