Google’s Tiered Index System

collection of hard disk drives

In a recent episode of Google’s Search Off The Record Podcast, Google’s own Gary Illyes revealed how Google uses a tiered system for it’s index. The most popular content is indexed on faster, more expensive storage compared to the bulk of the index which is indexed on a cheaper and easily replaceable on failure storage system.

In the SEO world, there is no way to optimise for your content to be indexed on the faster index tier or anyway to know for sure which tier your content is indexed on, however this does give a good insight into how Google’s index functions.

The Tiers

Gary Illyes confirmed that Google indexes content on the following 3 types of storage:

 

  • RAM (Random Access Memory) – Fastest and most expensive storage option
  • SSD (Solid State Drive) – Very fast and moderately expensive, the middle ground
  • HDD (Hard Disk Drive) – Slowest but cheapest and easiest to replace on failure

 

Google reserves RAM indexing for content/documents that are likely to be served in search results frequently. The amount of Google’s index that is served using the faster storage options such as RAM and SSD represents an incredibly small amount of Google’s overall index. Gary quotes: 

“So for example, for documents that we know that might be surfaced every second, for example, they will end up on something super fast. And the super fast would be the RAM. Like part of our serving index is on RAM. Then we’ll have another tier, for example, for solid state drives because they are fast and not as expensive as RAM. But still not– the bulk of the index wouldn’t be on that. The bulk of the index would be on something that’s cheap, accessible, easily replaceable, and doesn’t break the bank. And that would be hard drives”.

The cost for running even a small percentage of the index from RAM and other faster options will be significantly higher compared to HDD, however based on the popularity of certain content/documents for certain searches can potentially be justified.

So How Are The Tiers Chosen?

Gary states:

“When we build our index, and we use all those signals that we have. Let’s pick one, say, page rank, then we try to estimate how much we would serve those documents that we indexed. So will it be like every second? Will we have a query that triggers those docs? Or will it be once a week or will it be once a year? And based on that, we might use different kinds of storages to build the index.”

We suspect that documents around popular subjects, most recently the US Presidential Election and Coronavirus will have been placed on the faster index tiers. The search demand around these subjects will be high enough for Google to want to impact any delay in serving the documents to searchers.

Summary

Although you can’t include optimising by tier into an SEO campaign, this is another rare,  interesting insight into how Google’s index is tiered based on the demand in searches.