The Google BERT Algorithm Just Got A Big Brother. Welcome, SMITH!

Google search results after the BERT and SMITH updates

In the ever-changing world of Google Algorithms, their goal remains the same, how can we serve the best possible results for a user query? Back in the earlier days of SEO (rip), it was simpler in the sense of mentioning the keyword in the right places throughout the content and the on-page optimisation. Nowadays we need to be providing more than that. It needs relevance, contextuality, synonyms, and be actually good at answering the search query AND the queries a user may have after the initial one. Waffle content simply doesn’t perform anymore let alone have the longevity needed for a solid SEO campaign. A part of that growth has been Google trying to better understand the context of content and certain parts of it. The BERT algorithm allowed them to do that in a way. Before we get onto SMITH, let’s talk through BERT first.

What Is The Google BERT Algorithm?

BERT (Bidirectional Encoder Representations from Transformers), is an algorithm used by Google to help them understand the context of words in search queries by training for natural language processing. The best way to understand how it works is by using an example. Here is one Google notes on a blog about using BERT to better understand search queries.

example of search results using BERT

Prior to BERT, Google couldn’t understand the context within the search and in particular, how important the word ‘to’ is when serving the correct results. The result prior to BERT conducted on Google.com (USA) served back the opposite result of what the user actually needed because it’s talking about how U.S. citizens can travel to Brazil without a visa. The result after applying BERT far better for the user’s needs. It’s directing them to the US Embassy for information on visa requirements when travelling FROM Brasil TO the U.S.

Summed up in short, the BERT Algorithm allows Google to contextualise words within sentences, something that it couldn’t reliably achieve before. The single-use of the word ‘to’ in this query makes all the difference to the query and we as humans understand that as we read it, but machines didn’t.

Google Announces It Intends To Index Specific Passages Within Content

Rewinding back to October last year, Google announced that they would be working to index individual passages within pieces of content, especially long-form content (something that we at Embryo are big fans of because of the potential it has), and I covered that in more detail on a blog you can find here. But in short, and you may have already seen it in search results already, is that Google links from search results to specific passages on a page as that provides the user with the best answer to their query. So say for example you have a 3000-word piece of content, a guide on insurance or something similar, and a user searches a specific question that you’ve covered in that content but that particular section is 2000-words down the page, Google knows where to send the user now because it’s worked to understand the context of that passage within the bigger piece of content.

Which Leads Us To SMITH! Google’s Latest Algorithm

It was inevitable to some extent that this was coming because BERT simply wasn’t strong enough to understand the content on such a bigger scale. So when the indexing of passages was announced in October, SEOs were fairly confident that another algorithm was working to achieve it. If you want to really dive into the technical side of the SMITH algorithm, you can check out Google’s research paper here which goes into a great amount of detail.

Whether it is being used yet is another question because Google doesn’t specifically state which of the algorithms it’s using at any time. However, the increase in passage linked search results suggests to me that it is.

BERT works to understand the context of a sentence, SMITH in comparison is working to understand the context of a passage within an entire document and also accurately predict what comes in the sentences that follow that initial query. What is the next question? SMITH is trained to try and work that out and push a user to the right content. Unlike BERT where it the limitations come into play as a piece of content gets longer, SMITH works at its best the longer the content is. It is important to note that SMITH is not a replacement for BERT. Both have their place in understanding content and they will work alongside each other.

A stand-out quote from the research paper:

“The SMITH model which enjoys longer input text lengths compared with other standard self-attention models is a better choice for long document representation learning and matching.”

So what does that mean for us? Not a great deal changes, if anything it just means that the content we write for ourselves and for our clients will be better understood and better represented in search results. Every piece of content we write is meticulously researched and designed to answer questions that searchers may have outside of the initial query. 3000-words written about insurance, for example, could just be waffle and words for the sake of words, but instead, it’s broken down to better help a user through a conversion journey. We were writing for SMITH before SMITH was even a thing. That’s what makes us the best.