Optimising Your XML Sitemap

XML Sitemap Optimisation

XML sitemaps are an essential SEO best practice, for search engines they are an easy way for search engine crawlers to understand the structure of pages on your website. They also provide insight into how often a page is updated as well as when they were last changed. So just like the importance of optimising your web pages/posts for SEO, taking care of your XML sitemap is important and below I have highlighted some key takeaways for ensuring a healthy, optimised sitemap.

 

The Basics

Most major website platforms have XML sitemap functionality built in to them, Yoast & RankMath have this functionality for WordPress as well as a myriad of other plugins available that can independently generate a sitemap for you. Automated sitemap creation like the plugins above allow new pages and posts added to a website to be automatically included in to the sitemap, compared to manually creating a sitemap where you will have to update this manually once new content is added. If using the plugin method which we recommend, the sitemap is usually found at the following URL: https://www.myawsomesite.com/sitemap_index.xml. From there the sitemap is split in to a page sitemap, posts sitemap and then following are the individual sitemaps for other sections of the website you may have such as careers or staff pages.

 

Letting search engines know of your sitemap is easy, with Google Search Console & Bing Webmaster Tools you can input the URL of your XML sitemap within the dashboards, both will then start to process your sitemap and notify of any errors found. It’s also best practice to reference your sitemap in your website’s robots.txt. This is because the robots.txt file is visited by crawlers first so they can check on sections of the website that they are allowed and unallowed to crawl, it provides you with a way to notify crawlers early where they can find and process your sitemap. Typically this is referenced like so in the robots.txt file:

 

User-agent: *

Disallow: /wp-admin/

Allow: /wp-admin/admin-ajax.php

 

Sitemap: https://www.embryodigital.co.uk/sitemap_index.xml

 

What Should Be in My Sitemap?

Here comes the optimisation part. Your sitemap should contain only pages that you want in search results, pages that are high quality filled with unique content, images & videos and prompt user engagement. Pages I usually exclude from the sitemap are:

 

  • Cart & checkout pages
  • Login & register form pages
  • Blog tag pages
  • Blog category pages

 

Naturally cart and checkout pages aren’t pages that users will be using search engines to find and the content of these pages is relatively thin, therefore I recommend excluding this from the sitemap and setting these pages to noindex, similar to these pages, login and registration pages generally follow the same rule. Blog tag and category pages are pages I usually exclude from the sitemap on certain websites as they are generally duplicates of the main blog/news page. Some sites where this rule differs for blog categories is where content is correctly categorized and correctly linked to, otherwise the uncategorized category is very similar to the blog page.

 

Both plugins make it easy to disable pages from the sitemap, RankMath has a feature built in to it’s sitemap settings page that allows your to include/exclude types of posts and pages in the sitemap, Yoast has a similar solution and with both plugins you can set individual pages to noindex and also exclude from the sitemap.

RankMath Sitemap Settings

Remember to noindex pages using the plugin which adds a noindex tag to the page, setting pages to be disallowed in the robots.txt file means pages will still appear in the Google index.

 

Attachment Pages

Everytime you upload an image or video to WordPress, an attachment page is made for the new media which has been added. This page consists of the title of the image or video followed by the image or video itself. Attachment pages aren’t of any use, uploading media to WordPress is generally to be added to a page, post or current content. They are also regarded as thin content, the pages don’t have a lot of content on them apart from the title and image so it is recommended that attachment pages are redirected back to the original image or video file that’s uploaded to the server. Yoast & RankMath easily allow you to change this setting so that attachment pages don’t appear in search results or within the sitemap.

RankMath Redirect Attachments

Setting in RankMath

Yoast Attachment Redirect

Setting in Yoast

 

Images in Sitemaps

It is recommended to include your images in sitemaps, both plugin solutions should allow you to change a setting to include this. By including your images you have a much better chance of ranking images from your website in search results. Also if your images are optimised with relevant filenames, crawlers will further be able to understand how a given image fits in the context of the page.

 

What are Orphaned Pages?

A number of SEO tools report on orphaned pages that are found in the website’s XML sitemap. Orphaned pages are pages which don’t have at least 1 internal link from other pages of the website to it but have been included in the sitemap itself. From analysis this is commonly when blog tag or blog category pages have been included in the sitemap but aren’t used on the website and therefore aren’t linked to from other pages. In this case we would exclude the blog category and blog tags pages from being included in search results and the sitemap. If we audited the list of orphaned pages and found a page that’s high quality and useful to users, the next step for this would be to build some internal links for this.

 

Summary

Optimising your sitemap is something that shouldn’t be overlooked. By defining which pages or posts you include in your sitemap based on quality of content and search intent is a step at ensuring thin content and other unnecessary pages don’t hamper your results. It also gives search engine crawlers the ability to crawl your website more efficiently by only ensuring that the pages and posts with high quality content and search intent are crawled.