The robots.txt file informs search engines which pages on your website should be crawled. This can easily be edited with the Rank Math SEO plugin. If you aren’t already using Rank Math on your website – learn more & get started here.
In this tutorial, we’ll show you how you can edit your robots.txt file with the help of Rank Math.
Table of Contents
1 Why Robots.txt is Important?
Before getting started with editing the robots.txt file, let us try to understand its importance.
When search engine crawlers or any bots land on your website, they would first look for the presence of the robots.txt file, as it would contain important instructions on how search engines should crawl your website.
Although most bots would honor your request, some malware bots, and email scraping bots aren’t likely to follow the instructions from robots.txt. But that said, these bad bots, in most cases, have barely any traffic volume, to begin with, and hence it is safe to ignore such bots.
2 How to Edit Your Robots.txt With Rank Math
Rank Math makes it possible to edit your robots.txt file right inside your WordPress dashboard, by creating a virtual file. If you prefer to edit your robots.txt file using Rank Math, you’ll need to delete the actual robots.txt file (if any) from your website’s root folder using an FTP client.
Now that said, to edit your robots.txt file with Rank Math, you can follow the below steps.
2.1 Navigate to Edit Robots.txt
To begin with, log in to your WordPress website and make sure you’ve switched to the Advanced Mode from Rank Math’s dashboard.
Navigate to your robots.txt file in Rank Math which is located under WordPress Dashboard → Rank Math SEO → General Settings → Edit robots.txt as shown below:
2.2 Add Code in Your Robots.txt
By default, Rank Math would automatically add a set of rules (including your Sitemap) to your robots.txt file. But you can always add/edit the code as you prefer in the available text area.
If you’re unsure about the rules available to use with your robots.txt file, hold on, as we will also discuss them shortly in this article. And if you ever need a copy of the default robots.txt rules, you can refer to them here.
2.3 Save Your Changes
Save your changes by clicking on Save Changes once you have made the necessary changes to the file.
Caution: Please be careful while making any major or minor changes to your website via robots.txt. While these changes can improve your search traffic, they can also do more harm than good if you are not careful.
3 Robots.txt Rules
Now that we’re aware of how to edit the robots.txt with Rank Math, let us dive into the rules that you can add to your robots.txt file.
A rule (or a directive) in robots.txt is simply an instruction to the crawler on what pages to be indexed. It is a simple way of telling, ‘Hey crawler, you should be crawling these pages, but not the pages from those directives‘ and so.
When it comes to adding rules to your robots.txt file, there are some general guidelines you should be following:
- A robots.txt file can have one or more groups, and each group consists of multiple rules.
- Each group begins with a User-agent and then specifies which directories or files the agent can access and cannot access.
- By default, it is assumed that a user-agent can crawl any page on your website unless you specifically block access using the disallow rule.
- Rules are case-sensitive.
#
character marks the beginning of the comment.
And now, let’s look into the different rules (or directives) that can be used in robots.txt:
User-agent | The rule indicates the web crawler (or the bot) the group is targeting. The complete lists of Google user agents and Bing user agents are available here. |
Disallow | The rule refers to the directory or page on your website that you don’t want the user agent to crawl. |
Allow | The rule refers to the directory or page on your website that you want the user agent to crawl. |
Sitemap | The rule indicates the sitemap of the website and should be entered as a fully qualified URL. Though it is optional, it is a good practice to have one. |
Please note, any line in your robots.txt that does not match the above directives are completely ignored. And all directives except sitemap will accept the wildcard * as a prefix, suffix, or the entire string. $ is another wildcard that is honored by both Google & Bing, and it indicates the end of the URL.
4 Robots.txt Rule Examples
Although there are only a handful of rules allowed in robots.txt, it is still easier to make mistakes. Hence, we have got some examples of robots.txt rules that you can use right away.
4.1 Disallow Crawling of the Entire Website
This disallow rule will prevent all the bots from crawling your entire website. The /
here represents the root of the website directory and all the pages that branch out from. Thus, it includes the homepage of your website, and all the pages being linked from it.
We don’t recommend using this rule on a live website, as search engine crawlers will not crawl and index your website. But that said, this rule finds its usage in the development and staging sites, where you wouldn’t want the crawlers to access and index the contents of the site.
User-agent: *
Disallow: /
4.2 Disallow Crawling of the Entire Website for a Specific Bot
Instead of blocking access to all the web crawlers, if you wish to secure your website access from specific crawlers, replace the wildcard in the User-agent with the name of the crawlers. For example, the following rule will block access to the Adsbot of Google.
User-agent: AdsBot-Google
Disallow: /
And, if you want to mention explicitly that other bots can crawl your website, then use the following group of rules.
User-agent: AdsBot-Google
Disallow: /
User-agent: *
Allow: /
4.3 Rank Math Default Robots.txt Rules
Rank Math, by default, includes the following rules in the robots.txt file. If you ever happen to remove these rules from your robots.txt but later want to include them, you can copy-paste the following rules. Make sure to replace yoursite.com
with your domain name.
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: yoursite.com/sitemap_index.xml
4.4 Disallow Access to Specific Directory
To block access to specific directories on your website, you can include the relative path of the directory with the disallow rule. For instance, if you want to block access to the feed pages on your website, you can include the rule as follows.
User-agent: *
Disallow: */feed/
4.5 Disallow Crawling of Files of a Specific File Type
If you want to stop search engine crawlers from accessing specific file types, then you may consider using the following rule. But please note, blocking CSS and JS files is not recommended by Google, as it would prevent the rendering of the page for Google, and this can potentially affect your search traffic.
User-agent: *
Disallow: /*.pdf$
4.6 Disallow AI Models & Chatbots Using Your Content
If you do not want Google to use your website’s content to train its Bard and Vertex AI models, you can disallow its user-agent. Please note this will not prevent Google from crawling and indexing your website’s content for search results.
User-agent: Google-Extended
Disallow: /
Similarly, if you want to prevent your content from being used for training an AI model, you can disallow the corresponding user agent. For instance, you can block OpenAI from using your content to train its AI models (including ChatGPT) using the below rule.
User-agent: GPTBot
Disallow: /
Note: This rule will also block ChatGPT users from using the ChatGPT-User
bot to browse your site.
Conclusion — Edit Your Robots.txt File & Validate
And, that’s it! We hope the tutorial helped you edit your robots.txt file using Rank Math. But please note, you can also manually create & edit your robots.txt file, and if you prefer that way, you can simply upload the file to the root folder of your website (on your server) and edit it — if you’re unsure of where to upload the file, you can always check with your web host for further assistance.
Once you’ve edited the robots.txt file, you can always try to simulate the crawling of pages on your website with Google bots and check if they can access the page or not using robots.txt tester.
If you still have any questions about editing your robots.txt file with Rank Math or facing issues while editing, you’re always more than welcome to contact our dedicated support team, and we’re available 24/7, 365 days a year…