Monday, August 20, 2018

Setting Robots.txt and Header Tags For Better Search Results

    How to Active and Setting Custom Robots.txt For SEO Friendly [2018 Update], Which URL Should not Crawl by Google?

    The robots.txt file is one of the most important elements for a website. This file allows and restricts search engines from crawling every page on the website.

    By activating this custom robots.txt, we will tell the robot the search engines google, bing, yandex and others, which pages can be shown in search results.

    But keep in mind, this robots.txt file is very sensitive and may not be arbitrary when inputting it. So that search engines can crawl and bring up search results perfectly, this is SEO Friendly.

    On the Blogger platform, creating a custom robots.txt file is easy. Not as complicated as if we use wordpress, which is actually a lot of web directories whose owners don't want to be crawled by google.

    Simple Blogger URL Directory

    In Blogger platform, as we know there are only a few directory addresses / link that exist, including: /search, /p, /year/month. And there are only 2 URL parameters, m=1 for mobile display and m=0 for non-mobile view.

    Of the several address directories, it is also 'broken down' into several types. Example:

    
    www.amp-blogger.com/search/label/AMP
    / * The URL above is the URL for the Label or category * /


    
    www.amp.blogger.com/2018/08/seo-seo-google.html
    / * The URL above is the URL for the Label or category * /


    
    www.amp.blogger.com/2018/08/seo-seo-google.html
    / * The URL above is a link for Article * /


    
    www.amp.blogger.com/2018/08/
    / * The URL above is a link for the Year and Month Archives. Usually archive this year and month the link has been 'turned off' * /


    From some of the Blogger directories and the example URL above, I recommend several URL directories that you must block from search engines. This is to prevent the Google search engine from crawling pages that make you get problems in search results later.

    Because this can cause crawling problems such as duplicate title and page description. So for that robots.txt on blogger, you must change it to make it more SEO Friendly.

    Let's proceed to the next step, Setting Robots.txt and Header Tags For Better Google Crawling.

    Setting Robots.txt and Custom Robots Header Tag in Blogger [AMP and Non AMP]


    Setting robots.txt 


    There are 2 ways that I will explain. First is the robots.txt setting on Blogger blog. Please go to the Blogger dashboard then click Setting. Then several tabs will open, select Search Preferences

    If you first set a robots.txt setting, there will be disabled. Click to activate Custom Robots.txt. Then the text area will appear, and enter the code for robots.txt below:

    
    User-agent: *
    Disallow: /search
    Allow: /

    Sitemap: https://www.yourblog.com/sitemap.xml
    Change the yellow mark with your blog address.

    After you enter the code for the Blogger robots.txt above, please click Save.

    Explanation:
    • User-agent: *, will cover the entire Googleboot robot. These include Googleboot, Google Mediapartners, GoogleBot-images, GoogleBoot-Video, GoogleBoot-News, and GoogleBoot-mobile.
    • Allow: / : means robots.txt allows search engines to crawl the entire URL or link in the website directory. With the exception of row disallow applied to robots.txt.
    • Disallow: / search, meaning that all urls that are in the /search directory will not be crawled by search engines. For blogger itself, row Dissalow: / search is highly recommended so that there are no duplicate titles and descriptions in search results.

    Setting Custom Robots Header Tags

    Robots header tags work almost the same as robots, but the Blogger interface is easier to use, because it doesn't need to enter any code. 


    Please see the picture above for perfect setting custom robots header tags.

    Please note that even a slight error in robots.txt can affect the search results of your blog. You can monitor search performance in Google Search Console.