How to technically have a live test website and avoid an SEO fiasco

Alexis Wilke's picture
Categories:

Introduction

Test written on a board with a chalkWhenever I create a new system I have at least one test website. In order for others to be able to also run tests on that site, I make the site public. What that means is that Google is going to see the website and index it, whether you have a sitemap.xml or not.

The ways to avoid problems, mainly because of duplicate content, is to protect your site.

You have several solutions that may be more or less easy for you to implement. If you have full control of the server, it will certainly be easy. If you don't, it will depend on how much control you are given.

Meta Tag (noindex)

Most of the time people enter a meta tag in the <head> block.

The meta tag will look like this:

<meta name="robots" value="noindex,noarchive"/>

Some CMS give you a way to add meta tags, others give you the ability to edit their theme and that includes access to the <head> tag.

Apache2 Server (noindex)

When you have access to the Apache2 server, you may instead add an HTTP header. This is not more authoritative or anything like that, however, it makes it very easy to make sure all pages have at least those robots values.

The syntax in Apache2 is as follow:

Header set X-Robots-Tag noindex,noarchive

The name of the header to set is "X-Robots-Tags", It accepts the same values as the meta tag.

Google actually supports a few other options. One of which is to define when the page will be removed, some people do that after an event, or when an offer expires. Google will certain check the page again before removing it, but it will be able to schedule a new check around the right time.

Header set X-Robots-Tag unavailable_after: 31 Dec 2017 23:59:59 PST

You can add the Header directive at pretty much all levels (Server Wide, specific to a VirtualHost, or a Directory)

.htaccess file (noindex)

Many people have access to the .htaccess file. The Header directive can be used in the .htaccess file too.

The syntax is the same as in the Apache2 Server section above.

Apache2 Server (filter by IP)

Another solution, assuming that all your testers have a static IP address, is to add a restriction to Apache2. Users with a non-compatible IP address will be give a 403 Forbidden error, including GoogleBot and other search engine robots.

It may be difficult to setup in your environment. In part because many employees work remotely and each have their own IP address at home or in a cafe and those IP addresses will not be static or not registered. It can be a difficult management problem, for example when an employee leaves, we want to remove her IP address access.

In the old days we used the Deny and Allow (in that order) to do that:

Order deny,allow
Deny from all
Allow from 1.2.3.4

These directives have been removed from Apache2 in the latest version. There is a module to make it work, but it is not recommended.

The new way uses a new directive named Require. I think that directive is much easier to use (no strange negative: or order to follow.)

Require ip 1.2.3.4

Replace the 1.2.3.4 with your static IP address.