Snap! Websites
Work in progress
Snap! C++
Work in progress
Snap! C++
Submitted by Alexis Wilke on Fri, 10/06/2017 - 16:17
|
Whenever I create a new system I have at least one test website. In order for others to be able to also run tests on that site, I make the site public. What that means is that Google is going to see the website and index it, whether you have a sitemap.xml or not.
The ways to avoid problems, mainly because of duplicate content, is to protect your site.
You have several solutions that may be more or less easy for you to implement. If you have full control of the server, it will certainly be easy. If you don't, it will depend on how much control you are given.
Most of the time people enter a meta tag in the <head> block.
The meta tag will look like this:
<meta name="robots" value="noindex,noarchive"/>
Some CMS give you a way to add meta tags, others give you the ability to edit their theme and that includes access to the <head> tag.
When you have access to the Apache2 server, you may instead add an HTTP header. This is not more authoritative or anything like that, however, it makes it very easy to make sure all pages have at least those robots values.
The syntax in Apache2 is as follow:
Header set X-Robots-Tag noindex,noarchive
The name of the header to set is "X-Robots-Tags", It accepts the same values as the meta tag.
Google actually supports a few other options. One of which is to define when the page will be removed, some people do that after an event, or when an offer expires. Google will certain check the page again before removing it, but it will be able to schedule a new check around the right time.
Header set X-Robots-Tag unavailable_after: 31 Dec 2017 23:59:59 PST
You can add the Header directive at pretty much all levels (Server Wide, specific to a VirtualHost, or a Directory)
Many people have access to the .htaccess file. The Header directive can be used in the .htaccess file too.
The syntax is the same as in the Apache2 Server section above.
Another solution, assuming that all your testers have a static IP address, is to add a restriction to Apache2. Users with a non-compatible IP address will be give a 403 Forbidden error, including GoogleBot and other search engine robots.
It may be difficult to setup in your environment. In part because many employees work remotely and each have their own IP address at home or in a cafe and those IP addresses will not be static or not registered. It can be a difficult management problem, for example when an employee leaves, we want to remove her IP address access.
In the old days we used the Deny and Allow (in that order) to do that:
Order deny,allow Deny from all Allow from 1.2.3.4
These directives have been removed from Apache2 in the latest version. There is a module to make it work, but it is not recommended.
The new way uses a new directive named Require. I think that directive is much easier to use (no strange negative: or order to follow.)
Require ip 1.2.3.4
Replace the 1.2.3.4 with your static IP address.