Document status:
This is version 1.1 and includes corrections and suggestions from Michael Radwin to whom I am most grateful. Any further feedback, positive, negative or indifferent is welcome.
Phil Archer
Internet Content Rating Association
Brighton, UK
19th April 2002
In order to label a site effectively, each and every page on the site needs to carry a label... which is wholly impractical for the majority of professionally run, commercial sites. There are essentially three options:
This document sets out the technical detail of the most effective, efficient and flexible way we know to label a large website or network of properties. It details how to set ratings labels to be sent with content at all levels from "default" which would be applied to all content served from a given physical server through to individual page level via domain names, branches and file groups.
If Apache servers are used (the majority of websites use this software), it is possible to put in place rules such as: any URL that includes the word chat should have this label while any page that ends in -pg should have this label. These labels can even be maintained and stored in a separate file if required.
The other important server - Microsoft IIS - provides a great deal of flexibility too.
Whilst this document gives technical detail, the policy implications are explained in each case.
The key concept behind labelling websites is pretty straightforward - content is delivered to the client with a set of encoded descriptors which filtering software can block or allow, depending on parental settings. Sounds like a censor's dream? No - and here's why:
In order for content to be labelled, the rating must either be delivered with the content, or already be in the filter's cache. If a filter application is to store a label in cache, extra data needs to be sent telling the filter which URLs are covered by the label. Configuring your server to include PICS labels in the HTTP header means that no such extra data need be sent - just send the rating with each file and that's it.
When the PICS standard was developed by the W3C, it was assumed that this would be the method by which labels were delivered. Meta tags are a workaround for people who don't have access to their server configurations.
The information below is enough (just) for you to simply write an ICRA rating label - and ICRA doesn't mind a bit if you do that. However, there are easier ways and, from our point of view, preferable ways.
At the time of writing the ICRA website includes a labelling engine which interfaces directly with our database. Alternatively, there is what we call the Offline Form - a self-contained HTML/JavaScript tool which you can use to bash out ICRA labels and HTML meta tags at will. Further tools, including variations of the above will be available in the (very) near future.
At present, these tools both turn out meta tags from which you will need to extract the key elements for use in HTTP headers. The key thing is that they include the rating questionnaire - tick the boxes, hit go and out comes your label.
Use of ICRA labels, whether generated automatically or written by hand, are subject to the organization's terms and conditions which are available on the ICRA website.
A basic PICS label takes the following form:
(pics-1.1 "RATING SERVICE URL" l r (RATING))
The elements here are:
pics-1.1 Defines which version of PICS we're using
RATING SERVICE URL A quoted URL that is always in double quotes (which plays merry havoc with the web authoring tools but never mind). As it is a URL, it serves as a unique identifier for the rating service as well as being a location from which information about the service can be obtained. In ICRA's case, the rating service URL is https://icra.org/ratingsv02.html.
l This is a lower case "L" and is short for labels (optionally you can write the word labels in full). This declares the beginning of the label or list of labels that follow, all of which use the defined rating service.
r Short for ratings (which optionally you can write in full). This is the actual rating according to the rating service.
Which leads us to our first example complete ICRA label:
'(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1))'
The ratings shown in this example are ICRA-code for "none of the above" in all categories. So this label is making a positive statement that the site contains:
The codes used in the ratings are explained on the ICRA website at https://icra.org/decode.html.
As mentioned earlier, if labels are used within an HTML meta tag, additional information is added to control how filtering applications should cache and apply those labels to content which is not individually labelled.
As ICRA grew out of the old RSACi rating system, which at the time of writing is still the default service embedded within Microsoft Internet Explorer's Content Advisor, and is included in other older software (NetWatch in Netscape 4.x, NetNanny and CyberPatrol), it is advisable, but not essential to include RSACi elements too. Such a combined label looks like this:
'(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
Again, the RSACi ratings, which are scalar and subjective in nature, are detailed at https://icra.org/decode.html. It is perfectly possible to extend these labels to include elements from other PICS ratings services if you wish.
Just to make this explicit - if the label generator you are using doesn't have the option to just generate the string you need for use in HTTP headers, you'll get a meta tag that looks like the one below. The highlighted elements are needed for HTTP response headers.
<meta http-equiv="pics-label" content='(pics-1.1 "https://icra.org/ratingsv02.html" l gen true for "hostname" r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true for "hostname" r (n 0 s 0 v 0 l 0))'>
The following explanation assumes you have at least a grounding in Apache configuration.
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
Put this in your config file outside any block directive and the job's done. Every file served will include this label in its HTTP header.
The elements in this are as follows:
Header set pics-label: Fairly self explanatory - this tells Apache to set the value of pics-label header to the following value. By using set, as I recommend in all cases rather than append or add, any previously set label is overwritten.
'(pics-1.1 "http://www...)' The label itself, or as far as Apache is concerned, the value of the pics-label header. Notice that it is enclosed in single quotes. You must use single and double quotes as shown here. Unusually for coding, PICS does not permit you to swap their usage.
HTTP Response Headers can be set within the following block directives:
<none> i.e. act as a default
<Directory> and <DirectoryMatch>
<Files> and <FilesMatch>
<Location> and <LocationMatch>
These block directives support wildcards - that is, "?" to match a single character and "*" to match any number of characters; as well as Regular Expressions for detailed pattern matching. Only <Files> and <FilesMatch> can be set within a .htaccess file. We'll return to these issues shortly.
HTTP Response Headers cannot be set in a <VirtualHost> block directive.
The order of the above list is important. <Directory> is overridden by <Files> is overridden by <Location>.
For full details of block directives, please consult the official Apache documentation, in particular http://httpd.apache.org/docs/sections.html.
The key thing about all this of course is that you can apply different labels to different sections of your content. As the <VirtualHost> directive does not support HTTP Response Headers, the way to label a given website on a server is to apply a <Directory> block directive thus:
<Directory dir>
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
</Directory>
To label a whole website, dir should be the absolute path to the website's root directory on the server.
The same block directive can be used to label a particular section of a website if all its files are stored in a given directory - just set up another <Directory> block directive with dir set as appropriate. As a facetious example, you might want to label www.animals.com/birds/ differently from www.animals.com/insects/.
Apache processes <Directory> block directives in increasing order of the number of elements. So that <Directory "D:/root/website1"> is processed before <Directory "D:/root/website1/section">. Therefore, the label you intend to apply to the section directory will overwrite the previous one correctly.
<Files> and <Location> block directives are processed in the order in which they appear in the config file.
For our purposes, this is just a logical extension of the <Directory> block directive. As an example imagine you had a site which should carry rating A, but that your index page, uniquely, should carry rating B. This would take care of it:
<Files index.html>
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))' </Files>
Notice that the <Files> block directive takes a relative path (to DocumentRoot) not an absolute one.
Depending on your situation, this is perhaps the most easy to use block directive since it takes a URL as its argument rather than filenames and paths on your server. Labelling a website called www.testsite.com becomes:
<Location www.testsite.com/>
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
</Location>
The examples so far have all been very specific. Apache block directives, however, are far more flexible than we have hitherto discussed. This works very much to our advantage in terms of labelling.
For example, the ICRA labelling matrix includes a section on chat. ca 1 codes for unmoderated chat (or message boards), cb 1 codes for moderated chat and cz 1 declares that there are no chat facilities or message boards. So you might have a default label for most of your site that declares cz 1, but you might also have a full-blown chat facility and the chances are that all the relevant URLs have the word chat in there somewhere. So use a wildcard like this:
<Location *chat*>
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (ca 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
</Location>
With that in place, no matter how many times the pages are updated, improved and added to by the webmaster team, the chat areas will carry this label.
The danger here, of course, is that any URL that includes chat as four consecutive characters will carry this label. Bad news for a site about the Chatanooga Choo Choo.
This is where some interplay between different people in your organization becomes important! If the block directive in example 7 were amended simply to include a forward slash after the word chat thus: <Location *chat/*> then only content whose URLs included a path which at some point had chat immediately before a forward slash would carry this label.
You can label all the content in the cats, dogs and any other site beginning with "a" through "m" with a block directive like this:
<DirectoryMatch /[a-m].*>
Meanwhile the warthogs, zebras and other latter end of the alphabet wildlife would be taken care of by this block directive:
<DirectoryMatch /[n-z].*>
(You've seen enough PICS labels now, these are just the opening tags for the block directive!)
Using wildcards or regular expressions, it is possible to establish your own easy-rating system by simply naming files in a pre-defined way. For example, you might want to divide the content on your site into age-based categories. You may decide, for example, that some content on your site should carry a "PG" rating or a "12" rating. OK - set up these two <Files> directives:
<Files *-pg.*>
and
<Files *-12.*>
Now any file on your site which has -pg. immediately before the file extension will carry your PG rating, any file with -12. immediately before the file extension will have a 12 rating. Any file with neither string immediately before the file extension would carry the default label (if you set one).
It is possible to add/delete/amend PICS labels to web content without stopping/restarting the server by including HTTP Header Responses in a .htaccess file.
NB. Only the <Files> and <FileMatch> block directives can be used in .htacess files, not <Directory> or <Location>.
The pros and cons of using a .htaccess file are well understood (flexibility vs. server load). For our purposes here it is probably most applicable as a mechanism for labelling ephemeral content. However, the suggestion outlined below may be of interest to geographically diverse organizations and networks.
You might consider setting up a secondary .hataccess file specifically to handle the labels. Apache supports multiple .htaccess files so one option might be to include a configuration like this:
AccessFileName .htaccess, .filename
The .htaccess file would contain whatever you put in your .htaccess file now with the separate .filename file just used for labelling.
In tests, I used the following <Files> directives
<
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (cz 1 lb 1 nz 1 oz 1 vz 0) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
</Files>
<Files *-12.*>
Header set pics-label: '(pics-1.1 "https://icra.org/ratingsv02.html" l r (cb 1 lb 1 lc 1 nz 0 oz 1 vz 0) "http://www.rsac.org/ratingsv01.html" l r (n 0 s 0 v 0 l 0))'
</Files>
Initially, these were tested with one block directive in each of 2 separate files: .htaccess and another file that I called .picslables (the name is not significant) - and it failed. Only the block directive in whichever file was declared second in the AccessFileName declaration in the config file worked. However, putting both block directives in a single file worked perfectly, whether this was declared first or second in the AccessFileName list.
The policy/organizational implication here being that this method makes it possible for a member of staff to maintain a labels file as a separate entity. Give that member of staff FTP access to the relevant directory on your server and s/he can take care of the whole job by remote.
Microsoft has made configuring IIS to include PICS labels very easy. The header information is set in the HTTP Headers property page using the Custom HTTP Headers function. IIS uses a hierarchical architecture with the HTTP Headers property page being configurable at the following levels:
To set the HTTP Header properties, select the required level, right click and select properties, then select the HTTP Headers property page. The screen shot below shows the HTTP Headers property page for the default website. As shown, an e-mail address and content expiry date can also be sent within the HTTP Header (these are unrelated to PICS labels).
Please do not use the [Edit Ratings] function. If you add the ICRA .rat file (the file that defines the ICRA rating system within the PICS standard) to the System32 folder, then you can see the ICRA ratings in the relevant dialogue. But IIS makes a mess of things by using the old RSACi identifier and writing in a whole jumble of a label which, not surprisingly, the filters can't make sense of. So please, just stick to the custom headers.
Click the Add button, enter pics-label in the Custom Header Name field and the label itself in the Custom Header Value field to give you something like this:
And that's it. If you have a dedicated server for your site and you can legitimately apply the same rating to every page and you use IIS - this one addition will label the whole site - without a meta tag in sight.
You can apply labels to directories and specific pages by going through the same process as required (just right click on the relevant directory or file). However, some of the "nice touches" that Apache offers - such as maintaining and storing the labels in a separate file are not available with IIS.
The only practical way to test how effective your labelling has been is to browse the site with a filter activated. MSIE Content Advisor will do the job admirably but you will need to install the ICRA .rat file first. Alternatively, you can use our ICRAfilter freeware.
To actually see your HTTP headers, you can either use telnet, or a tool such as DJ Delorie's HTTP Header viewer.