ICRA Labelling System Specification
Target readership
This document is intended to be the definitive source of advice on labelling content using the ICRA system. Whilst every attempt has been made to present the information clearly, a large amount of detail is included, along with references to a variety of web technologies.
ICRA publishes shorter and simplified explanations for users with little or no experience of website creation, or those who don't need the level of detail provided here, on its support pages1.
PDF version (A4 format) 1.5 Mb
PDF version (US Letter format) 1.5 Mb
Version 1.0.3
January 2006
Contents
1 Introduction
1.1 Namespaces and documentation
2 The key concepts
2.1 Linking to specific labels (server-side processing)
2.2 Linking to a set of rules (client-side processing)
2.3 Creating the RDF instance
3 The content of the RDF instance
4 Setting the MIME type
5 Methods for inserting the tags
5.1 Is it important to include the tag on every page?
5.2 Apache server configuration
5.3 Microsoft IIS server configuration
6 The Ruleset in Detail
7 Global defaults, local overrides
8 Processing ICRA labels
8.1 Before retrieving the resource
8.2 Identifying the correct label
8.3 Unlabelled resources
9 Showing users that a site is ICRA labelled
10 Working with other RDF schemas
10.1 Management information
10.2 Classification
11 Frequency modifiers
12 Some tips
12.1 It's just RDF
12.2 Managing labels for multiple websites
12.3 A specific label for a specific page
13 Testing the labels
14 Change log
1 Introduction
ICRA exists to help users to find what they want, to trust what they find and to avoid content that they regard as inappropriate for themselves or their children. A vocabulary is provided that can be used to describe any and all digital content in a manner that reflects a broad range of parental concerns around the world2. The underlying system can, however, carry any kind of metadata for any purpose.
The descriptions are machine-understandable and may be used by a variety of agents such as filters, search engines and helper applications that display extra information for users.
ICRA labels are encoded in RDF3, one of the key technologies behind the Semantic Web4. This document does not set out the many advantages to content providers afforded by the semantic web except to note that features such as RSS, shared bookmarks, blogs and wikis are among its contributory elements.
Note: ICRA also offers a simplified PICS version of the label along with the Link tag in order to support legacy systems, notably Internet Explorer's Content Advisor. This is covered in a separate document.
1.1 Namespaces and documentation
The namespace of the RDF schema that provides the framework for ICRA labels is http://www.w3.org/2004/12/q/contentlabel# and the recommended QName is label. The relevant documentation is at http://www.w3.org/2004/12/q/doc/content-labels-schema.htm.
The namespace for the ICRA vocabulary is https://icra.org/rdfs/vocabularyv03# and the recommended QName is icra. The plain text version of the ICRA vocabulary and its supplementary definitions is at https://icra.org/vocabulary.
2 The key concepts
A Content Label is a description, i.e. a set of metadata, that can be applied to multiple resources. One or more labels are placed in a file and resources link to it either using an (X)HTML Link tag or an HTTP Response Header.
The file containing the labels is an RDF instance and is usually called labels.rdf. This is the name of the file created by the ICRA label generator (see section 2.3), although it is not significant and can be changed to anything.
Resources may link to a specific label or may link to a data set that allows clients to match the resource's URI against a series of rules that resolve to give the correct label.
Content providers can thus choose whether the association of a resource with its label is undertaken client side or server side.
2.1 Linking to specific labels (server-side processing)
data:image/s3,"s3://crabby-images/e0fec/e0fec8a34139c2605972ac4224cad4ef8bf8e781" alt="Figure 1 Server side association of content with labels"
Figure 1 Server-side association of content with labels
Figure 1 shows each resource linked to a specific label. If the RDF instance is called labels.rdf and is located in the root of the website, and if a resource should be linked to a label with the ID "label_1", then the Link tag is as shown in Example 1:
<link rel="meta" href="/labels.rdf#label_1" type="application/rdf+xml" title="ICRA labels" />
Example 1 A typical link tag that associates a specific label with a resource
The equivalent HTTP Response Header is:
Link: </labels.rdf#label_1>; /="/"; rel="meta" type="application/rdf+xml"; title="ICRA labels";
Example 2 The HTTP Response Header equivalent of Example 1
Note that the specific label within the RDF instance is identified by the URL fragment given in the href attribute. The title attribute is optional but is recommended for clarity. The location of the labels.rdf file is unimportant. It can be at any location on any server, but, of course, its location must be given in the href attribute.
2.2 Linking to a set of rules (client-side processing)
Figure 2 shows the alternative approach. All resources are linked to the RDF instance but the link does not identify the label. Instead, the RDF instance defines a default label and may then also define a sequence of rules, based on Perl 5 regular expressions, that can override that default. The first rule in the sequence to be satisfied identifies the correct label.
data:image/s3,"s3://crabby-images/667de/667de7f17704eceb0f5c0c41923075ca2fa12a7d" alt="Figure 2 Client side association of resources and labels"
Figure 2 Client-side association of resources and labels
If the RDF instance is called labels.rdf and is located in the root of the website then the Link tag is shown in Example 3:
<link rel="meta" href="/labels.rdf" type="application/rdf+xml" title="ICRA labels" />
Example 3 Typical tag linking a resource with an RDF instance that contains rules that identify the correct label.
The equivalent HTTP Response Header is:
Link: </labels.rdf>; /="/"; rel="meta" type="application/rdf+xml"; title="ICRA labels";
Example 4 The HTTP Response Header equivalent of Example 3.
As with Example 1, the location and name of the RDF instance are not significant.
2.3 Creating the RDF instance
ICRA provides a tool on its website for creating the RDF instance and the necessary tags, known as the label generator5. It is designed to be used by those with little or no knowledge of web authoring techniques as well as more advanced users. The label generator builds the RDF instance based on the client-side processing model described above (section 2.2), although it is equally valid for the server-side model.
3 The content of the RDF instance
The RDF instance must define 1 or more labels. More specifically, it must define at least one instance of the RDF class Content Label as defined by http://www.w3.org/2004/12/q/contentlabel#ContentLabel.
NB. RDF Content Labels may contain statements from any RDF schema; however this document is concerned solely with ICRA's implementation.
The RDF instance can further define zero or more of the following:
- The host(s) for which the label(s) are applicable. Sub-domains are in scope.
- An additional string that must match the resource's URI for any labels in the RDF instance to be applicable.
- The default label.
- An ordered sequence of rules that should be matched against a resource's URI. If a rule is satisfied, it must provide a label that overrides any default.
- A description of the RDF instance itself that identifies where additional information about the label can be found, including how its veracity can be assessed.
These elements are explained in detail with reference to Example 5. Like all examples in this document and others produced by ICRA, the RDF is serialized in XML. However, this is not a requirement; other serializations, such as N36, are equally valid.
1 |
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:label="http://www.w3.org/2004/12/q/contentlabel#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:icra="https://icra.org/rdfs/vocabularyv03#"> |
2 |
<rdf:Description rdf:about=""> <dc:creator rdf:resource="https://icra.org" /> <label:authorityFor>https://icra.org/rdfs/vocabularyv03# </label:authorityFor> </rdf:Description> |
3 |
<label:Ruleset> <label:hasHostRestrictions> <label:Hosts> <label:hostRestriction>example.org</label:hostRestriction> <label:hostRestriction>example.com</label:hostRestriction> </label:Hosts> </label:hasHostRestrictions> <label:hasDefaultLabel rdf:resource="#label_1" /> |
4 |
<label:rules rdf:parseType="Collection"> <rdf:Description> <label:hasURI>photography</label:hasURI> <label:hasLabel rdf:resource="#label_2"/> </rdf:Description> <label:UnionOf> <label:hasURI>guestbook</label:hasURI> <label:hasURI>messages</label:hasURI> <label:hasLabel rdf:resource="#label_3" /> </label:UnionOf> </label:rules> </label:Ruleset> |
5 |
<label:ContentLabel rdf:ID="label_1"> <rdfs:comment>Label for all/most of website</rdfs:comment> <rdfs:label>No nudity, no sexual content, no violence, no potentially offensive language, no potentially harmful activities, no user-generated content</rdfs:label> <icra:nz>1</icra:nz> <icra:sz>1</icra:sz> <icra:vz>1</icra:vz> <icra:lz>1</icra:lz> <icra:oz>1</icra:oz> <icra:cz>1</icra:cz> </label:ContentLabel> <label:ContentLabel rdf:ID="label_2"> <rdfs:comment>Label for photography section</rdfs:comment> <rdfs:label>Exposed breasts, Bare buttocks, No sexual content, no violence, no potentially offensive language, no potentially harmful activities, no user-generated content, This material appears in an artistic context</rdfs:label> <icra:na>1</icra:na> <icra:nb>1</icra:nb> <icra:sz>1</icra:sz> <icra:vz>1</icra:vz> <icra:lz>1</icra:lz> <icra:oz>1</icra:oz> <icra:cz>1</icra:cz> <label:hasModifier><icra:xa /></label:hasModifier> </label:ContentLabel> <label:ContentLabel rdf:ID="label_3"> <rdfs:comment>Label for guestbook and message board </rdfs:comment> <rdfs:label>No nudity, no sexual content, no violence, no potentially offensive language, no potentially harmful activities, user-generated content (moderated)</rdfs:label> <icra:nz>1</icra:nz> <icra:sz>1</icra:sz> <icra:vz>1</icra:vz> <icra:lz>1</icra:lz> <icra:oz>1</icra:oz> <icra:ca>1</icra:ca> </label:ContentLabel> </rdf:RDF> |
Example 5 An example RDF instance containing ICRA labels
3.1.1 Section 1
The namespaces are declared. The QNames label and icra are recommended for their respective namespaces.
3.1.2 Section 2
This short section declares that the labels were created by ICRA and that further information is available at icra.org. Since it is possible to include descriptions based on other schemas, this section specifies that icra.org only has information about the ICRA namespace.
3.1.3 Section 3
This section declares the hosts for which the data is valid. In this instance, we have declared that the labels can be applied to both example.org and example.com. Subdomains are covered, for example, www.example.org, sub.example.com etc.
This section also declares that the default Content Label for material on those hosts is "label_1" (see 3.1.5).
If labels are to be restricted to a particular area of the example.org and example.com hosts, this would be included thus:
<label:hasURI>foo</label:hasURI>
Labels in this RDF instance would then only be in scope for resources with URIs on the example.org or example.com hosts that also contain 'foo.' This feature is included primarily for ISPs who offer personal web space with URLs like www.example.org/username. If more than one hasURI property is included, a URI is in scope if any one of them matches.
3.1.4 Section 4
The rules that determine where the default label should be overridden by another label are declared next. In this example, everything in the 'photography' section of both example.com and example.org will be associated with "label_2," everything with either the word guestbook or messages in the URL will be associated with "label_3." Otherwise, the default applies.
Matching is done using Perl 5 regular expressions7 so that if a rule should apply to "all URLs ending in .jpg" then this would appear as \.jpg$.
The use of rdf:parseType="Collection" ensures that rules are processed in order. The first rule to be satisfied is the one that is used, and processing stops at that point.
3.1.5 Section 5
Finally the labels themselves are defined. In the example, "label_2" declares that there are exposed breasts, bare buttocks, and that the material appears in an artistic context. "Label_3" declares that there is moderated user-generated content, and "label_1" states "none of the above" in all categories of the ICRA vocabulary.
4 Setting the MIME type
The correct MIME type for RDF instances is application/rdf+xml8. Your server may not support this by default9. If this is the case you'll need to do one of two things:
- Ideally, add the MIME type application/rdf+xml, usually associated with file extension .rdf.
- If you are unable to do this, try changing the name of the RDF instance to labels.xml. The XML MIME type (application/xml) is an acceptable alternative and is more widely included in default server configuration.
- Some servers may offer text/xml as a MIME type for files with the .xml extension. This is unlikely to cause problems for clients looking for ICRA labels but should not be used if you're including ICRA labels in a more sophisticated data set such as a database, or if the character set is not iso-8859-1 (Latin-1).
If none of these options is followed, your server may use a default MIME type such as text/plain. In this situation a client may or may not recognise the data as RDF and therefore may or may not process it correctly.
If you run IIS servers and are unsure how to add new MIME types, please see Section 5.3 below.
If your server is protected by a firewall, you may to need to configure this accordingly too.
5 Methods for inserting the tags
Having created the RDF instance, the next step is to insert the links to it. For a website to be considered fully labelled, links must be included on every (X)HTML page and ideally should be included in all resources.
The ability to shift label processing to the client rather than the server offers one crucial advantage: an identical link can be inserted on all resources. This is true whether the labels cover one small website or a global network of internet properties.
The most efficient way to do it is to configure the server(s) to include the link in the HTTP Response Headers. This also avoids accidentally deleting the tag (or omitting it) when pages are redesigned. Control of the labels is then firmly in the hands of the person (or department) responsible for managing the RDF instance. This may or may not be the same as those responsible for content creation. Alternatively, an (X)HTML Link tag (similar to Example 1 or Example 3 as appropriate) can simply be included in a document template or any other method you may use to include the same data in every page's <head> section.
5.1 Is it important to include the tag on every page?
Yes. When a user visits your site for the first time, their client will only detect the labels if a link is in place. If the link is only included in, say, the homepage, then users who enter the site via other routes will not benefit.
5.2 Apache server configuration
There is more than one way to control Apache's HTTP Response Headers. If you already set headers for other reasons, continue to use the same method. If not, the method given below is robust and will work.
5.2.1 Install Mod_Headers
Mod_Headers is not generally included in the default configuration but will almost certainly be included in your Apache installation and just needs to be "switched on" by removing the comment symbol before two lines in the httpd.conf file.
There are many different "flavours" of Apache, but what follows is likely to be at least close to what is required.
In the DSO section of the httpd.conf file look for
LoadModule headers_module modules/mod_headers.so
In some builds, that's enough; others will also require the command below:
AddModule mod_headers.c
The comments in your config file and the presence (or absence) of similar commands for other modules will give you a good clue as to what to do.
5.2.2 Setting the same Response Header for all resources
Assuming that the RDF instance is called labels.rdf and is in the web server's document root, the following command, inserted in the httpd.conf file, will achieve the desired result.
Header set Link '</labels.rdf>; /="/"; rel="meta" type="application/rdf+xml"; title="ICRA labels";'
N.B. This command should appear all on one line.
5.2.3 Linking to specific labels with HTTP Response Headers
Like other Apache configuration options, HTTP Response Headers can be set within block directives. Example 6 sets the link to "label_2" for all resources in /var/www/images/.
<Directory /var/www/images/>
Header add Link '</labels.rdf#label_2>; /="/";
rel="meta" type="application/rdf+xml";
title="ICRA labels";'
</Directory>
Example 6 A simple block directive setting a header for all resources in the images directory
As above, the Header add Link command should appear on a single line.
Block directives also offer very fine control over the HTTP Response Headers where required*. Example 7 sets a header pointing to "label_1" for all resources in the /var/www/ directory (and its subdirectories), but where the filename ends with .gif, .jpg, .jpeg or .png, the header linking to "label_2" is invoked.
<Directory /var/www/>
Header add Link '</labels.rdf#label_1>; /="/";
rel="meta" type="application/rdf+xml";'
<Files ~ "\.(gif|jpe?g|png)$">
Header unset Link
Header add Link '</labels.rdf#label_2>; /="/";
rel="meta" type="application/rdf+xml";'
</Files>
</Directory>
Example 7 A nested block directive setting a different header for image files than for other files in the same block.
Notice in Example 7 that the link is unset within the file block directive. This is because where a resource is linked to a specific label, that label is given the highest priority and can't be overridden (see section 7). It is therefore an error to include more than one link to specific labels, and the expected behaviour of clients is not defined in these circumstances10.
* Some versions of Apache may not allow headers to be set in a Virtual Host block directive.
5.3 Microsoft IIS server configuration
- Web server
- Home directory / Web site (IIS 4 and later support multiple websites)
- Virtual directory
- Folder
- Page
Figure 3 The properties dialogue box in IIS
</labels.rdf>; /="/"; rel="meta" type="application/rdf+xml"; title="ICRA labels";
Figure 4 The custom header name and value fields in IIS
Figure 5 Adding the RDF MIME type in IIS
N.B. Please ignore the Content Rating options (this uses an obsolete system).
6 The Ruleset in Detail
In Example 5, two rules are declared:
<label:UnionOf> <label:rules> <label:IntersectionOf> <label:hasURI>colour</label:hasURI> <label:hasURI>image</label:hasURI> </label:IntersectionOf> <label:IntersectionOf> <label:hasURI>monochrome</label:hasURI> <label:hasURI>image</label:hasURI> </label:IntersectionOf> </label:rules> <label:hasLabel rdf:resource="#label_2"" /> </label:UnionOf>
7 Global defaults, local overrides
However, if page.html were to include a link to "label_2," using a tag like this:
<link rel="meta" href="/labels.rdf#label_2" type="application/rdf+xml" title="ICRA label" />
this would override the default and associate it with "label_2."
8 Processing ICRA labels
For a given resource (a page, an image etc.) there are three possible sources of a label:
- It may be possible to deduce a label by processing data already held in the filter's cache (memory). Such labels are referred to below as Type 1.
- A resource may include a link to data that contains rules that can be followed to identify a label. Labels identified in data sources linked from the resource itself are referred to below as labels of Type 2.
- A resource may include a direct link to a label. Labels identified by a direct link from a resource are referred to below as labels of Type 3.
As detailed below, filters SHOULD assign increasing priority to each of these sources.
8.1 Before retrieving the resource
Figure 6 The processing to be carried out prior to any request to the internet.
- A Type 1 label MUST NOT be used to block access to a URL before it has been fetched.
- A Type 2 label MAY be used to block access to a URL before it has been fetched.
If there is no data in the filter's cache, the resource at the URL MUST be fetched.
Figure 7 The processing rules after the resource has been fetched.
9 Showing users that a site is ICRA labelled
Alternatively, you can add any of the graphics available from icra.org11. These are available in a number of languages and in both American and British English (i.e. labeled and labelled respectively).
10 Working with other RDF schemas
An ICRA label and the Ruleset are simply RDF Classes whose types are defined by the relevant schema. These elements can all be included in any RDF instance and, conversely, any other RDF metadata can be included in "an ICRA labels file."
Indeed, content providers are encouraged to make use of other schemas.
10.1 Management information
Perhaps the best-known metadata schema that is regularly encoded in RDF is Dublin Core12. Along with Creative Commons13 licences and similar schemas, this can be included directly in content labels but can also be put in a special class for management information.
In exactly the same way as for Content Labels, the label schema defines the properties hasDefaultManagementInfo and hasManagementInfo. These link to RDF Content Labels that can include data such as title, author and publication date. Management Info labels exist independently of Content Labels linked by hasDefaultLabel and hasLabel properties labels, so it's possible to have a rule that overrides the default management information label without overriding the ICRA label.
10.2 Classification
Content Labels, whether they carry ICRA descriptors or any other metadata, are designed to include detailed descriptors. More formally, they are classes with properties that describe the resource. A third type of description is also defined in the label schema - a classification. Again there are analogous properties: hasDefaultClassification and hasClassification, but unlike a Content Label, a classification is designed to be a description in itself.
For example, the classification may be an age-based film rating or identify an article as being about "politics" as opposed to "fashion." Whatever the classification is, clients are not obliged to process any properties of a class linked by a hasClassification or hasDefaultClassification property.
Again, classifications exist independently so that overriding a default classification has no effect on either the management information or label.
11 Frequency modifiers
Movies, TV programmes, games and other content that "has a duration" may need more than one label. It may be appropriate to provide a label that describes a particular scene in a movie or a content type that occurs occasionally throughout a game. To support this, the Content Label schema supports a set of frequency modifiers:
- has frequent scenes
- has several scenes
- has occasional scenes
- has a single scene
A standard RDF description might appear as shown in Example 9. This can stand alone in the same way as any other RDF description, or form part of a sequence of rules in a label:Ruleset.
<rdf:Description rdf:about="http://example.org/movie.mov> <label:hasLabel rdf:resource= "#label_1" /> <label:hasOccasionalScenes rdf:resource="#label_2" /> <label:hasSingleScene rdf:resource="#label_3" /> </rdf:Description>
Example 9 Description of a movie using frequency modifiers
Frequency modifiers have a range of label:ContentLabel. That is, they MUST link to a class of that type.
12 Some tips
12.1 It's just RDF
Content Labels, host restrictions, rules - these are all just RDF fragments. They do not need to all be in a single file called labels.rdf. If you're familiar with RDF, think of ICRA labels simply as part of your metadata.
12.2 Managing labels for multiple websites
If you create lots of websites that should have the same ICRA label, create a file with the label in it and make the Link tag that points to it part of your regular template. Remember that the labels do not have to be on the same server, they can be anywhere.
You do not need to include a host restriction at all - if a resource points to a label and there's no host restriction included in the RDF instance, the label is valid. On the downside, it means anyone can point to your label, which may put extra load on your server.
If you do want to include a host restriction it can be in a separate file all on its own. Example 10 shows how this can be done. The two fragments of RDF can be in the same file (as shown here) or in separate files on different servers. In this case you'd need to include a full URI (including the fragment identifier) as the rdf:resource.
<label:Ruleset> <label:hasHostRestrictions rdf:resource="#hosts" /> ... </label:Ruleset> <label:Hosts rdf:ID="hosts"> <label:hostRestriction>gt;example.com</label:hostRestriction> <label:hostRestriction>gt;example.org</label:hostRestriction> </label:Hosts>
Example 10 A Ruleset that links to an "external" list of host restrictions
This allows you to set up a stable file for the labels and then generate the host restriction list dynamically if desired.
12.3 A specific label for a specific page
Labels that apply to a single resource can be put in a separate file. You might set up a default labels file (with a Ruleset) and link everything to that and then create a completely separate label file for a particular page with a specific link to the label.
In short, work out what's best for you. It'll probably work in practice.
13 Testing the labels
The ICRA website includes an online tool that will identify the correct label for a given URL14.
14 Change log
Version 1.0.1: Added section on linking to icra.org/sitelabel (section 9). Subsequent sections renumbered.
Version 1.0.2 Amended documentation of hostRestriction to include hasHostRestrictions property and Hosts class.
Version 1.0.3 Added section referencing PICS labelling document.
Links and References
- https://icra.org/support
- https://icra.org/vocabulary
- http://www.w3.org/RDF/
- http://www.w3.org/2001/sw/
- https://icra.org/label
- http://infomesh.net/2002/notation3/
- See, for example, http://www.perl.com/doc/FMTEYEWTK/regexps.html
- http://www.faqs.org/rfcs/rfc3870.html
- Apache's default configuration has supported this since summer 2003. At the time of writing (April 2005) IIS does not.
- More information on the use of Apache's block directives to control HTTP Response Headers is available at https://icra.org/archive/labellingWG/mod_headers/
- https://icra.org/buttons
- http://www.dublincore.org/
- http://creativecommons.org/
- https://icra.org/label/tester/