Overview of the RDF Vocabularies for Labelling


Table of Contents

Introduction
How To Read This Document
The Content Labelling Vocabulary
Vocabulary Description
Creating a Labelling Scheme
Identifying, Naming and Describing Scheme Components
Defining A Labelling Scheme
Using the ICRA labelling vocabulary.
Content Label Basics
Specifying the application of a label to resources

Introduction

This document provides an overview of two RDF vocabularies which together enable the use of RDF for content labelling with the ICRA scheme. The two vocabularies are the generic Content Labelling Vocabulary which provides a mechanism for describing a content labelling system; and the ICRA content labelling vocabulary which describes the ICRA content labelling scheme.

How To Read This Document

The first two sections of this document describe how the generic vocabulary for defining labelling schemes is constructed and how to apply the vocabulary to defining a new labelling scheme. If you are only interested in applying the ICRA labelling scheme in RDF, you can skip the first part of this document and go straight to the section called “Using the ICRA labelling vocabulary.”.

The Content Labelling Vocabulary

The Content Labelling Vocabulary (namespace https://icra.org/labellingv01/rdfs/#) provides a simple vocabulary for the description of a labelling scheme. A labelling scheme consists of one or more categories which group together related content descriptors and zero or more modifiers which provide further context for a label. Together, these are referred to in this document as the components of a labelling scheme.

In terms of the ICRA vocabulary, "Violence" is a category, "Deliberate injury to human beings" is a descriptor, and "Material appears in a sports context" is a modifier.

The Content Labelling Vocabulary defines a small set of classes and properties that are the basis for defining labelling schemes. A labelling scheme such as the ICRA scheme is created by defining instances of these classes and using the properties to define the relationships between those instances.

Vocabulary Description

contentLabel

Full URI. https://icra.org/labellingv01/rdfs/#contentLabel

Description.  An instance of this class is a single descriptive label for content which may be applied to one or more web resources.

Properties. The following properties may be specified for a contentLabel instance:

  • hasModifier specifies the modifiers for the content label.

  • Any subproperty of the the descriptor property.

category

Full URI. https://icra.org/labellingv01/rdfs/#category

Description.  A category is a grouping of related content descriptors. In ICRA, these groupings are thematic, but this is not a constraint on category instances in general.

Properties. 

  • hasDescriptor specifies the descriptors which make up this category.

descriptor

Full URI. https://icra.org/labellingv01/rdfs/#descriptor

Description.  A descriptor defines a single form of content which may or may not be present in a resource. When labelling web resources, a descriptor is used as a property of the content label that it applies to. This means that a descriptor has a range of allowed values. The Content Labelling Vocabulary does not restrict the allowed range of values.

hasDescriptor

Full URI. https://icra.org/labellingv01/rdfs/#hasDescriptor

Description.  This property connects a category to the descriptors that make up that category. It can be used by applications to quickly list what all the possible descriptors for a category are.

modifier

Full URI. https://icra.org/labellingv01/rdfs/#modifier

Description.  A modifier provides context for a content label as a whole. Each content labelling scheme may define its own set of modifiers.

hasModifier

Full URI. https://icra.org/labellingv01/rdfs/#hasModifier

Description.  This property connects an instance of the modifier class to the contentLabel that it modifies.

applicationRule

Full URI. https://icra.org/labellingv01/rdfs/#applicationRule

Description.  An applicationRule defines a processing rule that should be used to determine which URL(s) a content label applies to. Such rules are required in situations where the web resources do not carry their own content labels. This class is used as a base class for a number of more specific types of rule. New labelling systems may choose to introduce new types of rules by subclassing from applicationRule. An applicationRule may contain one or more oneOf, allOf and not properties. If more than one such property is present, the applicationRule must be processed as the disjunction of the results of processing each property.

Properties. 

  • oneOf adds a logical OR of the applicationRules that are the object of the property to the applicationRule that is the subject of the property.

  • allOf adds a logical AND of the applicationRules that are the object of the property to the applicationRule that is the subject of the property.

  • not adds a logical NOT of the applicationRule that is the object of the property to the applicationRule that is the subject of the property.

oneOf

Full URI. https://icra.org/labellingv01/rdfs/#oneOf

Description.  A property of a applicationRule that indicates that the rule applies when any one of the applicationRules that are the object of this property apply. This produces a conjunction of the object applicationRules.

allOf

Full URI. https://icra.org/labellingv01/rdfs/#allOf

Description.  A property of a applicationRule that indicates that the rule applies only if all of the applicationRules that are the object of this statement apply. This produces a disjunction of the object applicationRules.

not

Full URI. https://icra.org/labellingv01/rdfs/#not

Description.  A property of a applicationRule that indicates that the rule DOES NOT apply if the applicationRule that is the object of this statement applies.

beginsWith

Full URI. https://icra.org/labellingv01/rdfs/#beginsWith

Description.  A subclass of applicationRule which matches against the start of a web resource's URI.

Properties. 

  • value the substring to match at the start of the resource URI.

endsWith

Full URI. https://icra.org/labellingv01/rdfs/#endsWith

Description.  A subclass of applicationRule which matches against the end of a web resource's URI.

Properties. 

  • value the substring to match at the end of the resource URI.

contains

Full URI. https://icra.org/labellingv01/rdfs/#contains

Description.  A subclass of applicationRule which matches against a substring of a web resource's URI.

Properties. 

  • value the substring to match anywhere withing the resource URI.

matches

Full URI. https://icra.org/labellingv01/rdfs/#matches

Description.  A subclass of applicationRule which matches against a web resource's URL using a Perl-style regular expression.

Properties. 

  • value the regular expression to match against the resource URI.

value

Full URI. https://icra.org/labellingv01/rdfs/#value

Description.  This is a property of the beginsWith, endsWith, contains, and matches application rules. It contains the substring or regular expression string that is used by the rule.

hasContentLabel

Full URI. https://icra.org/labellingv01/rdfs/#hasContentLabel

Description.  This is a property that links an applicationRule to the contentLabel that labels resources that match the rule.

Creating a Labelling Scheme

This section describes how to apply the Content Labelling Vocabulary to create a specific labelling scheme.

Identifying, Naming and Describing Scheme Components

The Content Labelling Vocabulary makes use of basic RDF functionality for identifying, naming and describing the components that make up a labelling scheme.

Each component of the scheme is assigned an ID. This ID, when combined with the base URL of the RDF resource that describes the scheme, gives a unique URI identifier for the component. For example, the ICRA labelling scheme is defined by the resource with the base URI https://icra.org/ratingsv03/rdfs/# and the descriptor for "Unmoderated user-generated content" is currently defined in that resource with the ID "cb", so the full identifier for the descriptor is https://icra.org/ratingsv03/rdfs/#cb.

Each component should always be assigned a short name. This should be a name suitable for display in a user interface and should be consumer-oriented in nature. A good short name would be "Violence" or "Injury to animals", a bad short name would be "vb" or "vz". RDF provides a mechanism for these short names by using the rdfs:label property. A component can have any number of rdfs:label property values, although it is STRONGLY recommended that they should be distnguished from each other using an xml:lang attribute and that there should be only one label per language.

Example 1. An example of a short name

<label:category rdf:ID="nx">
  <rdfs:label xml:lang="en">Nudity</rdfs:label>
  ...
</label:category>

A component may also be assigned a longer description that might be displayed to a user as pop-up help text. For this description, use the RDF-defined rdfs:comment property. Again, multiple rdfs:comment labels may be provided, but should be distinguished by language using the xml:lang attribute.

Example 2. An example of a short description

<label:category rdf:ID="nx">
  <rdfs:label xml:lang="en">Nudity</rdfs:label>
  <rdfs:comment xml:lang="en">
    Erections or female genitals in detail, Male genitals, Female genitals, 
    Female breasts, Bare buttocks
  </rdfs:comment>
</label:category>

Finally, a component may also contain a link to another web resource that provides a much more detailed description. For this link, use the RDF-defined rdfs:seeAlso property. The value of this property MUST be an RDF resource URI.

Example 3. An example of a reference to a longer description

<label:category rdf:ID="nx">
  <rdfs:label xml:lang="en">Nudity</rdfs:label>
  <rdfs:comment xml:lang="en">
    Erections or female genitals in detail, Male genitals, Female genitals, 
    Female breasts, Bare buttocks
  </rdfs:comment>
  <rdfs:seeAlso rdf:resource="https://icra.org/vocabulary/#hn"/>
</label:category>

Defining A Labelling Scheme

  1. Define Categories

    Each category in a labelling scheme has the identifier, name and descriptions described above, and a list of the descriptors that are part of that category. The descriptors are linked to the category using the label:hasDescriptor property. As there is a list of descriptors, and we want the list to be closed (i.e. no more can be added to the list without modifying our vocabulary file), we specify the hasDescriptors property value as a collection.

    Each descriptor must be defined as being a subPropertyOf the descriptor property from the Content Labelling Vocabulary.

    Example 4. Example of a Category Definition

    <!-- Nudity category -->
    <label:category rdf:ID="nx">
      <rdfs:label xml:lang="en">Nudity</rdfs:label>
      <rdfs:comment xml:lang="en">
        Erections or female genitals in detail, Male genitals, Female genitals, 
        Female breasts, Bare buttocks
      </rdfs:comment>
      <rdfs:seeAlso xml:lang="en" 
                    rdf:resource="https://icra.org/vocabulary/#hn"/>
      <label:hasDescriptor rdf:parseType="Collection">
        <rdf:Property rdf:ID="na">
          <rdfs:label>Exposed breasts</rdfs:label>
          <rdfs:comment>...</rdfs:comment>
          <rdfs:seeAlso 
               rdf:resource="https://icra.org/ratingsv03/descriptions/#na"/>
          <rdfs:subPropertyOf rdf:resource="&label;descriptor"/>
          </rdf:Property>
        <rdf:Property rdf:ID="nb">
          <rdfs:label>Bare buttocks</rdfs:label>
          <rdfs:comment>...</rdfs:comment>
          <rdfs:seeAlso 
             rdf:resource="https://icra.org/ratingsv03/descriptions/#na"/>
          <rdfs:subPropertyOf rdf:resource="&label;descriptor"/>
        </rdf:Property>
        ...
        <rdf:Property rdf:ID="nz">
          <rdfs:label>No nudity</rdfs:label>
          <rdfs:comment>...</rdfs:comment>
          <rdfs:seeAlso 
             rdf:resource="https://icra.org/ratingsv03/descriptions/#na"/>
          <rdfs:subPropertyOf rdf:resource="&label;descriptor"/>
        </rdf:Property>
      </label:hasDescriptor>
    </label:category>
  2. Define Modifiers

    Each modifier is simply defined as an instance of the label:modifier class. Modifiers should be defined with names and descriptions as described above, but there is no need to define any other properties for a modifier.

    Example 5. Example of a Modifier definition

    <label:modifier rdf:ID="r">
      <rdfs:label>Artistic</rdfs:label>
      <rdfs:comment>
        Material appears in an artistic context and is suitable 
        for young children.
      </rdfs:comment>
      <rdfs:seeAlso rdf:resource="https://icra.org/vocabulary/#hn"/>
    </label:modifier>
    
  3. Define any new Application Rule types

    A new type of application rule is simply defined by creating a new subclass of the applicationRule class. Each new rule introduced SHOULD be well-documented to enable its implementation in label processing clients.

    Warning

    Labelling scheme designers should consider carefully the need to introduce new application rules. Each new rule introduced can only be successfully used if all labelling-processing client applications implement the rule.

Using the ICRA labelling vocabulary.

This section covers the creation of content labels using the ICRA-defined labelling scheme.

Content Label Basics

A content label consists of two principle components:

  • a list of descriptor properties, and

  • a list of modifiers.

Under the ICRA labelling scheme, each descriptor property MUST have a value that is a valid boolean as defined by W3C XML Schema Part 2: Datatypes (this allows the values '0', '1', 'false' and 'true') and there SHOULD be at least one descriptor from each ICRA-defined category. Descriptors are listed as RDF properties of the contentLabel resource.

Modifiers are simply present or not present in a content label and no value is associated with them. If a modifier is present in a label, then the modifier applies. Modifiers are added to a content label using the hasModifier property.

Example 6. A simple label with descriptors and modifiers

  <label:contentLabel rdf:ID="siteLabel">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>0</i:vz>
    <label:hasModifier><i:s/></label:hasModifier>
  </label:contentLabel>

The label:contentLabel tag tells the processor that this is an RDF resource of type contentLabel (from the namespace https://icra.org/labellingv01/rdfs/#). The rdf:ID attribute on the label:contentLabel element enables this label to be selected from an XML file containing multiple labels, the value must be unique within the XML file.

The elements i:cz to i:vz specify the values of descriptors defined by the ICRA scheme. The namespace i should be the URI https://icra.org/ratingsv03/rdfs/#, so these XML tags actually represent the descriptors https://icra.org/ratingsv03/rdfs/#cz (chat), https://icra.org/ratingsv03/rdfs/#lz (language) etc.

The label:hasModifier element contains a single modifier represented by the i:s tag. This indicates that the modifier identified by the URI https://icra.org/ratingsv03/rdfs/#s (sports context) applies to this label.

Specifying the application of a label to resources

There are a number of ways in which a label may be applied to a resource, in many cases it is simplest to include a reference to the label either within the resource itself or within the HTTP response header generated by the server when it provides the resource. However, in some cases it is either necessary or more efficient to define a catalog of labels and the rules that a client should use to apply those labels to resources. This can be achieved using the applicationRule construct and its subclasses in conjunction with the hasContentLabel property.

The following example show how an application rule can contain the label to be to applied to any resource that matches that rule.

Example 7. Example of a content label with application rules

<label:startsWith>
  <label:value>https://icra.org</label:value>
  <label:hasContentLabel>
    <label:contentLabel rdf:ID="siteLabel">
      <i:cz>1</i:cz>
      <i:lz>1</i:lz>
      <i:nz>1</i:nz>
      <i:oz>1</i:oz>
      <i:sz>1</i:sz>
      <i:vz>1</i:vz>
      <label:hasModifier><i:s/></label:hasModifier>
    </label:contentLabel>
  </label:hasContentLabel>
</label:startsWith>
  <label:hasModifier><i:s/></label:hasModifier>
</label:contentLabel>

It is also possible to separate the labels from the lists of resources to which the labels apply. This can be useful for two reasons. Firstly, it allows different people to be responsible for defining the labels and applying them to resources and allows the job of labelling a large site to be split amongst many people while still using a single consistent set of labels. Secondly, it is envisaged that label-processing clients will process a set of application rules in a top-to-bottom manner looking for the first rule that matches their situation. In such a case, separating the labels from the rules means that it should never be necessary to repeat a label just to ensure that rules are applied in the right order.

It is also possible to use the rule-combining properties oneOf, allOf, and not to specify logical combinations of resource matching rules that the processor will use to determine if a label applies to a particular resource. In the following example, the oneOf property is used to specify a list of matches to receive the "advert" label.

Example 8. Application rules separated from labels

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:label="https://icra.org/labellingv01/rdfs/#"
 xmlns:i="https://icra.org/ratingsv03/rdfs/#">

  <label:contentLabel rdf:ID="defaultContentPage">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>1</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:contentLabel rdf:ID="advert">
    <i:cz>1</i:cz>
    <i:lz>1</i:lz>
    <i:nz>1</i:nz>
    <i:oz>0</i:oz>
    <i:sz>1</i:sz>
    <i:vz>1</i:vz>
  </label:contentLabel>

  <label:applicationRule>
    <label:oneOf>
      <label:startsWith>
        <label:value>http://a.tribalfusion.com</label:value>
      </label:startsWith>
      <label:startsWith>
        <label:value>http://m.tribalfusion.com</label:value>
      </label:startsWith>
    </label:oneOf>
    <label:hasContentLabel rdf:resource="#advert"/>
  </label:applicationRule>

  <label:matches>
    <label:value>*</label:value>
    <label:hasContentLabel rdf:resource="defaultContentPage"/>
  </label:matches>

</rdf:RDF>

If the labels were defined in another file (e.g. at http://www.example.com/labels.rdf), then the application rules file would appear as follows:

Example 9. Application rules in a separate file

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:label="https://icra.org/labellingv01/rdfs/#"
 xmlns:i="https://icra.org/ratingsv03/rdfs/#">

  <!-- These rules are processed first -->
  <label:applicationRule>
    <label:oneOf>
      <label:startsWith>
        <label:value>http://a.tribalfusion.com</label:value>
      </label:startsWith>
      <label:startsWith>
        <label:value>http://m.tribalfusion.com</label:value>
      </label:startsWith>
    </label:oneOf>
    <label:hasContentLabel rdf:resource="http://www.example.com/labels.rdf#advert"/>
  </label:applicationRule>>

  <!-- Then everything else gets a default label -->
  <label:matches>
    <label:value>*</label:value>
    <label:hasContentLabel rdf:resource="http://www.example.com/labels.rdf#defaultContentPage">
  </label:matches>
</rdf:RDF>