Submitting RDF
From ARPWiki
RDF is the metadata format that contributors use to make their resources available for use within NINES. With RDF, contributors describe each of their resources in general terms that allow those resources to be categorized and searched through COLLEX.
Below, contributors can find basic information about the composition of RDF files as well as links to sample RDF submissions and to sample XSL transformations which can be used to turn XML resources into RDF.
RDF Basics
Resource Description Framework, or RDF, is the descriptive data which NINES uses in COLLEX; through the metadata contained by the RDF, COLLEX makes peer reviewed resources findable, interconnected, and ready for repurposing.
RDF is an XML metadata model used for describing resources as part of the Semantic Web. NINES contributors identify the basic features of their digital objects, such as the title, creator, publisher, date of composition, genre, even a list of the component objects that make a greater whole. The NINES metadata scheme leverages some preexisting schemes, such as Dublin Core and Library of Congress [Relator Terms].
Those interested in the nuts and bolts of RDF can find general information on Wikipedia. Details on the generalized RDF specification can be found through the World Wide Web Consortium.
In thinking about the RDF creation process, contributors should first decide on how to define objects in their resources. The RDF metadata scheme is predicated on the description of objects, but what comprises an object is left to the discretion of the contributor. Contributors would be best to think of defining their objects as the units that contributors wish to make browseable, collectible, and available for repurposing.
For example, a transcription of a novel would have an object for the unit of "the novel." But a contributor could also decide that the chapters which constitute that novel might also be interesting to collect on their own; the contributor would then make RDF objects for each chapter unit as well. One could easily imagine a poetry anthology receiving a similar distillation of its many layers: one RDF object for the anthology as a whole; one object for each author; one object for each poem; one object for each figure; even objects for the scholarly commentary or introductions. Another contributor could treat a similar anthology in a totally dissimilar way, viewing the bibliographic page as the elementary unit instead of the logical divisions. While contributors are free to create whatever granularity or type of objectification they would like, their reason should be guided by a sensible judgment of what other scholars will find useful for collection and annotation. A large book rendered as a monolithic object won't help to reveal the rich resources of its individual chapters, essays, poems, or pictures. Likewise, a poetry anthology atomized into single lines of verse would have little use for collection and prove a nightmare for browsing.
Samples
Below is a generic, mock RDF file. We have also made several RDF samples available, along with samples of XSLT for transforming XML source files into RDF metadata.
RDF Mock-up
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:nines="http://www.nines.org/schema#"
xmlns:ra="http://www.rossettiarchive.org/schema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:role="http://www.loc.gov/loc.terms/relators/">
<YOUR:NAMESPACE rdf:about="UNIQUE_OBJECT_ID">
<nines:archive>CONTRIBUTING PROJECT</nines:archive>
<dc:title>OBJECT TITLE</dc:title>
<dcterms:alternative>ALTERNATE TITLE</dcterms:alternative>
<dc:source>TITLE OF JOURNAL/ANTHOLOGY/LARGER WORK</dc:source>
<role:ART>VISUAL ARTIST</role:ART>
<role:AUT>AUTHOR</role:AUT>
<role:EDT>EDITOR</role:EDT>
<role:PBL>PUBLISHER</role:PBL>
<role:TRL>TRANSLATOR</role:TRL>
<nines:genre>GENRE</nines:genre>
<nines:genre>ANOTHER GENRE</nines:genre>
<nines:freeculture>TRUE OR FALSE</nines:freeculture>
<dc:date>4-DIGIT-DATE</dc:date>
<nines:thumbnail rdf:resource="http://YOUR_PUBLICATION.ORG/THUMBNAIL.JPG"/>
<nines:image rdf:resource="http://YOUR_PUBLICATION.ORG/FULL_SIZE_IMAGE.JPG"/>
<nines:source rdf:resource="http://YOUR_ENCODED_RESOURCE.XML"/>
<nines:text rdf:resource="http://PLAIN_TEXT_OBJECT_TRANSCRIPTION.TXT"/>
<rdfs:seeAlso rdf:resource="http://YOUR_PUBLICATION.ORG/YOUR_OBJECT.HTML"/>
<dcterms:hasPart rdf:resource="ANOTHER_OBJECT_CONTAINED_BY_THIS_OBJECT"/>
<dcterms:isPartOf rdf:resource="AN_OBJECT_THAT_CONTAINS_THIS_OBJECT">
<dc:relation rdf:resource="AN_ASSOCIATED_OBJECT">
</YOUR:NAMESPACE>
</rdf:RDF>
RDF Specification
Element Definitions
All element values should not include leading or trailing whitespace. In other words, <dc:date>1875</dc:date> is correct while <dc:date> 1875 </dc:date> is incorrect.
<rdf:RDF>
- the root element of the RDF file, listing namespace declarations with multiple "
xmlns:___" attributes - it isn't necessary to reference an actual XSD schema to validate the RDF--use the "xmlns" value only to establish a unique namespace
- Required? YES
- Can appear? ONCE
<custom_namespace rdf:about="value">
- denotes the object
- a child element of
<rdf:RDF>with a project defined namespace -
"rdf:about"attribute records the unique id for the object - Required? YES
- Can appear? MULTIPLE
<nines:archive>
- a shorthand reference to the contributing project or journal, one word such as "rossetti" or "rc-praxis." This word should be unique to this particular set of content. You shouldn't, therefore, choose a reference like "PodunkUP" if Podunk University Press intends to contribute a different set of content in future. (Instead, choose "PodunkUP-journal1.")
- Required? YES
- Can appear? ONCE
<dc:title>
- the title of the object
- Required? YES
- Can appear? ONCE
<dcterms:alternative>
- an alternative title of the object
- Required? NO
- Can appear? MULTIPLE
<dc:source>
- title of the larger work, resource, or collection of which the present object takes part
- can be used for the title of a journal, anthology, book, online collection, etc.
- Required? NO
- Can appear? MULTIPLE
<dc:subject>
- a single keyword which will be converted into user tags
- Required? NO
- Can appear? MULTIPLE
<role:***>
- individual who created the object
- possible element names include
<role:ART>for Visual Artist<role:AUT>for Author<role:EDT>for Editor<role:PBL>for Publisher<role:TRL>for Translator
- NINES is developing a system to allow contributors to find the conventional expression(s) of an agent's name. Meanwhile, contributors may choose to consult the Library of Congress authorities list. Please be internally consistent and keep good records of any names you use.
- Please note: each element's content values pertain only to the object at hand, not to the object's content or subject matter; when you list a particular name as "author," this should be the author of the object, not an author described in the object's text.
- NINES strongly encourages using
<role:ART>or<role:AUT>, even when the agent is unknown or anonymous. In such cases, use the standard values "Unknown" or "Anonymous." For example,<role:AUT>Unknown</role:AUT>. Variants of those values ("Unk." or "Anon.") will degrade the usability of the faceted browser. - Required? YES
- Can appear? MULTIPLE
<nines:genre>
- basic descriptive genres for NINES materials
- Each object is required to have at least one valid genre from the list below. (Please note: we formerly required that all objects be typed either "Primary" or "Secondary." Due to extreme ontological unease, this is no longer a NINES requirement!)
- Required? YES
- Can appear? MULTIPLE
Available Genre Values Architecture Ephemera Music Poetry Artifacts Fiction Nonfiction Religion Bibliography History Paratext Review Collection Leisure Periodical Visual Art Criticism Letters Philosophy Translation Drama Life Writing Photograph Travel Education Manuscript Citation Book History Politics Reference Works Family Life Law Folklore Humor
<dc:date>
- date of the object
- may contain either a four digit year or a
<nines:date>element - Please note: contributors should, when at all possible, attempt to include a date even when a date value is unknown or uncertain
- Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the
<nines:date>formula to record a human-readable value (<rdfs:label>) and a computational value (<rdf:value>).- To narrow a range to a decade, replace the last year digit with "u." E.g. 1860's are written as "186u" in
<rdf:value>. - To narrow a range to a century, replace the last two year digits with "u." E.g. 1800's are written as "18uu" in
<rdf:value>. - The value of
<rdfs:label>can be anything one would like: "1860's", "1800's", "Likely the 1860's", "1860 through 1869", etc.
- To narrow a range to a decade, replace the last year digit with "u." E.g. 1860's are written as "186u" in
- Unknown or uncertain dates can, in most cases, be narrowed to a possible date range, be it a decade or a century; contributors should use the
<dc:date>
<nines:date>
<rdfs:label>1890-99 (circa)</rdfs:label>
<rdf:value>189u</rdf:value>
</nines:date>
</dc:date>
- Objects which were produced over a number of years can receive a date range. Again, use the
<nines:date>scheme. The<rdfs:label>takes any human-readable formulation, e.g. "1861 through 1862".<rdf:value>would encode the start-date and end-date in the range in a comma-separated format, e.g. "1861,1862".
- Objects which were produced over a number of years can receive a date range. Again, use the
<dc:date>
<nines:date>
<rdfs:label>1891-93</rdfs:label>
<rdf:value>1891,1893</rdf:value>
</nines:date>
</dc:date>
- Objects that are uncertainly dated but still known to be composed within a specific date range should receive a hybrid formulation, involving two
<dc:date>elements. One<dc:date>records the date range; the second<dc:date>marks the object's date as "Uncertain".
- Objects that are uncertainly dated but still known to be composed within a specific date range should receive a hybrid formulation, involving two
<dc:date>
<nines:date>
<rdfs:label>Sometime between 1891 and 1893</rdfs:label>
<rdf:value>1891,1893</rdf:value>
</nines:date>
</dc:date>
<dc:date>Uncertain</dc:date>
- Objects worked on in nonconsecutive years should receive distinct
<dc:date>elements for each year. So, for an object begun in 1890, put on hiatus in 1891, then concluded in 1892, the encoding would be ...
- Objects worked on in nonconsecutive years should receive distinct
<dc:date>1890</dc:date> <dc:date>1892</dc:date>
- Required? YES
- Can appear? MULTIPLE
<nines:date>
- element used when contributor wants to preserve more human readable date information while also including a formal date value
- has two child elements,
<rdfs:label>and<rdf:value> - Required? NO
- Can appear? ONCE
<rdfs:label>
- preserves a human readable date value, e.g. "1806 (circa)"
- will appear as the "Date" value in COLLEX query results
- Required? NO
- Can appear? ONCE
<rdf:value>
- formal, four-digit date value of the
<rdfs:label>contents - used for computational sorting and querying
- Required? NO
- Can appear? ONCE
<nines:freeculture>
- if present, a "true" value denotes that the content is free and available for use by all people in all places, whereas as a "false" value denotes that the content is restricted in some way to subscribers.
- Required? NO (defaults to "true" if not present)
- Can appear? ONCE
<nines:source rdf:resource="">
- pointer to the web-accessible source code for the data (usually in XML).
- Required? NO
- Can appear? ONCE
<rdfs:seeAlso rdf:resource="">
- pointer to the web-accessible object as it is rendered in your own interface. distinct urls displaying the same content should each get an rdfs:seeAlso entry.
- usually an html page. During indexing, the NINES server issues a HEAD request to the specified URL (not a GET) and follows redirects.
- Required? NO
- Can appear? MULTIPLE
<nines:text>
- contains either:
- 1) URL to a web-accessible, plain text transcription of the object, like the following:
-
<nines:text rdf:resource="http://www.rossettiarchive.org/docs/1-1835.raw.txt"/>
- 2) plain text of the transcript within the nines:text element, such as:
-
<nines:text>full text goes here</nines:text>
- indexed by the COLLEX search engine and used for full-text queries. This should be a "pure" transcript of the text content of the object, without extraneous text from navigation elements, copyright statements, etc. Encode plain text in UTF-8 format.
- Required? NO
- Can appear? ONCE
<nines:image rdf:resource="">
- pointer to the web-accessible, full-size digital image of the object
- Required? NO
- Can appear? ONCE
<nines:thumbnail rdf:resource="">
- pointer to the web-accessible, thumbnail-sized digital image of the object. We suggest that you make your thumbnails no larger than 100 pixels in either height or width.
- Required? NO
- Can appear? ONCE
<dcterms:hasPart rdf:resource="">
- pointer to divisions of the present object which have their own RDF objects
- expresses a hierarchical relationship
- e.g. a book object could points to its subordinate chapter objects
- not currently exploited by COLLEX, but useful in the future for describing a graph of objects
- Required? NO
- Can appear? MULTIPLE
<dcterms:isPartOf rdf:resource="">
- pointer to the RDF object of which the present object is a division
- expresses a hierarchical relationship
- e.g. a chapter object points to a book object
- Required? NO
- Can appear? MULTIPLE
<dc:relation rdf:resource="">
- pointer to an associated resource
- provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves
- e.g. images in a document, other volumes in a series or items in a collection
- Required? NO
- Can appear? MULTIPLE
Testing, Troubleshooting, and Submitting RDF
The W3C makes available a great RDF Validator. Use this service to ensure your RDF parses correctly and to gain a deeper understanding of the graph nature of RDF (by enabling the graph display option).
NINES itself has developed a mechanism for contributors to upload their own RDF submissions, parse them against our schema, and test and tinker with them in a sandbox Collex interface. Once you've prepared a set of RDF you'd like to test, you can gain access to this data administration system (or "admin app") by contacting the appropriate editorial board (Romantic, Victorian, or American) for your project.
Further instructions are included on the "help" page of the admin app. You can also use the admin app to submit your RDF for formal review and inclusion into NINES.
The Importance of Being Stable
We recommend linking to your RDF in the meta tags of your HTML as follows:
<link rel="meta" type="application/rdf xml" href="myobject.rdf"/>
These links are a semantic web "best practice." That said, Collex does not currently pick up changes to your HTML-linked RDF in any automated way. Instead, when you have revised RDF, you should upload it through the data administration system as a fresh batch. Please note that your fresh upload will completely replace all the RDF records NINES currently holds for your project. This means that the unique object id's expressed in each rdf:about field should remain stable.
These id's are the most brittle aspect of the NINES system. If you change an id, all the user-created content built on top of your object will be lost or ruined. This includes tags and annotations as well as NINES exhibits, such as course syllabi or critical essays.
The requirement that you keep stable NINES id's should not impact your ability to alter identifiers within your own archive at will.
Special Considerations for Dynamic Content
The Collex software matches a site's public URL to the one given in the rdfs:seeAlso link in order to make objects collectible from the bookmarklet. Web resources generated from a database or XSLT at run time present additional challenges, as parameters may be re-ordered or absent. Rather than listing every URL permutation as an rdfs:seeAlso entry, one should explicitly reference the RDF from the meta tag in the HTML (see note above). The Collex system then matches the rdf:about unique identifier for objects defined in your RDF and objects loaded in Collex.
The preferable solution for dynamic driven sites is to use URIs which hide the underlying technology, which is certain to change. See the following article for a technical explanation for future proofing your URIs:
http://www.wrox.com/WileyCDA/Section/id-301495.html
