==== File ==== A :code:`pcdm:File` and its various subclasses are not binary datastreams. Instead, they are RDF resources. In RDF, a reource is something that can be identified by a URI and described using RDF properties. A resource is not a binary file, but a conceptual resource that describes the file and links to its location or representation. In PCDM, there are two other ontologies that extend :code:`pcdm:File`: * `PCDM Use Extension `_ * `PCDM File Formats Extension `_ These extensions allow us to make something not just a file, but a specific type of file with a specific purpose. Having the ability to do this allows us to have more flexibility in our repository and interoperable applications. In Fedora, the :code:`/fcr:metadata` suffix is appended to the end of URI paths for binary files so that an RDF representation of the binary exists with the resource. Using this plus philosophy from RDF, we can both adhere and make meaningful connections to our resources. ------------- Minimum Model ------------- A simple File may have many more properties, but should look something like this: .. code-block:: turtle @prefix ex: . @prefix fedora: . @prefix rdf: . @prefix pcdm: . a pcdm:File ; pcdm:fileOf ex:Object ; fedora:hasBinary . -------------------- Rules and Guidelines -------------------- A File **MUST**: * be a RDF resource and an instance of :code:`pcdm:File` with an :code:`rdf:type` property that asserts this. This can be via a subclass of :code:`pcdm:File`. * have 1 :code:`fedora:hasBinary` property with :code:`1` binary datastreams if versioning is not enabled. * have 0 :code:`fedora:hasBinary` properties and 1 :code:`fedora:hasVersions` property with :code:`1-n` :class:`Version` resources. A File **SHOULD**: * have be a :code:`pcdm:File` even if it is also another type from :code:`pcdmuse` or :code:`pcdmff` as classes in neither ontology are direct subclasses. A File **MAY**: * have :code:`0-n` :code:`rdfs:label` values to make the resource more semantically understandable. * have a :code:`pcdm:fileOf` property that refers to its parent resource. * have :code:`0-1` :code:`dcterms:extent` values with file size represented as bytes. * have :code:`0-1` :code:`dc:format` with the MIME type of the file. * have :code:`0-n` checksums. If using multiple algorithms, you can use different properties or URN syntax. * have :code:`0-1` :code:`dcterms:created` values with the creation date of the file. * have :code:`0-1` :code:`dcterms:modified` values with the last modification date. * have other descriptive and technical metadata properties. --------- Valid Use --------- A :code:`pcdm:File` is valid as the subject of triples with these properties: * :code:`pcdm:fileOf` A :code:`pcdm:File` is valid as the object of triples with these properites: * :code:`pcdm:hasFile` -------- Examples -------- An HOCR File ============ An HOCR (HOCR or hOCR) file is an HTML-based file format used to store the results of Optical Character Recognition (OCR) processing. The format contains both the recognized text and information about the layout of the text on the image, such as font size, position, and structure (like paragraphs, lines, and words). It is commonly used to represent the structure of scanned documents where maintaining the spatial arrangement is essential. The file is HTML-based and contains the recognized text from a scanned image and precise positioning and size information of text blocks, helping to recreate the layout of the original document. To aid in external services using the text and positioning in something like IIIF presentation, an HOCR file might look like this: .. code-block:: turtle @prefix ex: . @prefix fedora: . @prefix rdf: . @prefix pcdm: . @prefix pcdmff: . ex:HOCR a pcdm:File, pcdmff:HTML ; pcdm:fileOf ex:FileSetWithCorrespondingImage ; fedora:hasBinary . An Image File that Should be Used as a IIIF Canvas ================================================== If we were to ever use Fedora as a true digital assets management system and include files for preservation, we may need to differentiate what is included in a IIIF manifest as a Canvas and what is stored in Fedora for preservation but not for display. In this case, a IIIF Canvas should look something like this: .. code-block:: turtle @prefix ex: . @prefix fedora: . @prefix rdf: . @prefix pcdm: . @prefix pcdmff: . @prefix pcdmuse: . ex:CanvasImage a pcdmff:Image, pcdmuse:ServiceFile ; pcdm:fileOf ex:FileSetWithCorrespondingImage ; fedora:hasBinary . An Image File that is just a Preservation File ============================================== Conversely, if it is stored for preservation, we might want to store more technical metadata about the binary datastream. It also might be less likely to have corresponding files like OCR and HOCR. In that case, it might look like this: .. code-block:: turtle @prefix ex: . @prefix fedora: . @prefix rdf: . @prefix pcdm: . @prefix pcdmuse: . @prefix nepomuk: . @prefix exif: . @prefix ebucore: . @prefix premis: . ex:PreservationImage a pcdmff:Image, pcdmuse:PreservationFile ; pcdm:fileOf ex:ImageWork ; fedora:hasBinary . ebucore:width "2106"^^ ; ebucore:fileSize "17765536"^^ ; premis:hasSize "17765536"^^ ; exif:orientation "normal*"^^ ; exif:colorSpace "RGB"^^ ; ebucore:hasMimeType "image/tiff"^^ ; ebucore:height "2808"^^ ; nepomuk:hashValue "99d14ee8c28517e10c637e0e0a675b94"^^ ; ebucore:filename "preservation-file.tif"^^ ; exif:software "Adobe Photoshop CS2 Windows"^^ .