Create and Manage METS in retrodigitization Markus Enders

41 Slides576.50 KB

Create and Manage METS in retrodigitization Markus Enders Goettingen State and University Library www.sub.uni-goettingen.de/GDZ

Digitization Center Located at State and University Library Göttingen Founded in 1997 Funded by DFG Build infrastructure Set up production line for digitization

Digitization Center Production line 3 bw/greyscale book scanners 2 color digitization working places Quality control Image enchancement Production line for all inhouse digitization projects Ca. 1.000.000 pages / year

Digitization Center Infrastructure Software to create contents Software to manage contents Software to present content on the web Hardware to store contents

Digitization Center Software to create content Software to manage content } Software to present content on the web Hardware to store and manage content DMS Infrastructure

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages

Document model Logical struture Monograph, chapters, articles etc. METS:structMap TYPE "LOGICAL" METS:div TYPE "Monograph" ID "log0001" DMDID "dmdlog0001" METS:div TYPE "TitlePage" ID "log0002"/ METS:div TYPE "Dedication" ID "log0003"/ METS:div TYPE "CurriculumVitae" ID "log0005"/ /METS:div /METS:structMap

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages METS:structMap TYPE "PHYSICAL" METS:div TYPE "BoundBook" ID "phys0001" METS:div TYPE "page" ID "phys0002" DMDID "dmdphys0001" METS:fptr FILEID "bitonal0001"/ /METS:div . /METS:div /METS:structMap

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages METS:structLink !--Monograph -- METS:smLink from "log0001" to "phys0001"/ !--Titelseite-- METS:smLink from "log0002" to "phys0002"/ . /METS:structLink

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata MODS extension – own namespace

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext with coordinates for words separate TEI/XML file, linked to METS

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext Problem TEI: tag physical structure in TEI (TEI only support page- and column breaks.

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext Solution: Tag smallest physical structure in fulltext: text-blocks ( q element)

Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext with coordinates for words One image per page

Production (Metadata) Excel spreadsheet Bibliographic information Structure information with metadata Pagination information

Excel spreadsheet – bibliographic information on Monograph level

Excel spreadsheet – pagination information Columns A and C: counted pages start and end, logical page numbers Columns D and E: uncounted pages start and end Columns M and N: calculated physical page numbers

Excel spreadsheet – structural information Column B: type of structure element Columns C and D: start location of strucutre element (sequence and page) Columns H and I: Author and Title of structure element

Excel spreadsheet: Conversion of content to XML-file using a visual basic script RDF-XML based file

Excel spreadsheet: Conversion of content to XML-file using a visual basic script RDF-XML based file Conversion of content to METS using JAVA (POI library) METS file still in beta-test

AGORA Editor Commercial program Structural and bibliographic metadata Images are displayed during capturing Pagination information is captured „automatically“

AGORA Editor

AGORA Editor Writes RDF/XML based file Converted to METS using Java program

Production (Metadata & fulltext) docWorks Software by CCS Structure data, Metadata and fulltext Direct METS output (no conversion necessary) Testing started in june

Production METS: Only docWorks has direct METS output For other solutions: Java program will convert output to METS Excel - METS RDF/XML - METS Can be used to migrate old data to METS

Management and Presentation Document Management System One platform for all digitization projects Development began in 1998 Defining own RDF/XML based format Cooperation with external company: „Satz-Rechen-Zentrum“, Berlin

Document Management System “AGORA” Java based server Windows Administration client Java based system; uses relational database Verity search engine for: metadata fulltext

Document Management System “AGORA” Data storage: Metadata, Structure data and fulltext in relation database Images stored in file-system

Document Management System “AGORA” Import: RDF/XML files (metadata; structure) Image data from file system TEI/XML for fulltext (stored in database) METS support in August-release Batch-import possible (hotfolder)

Document Management System “AGORA” Access: Web-Frontend HTML Templates (webmacro) XML-output possible (via webmacro) Caching of HTML pages - high performance

Document Management System “AGORA” Access: Web-Frontend www.webmacro.org HTML Templates (webmacro) XML-output possible (via webmacro) Caching of HTML pages - high performance

Document Management System “AGORA” Access: Web-Frontend HTML Templates (webmacro) XML-output possible (via webmacro) Caching of HTML pages - high performance

DMS “AGORA” Page view: zoom with on-the fly conversion of images

DMS “AGORA” Hitlist:

DMS “AGORA” Hitlist: Image highlighting possible (fulltext search)

Document Management System “AGORA” Access: JAVA API Full functionality available: Add, update, read and delete elements retrieval OAI-PMH implementation based on API

Document Management System “AGORA” Export: XML export (with images)

Document Management System “AGORA” PDF-Export – logical structure as bookmarks:

Future document model Logical struture Monograph, chapters, articles etc. Physical structure Pages, columns. Descriptive Metadata Technical Metadata for images: NISO / MIX Fulltext Derivates of content files (images)

Future document model Metadata production line (using METS) docWorks AGORA Editor METS Converter AGORA DMS Archive

Further information GDZ http://gdz.sub.uni-goettingen.de DigiZeitschriften (example) http://www.digizeitschriften.de AGORA http://www.agora.de

Back to top button