Create and Manage METS in retrodigitization Markus Enders
41 Slides576.50 KB
Create and Manage METS in retrodigitization Markus Enders Goettingen State and University Library www.sub.uni-goettingen.de/GDZ
Digitization Center Located at State and University Library Göttingen Founded in 1997 Funded by DFG Build infrastructure Set up production line for digitization
Digitization Center Production line 3 bw/greyscale book scanners 2 color digitization working places Quality control Image enchancement Production line for all inhouse digitization projects Ca. 1.000.000 pages / year
Digitization Center Infrastructure Software to create contents Software to manage contents Software to present content on the web Hardware to store contents
Digitization Center Software to create content Software to manage content } Software to present content on the web Hardware to store and manage content DMS Infrastructure
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages
Document model Logical struture Monograph, chapters, articles etc. METS:structMap TYPE "LOGICAL" METS:div TYPE "Monograph" ID "log0001" DMDID "dmdlog0001" METS:div TYPE "TitlePage" ID "log0002"/ METS:div TYPE "Dedication" ID "log0003"/ METS:div TYPE "CurriculumVitae" ID "log0005"/ /METS:div /METS:structMap
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages METS:structMap TYPE "PHYSICAL" METS:div TYPE "BoundBook" ID "phys0001" METS:div TYPE "page" ID "phys0002" DMDID "dmdphys0001" METS:fptr FILEID "bitonal0001"/ /METS:div . /METS:div /METS:structMap
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages METS:structLink !--Monograph -- METS:smLink from "log0001" to "phys0001"/ !--Titelseite-- METS:smLink from "log0002" to "phys0002"/ . /METS:structLink
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata MODS extension – own namespace
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext with coordinates for words separate TEI/XML file, linked to METS
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext Problem TEI: tag physical structure in TEI (TEI only support page- and column breaks.
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext Solution: Tag smallest physical structure in fulltext: text-blocks ( q element)
Document model Logical struture Monograph, chapters, articles etc. Physical structure only pages; no metadata for pages Descriptive Metadata Fulltext with coordinates for words One image per page
Production (Metadata) Excel spreadsheet Bibliographic information Structure information with metadata Pagination information
Excel spreadsheet – bibliographic information on Monograph level
Excel spreadsheet – pagination information Columns A and C: counted pages start and end, logical page numbers Columns D and E: uncounted pages start and end Columns M and N: calculated physical page numbers
Excel spreadsheet – structural information Column B: type of structure element Columns C and D: start location of strucutre element (sequence and page) Columns H and I: Author and Title of structure element
Excel spreadsheet: Conversion of content to XML-file using a visual basic script RDF-XML based file
Excel spreadsheet: Conversion of content to XML-file using a visual basic script RDF-XML based file Conversion of content to METS using JAVA (POI library) METS file still in beta-test
AGORA Editor Commercial program Structural and bibliographic metadata Images are displayed during capturing Pagination information is captured „automatically“
AGORA Editor
AGORA Editor Writes RDF/XML based file Converted to METS using Java program
Production (Metadata & fulltext) docWorks Software by CCS Structure data, Metadata and fulltext Direct METS output (no conversion necessary) Testing started in june
Production METS: Only docWorks has direct METS output For other solutions: Java program will convert output to METS Excel - METS RDF/XML - METS Can be used to migrate old data to METS
Management and Presentation Document Management System One platform for all digitization projects Development began in 1998 Defining own RDF/XML based format Cooperation with external company: „Satz-Rechen-Zentrum“, Berlin
Document Management System “AGORA” Java based server Windows Administration client Java based system; uses relational database Verity search engine for: metadata fulltext
Document Management System “AGORA” Data storage: Metadata, Structure data and fulltext in relation database Images stored in file-system
Document Management System “AGORA” Import: RDF/XML files (metadata; structure) Image data from file system TEI/XML for fulltext (stored in database) METS support in August-release Batch-import possible (hotfolder)
Document Management System “AGORA” Access: Web-Frontend HTML Templates (webmacro) XML-output possible (via webmacro) Caching of HTML pages - high performance
Document Management System “AGORA” Access: Web-Frontend www.webmacro.org HTML Templates (webmacro) XML-output possible (via webmacro) Caching of HTML pages - high performance
Document Management System “AGORA” Access: Web-Frontend HTML Templates (webmacro) XML-output possible (via webmacro) Caching of HTML pages - high performance
DMS “AGORA” Page view: zoom with on-the fly conversion of images
DMS “AGORA” Hitlist:
DMS “AGORA” Hitlist: Image highlighting possible (fulltext search)
Document Management System “AGORA” Access: JAVA API Full functionality available: Add, update, read and delete elements retrieval OAI-PMH implementation based on API
Document Management System “AGORA” Export: XML export (with images)
Document Management System “AGORA” PDF-Export – logical structure as bookmarks:
Future document model Logical struture Monograph, chapters, articles etc. Physical structure Pages, columns. Descriptive Metadata Technical Metadata for images: NISO / MIX Fulltext Derivates of content files (images)
Future document model Metadata production line (using METS) docWorks AGORA Editor METS Converter AGORA DMS Archive
Further information GDZ http://gdz.sub.uni-goettingen.de DigiZeitschriften (example) http://www.digizeitschriften.de AGORA http://www.agora.de