ACERA 2011 April 6, 2011 ERA Technology and Development
37 Slides2.89 MB
ACERA 2011 April 6, 2011 ERA Technology and Development Strategy (System Architecture Development) Meg Phillips and Quyen Nguyen ACERA - April 2011
Agenda System Requirements Design Approach System Architecture Related Work Conclusion 2 ACERA - April 2011
System Requirements Extensibility: record types, data types, and services could be added without extensive redesign. Evolvability: Evolvability new technologies could be inserted using standards APIs and interfaces. Availability: key functions must be highly available. Scalability : adapt to record volume and user community growth. Security: protection of system and its assets. User Friendly: browser interface, intuitive, 508 compliance. 3 ACERA - April 2011
Design Approach Develop ERA Reference Architecture – Correct deficiencies in I1 – Architecture Tool to guide current and future design and development starting I3 Goal is to build a Robust Platform – to develop, add and enhance services and applications – Adaptive to changes, especially business rules – Foundation for Preservation and the Access framework, whose components evolve at different pace Fast pace for latter due to Internet, Web 2.0, Social Media. Slow pace for former Standard Interface is key – Open standards from Presentation to Backend layers – Domain standards such as OAIS (Open Archive Information System) and PREMIS (PREservation Metadata Implementation Strategies) Data-minded and Security-minded 4 ACERA - April 2011
Design Approach: Reference Architecture Facilitate system evolution to new technologies such as Cloud Computing, Web 2.0 Social Media, and future technologies. – Long term survivability of system – Take advantage of new technologies: potential reduction of lifecycle cost. – Follow federal mandate and better serve public (e.g. Open Gov.) Reference Architecture helps us leveraging Community Support – Very important due to some uncharted territory – Take advantage of community expertise – Reduce development cost Well-defined system interfaces Well-defined Data and Metadata Model Publish Reference Architecture 5 ACERA - April 2011
System Architecture: Three Pillars Evolvable System Architecture 6 ACERA - April 2011
System Architecture : SOA Paradigm Evolvable System Architecture 7 ACERA - April 2011
OAIS Reference Model OAIS Preservation Planning P R O D U C E R Description Data Management Queries Ingest SIP Archival Storage AIP Access AIP Results DIP C O N S U M E R Administration MANAGEMENT 8 ACERA - April 2011
Designing Services [mesoa 2009] Service-Oriented Architecture (SOA) Paradigm: – Services – Enterprise Service Bus (ESB) Starting from OAIS model, design Business Services: – Ingest – Preservation – Access Design lower level services to support those Business Services – Tool Services: Virus Scan, File Format Identification, etc. – Common Services: Logging, Authorization, etc. Composition of low-level services into business services made possible by ESB with standards-based middleware Flexibility and extensibility: add and replace services 9 ACERA - April 2011
Ingest Process Evolvable Architecture allows integration with various file identification tools: DROID, Jhove, JAI, pCOS, etc. Web Services made out of Tools (COTS, FOSS) Old tools can be replaced by new tools. New tools can be added. Capability allows the system to leverage open software developed by the digital library and archiving community. 10 ACERA - April 2011
Preservation: Transformation Process Evolvable Architecture allows integration with various transformation tools depending on the file types. Tools are web services For the same file type, a new transformation tool with better conversion can be added. For a new file type, a new transformation tool can also be added and used. 11 ACERA - April 2011
Preservation: Future Choice of Strategy Evolvable Architecture allows adding a new branch for a new preservation strategy 12 ACERA - April 2011
System Architecture: Metadata Model Evolvable System Architecture 13 ACERA - April 2011
ACE: Motivation ERA needs the capability to create and manage different versions of an electronic record, and relate them to a single logical entity: preservation, redaction TIFF JPEG JPEG Image of Gen. George B. McClellan McClellan.tiff McClellan.jpg Digital Master Version Online Access Version Memorandum MS MS Word Word .D .D oc oc Original Version ERA ERA Transformation Transformation Tool Tool ACERA - April 2011 Preservation Version 14
PREMIS-based ACE 15 ACERA - April 2011
ACE Structure Multiplicity of Representations and Objects Usage Preservation transformation Redaction Relationships – With Business Objects – Between representations – Between Objects Multiple pages of a digitized record Extensible implementation which could be used in future for: – Archival Description – Technical Metadata of Digitized Materials 16 ACERA - April 2011
Archival Asset Package [nist 2010] Adherence to Archival Information Package (AIP) in OAIS Self-contained digital object Data model used to Import & Export between services and systems ACERA - April 2011 17
System Architecture: Content Server Evolvable System Architecture 18 ACERA - April 2011
Content Server within OAIS Model OAIS Preservation Planning P R O D U C E R Description Content Server Data Management Queries Ingest SIP Archival Storage AIP Access AIP Results DIP C O N S U M E R Administration MANAGEMENT 19 ACERA - April 2011
Content Server [syscon 2010] A Content Server is a logical construct to store and manage both data and metadata encapsulated in an Archival Asset Package (AAP): – Insert, Retrieve, Update, Delete and Search Expose a simple interface Hide specific implementation of underlying storage management system Allow the system to have various technologies System can evolve to new technologies Allocation Policy can be based on business needs and requirements Different data collections: Federal, Presidential, Legislative, Census Security and access control considerations ACERA - April 2011 20
Related Work Survey of system architecture designed and developed for digital preservations and archives Validation of our approach Evolvability, extensibility and pluggability of services achieved by SOA – Planets project funded by the European Union – National Library of Australia – Portuguese National Archives RODA (Repository of Authentic Digital Objects), etc. Content Server similar to Content Manager used in the system of the Royal Dutch Library based on IBM Digital Information Archiving System (DIAS). 21 ACERA - April 2011
Conclusion: Summary ERA Reference Architecture is evolvable and extensible thanks to the synergy of the three pillars: – SOA Paradigm – Metadata Model – Content Server Concept Based on open standards: OAIS, PREMIS, XML, Web Services Implemented in the I3 release Benefits seen in Option Year 5 – Ease of modifying, and branching existing workflow – Reuse of underlying services – Facilitate development of Preservation Transformation framework – Creation of Transformation Strategy and Job Definition based on XForms and workflow middleware Positioned to take advantage of software tools and components developed by the digital preservation community ACERA - April 2011 22
Conclusion: Future Direction NARA Internal Community – Externalize architectural components such as ESB and Web Services to promote reuse. – Publish well-defined system interfaces – Publish well-defined Data and Metadata Model Federal Agencies – Share and learn experience with other agencies such as LOC, GPO, NASA, and others. Larger Community – Collaboration with other archives and digital libraries – Collaboration with Research community for Ingest, Preservation and Access functionalities. – Identify areas of possible usage of Free Open Source Software ACERA - April 2011 23
Publications [syscon 2010] Quyen L. Nguyen, Alla Lake and Mark Huber. “Evolvable and Scalable System of Content Servers for a Large Digital Preservation Archives”. Proceedings of 4th Annual IEEE Systems Conference , April 5-8, 2010, San Diego. [nist 2010] Quyen L. Nguyen and Dyung Le. “Archival Asset Package Design Concept for an OAIS System”. Proceedings of US Workshop Roadmap development for Digital Preservation Interoperability Framework (DPIF). NIST, Gaithersburg, Maryland, March 29-31, 2010. [mesoa 2009] Quyen L. Nguyen. “Towards a Design Approach for an Effective System Evolution of a Large Electronic Archive Information System”. Proceedings of 3rd International Workshop on a Research Agenda for Maintenance and Evolution of Service-Oriented Systems, September 20-26, 2009, Edmonton. [balisage 2009] Quyen L. Nguyen and Betty Harvey. “Agile Business Objects Management Application for Electronic Records Archive Transfer Process”. Proceedings of Balisage, the Markup Conference 2009, Aug 11-14 2009, Montreal. 24 ACERA - April 2011
References 1. The Consultative Committee for Space Data Systems. “Reference Model for an Open Archival Information System (OAIS)”, 2002. 2. Preservation Metadata: Implementation Strategies (PREMIS). http://www.loc.gov/standards/premis/. 3. Robert Kahn and Robert Wilensky. “A Framework for Distributed Digital Objects”. International Journal on Digital Libraries (2006) 6(2): 115–123. 4. Adam Farquhar and Helen Hockx-Yu. “Planets: Integrated Services for Digital Preservation”. The International Journal of Digital Curation, Issue 2, Volume 2 2007. 5. IBM DIAS for The Royal Dutch Library. http://www-935.ibm.com/services/nl/dias/ref/references.html. 6. National Library of Australia. http://www.nla.gov.au/dsp/documents/itag.pdf 7. Jose Carlos Ramalho et al. “RODA and Crib – a Service-Oriented Digital Repository”. http://repositorium.sdum.uminho.pt/bitstream/1822/8226/1/RodaAndCrib.pdf. 25 ACERA - April 2011
Thank You! Meg Phillips [email protected] Quyen Nguyen [email protected] ACERA - April 2011 26
Backup Slides ACERA - April 2011
Evolution Relativity [balisage 2009] Upper timeline shows evolution of the system itself. Lower timeline shows evolution of external systems that created to-be archived data. Note the lags between the two timelines (several years). Challenge: evolving itself to use current technologies of epoch Ta in order to provide long-term access to data born out of technologies at Tc time. 28
High-level Architecture Roadmap In Place In Progress Future ACERA - April 2011 29
Planets Interoperability Framework Planets’ core components: – Service Bus, and workflow – security, monitoring, transaction manager, etc. Evolvability and extensibility: allow plugging of third-party services ACERA - April 2011
The Royal Dutch Library Based on IBM’s Digital Information Archiving System (DIAS) Core component is Content Manager to store and manage both data and metadata – Library Server: Cataloging and indexing of metadata Facilitate search and retrieval Security Control for access – Object Server Store actual digital objects 31 ACERA - April 2011
Physical Implementation of AAP ZIP and URL options for encapsulating files. 32 ACERA - April 2011
N-Part Identifier Uniqueness – Within ERA; Can be integrated with current and future standard protocols such as Handle, DOI, PURL, etc. Allow access to different levels of the ACE structure Identifiers can be assigned in a decentralized system Example: ID of an Electronic Asset & its Metadata: 1.1–6–200902.1 The N-part ID can be made globally unique by prefixing it with the ERA namespace. For instance, if “era.nara.gov” is used, then the above ID becomes: http://www.era.nara.gov/1.1-1-200902.1 33 ACERA - April 2011
Possible Support of HTTP Protocol ERA-HTTP Example: http://era.gov/1.6-1-200903.123 HTTP Web Server: Receives HTTP request and passes it to ERA ID Resolver. ERAIDResolver: ERA-ID Resolves the 1st and 2nd segments of ERA-ID Content Server ID Example: CS # 6 Resolver in Content Server: Resolves remainder of ERA-ID if necessary Storage ID Storage Object Physical Location 34 ACERA - April 2011
Federators Global & Local Data Management Federator Route AIP Access Subsyste m. Description Management Business Object Management to Correct CS Results Federator: Federate Query Preservation AAP AAP Config. Policy C O N T E N T S Metadata E Management R V E ERA R C O N T E N T S Metadata E Management R V E ERA R Storage Query DIP Query AAP Results AAP Query Results Access Working Storage (AWS) Standard Operations and Interfaces: Put AAP Get AAP Update AAP Delete AAP Search: 1. Metadata 2. Asset Query Storage Config. Policy C O N T E N T S Metadata E Management R V E ERA R Query Storage ACERA - April 2011 35
Potential Access of Records in ERA If requested asset is in OPA’s local storage, just send out the asset to requestor. If requested asset with given URI is in ERA, then the request gets pooled and forwarded to ERA system, which will push the asset. ACERA - April 2011 36
Potential Reuse of Services Cross-use of Services facilitated ACERA by ESB - April 2011 37