1/2012 Internetové sociální sítě
Tématem prvního čísla ProInflow v roce 2012 je fenomén internetových sociálních sítí (tzv. soci… » více o výzvě
Souborné katalogy jsou specifickým typem knihovních katalogů, které shromažďují a poskytují informace z více knihoven prostřednictvím jednoho unifikovaného rozhraní. Souborné katalogy původně představovaly seznamy knih či seriálů knihoven z dané lokality nebo daného oboru. S rozvojem a expanzí digitálního obsahu se objevují souborné katalogy další generace, které shromažďují obsah z různých digitálních repozitářů. Tento článek prezentuje obecný přehled o problematice Open Access, digitálních repozitářů a interoperability a věnuje se budoucnosti souborných katalogů a knihoven obecně.
souborné katalogy, Open Access, interoperabilita, digitální repozitáře, budoucnost knihoven
Union catalogues are a specific type of library catalogue that aggregate and present information from multiple libraries in a single, unified interface. Union catalogues were originally listings of books or serials collected by a set of libraries in a given geographic area or subject area. As the information landscape continues to evolve with more exclusively-digital content being produced, a next generation of union catalogues is being developed to tie together content from disparate digital repositories. This paper will provide an overview of Open Access, digital repositories, and interoperability, and it will suggest a framework for considering the future of union catalogues and libraries in general.
union catalogues, Open Access, interoperability, future of libraries, digital repostitories
Since the Ancient Library of Alexandria was constructed in the 3rd century BC until recent years, libraries have enjoyed a fairly static environment. Supporting the information lifecycle – or, the range of activities surrounding how information is created, disseminated, collected, organized, catalogued, described, and preserved – has always been at the core of librarians’ work. Even so, traditional librarianship was mainly focused on printed materials such as books and journals and on tasks related to selecting, organizing, cataloguing, and preserving those objects. But the Internet and developments in technology are completely changing the way we think about information: how we access information; how we use, reuse, manipulate, and work with information and data; who has access to information; where and when information is accessible; and how we can ensure access to digital information in the future.
As we begin grappling with these questions, it is clear that traditional library roles tied to books and buildings are not sufficient in the digital world – a messy, disorganized, uncontrolled space where anyone with Internet access can publish information without intervention by a publisher or the peer-review process. Librarians’ skills – specifically, our ability to work with large masses of information, organize it, and present it in meaningful ways – are more important than ever. But we need to quickly shift how we think about our work and start to tackle en masse these new challenges.
Union catalogues were an early and highly-successful method by which libraries took advantage of new technologies to provide value-added services for both users and librarians. Traditional union catalogues are library catalogues that contain information about holdings from different places, all presented through a single interface. For users, union catalogues facilitate access to information by allowing users to search holdings for multiple libraries at once; browse through keywords or subject headings in larger, aggregated masses of holdings; and, at many libraries, see which library has a particular item as a first step in submitting an interlibrary loan request. Within a single institution, union catalogues connect holdings from multiple libraries or campuses. Specialty libraries such as law libraries and local consortia often establish union catalogues as a service to local patrons. National union catalogues present an entire country’s holdings in one web site.
Union catalogues are the result of a shared set of values common among libraries: interoperability among systems, interoperability of data through MARC records, and cooperation among participating libraries. They also require a shared goal of facilitating access to information for our users and doing what we can to create a seamless environment in which users can access information regardless of its physical location.
With the transition to a digital environment, this need to aggregate and present information from a variety of places within a single interface still exists – although users’ expectations from such services are more complex. Traditional union catalogues were the first step in getting access to a physical item; with “next generation” union catalogues, users are able to access items themselves. The information ecosystem has changed – and it is going to continue changing at an exponential rate. Libraries’ first step in working in this new environment has been to create a network of distributed repositories and associated services designed to collect, archive, disseminate, curate, describe, and preserve digital materials. These Open Access repositories along with the tools to aggregate their content are the next step in the evolution of union catalogues; it is an important step in the development of libraries as we move into the digital landscape.
“Open Access” refers to the practice of granting free access to scholarly research via the Internet. One of the most comment definitions notes that “Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions. What makes it possible is the internet and the consent of the author or copyright-holder.”1
Open Access can occur via two methods:
The peer-review process is critical for either method.
Furthermore, Open Access includes two level of “openness,” gratis versus libre. Gratis Open Access provides access to content at no cost other than the costs associated with accessing the Internet. The copyright holder retains all permissions except for the right to self-archive (i.e., deposit into a personal or institutional repository). On the other hand, libre Open Access includes a much wider range of rights and permissions for how others can use the content. In addition to granting permission to access an object, libre Open Access allows others to remix or edit content, depending on the exact details of the license associated with a particular item.2 Libre Open Access is often indicated with specific types of Creative Commons licenses such as the “Attribution” or “CC-BY” license in which the copyright owner allows others to “distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered.”3
The motivations for supporting Open Access are diverse but all have increased access at their core. Some common themes:
The Open Access movement started in the early 2000s with the Budapest Open Access Initiative (2001), a meeting which led to use of the phrase ‘Open Access,’ its formal definition, and one of the original declarations in support of Open Access, now an open document that can be signed by the public. In 2003, the other two formative meetings were held in Berlin, Germany and Bethesda, Maryland in the United States that led to further public statements in support of Open Access. Currently, institutions around the world maintain nearly 2,000 repositories4, and over 6,500 Open Access journals (including over 570,000 articles) are produced by publishers and other organizations5. Open Access has been most widely adopted within Europe and North America, although increasing activity is being seen in Latin America, Asia, and Africa. For developing countries, Open Access is appealing in two ways: to increase access to current research that was previously inaccessible and to disseminate locally-produced research on a global scale.
Even in the ten years since the launch of Open Access, the information ecosystem has changed. While the initial focus was on peer-reviewed scholarly articles (postprints), digital repositories can handle a wide variety of content types, and the boundaries between Open Access and other forms of scholarly materials held in digital repositories is blurring. Currently, many repository managers are examining the feasibility of incorporating enhanced publications (published research plus the associated data sets, documents, models, images, etc. associated with particular publications) into repositories. A wide variety of information and content exists in many repositories: grey literature, including conference proceedings and pre-prints; student scholarship such as electronic theses & dissertations (ETDs); and multimedia files – audio, video, digital images.
Each institution needs to be able to collect, curate, disseminate, and preserve the intellectual capital created at that institution, and so each institutional repository has value on its own. But much in the way that traditional union catalogues provided significant value to users by presenting holdings from multiple libraries in a single interface, the real value of Open Access and digital repositories lies in the potential to aggregate research outputs, present information in different ways, and allow for new types of data extraction, data mining, visualizations, and other forms of analysis.
In this regard, some of the earliest and largest repository harvesting projects both clearly identify with the legacy of union catalogues – they function in much the same as traditional union catalogues, but they tie together mainly full-text content from digital repositories, thus providing direct access to digital objects. These Open Access repository harvesting projects are truly a next generation of union catalogs, building on the same principles but going a step further.
One example, OAIster is described as “a union catalog of millions of records representing open access resources that was built from open access collections worldwide using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Today, OAIster includes more than 25 million records representing digital resources from more than 1,100 contributors.”6 The OAIster project was developed in the early 2000s at the University of Michigan; in 2009, it was transitioned to OCLC, who still maintains the service. Another significant repository harvesting project, the Networked Digital Library of Theses and Dissertations (NDLTD), also describes its harvesting repository as a union catalog: “The NDLTD Union Catalog contains more than one million records of electronic theses and dissertations. For students and researchers, the Union Catalog makes individual collections of NDLTD member institutions and consortia appear as one seamless digital library of ETDs.”7
OAIster, the NDLTD Union Catalog, and all other tools and services that tie together content from multiple repositories are possible because of interoperability, or the ability of systems to pass information back and forth between them in a usable format. Interoperability is the mechanism by which repositories and other systems, including traditional library catalogues, are able to work together.
To facilitate repository interoperability, most Open Access repositories use Dublin Core, an international standard, as the basis for the structure of their metadata. Dublin Core is the most generic, most flexible, and least granular of the commonly-used metadata standards; as a result of its generalness, it is used for a variety of purposes including not only repositories but also metadata tags coded into the HTML of many web pages.
Dublin Core comes in two varieties. Unqualified Dublin Core includes fifteen elements or fields, covering basic descriptive and administrative information: “Title,” “Creator,” “Subject,” “Description,” “Publisher,” “Contributor,” “Date,” “Type,” “Format,” “Identifier,” “Source,” “Language,” “Relation,” “Coverage,” and “Rights.” The fields are vague and, in many cases, lack consistency in how they are used from one implementation to another. In order to help alleviate some of the confusion often inherent in Dublin Core, additional “qualifier” elements were added to refine or give more specificity to fields. For example, qualifiers for the “Date” field include: “Created,” “Valid,” “Available,” “Issued,” and “Modified.” While the qualifiers are often useful to differentiate important bits of information, interoperability applications are tied to unqualified Dublin Core – so enough information needs to be conveyed in the standard 15 elements that an object is findable and usable.
OAI-PMH works by having metadata from a cooperating group of repositories exposed to a harvesting system. Harvesting systems extract the data, either through ongoing, updated processes or through a one-time ingest. Users search the harvested collection of metadata but then access the digital objects themselves from their original home repositories. Only the metadata is extracted from participating repositories, not the digital objects.
OAI-PMH was developed to serve as a low-barrier mechanism for interoperability – and indeed, now nearly all Open Access repositories include OAI-PMH functionality. Because of its simplicity, OAI-PMH has served as the basis for some of the most widely-adopted interoperability guidelines such as OAIster, the NDLTD Union Catalog, the Digital Repository Infrastructure Vision for European Research (DRIVER) Project, and the Open Access Infrastructure for Research in Europe (OpenAIRE). In all of these cases, a new interface is built as a layer on top of several repositories, bring together content from different places – very much following the model of union catalogues.
Interoperability can also help bind together objects that are somehow related to each other. In the past, scientists conducted research, collected data, and then wrote and published articles. Articles might have included some graphs, charts, drawings, or other affiliated images in limited quantities. Once published, that article became a static entity, and there was not a way to package other related items with a particular article. The digital environment has changed this mindset; researchers are now able to deposit an article in a repository alongside other related items – data sets, algorithms, scripts, transcribed or translated documents, photographs, drawings, charts, graphs, etc. We can even go one step further and create new materials related with research – video or audio interviews with authors discussing their publications, videos of conference presentations, slides from presentations related to the published research.
Curating these groups of objects therefore requires new tools. An interoperability project currently under development is trying to tackle this scenario. The Open Archives Initiative Object Reuse Exchange (OAI-ORE) project is designed to “define standards for the description of aggregations of Web resources.” The emphasis of OAI-ORE is on this type of information – these “enhanced publications” or compound digital objects, i.e. the entire body of materials tied to a specific publication such as an article plus its data set; related images, audio, or video files; charts, graphs or visualizations; code for software; etc. With OAI-ORE, the entire aggregate of objects and their associated metadata can be passed back and forth between systems as a unit.
Interoperability among repositories can be used to develop systems and tools that offer more functionality than traditional union catalogues. Another current project, Simple Web-Service Offering Repository Deposit (SWORD), is designed to support authors who are trying to contribute content to repositories. With the introduction of repositories, the publication process has been extended – and become more complicated. Now, after going through the peer review and editing process for journal publication, authors are being asked (or required) to deposit copies of articles into their institutional repositories. If an institutional Open Access policy is in place, authors might need to either attach a waiver or amendment to their copyright agreements at the point when they sign over copyright. They might need to identify which version of an article meets the publisher’s criteria for deposit into an institutional repository. Then they need to deposit the article in one or many repositories, depending on grant funding criteria. Deposits usually require a few steps plus creating metadata and possibly applying taxonomy terms. SWORD is one mechanism to try to facilitate this process, making it as simple and seamless as possible, by creating a single interface and having articles and their associated metadata deposited into multiple, pre-selected repositories.8
At the system level, interoperability is necessary for different repositories or other types of systems to be able to pass data or digital objects from one system to another. Each institution is responsible for its own repository, creating a distributed environment. Most institutions use one of a handful of types of systems, most of which are open source, for their repositories: DSpace, ePrints and Fedora are three commonly-adopted systems; Greenstone, CDS Invenio, and bePress Digital Commons are also all fairly common. Many Open Access journals use Open Journal System (OJS). Of all of these systems, Digital Commons is the only one that is not open source. Even so, all of the systems handle typical processes and workflows in strikingly different ways, making it challenging to create service layers that work consistently for all systems.
In addition, many institutions have multiple, inter-related systems that serve different needs: Open Access or institutional repositories usually maintained by libraries; courseware, possibly in conjunction with a repository of learning objects or Open Educational Resources (OERs) maintained by academic technology departments; and, increasingly, Current Research Information Systems (CRIS) such as euroCRIS, that provide administrative frameworks for managing information about awarded grants, funding, and project information as well as project data. Within a single institution, it is likely that several of these systems need to be able to share information with each other, or that one day a union catalogue might be created to look at different types of digital objects from multiple types of systems.
While challenging, in order to attain our goals with Open Access, it is necessary for these systems to be able to share information and pass objects as well as metadata back and forth. The digital environment should allow for us to do things that we couldn’t do before. We’re just starting to understand what this means. OAI-PMH, OAI-ORE, and SWORD are just three examples of ways that interoperability allows us to connect repositories to each other. The purposes of the three protocols are vastly different, but ultimately they are services or tools that make it possible for our users to interact with digital objects in unprecedented ways – either at the point of discovery for end users (OAI-PMH or OAI-ORE) or earlier in the information lifecycle as content owners are disseminating their objects (SWORD). If our role as librarians is to support the information lifecycle, interoperability allows us to develop tools to facilitate and further these processes in ways not even imaginable a few years ago.
Librarians have long understood the importance of our role in creating tools to facilitate access and discovery of information resources, and union catalogues in various formats and guises have been one way to tie together resources from disparate collections. If information exists, but users aren’t able to find it, what good it is? Within the proliferation of digital content, this same principle holds true. If libraries are advocating for Open Access and investing significant resources into building, curating, and sustaining digital repositories, then we also need to develop the tools that enhance how we are able to discover objects, access them, and ultimately, how we are able to use, re-use, or adapt them for new purposes.
Supporting Open Access and other digital collections is the next stage of library work, a stage that is directly tied in with the traditions established by union catalogues. But while union catalogues were traditionally supported by a few librarians or small library systems departments, it is time to reconsider all of this work in light of much larger changes within libraries. Technology, user expectations, and the global information ecosystem continue to change at breakneck speeds. While we can’t predict the future, we can look at what we’ve learned from the traditions of union catalogues and the current landscape and apply these lessons to libraries and information science in general so that we can be better positioned for the future.
The key is to let go of the traditional images of libraries as books and buildings and the traditional organizational structure of staff of libraries (public services and technical services). Instead, we need to take a step back and consider bigger-picture questions – and then let that drive strategic planning, resource allocation, organizational development, and decision making. The core of our work is not about books and buildings, the two images most closely associated with libraries, but rather it is about facilitating all aspects of the information lifecycle: creating new knowledge; curating or collecting, organizing, cataloguing and describing information; disseminating information; connecting people to information; and preserving information.
A few points to consider:
It is important that we keep at the forefront of our minds the fundamental purpose of our work and reflect on what we’ve learned as the world around us changes.
Libraries are in the midst of a dramatic period of change. We need to rethink our roles, embrace the changing information ecosystem, separate our professional identities from books and buildings, and focus on new ways to work with information in this changing environment. We need to remain consistent with the heritage and values of libraries and information science from the past 2000 years but embrace new technologies and think more broadly. The history of union catalogues is a useful blueprint and incorporates key values of cooperation, openness, and interoperability, but it is time to think more broadly and boldly and embrace the new information management questions being raised by the continually-evolving digital landscape.
CLOBRIDGE, Abby. Libraries in transition: From book collections & union catalogues to Open Access & digital repositories. ProInflow [online]. 14.10.2011 [cit. 17.05.2012]. Dostupný z WWW: <http://pro.inflow.cz/libraries-transition-book-collections-union-catalogues-open-access-digital-repositories>. ISSN 1804–2406.