The story of Casalini Libri is the story of using knowledge graphs to revolutionize what are perceived as one of the most analog of institutions: libraries.
Libraries hold vast amounts of data, metadata, and resources that have historically be siloed away from other libraries, indexes, and search engines. Unless a user searched a specific library’s data, they wouldn’t find the information on the book they were looking for, nor the context of that particular book in the rest of the knowledge base.
Casalini Libri built the Shared Virtual Discovery Environment (Share VDE) to bring the enormous amount of data produced and curated by libraries into the broader world of information. Through Share VDE, libraries can link their data and resources to other libraries across the world, and librarians and patrons can get the full picture of an author, topic, or resource they need no matter where the physical material is located.
We sat down with Casalini Libri CEO Michele Casalini to discuss this. Below are some of the highlights.
How was Casalini Libri able to lead the way in this shift?
Casalini Libri has worked for many decades — since the 1960s — as a bibliographic agency with large entities like national libraries and the Library of Congress, so we have a significant amount of expertise in this area. Because of our credibility in this area, we were able to build consensus among leading institutions, like Stanford, to move forward with a new approach. Lastly, as a contributor to bibliographic data for European applications, it was important to our corporate strategy to have this data instrumented in the most flexible and interoperable format.
Why is graph the right data model for libraries?
For decades, the digital cataloging of library content had been governed by the same Machine-readable cataloging (MARC) standards. The problem with MARC is that it’s a silo, focused only on single pieces of information within library sciences — it isn’t operable with other standards in other industries.
Libraries started to address this issue roughly a decade ago with the publication of a now-famous article “MARC must Die” which started the conversation about replacing MARC. The Library of Congress started analyzing the problem and this led to the creation of the RDF-based BIBFRAME standard. This ushered in the era of libraries being able to share their data with different types of institutions, say publishers or museums.
How did you build Share VDE?
Share VDE is an initiative that involves many different libraries. It started in North America with some academic libraries and has expanded to include national libraries in Europe. Initially the project was scoped around implementing graph technology for libraries, but gradually this grew. Casalini Libri started by resolving the entities within library data and converting opaque XML to BIBFRAME. They then built a Cluster Knowledge Base of entities and relationships between these entities, assigning each a unique identifier.
They incorporated data from external data sources and assigned unique identifiers to these as well. This has made it possible to share data across institutions and has created an interoperable language for information shared. The team also enriched existing MARC 21 data so legacy systems also could be used. Once all the data was mapped, it was made available to libraries via a portal.
What can Casalini Libri do now that it couldn’t do before?
The big shift has been in the move from description of resources to the description of information within resources. MARC required relationships on the resource or record level. With the shift to entity-level description, the data can be reused and linked to related items, meaning resources can be searched more easily and making information better findable.
Now entities are defined as real-world objects that can be connected to social media, archives, blogs, etc. This means that institutions beyond just libraries can link together and identify relationships between their materials and other resources. This is where the real value lies, in enriching the data so users and machines can take advantage of the added context.