Sad Metadata Kitty is sad because it took only two SPARQL queries to hose the server. (Original photograph by CogSciLibrarian used by permission)
seen from Brazil
seen from China

seen from T1
seen from United States
seen from United States
seen from United States

seen from United States

seen from Malaysia

seen from United States

seen from Türkiye

seen from Malaysia
seen from Türkiye

seen from Germany
seen from France
seen from China

seen from Malaysia
seen from Philippines

seen from Australia
seen from China

seen from Germany
Sad Metadata Kitty is sad because it took only two SPARQL queries to hose the server. (Original photograph by CogSciLibrarian used by permission)
Il summit dei linked data applicati ai beni culturali si terrà il 28 e 29 giugno 2017 presso la Fondazione Giorgio Cini
Sad Metadata Kitty is not so sure about the Open World Assumption... prefers Closed and Heated Model
Linking Data in Sydney
By Geoff Browell, Head of Archives Services
I was fortunate to attend the biennial Linked Open Data, Libraries, Archives, Museums summit in early July in Sydney, Australia. I played a very small role in setting it up, as a member of the organising committee. The conference is an opportunity for archivists, librarians, museum curators and information professionals and IT experts to meet and discuss the latest developments in Linked Data among higher education, heritage and ‘memory’ institutions, worldwide. Delegates have the chance to hear about successful (and unsuccessful) projects and take part in targeted discussions on the future of the technology, and encourage new collaborations. The event features the ‘Challenge’ – an open competition for the best application of Linked Data in a cultural setting. The summit adopts the ‘un-conference’ format without pre-prepared papers, at which relevant issues can be aired and debated and sub-groups convened to address specific topics.
View this graph of attendees: https://graphcommons.com/graphs/0f874303-97c2-4e53-abc6-83a13a1a2030
What is Linked Data?
Linked Data is a way of structuring online and other data to improve its accuracy, visibility and connectedness. The technology has been available for more than a decade and has mainly been used by commercial entities such as publishing and media organisations including the BBC and Reuters. For archives, libraries and museums, Linked Data holds the prospect of providing a richer experience for users, better connectivity between pools of data, new ways of cataloguing collections, and improved access for researchers and the public.
It could, for example, provide the means to unlock research data or mix it with other types of data such as maps, or to search digitised content including books and image files and collection metadata. New, more robust, services are currently being developed by international initiatives such as Europeana which should make its adoption by libraries and archives much easier. There remain many challenges, however, and this conference provided the opportunity to explore these.
The conference comprised a mix of quick fire discussions, parallel breakout sessions, 2-minute introductions to interesting projects, and the Challenge entries.
[photo: Work in progress at the LODLAM summit]
Quick fire points from delegates
Need for improved visualisation of data (current visualisations are not scalable or require too much IT input for archivists and librarians to realistically use)
Need to build Linked Data creation and editing into vendor systems (the Step change model which we pursued at King’s Archives in a Jisc-funded project)
Exploring where text mining and Natural Language Processing overlap with LOD
World War One Linked Data: what next? (less of a theme this time around as the anniversary has already started)
LOD in archives: a particular challenge? (archives are lagging libraries and galleries in their implementation of Linked Data)
What is the next Getty vocabularies: a popular vocabulary that can encourage use of LOD?
Fedora 8 and LOD in similar open source or proprietary content management systems (how can Linked Data be used with these popular platforms?)
Linked Data is an off-putting term implying a data-centric set of skills (perhaps Linked Open Knowledge as an alternative?)
Building a directory of cultural heritage organisation LOD: how do we find available data sets? (such as Linked Open Vocabularies)
Implementing the European Data Model: next steps (stressing the importance of Europeana in the Linked Data landscape)
Can we connect different entities across different vocabularies to create new knowledge? (a lot of vocabularies have been created, but how do they communicate?)
Day One sessions
OASIS Deep Image Indexing (http://www.synaptica.com/oasis/).
This talk showcased a new product called OASIS from Synaptica, aimed at art galleries, which facilitates the identification, annotation and linking of parts of images. These elements can be linked semantically and described using externally-managed vocabularies such as the Getty suite of vocabularies or classifications like Iconclass. This helps curators do their job. End users enjoy an enriched appreciation of paintings and other art. It is the latest example of annotation services that overlay useful information and utilise agreed international standards like the Open Annotation Data Model and the IIIF standard for image zoom.
We were shown two examples: Botticelli’s The Birth of Venus and Holbein’s The Ambassadors for impressive zooming of well-known paintings and detailed descriptions of features. Future development will allow for crowdsourcing to identify key elements and utilising image recognition software to find these elements on the Web (‘find all examples of images of dogs in 16th century public works of art embedded in the art but not indexed in available metadata’).
This product mirrors the implementation of IIIF by an international consortium that includes leading US universities, the Bodleian, BL, Wellcome and others. Two services have evolved which offer archives the chance to provide deep zoom and interoperability for their images for their users: Mirador, and the Wellcome’s Universal Viewer (http://showcase.iiif.io/viewer/mirador/). These get around the problem of having to create differently sized derivatives of images for different uses, and of having to publish very large images on the internet when download speeds might be slow.
Digital New Zealand
Chris McDowall of Digital New Zealand explored how best to make LOD work for non-LOD people. Linked Open Data uses a lot of acronyms and assumes a fairly high level of technical knowledge of systems which should not be assumed. This is a particular bugbear of mine, which is why this talk resonated. Chris’ advocacy of cross developer/user meetups also chimed with my own thinking: LOD will never be properly adopted if it is assumed to be the province of ‘techies’. Developers often don’t know what they are developing because they don’t understand the content or its purpose: they are not curators.
He stressed the importance of vocabulary cross-walks and the need for good communication in organisations to make services stable and sustainable. Again, this chimed with my own thinking: much work needs to be done to ‘sell’ the benefits of Linked Data to sceptical senior management. These benefits might include context building around archive collections, gamification of data to encourage re-use, and serendipity searches and prompts which can aid researchers. Linked Data offers the kind of truly targeted searching in contrast to the ‘faith based technology’ of existing search engines (a really memorable expression).
He warned that the infrastructure demands of LOD should not be underestimated, particularly from researchers making a lot of simultaneous queries: he mooted a pared down type of LOD for wider adoption.
Chris finished by highlighting a number of interesting use cases of LOD in Libraries as part of the Linked Data for Libraries (LD4L) project, a collaboration between Harvard, Cornell and Stanford (https://wiki.duraspace.org/pages/viewpage.action?pageId=41354028). See also Richard Wallis’ presentation on the benefit of LO for libraries: http://swib.org/swib13/slides/wallis_swib13_108.pdf
Schema.org
Richard Wallis of OCLC explored the potential of Schema.org, a growing vocabulary of high level terms agreed by the main search engines to make content more searchable. Schema.org helps power search result boxes one sees at the top of Google search return pages. Richard suggested the creation of an extension relevant to archives to add to the one for bibliographic material. The advantage of schema.org is that it can easily be added to web pages, resulting in appreciable improvement in ranking and the possibility of generating user-centred suggestions in search results. For an archive, this might mean a Google user searches for the papers of Winston Churchill and is offered suggested other uses such as booking tickets to a talk about the papers, or viewing Google maps information showing the opening times and location of the archive.
The group discussion centred on the potential elements (would the extension refer to thesis, research data, university systems that contain archive data such as Finance and student information?), and on the need for use cases and setting out potential benefits. I agreed to be part of an international team through the W3C Consortium, to help set one up.
[photo: Shakespeare window at the State Library of New South Wales]
Dork shorts/Speedos – these are impromptu lightning talks lasting a few minutes, which highlight a project, idea or proposal. View here: http://summit2015.lodlam.net/about/speedos/
Highlights:
Cultuurlink (http://cultuurlink.beeldengeluid.nl/app/#/): Introduction by Johan Oomen
This Dutch service facilitates the linking of different controlled vocabularies and thesauri and helps address the problem faced by many cultural organisations ‘which thesauri do I use?’ and ‘how do I avoid reinventing the thesauri wheel?’. The services allows users to upload a SKOS vocabulary, link it with one of four supported vocabularies and visualise the results.
The service helps different types of organisation to connect their vocabularies, for example an audio-visual archive with a museum’s collections. The approach also allows content from one repository to be enhanced or deepened through contextual information from another. The example of Vermeer’s Milkmaid was cited: enhancing the discoverability of information on the painting held in the Rijksmuseum in Amsterdam through connecting the collection data held on the local museum management system with DBPedia and with the Getty Art and Architecture Thesaurus. This sort of approach builds on the prototypes developed in the last few years to align vocabularies (and to ‘Skosify’ data – turn it into Linked Data) around shared Europeana initiatives (see http://semanticweb.cs.vu.nl/amalgame/).
Research Data Services project: Introduction by Ingrid Mason
This is a pan-Australian research data management project focusing on the repackaging of cultural heritage data for academic re-use. Linked Data will be used to describe a ‘meta-collection’ of the country’s cultural data, one that brings together academic users of data and curators. It will utilise the Australia-wide research data nodes for high speed retrieval (https://www.rds.edu.au/project-overview and http://www.intersect.org.au/).
Tim Sherratt on historians using LOD
This fascinating short explained how historians have been creating LOD for years – and haven’t even known they were doing it – identifying links and narratives in text as part of the painstaking historical process. How can Linked Data be used to mimic and speed up this historical research process? Tim showed a working example and a step by step guide is available: http://discontents.com.au/stories-for-machines-data-for-humans/ and listen to the talk: http://summit2015.lodlam.net/2015/07/10/lod-book/
Jon Voss on historypin
Jon explained how the popular historical mapping service, historypin, is dealing with the problem of ‘roundtripping’ where heritage data is enhanced or augmented through crowdsourcing and returned to its source. This is of particular interest to Europeana, whose data might pass through many hands. It highlights a potential difficulty of LOD: validating the authenticity and quality of data that has been distributed and enriched.
Chris McDowall of Digital New Zealand
Chris explained how to search across different types of data source in New Zealand, for example to match and search for people using phonetic algorithms to generate sound alike suggestions and fuzzy name matching: http://digitalnz.github.io/supplejack/.
Axes Project (http://www.axes-project.eu/): Introduction from Martijn Kleppe
This 6 million Euro EU-funded project aims to make audio-visual material more accessible and has been trialled with thousands of hours of video footage, and expert users, from the BBC. Its purpose is to help users mine vast quantities of audio-visual material in the public domain as accurately and quickly as possible. The team have developed tools using open source frameworks that allow users to detect people, places, events and other entities in speech and images and to annotate and refine these results. This sophisticated tool set utilises face, speech and place recognition to zero-in on precise fragments without the need for accompanying (longhand) metadata. The results are undeniably impressive – with a speedy, clear, interface locating the parts of each video with filtering and similarity options. The main use for the toolset to date is with film studies and journalism students but it unquestionably has wider application.
The Axes website also highlights a number of interesting projects in this field. Two stand out: http://www.axes-project.eu/?page_id=25, notably Cubrik (http://www.cubrikproject.eu/), another FP 7 multinational project which mixes crowd and machine analysis to refine and improving searching of multimedia assets; and the PATHS prototype (http://www.paths-project.eu/) ‘an interactive personalised tour guide through existing digital library collections. The system will offer suggestions about items to look at and assist in their interpretation. Navigation will be based around the metaphor of a path through the collection.’ The project created an API, User Interface and launched a tested exemplar with Europeana to demonstrate the potential of new discovery journeys to open access to already-digitised collections.
Loom project (http://dxlab.sl.nsw.gov.au/making-loom/): Introduction from Paula Bray of State Library of New South Wales
The NSW State Library sought to find new ways of visualising their collections by date and geography through their DX Labs, an experimental data laboratory similar to BL Labs, which I have worked with in the UK. One visually arresting visualisation shows the proportions of collections relevant to particular geographical locations in the city of Sydney. Accompanied by approving gasps from the audience, this showed an iceberg graphic superimposed onto a map showing the proportion of collections about a place that had been digitised and yet to be digitised – a striking way of communicating the fragility of some collections and the work still to be done to make them accessible to the public.
LODLAM challenge
19 entries were received: http://summit2015.lodlam.net/challenge/challenge-entries/
Open Memory Project. This Italian entry won the main prize. It uses Linked Data to re-connect victims of the Holocaust in wartime Italy. The project was thought provoking and moving and has the potential to capture the public imagination.
Polimedia is a service designed to answer questions from the media and journalists by querying multi-media libraries, identifying fragments of speech. It won second prize for its innovative solution to the challenge of searching video archives.
LodView goes LAM is a new Italian software designed to make it easier for novices to publish data as Linked Data. A visually beautiful and engaging interface makes this a joy to look at.
EEXCESS is a European project to augment books and other research and teaching materials with contextual information, and to develop sophisticated tools to measure usage. This is an exciting, ambitious, project to assemble different sources using Linked Data to enable a new kind of publication made up of a portfolio of assets.
Preservation Planning Ontology is a proposal for using Linked Data in the planning of digital preservation by archives. It has been developed by Artefactual Systems, the Canadian company behind ATOM and Archivematica software. This made the shortlist as it is a good example of a ‘behind the scenes’ management use of Linked data to make preservation workflows easier.
A selection of other entries:
Public Domain City extracts curious images from digitised content. This is similar to BL Labs’ Mechanical Curator, a way of mining digitised books for interesting images and making them available to social media to improve the profile and use of a collection.
Project Mosul uses Linked Data to digitally recreate damaged archaeological heritage from Iraq. A good example of using this technology to protect and recreate heritage damaged in conflict and disaster.
The Muninn Project combines 3D visualisations and printing using Linked Data taken from First World War source material.
LOD Stories is a way of creating story maps between different pots of data about art and visualising the results. The project is a good example of the need to make Linked Data more appealing and useful, in this case by building ‘family trees’ of information about subjects to create picture narratives.
Get your coins out of your pocket is a Linked Data engine about Roman coinage and the stories it has to tell – geographically and temporally. The project uses nodegoat as an engine for volunteers to map useful information: http://nodegoat.net/.
Graphity is a Danish project to improve access to historical Danish digitised newspapers and enhancing with maps and other content using Linked Data.
Dutch Ships and Sailors brings together multiple historical data sources and uses Linked Data to make them searchable.
Corbicula is a way of automating the extraction of data from collection management systems and publishing it as Linked Data.
[photo: delegates at the summit]
Day two sessions
Day two sessions focused on the future. A key session led by Richard Wallis explained how Google is moving from a page ranking approach to a triple confidence assertion approach to generating search results. The way in which Google generates its results will therefore move closer to the LOD method of attributing significance to results.
Highlights
Need for a vendor manifesto to encourage systems vendors such as Ex Libris, to build LOD into their systems (Corey Harper of New York University proposed this and is working closely with Ex Libris to bring this about)
Depositing APIs/documentation for maximum re-use (APIs are often a weak link – adoption of LOD won’t happen if services break or are unreliable)
Uses identified (mining digitised newspaper archives was cited)
Potential piggy-backing from Big Pharma investment in Big Data (massive investment by drugs companies to crunch huge quantities of data – how far can the heritage sector utilise even a fraction of that?)
Need to validate LOD: the quality issue – need for an assertion testing service (LOD won’t be used if its quality is questionable. Do curators (traditional guardians of quality) manage this?)
Training in Linked Data needs to be addressed
Need to encourage fundraising and make LO sustainable: what are we going to do with LOD in the next ten years? (Will the test of the success of Linked Open Data be if the term drops out of use when we are all doing it without noticing? Will 5 Star Linked Data be realised? http://5stardata.info/)
Summary
There were several key learning points from this conference:
The divide between technical experts and policy and decision makers remains significant: more work is needed to provide use cases and examples of improved efficiencies or innovative public engagement opportunities that the technology provides
The re-use and publication of Linked Data is becoming important and this brings challenges in terms of IPR, reliability of APIs and quality of data
Easy to use tools and widgets will help spread its use; avoiding complicated and unsustainable technical solutions that depend on project funding
Working with vendors to incorporate Linked Data tools in library and archive systems will speed its adoption
The Linked Data community ought to work towards the day Linked Data is business as usual and the terms goes out of use
LODLAM 2015 - the joy of the unconference
by Fiona Tweedie
The third international Linked Open Data in Libraries, Archives and Museums (LODLAM) summit took over the State Library of New South Wales for two days at the end of June. The fabulous Ingrid Mason encouraged me to attend and so I took a firm grip on my fear of Linked Data and presented myself bright and early at the State Library’s beautiful Mitchell Wing.
LODLAM is run as an unconference, which in this context means that the schedule is decided on the day, by participants, who pitch sessions that they would like to see happen. By pitching a session, you’re agreeing to be there and facilitate but everyone is responsible for making the session a success. The program for the day is then designed by arranging the sessions into a schedule (usually by moving post-it notes around on a board) and the day is set to begin. LODLAM 2015 consisted of a mix of discussion sessions, plenary presentations by the finalists in the LODLAM Challenge and Lightning Talks (nicknamed ‘speedoes’ for the occasion by Tim Sherratt)
The 'rules' or principles of the unconference are simple, set out here:
And one other thing - the law of mobility applies. If you’re not enjoying a session or are just curious about what’s happening elsewhere, you’re totally free to move.
Very intense final #lodlam breakout session on natural language processing. Great spot for good discussion. pic.twitter.com/Rza2gtSEbA
— Chris McDowall (@fogonwater)
June 30, 2015
Sessions can happen anywhere
An unconference isn’t always going to be the right format for a gathering. Participants need to bring a reasonable level of knowledge with them, as the free-flowing format means that they mayn’t get much orientation into a topic. Participants also need to be confident engaging and discussing. But as a way of connecting, learning and sharing for a group with common interests, the unconference format, mixed with lightning talks and perhaps a poster session, seems to me to offer an overdue refresh of the academic conference.
Personally, I found LODLAM really exciting and refreshing. I loved the fact that everyone was invited to contribute to the discussion - it struck me as a much better way to share knowledge than listening to someone read a script. And I really appreciated being free to move if I wanted to see what was going on elsewhere or the session turned out to be too technical for me. Anyone who has been trapped in a boring session at a conference will appreciate that liberty, I'm sure! And I love the implicit respect in the format; by inviting the participants to make the conference they want to attend, organisers are saying they value the expertise and commitment of their community.
The last couple of 'straight' academic conferences I have attended have struck me as a little tired. It's great to see experts speaking well on their topics - the keynotes at DH2015 are all the proof you need of that. But for me, the real value of a conference is in exchanging knowledge and meeting great people, which there just isn't enough space for in a traditional conference. So my challenge to everyone out there running a conference is to think about how you can mix up the format to re-energise your community.
I gave a 'speedoes’ talk about the project Fraser.digitalfabulists.org that has grown out of teaching NLTK with Daniel McDonald and Lachlan Musicman.
Less than 1 percent of websites have implemented Schema.org markup. However, pages with Schema.org integration rank better by an average of four positions compared to pages without Schema.org markup, according to recent findings by Searchmetrics.
It isn't clear to me what "schema.org markup" means--any semantic markup? Using Google's webmaster tools to implement that markup? Is the whole page marked up, or just a few key items, enough to form a snippet?
But, that more semantic markup is happening is what I am hearing in general; hearing that it is producing results is very good.
This is a site that was developed at Pratt Institute in the SILS program. The course was Programming for Cultural Heritage. The MetEagle team sought to identify and indicate origin information, using linked and open data methods, of a selected number of galleries at the Metropolitan Museum of Art. We hope that patrons and users will be able to better understand and contextualize the geographic origin of the unique objects offered by the Met.
One thing to say about digital humanities
I’ve been asked to give a talk about digital humanities to the Professional Historian’s Association Historically Speaking session in Melbourne. Now, I’d class myself someone who keeps up with discussions about digital humanities in order to provide better, more useful and useable access to museum collections and associated information. But as I’m neither a historian, nor an actual practitioner of digital humanities, writing a talk presents an interesting challenge.
Over the weekend I realised that what I should be doing was asking for ideas and input from the digital humanities community. Hooray for Twitter, and for the networked community in helping me with my question. I posted a simple message, asking for contributions on what is one thing that I should tell the audience? I’m extremely grateful for the swiftness and generosity of the answers – and even the offer of a readymade slide deck introducing digital humanities. Thanks so much for that @wragge!
In summary, I will tell the audience these things:
What are the digital humanities and what does it value?
They are traditional humanities combined with IT; making them both history and future orientated.
The things historians care about, such as archives, interpretation, meaning and historiography don’t disappear just because you’re using technology. The technology is used as the mechanism for discovery.
Digital humanities emphasises collaboration as a virtue. The ‘lone wolf’ scholar is less the norm. Sharing ideas, resources, community allows practitioners to go further and learn more than they would working alone.
Is digital humanities just for geeks?
Digital humanities is something all researchers can be part of; it’s not just for large institutions and IT geeks.
You don’t have to learn coding but a bit of scripting can be useful. It’s also a way of thinking – working with getting messy data into a structure, or trying some scripting, helps develop computational thinking.
Digital humanities means *open* data!
Digital humanities researchers value collaboration and partnerships – and to make the most of these they promote publishing data with unrestrictive licenses for reuse, and utilising linked open data principles.
Plan, from day one, to publish the data as well as the book. Make that data linked and open, even if the synthesis comes much later.
If research is funded by public money it must be open, no excuses.
Thanks very much to Twitter contributors @jamesinealing @erodley @mia_out @leoba @rahtz @jamescummings @annettestr @CriticalSteph @bestqualitycrab @ericdmj @jenguiliano @wragge whose thoughts have been compiled above.
If you have any more thoughts to add, please do so – you’d be very welcome!