Friday, 11 October 2013

Metadata Update #17 Metadata and MOOCs

MOOCs are a hot topic in higher education today.  A MOOC is a Massive Open Online Course.  I have been trying out various MOOCs for almost exactly two years since I was first invited to attend one in the fall of 2011 on the Coursera platform which was offered by Professor Ng at Stanford University.  Since then I’ve sampled about a dozen different courses, completely finished 5 and have become a Coursera Community Teaching Assistant for one course.

The media has a lot to say about MOOCs and here is a typical example:

So, what do MOOCs have to do with metadata and why is it the topic of one of my updates?  I’m taking an amazing class on Metadata via Coursera being offered by Jeffrey Pomerantz of the library school at UNC Chapel Hill:  The course will be starting week 7 next week so it’s nearly over.  But, I recommend keeping an eye out for it on Coursera if you are interested in Metadata.

But, why I am talking about this class in my blog post?  I think that it is worth talking about because this course has been designed to be specifically NOT about library metadata and specifically NOT for library workers.  It is for web designers, engineers, database managers and programmers. Despite the purpose of the class, I think that it is great for library metadata people.  What it does is exposed me to the very broad context of metadata and how many of the very basic principles libraries have been applying for years are also used in other contexts such as web metadata, structured data, linked data, etc (essentially what could be called the semantic web).

Perhaps when the course is finished, I will write another post about it.  At this point, I think that one of the most interesting things I have been exposed to is how various metadata schemas can be combined and reused in a diversity of ways to create data on the web that is highly discoverable.  For example, earlier this week I was looking at a music website which combined aspects of Dublin Core, the Web Ontology Language (OWL, yes the letters are in the wrong order) and LC genre terms to create very powerful and flexible searchability.

So, not only do I see decades-old library-related concepts in the topics that we discuss which are related to the semantic web, I find that the new things that I am learning about how metadata can be used in different contexts is also helping me to think about my work differently and showing me opportunities that I don’t think I would have recognized otherwise.  In addition, the nature of the MOOC brings people from all over the world and from various disciplines together.  I have found it interesting to read what others have to say about metadata and the various related technologies.

Tuesday, 3 September 2013

Metadata Update # 16- Metadata in the popular media

There hasn't been much new in the library world of metadata to report on or discuss over the summer.  I think that most of us have been busying learning, updating and applying much of what I have discussed already.  Of course, there was an update to RDA early in July but in the big scheme of all of the change that has been happening, things have been relatively quiet.

However, it appears that since late spring the popular media, particularly in the U.S., has become quite interested in the topic of metadata.  I know that I have heard stories on CBC radio more than once where metadata has been the topic of discussion.  One of the big questions that comes up is what metadata is.  So, it is a term that has worked its way into the vocabulary of many North Americans even if most are still not entirely clear as to what it means.  I think that a summary of how the media is describing it is  "it is not actually our phone conversations themselves but information about the phone calls such as what numbers they were placed to, where they originated from, the time of day", etc.  That's not a bad start.  It's a pretty narrow understanding but definately a good start.

Here is a little story from NPR where Larry Abramson discusses metadata and both cell phone calls and his own Gmail account:

If you were able to do my mini-MOOC you will likely notice the similarity between the sorts of things that Allistair Croll was talking about in terms of big data or linked data and the sort of mapping of relationships that Cesar Hidalgo was able to extract from Abramson's account.  While Croll was talking on a bigger scale such as "red-lining" neighbourhoods, this story deals with metadata at the personal level.

While I think that there is no question that the topic of metadata has risen to the surface lately, even after all of the media coverage a general understanding of the full range of metadata that exists and how it is used is still relatively thin.  This is because the stories are really only about how certain types of metadata can be used to track the activities of individuals - whether that be for intelligence or marketing purposes.  For the most part, the types of metadata used either for military intelligence or for marketing (e.g. gmail) is either generated automatically as messages are processed through equipment such as cell phones/cell towers or is literally created from words and phrases harvested from email messages.  While a lot can be learned from this type of metadata when a large quantity of it exists for analysis, I argue that it is still crude metadata relative to the type that we are accustomed to creating and using on a daily basis.

Friday, 31 May 2013

Metadata Update #15 - Is MARC really dead or dying?

Today I watched a video for a memorial for MARC .
It brought to mind the question of whether or not MARC is really dead or dying.  Or, is AACR2 dead or dying?


Many people would say yes.  Many people would say that both have been dead for a long time and they just haven’t fallen down yet.  This point of view has merit because both MARC and AACR2 had become technologically irrelevant in many ways starting in the 1980s. 


There are others who say that MARC and AACR2 will never die because there are millions of library catalogue records all over the world that have been created using these two standards.  Many of these records are owned and used by very small libraries in small organizations.  Some of the records are likely still in card catalogue cards.  This point of view has merit too.  It’s very unlikely that every library is going to have the resources or the interest in moving away from the older standards.


From my own point of view, I think that it’s important to remember that MARC and AACR2 are not people.  They may have a beginning or birth but they don’t necessarily die.  They don’t have feelings and they can’t be heroes or villains.  We don’t have to hold them up and venerate them as saints and we don’t have to disparage them as monsters either.  Before I cancelled my AUTOCAT subscription, I was getting the impression that there are some people who are confused on that point.  MARC and AACR2 are just tools that have been created in the library world in order to help us store metadata which facilitates, primarily, the discovery and retrieval of resources.  I agree that the library has so many more tools and options available to it today than it did in 1971 and sticking strictly to standards from that era is limiting.  But, anyone who has worked with both MARC and other metadata standards has to admit that for all its limitations, MARC is an incredibly robust and mature standard.  The library world has learned a lot from MARC and this is knowledge that will be taken forward into the future.  But has MARC been thrown under the bus by newer technologies?  I think not.  At least not completely.


Let my draw a parallel between MARC which is based in late 1960s technology and COBOL which was “born” about a decade earlier.  Through the 1960s and 70s, COBOL was a computer programming language at the heart of the incubating computer revolution.  Computer programmers during this time, for the most part, worked in or at least could read COBOL programming.  By the 80s and 90s COBOL was still taught to some computer science students but was in decreasing in popularity.  By the late 1990s, COBOL had gained notoriety because of the number of business and military programs which used it and also the fact that it had been applied in ways that made programs vulnerable to the Y2K bug.  So, after all of that Y2K bug excitement, did COBOL get thrown in the ground and buried?  No, interestingly it did not.  In 2002 there was a new version of COBOL developed and the programming was brought up and into the object-oriented programming realm.  Thus, COBOL has never actually died although it is not one of the big rollers in the world of programming as it once was.  It’s been through four major revisions along with other minor changes along the way and it still quietly functions in the background in many large government and business applications.


I think that MARC is on a similar trajectory.  MARC has been revised 21 times in the last 40 or so years.  It has also been adapted to an XML environment.  MARC has been king of the hill for a long time and it is on the downward journey.  I think that it’s getting next to impossible to deny that.  However, MARC has been the backbone of library metadata and not only has it shaped our thinking about metadata, it has challenged us and taught us things.  I think that this is exactly why MARC has become so robust over the decades.  I think that just like COBOL, MARC will run quietly in the background for many years to come.  I also think that those who are now aspiring to work in the area of library metadata would do well to learn and understand MARC rather than bury it, and as the person in the video said, not speak ill of the dead.  Yes, we will move our metadata into new environments such as linked data but MARC will always be with us in one form or another – even if it is just part of the way that we think about metadata elements.  Realistically, we will speak well of MARC and we will speak ill of MARC.


As for AACR2, this is an interesting case.  If AACR2 and ISBD had been followed precisely over the years, programs could be written to convert our bibliographic records to RDA and/or any new descriptive standards that emerge in the future.  For the most part, these standards have been applied well but not perfectly or consistently so I can see any attempts to automatically crosswalk and convert AACR2/ISBD metadata to RDA as likely more problematic than it is worth.  The latter statement is key:   Is it worth it?  Anyone who has done any crosswalking of metadata or even working with record sets for that matter knows that there is always going to be clean-up and that with effort conversion of standards-based metadata is generally possible.  However, for the descriptive elements in question would all of that work be worth it?  I really doubt it.  With the new standards such as RDA we can make better descriptive metadata but AACR2 metadata is still basically functional. To what degree will “ports” or “ill” help a user discover and access a resource?  While not irrelevant, much of the older AACR2 description isn’t absolutely critical.  And, changes such as eliminating the rule of three can’t be “updated” through an automated process since an RDA conversion would require having the item in hand and using it to add the RDA access points which are absent in the AACR2 record.  So, bottom line with AACR2?  We will have AACR2 formatted metadata for a long time even if our libraries have opted for creating new metadata using RDA or another descriptive standard.  Just like the SLIS student who learns MARC, so too should that student be learning AACR2.  Personally, I think that the depth and breadth of coverage of learning AACR2 needs to be dramatically reduced so as to make room in their program for learning other standards but not eliminated entirely.


Is MARC dead or dying?  No, not really.  Maybe MARC is going into an active retirement.

Wednesday, 8 May 2013

Metadata Update #14 - My Mini MOOC

So lately I've been busy basically taking one Coursera MOOC after the other.  I'm learning a lot and amazed that the quality of classes I am getting for free.

So, I put together my own little mini MOOC for folks interested in where the big picutre of data and metadata might be going.  There are no assignments, tests or discussion forums.  Instead, I have put together a collection of videos that you can watch (maybe one a day or one every couple of days) that discuss some of the thinking and technologies that are themes in discussions about where metadata and discovery in libraries are likely going.  I've organized the list from the most wide ranging and general to the most specific and library-related.

For the first three videos, I attended these sessions live and I've selected them because they are the best of the type that I have seen yet.

For the last 2 BIBFRAME videos, I didn't attend these particular sessions but I attended similar sessions at ALA MidWinter.  I apologize if the last videos are a little more dry than the previous ones.  This was the best that I could find. 

If you are interested in more information on this topic or want to start a discussion on any of these topics, let me know and I'll see what I can find or figure out.

Videos on Big Data, Shared Data, Linked Data and BIBFRAME

1.  Alistair Croll’s talk on Big Data from ALA Midwinter (I attended this one in person – it’s long but easy to watch)

2.  Shared Data and the Future of Libraries: (Mostly big data discussion)

3.  Linked Data:  What is means for the future of libraries

4.  Bibliographic Framework Initiative (BIBFRAME)

Thursday, 14 February 2013

Metadata Update #13 Identifiers

So with all of the excitement of implementing RDA, holidays, lots of snow and going to ALA MidWinter, the Monday Morning blog is getting a little behind!  Better late than never.

Today's update is on the role of identifiers in our work.  Here's a nice little article from a Scientific American blogger than briefly discusses why identifiers in general are useful:

We already know how handy ISBNs and ISSNs are.  In an electronic environment where websites and publishers change from time to time, the introduction of the DOI has been a significant advance.  I wish that a field for the DOI would be introduced into MARC as a core element (where a DOI for an electronic document exists) and that it would be used consistently. 

Another type of identifier I wanted to talk about is identifiers for persons, places, organizations, meetings, etc.  In traditional terms, we think of these as authorized headings.  I was interested to hear at ALA about the various ways people are using the VIAF (Virtual International Authority File).  Even Wikipedia is using it!  I think that we still have a way to go in terms of developing an internationally recognized and used authority file but I thought that it was a fascinating development to hear that authority files are being using on the web, outside of MARC and beyond the traditional library context.  Finding information got just that much easier....  I’m all for that.

One last thing that I found out at ALA.... people in private industry are head hunting metadata librarians.  Yikes, if I were an American and could just pack up and move to a major U.S. city, I could be working for a large multi-national right now.  That was a bit of a shock.  It got me to thinking....  There is value in standards based metadata.  People being able to find things means money to big corporations and this is becoming increasingly true as more people shop online or do other business in a virtual context.  That is certainly something to think about.