Engaging with the user community

In the process of identifying “designated communities” for the EPIC project we ran through a format review of all textual material in DSpace@Cambridge. Here is what we found:

.doc (175)/.docx (1)
.bbl/.tex (1)
.odp (1)
.pdf (3338)
.ps (4)
.sxi (2)
.sxw (26)
.bbl/.latex/.sty/.tex (5)
.rtf (35)
.txt (28814)

The .txt files are almost exclusively licence files so of no interest in the context of this project. We then ran a detailed analysis of both .doc/.docx and pdf files and reviewed the content. Our aim was to select communities who a) deposited file types that are or might soon be at risk, and b) who are potentially available for interview, c) cover a variety of disciplines. The list of communities we came up with was still very long so we decided to concentrate on one format (.pdf) and therefore make the interview list managable for this short project. We chose .pdf not only because it is the prevalent textual format in DSpace@Cambridge but also because we had recently run into problems with the format in another context. So we are interested to find out more.

List of “designated communities” to be interviewed:
Department of Physics
Department of Materials Science and Metallurgy
Faculty of Classics
Judge Business School
Department of Pure Maths and Mathematical Statistics
Department of Modern and Medieval Languages
Department of Archaeology
Department of Plant Sciences
World Oral Literature Project
Department of Engineering
Department of Applied Maths and Theoretical Physics

It was then for me to identify individuals and schedule interviews. I decided to contact the person responsible for the deposits – this in many cases was the author him/herself or in some cases the Head of Research Unit or Department. I also added a couple of supporting staff to the list to get a slightly different perspective. We considered conducting group interviews but decided that the short time of the project would not allow for this. The idea was that having more than one person thinking on complex issues would surely open debate and may result in more detailed answers, particularly if the members had differing roles – maybe an Academic, a Librarian, a Computing Officer etc. Something to consider doing in future.

I was very pleased with the response rate to my interview requests – half the addressees responded positively without me even having to remind them. There must be some interest in preservation planning!

Barbara Bultmann


About epiccambridge

Project Manager of the JISC funded EPIC Project
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Engaging with the user community

  1. Chris Rusbridge says:

    Barbara, this is interesting. I guess you’re treating the .cml files in the WWMM collection as “not text”, although in many ways they are (straight XML text files).

    However, I was mostly interested in your focus on .pdf as one of the “file types that are or might soon be at risk”, and also that you “had recently run into problems with the format in another context”. PDF has many faults, but I have only ever heard one person complaining about significant compatibility problems with PDF before. That person was at Cambridge, and I wasn’t at all sure he was right (in a preservation context, anyway).

    So, I wonder whether you could add, either in a comment or a subsequent post, what it is about PDF that you think makes it a format at risk (as opposed to a semantically opaque format for containing documents mostly oriented to printing rather than viewing on screen or processing)?

    • Chris

      Thanks for your comments. We are not suggesting that pdf is an “at risk” format but it is the most common textual format so it is being used for the interviews on significant properties.

      However we would like to standardise pdf ingests where possible on pdf/a and often encounter problems converting pdf to pdf/a so developing more detailed knowledge of the pdf format is a useful additional outcome for us.

      Dave Piper

  2. Hi Dave/Chris,

    We suffer from similar issues regarding pdf only deposits. The conversions to pdf/a does not always succeed which is frustrating to say the least. If only there was a foolproof batch conversion tool that could do this for us….!

    We are not comfortable with those files that we only hold as pdf and haven’t managed to convert to pdf/a. In many cases we get the Word doc originals from the depositor as well so we can store in an XML based document format (docx) as our preservation format which is something we are much happier with.

    We also have some old pdf deposits (dating back to late 90’s) where we need to start carrying out some migration because they are not working as well as they should (internal links between documents are failing and other issues that I can’t remember off the top of my head!) so this is something we need to address in our migration strategy.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s