Center for Informatics Research in Science and Scholarship

Graduate School of Library and Information Science
Center for Informatics Research in Science and Scholarship
University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

e-Research Roundtable

Next Meeting

  • Next ERRT TBA

About e-Research Roundtable

  • The e-Research Roundtable (ERRT) is a CIRSS research study group focusing on information problems in the curation and integration of digital research data and the development of research cyberinfrastructure more generally.

    The ERRT will meet on most Wednesdays when classes are in session. Meetings will be held from 12:30 - 2:00 in 242 LISB unless announced otherwise.

    The ERRT is open to researchers, faculty, staff, students and others who are interested in e-Research issues. It is a very informal exchange around participants' research activities and open problems and advances in the field.

    Announcements and meeting reminders for the ERRT are distributed via a mail list. To subscribe to the list, please visit the ERRT Mail List Web page.

    If you have any questions regarding ERRT, please contact Janet Eke at 217-333-4701 or jeke@illinois.edu.

Future Meetings

  • Future ERRT TBA

Archived Meetings

  • May 2, 2012--Modeling User Searching Behaviors and Search Assistance Usage via Transaction Logs
    Session Leader: William Mischo, Mary Schlembach, Joshua Bishoff (Grainger Engineering Library, UIUC); Elizabeth German (University of Houston)
    Archive: slides, audio
    Location:341LISB
    Description:In order to optimize search and discovery services, it is important to develop evidence-based models of user information seeking behaviors within distributed retrieval environments. While a large number of user information seeking studies have been performed, our knowledge of user searching patterns, particularly in online catalogs (OPACs), is incomplete and often contradictory. The University of Illinois at Urbana-Champaign Library has been collecting custom transaction log data from their main gateway interface and its underlying Easy Search (ES) federated search system since 2007. ES provides contextual and adaptive search assistance mechanisms that present the user with search modification and reformulation suggestions and perform additional target searches in the background. The Illinois team performed a detailed analysis of the project’s custom transaction logs collected over the Fall 2010 and Spring 2011 semesters. This analysis looked at approximately 1.4 million user searches and over 1.5 million user target clickthroughs. This analysis has revealed rich information on user search characteristics, search assistance usage, and user clickthrough actions. This transaction log analysis provides several implications for web-scale discovery system design.
    Among the findings: users of the Illinois gateway enter an average of 4.33 terms per search query – much higher than previous studies; 48.05% of the search sessions contain more than one search term or a combination of search terms and search assistance actions – also higher than other studies; and while 66% of all searches originate as default keyword searches, the percentage of known-item or specific title/author searches exceeds 51% of the search queries. Known-item searches are performed in almost 55% of the search sessions. In addition, the ES search assistance suggestions and custom links are well-accepted by users; in 32.45% of all search sessions and 58% of the sessions with more than a single search query, users employed one or more search assistance operations. The logs also revealed that users are entering complete or partial journal titles and then clicking through at a high frequency into an A-to-Z e-journal list link and that the exact phrase/title words added links shown in selected results displays are heavily used. Users click on the presented journal title link 21.41% of the time that they are suggested and in over 6.86% of all search sessions. In addition, the journal title search option tab constitutes over 12% of the searches within the gateway. The use of publisher e-book matches is also high -- with clickthroughs into all the e-book content targets totaling 9.31% of all result target clicks and taking place in 11.36% of all search sessions.
    This study will be published shortly. Full citation and an advance copy is posted here, under Resources.
    Resources:William H. Mischo, Mary C. Schlembach, Josh Bishoff, Elizabeth German, "User Search Activities within an Academic Library Gateway: Implications for Webscale Discovery Systems", in Planning and Implementing Resource Discovery Tools in Academic Libraries, edited by Mary Popp and Diane Dallis, IGI Global, 2012 (in press), 22 ms pages.
  • April 25, 2012--Bandits and Browsing: Data Mining and Network Analysis for Library Collections
    Session Leader: Harriett Green, English and Digital Humanities Librarian and assistant professor of library administration, University Library, Kirk Hess, Digital Humanities Specialist, University Library, Richard Hislop, Economics PhD candidate, University of Illinois at Urbana-Champaign
    Archive: slides, audio
    Location:242LISB
    Description:Our project proposes to conduct network analyses and data regressions on a data set of 22 million items indexed in the University of Illinois Library catalog. Based on the network analyses resulting from this project, we will begin development for an enhanced recommender system for library catalogs and digital libraries that retrieves richer search results from a library collection search based on network analysis of subject relevancy, circulation data of items, and usage data for items that share interrelated subjects. In order to build this test bed for algorithm and functionalities in the recommender system, we are utilizing the advanced computing resources of XSEDE to develop self-optimizing search algorithms and network analyses that would run against the bibliographic and catalog data in the University of Illinois library catalog and digital library indexes. We have created initial prototypes of search algorithms, topic analyses, and network analyses using the English literature collection's 40,000 item sample set. A core algorithm that we initially developed identifies items that are infrequently used, yet have a high degree of topical relevance to other heavily used works in a collection. Another search algorithm we have developed identifies subject relevancy between items in a library collection, through a multi-faceted approach incorporating subject heading correlations, user behavior in circulation transactions, and probability of circulation usage. Based on these and other analyses conducted on the sample data set, we will test the scalability of these search algorithms and network analyses by expanding them to run against full 22 million-item set of the University of Illinois Library catalog data on the Blacklight cluster in XSEDE.
  • April 18, 2012--Early Career Reuse of Quantitative Social Science Data
    Session Leader: Ixhcel Faniel, post-doc researcher at OCLC
    Archive:
    Location:341LISB
    Description: Large scale data reuse over the long term depends in part on uptake from early career researchers who are still learning the norms, conventions, and practices of their discipline. Yet we know less about the data reuse practices of these novice data consumers. Such knowledge is important for data repositories. This is particularly true for data repositories seeking the Trustworthy Repository Audit and Certification (TRAC) or the Data Seal of Approval that must demonstrate an understanding of their designated communities. This talk discusses preliminary findings describing how novice social science researchers make sense of quantitative social science data.
    Bio: Ixchel joined OCLC Research in 2011 as a Post-Doctoral Researcher on Lynn Silipigni Connaway's team. In this position she is working on projects associated with the User Behavior Studies & Synthesis activities theme. Ixchel also has an interest in the Research Information Management activities theme. She is currently studying the reuse of university research in industry. She is also Principal Investigator on the DIPIR (Dissemination Information Packages for Information Reuse) project, an IMLS-funded study. Along with Elizabeth Yakel at the University of Michigan (Co-Principal Investigator) and collaborators at The Inter-university Consortium for Political and Social Research, the University of Michigan Museum of Zoology, and Open Context, Ixchel is studying data reuse in three academic disciplines. The team plans to identify how contextual information about the data that supports reuse can best be created and preserved. Prior to joining OCLC Research, Ixchel was an Assistant Professor at the School of Information, University of Michigan. She earned an MBA and a Ph.D. in Business Administration (Information Systems) from the Marshall School of Business, University of Southern California. She also has a B.S. in Computer Science from Tufts University and has worked at Andersen Consulting (now Accenture) and IBM.
  • April 11, 2012--Recap of Research Data Access and Preservation Summit 2012
    Session Leader: Karen Wickett, Andrea Thomer
    Archive: slides, audio
    Location:341LISB
    Description:Wickett and Thomer will discuss the recent ASIS&T RDAP Summit held in New Orleans on March 22 and 23. We will discuss the major themes around data management that were discussed at the summit, including interoperability, policy development, strategies for deploying services, data citation, and education.

  • March 7, 2011--"Introduction to the Europeana Data Model: Framework & Requirements"
    Session Leader:Carlo Meghini, Institute of Information Science & Technology, Italy
    Archive: audio, elluminate recording
    Location: 341LISB
    Description:Description: The Europeana Data Model (EDM) is the data framework within which all Europeana data is ingested, managed, and published. This talk describes the fundamental parts of EDM and the EDM requirements that data providers need to meet to successfully produce EDM compatible data. The talk will culminate with a discussion of how EDM can accommodate collection-level representations. * Bio: Carlo Meghini is a prime researcher at ISTI, working in the area of digital libraries and digital preservation. In the area of digital libraries, he has been involved in the DELOS Network of Excellence in Digital Libraries, contributing to the DELOS Reference Model for Digital Libraries; he participated in the FP6 Integrated Project BRICKS, aiming at developing a distributed Digital Library Management System, in the DL.org coordination action, and is involved in the making of Europeana since 2007, through the EDLnet, Europeana version 1.0, Europeana version 2.0 and ASSETS Best Practice Networks. In the area of digital preservation, he has been involved in the CASPAR project, an FP6 Integrated Project aiming at developing an OAIS-based architecture for preservation; he has also taught the OAIS Reference Model in several events organized by the CASPAR Project in conjunction with PLANETS and DPE Network of Excellence in Digital Preservation. For more information:http://www.nmis.isti.cnr.it/meghini/.
    Resources: Suggested readings: Europeana documentation, including Primer, is available at: http://version1.europeana.eu/web/europeana-project/technicaldocuments/
    Paper: The Europeana Linked Open Data Pilot; Haslhofer, Bernhard and Antoine Issac. Proc Int'l Conf. on Dublin Core and Metadata Applications 2011. dcpapers.dublincore.org/index.php/pubs/article/view/3625
  • February 29, 2012--National Parks, Biological Collections & Natural Histroy Museum Informatics
    Session Leader: Andrea Thomer, GSLIS MS student
    Archive: audio
    Location: 242LISB
    Description:According to the 2008 IWGSC report, "Scientific Collections: Mission-Critical Infrastructure for Federal Science Agencies," object-based scientific collections contain billions of biological specimens that comprise a "vital research infrastructure" capable of supporting research in everything from climate science to agriculture to paleontology. However, this "infrastructure" is poorly funded, often poorly described, and dispersed throughout the country in collections at National Parks, in universities and in museums. Andrea spent last summer working at the Petrified Forest National Park (PEFO) as a Biological Science Technician and learning about this dispersed infrastructure first hand. In this presentation, she will present an overview of her work at PEFO and talk about some of her research interests in the description, meaning and content of natural history museum collections as they intersect with LIS.
    Resources: National Science and Technology Council, Committee on Science, Interagency Working Group On Scientific Collections. (2009). Scientific Collections: Mission-Critical Infrastructure for Federal Science Agencies A Report of the Interagency Working Group on Scientific Collections. Washington, D.C. Retrieved from www.whitehouse.gov/sites/default/files/sci-collections-report-2009-rev2.pdf
  • February 22, 2012--An Introduction to the Campus Shared Computing Cluster
    Session Leaders: Brynnen Owen & Jennifer Anderson, GSLIS IT
    Archive: audio
    Location: 341LISB
    Description:Brynnen and Jennifer will provide an overview of the University of Illinois campus cluster computing resources (https://campuscluster.illinois.edu/), including the basics of using parallel computing and how to start jobs on the cluster. Brynnen and Jennifer are also happy to provide additional information and answer questions on other campus computing resources.
    Resources:
  • February 15, 2012--Exploring Problems of Data Mobility, Sharing and Reuse
    Session Leader:Rob Procter, University of Manchester, UK (via Skype in room 242)
    Archive: slides, audio
    Location: This session will take place in 242LISB.
    Description:The e-Research vision sees increased data re-use and sharing as key to future scientific advances. This talk explores some problems that may make achieving this difficult in practice. It draws on experiences in two related projects involving the creation and curation of an archive of digital mammograms ando its use in training.
    Bio: Rob Procter is Professor and Director of the Manchester eResearch Centre(MeRC) at the University of Manchester, having previously been Director of Research at NCeSS. Before that, he was leader of the Social Informatics Cluster, a multi–disciplinary research group within the School of Informatics, University of Edinburgh. His research focuses on socio–technical issues in the design, implementation, evaluation and use of interactive computer systems, with a particular emphasis on ethnographic studies of work practices, computer-supported cooperative work and participatory design. http://www.merc.ac.uk/?q=rob
    Resources:
  • February 1, 2012--GSLIS work force study requirements: the past, present and future
    Session Leader: Cathy Blake
    Archive: slides, audio
    Location: This session will take place in 242LISB.
    Description:There was (perhaps) a time when the training that you received during your first degree was sufficient to sustain your entire career. The information age has drastically altered the time-frame for re-tooling, particularly in the field of information library science. The goal of this e-research roundtable is to identify GSLIS infrastructure requirements that will (a) provide a resource for GSLIS students to align their interests with positions and curricula and (b) enable longitudinal analyses of workforce needs and issues. To achieve this goal will require a collective GSLIS effort that includes faculty, staff, and students. Dr. Blake will lead this working session by providing a framework that would enable us to leverage text mining for the initial activities. Faculty, staff, and students who have conducted work-force analyses are particularly welcome.
    Resources:
  • November 30, 2011--Mining Biomedical Multiple-Ontology Patterns
    Session Leader: Samir E. AbdelRahman  (with Cathy Blake)
    Archive: audio
    Location: This session will take place in 242LISB.
    Description:Ontologies provide a powerful way to organize information and understand the world in which we live. Despite their usefulness, articulating a complete and consistent ontology is difficult and time-consuming. Moreover, knowledge evolves and the effort required to keep an ontology current and thus relevant is often underestimated. Our goal in this project is to infer new ontological concepts based on an existing ontology and full text documents. In this presentation we focus on the Unified Medical Language System (UMLS), that maps biomedical concepts to surface level features in text (words). For example, the kidney cancer concept in the UMLS includes ‘cancer of kidney’ and ‘kidney cancer’ phrases, which can be framed as a text transformation X of Y = YX. We also explore word patterns and transformations between parent and child relationships in the UMLS. For example kidney cancer and breast cancer are both children of cancer and take the form <body part> cancer. We report frequent patterns in the UMLS and preliminary results on text.
    Resources:


  • November 9, 2011--IMLS DCC and the Europeana Data Model: Convergences and next steps
    Session Leaders: Carole Palmer, Katrina Fenlon, Allen Renear
    Archive: Audio
    Location: 242 LISB
    Description: IMLS DCC is collaborating with Europeana, a massive international digital cultural heritage aggregation. This session will be a conversation about next steps toward meeting the challenges to interoperability between these two aggregations, particularly addressing the Europeana Data Model and its potential for accommodating IMLS DCC collections.

  • October 19, 2011--A Cross-Disciplinary Typology of Topical Relevance Relationships and Its Implications 
    Session Leader: Xiaoli Huang, Assistant professor, Business School, Sun Yat-sen University, China

    Archive: audio

    Location: This session will take place in 341LISB.

    Description:This presentation reports on a cross-disciplinary inquiry into topicality and relevance, involving an in-depth literature analysis and an inductive development of a faceted typology (containing 227 fine-grained topical relevance relationships arrayed in three facets and 33 types of presentation relationships). This inquiry reveals a large variety of topical connections beyond topic matching (the common assumption of topical relevance in the field), renders a closer look into the structure of a topic, and induces a generic topic-oriented information architecture that is meaningful across topics and domain boundaries. The findings from the analysis contribute to the foundation work of information organization, metadata development, intellectual access /information retrieval, and knowledge discovery.

    Resources:


  • October 5, 2011 -- The Prototype Open Emblem Book Portal: Leveraging the Emblem Community’s Spine Metadata Schema
    Session Leader: Tim Cole, Myung-Ja Han, Jordan Vannoy

    Archive: Audio

    Location:This session will take place in room 109 LIS .

    Description:In 2003 Stephen Rawles (Glasgow University Centre for Emblem Studies) outlined an approach for creating metadata records for digitized emblem books in a Web-published paper entitled, A Spine of Information Headings for Emblem-Related Electronic Resources [1]. As compared to many other classes of retrospectively digitized texts, digitized emblem books offer added challenges for description. A genre of European literature popular between 1530 and 1750, emblems unite three elements—a motto, a picture, and poetry. These three components create puzzles that carry metaphors and messages for readers. Individual books may contain only a handful of emblems or may contain more than 1,000 emblems. To support scholarship, emblems (as well as emblem books) need to be discoverable, retrievable and citable individually, further complicating issues of descriptive granularity. Rawles’s paper became the foundation for the Spine metadata XML schema [2] created by Thomas Stäcker of the Herzog August Bibliothek (Wolfenbüttel, Germany), with subsequent modifications and additions by Tim Cole and Myung-Ja Han. As part of a NEH/DFG funded grant project [3] (Mara Wade, U.S. PI), Cole, Han, and Jordan Vannoy have created a functioning prototype of a new Open Emblem Book Portal [4]. The new design leverages unique features of the Spine schema and is intended to be responsive to the evolving needs of the Emblem Studies community. Scholars expect more of digital libraries today than in years past. For digitized special collections materials such as our digitized emblem books collection, this has required the UIUC Library to reexamine our digital content processing workflows and rethink how we provide access to such digitized special collections content. For this project, our workflows have become more in keeping with standard Semantic Web and Linked Data principles, and now make use of globally-scoped, persistent and precise identifiers for our digitized emblem resources. This roundtable will start off with a reprise of a 20-minute presentation given by Cole and Han at the recent triennial meeting of the Society of Emblem Studies in Glasgow. Cole, Han and Vannoy will then lead an in-depth discussion of the Spine schema and the design choices implemented in the Open Emblem Book Portal to date.

    Abstract Notes:

    [1] http://www.ces.arts.gla.ac.uk/html/spine.htm
    [2] http://diglib.hab.de/rules/schema/emblem/emblem-1-2.xsd
    [3] http://emblematica.grainger.illinois.edu/
    [4] http://emblematica.grainger.illinois.edu/OEBP/UI/SearchForm

    Resources:

    Digital Collections and Management of Knowledge: Renessance Emblem Literature as a case study for the digitization of rare texts and images. 2004. Mara R. Wade (ed.), DigiCULT. Available online: http://www.digicult.info/downloads/dc_emblemsbook_lowres.pdf

    Iconclass, a multilingual classification system for cultural content: http://www.iconclass.org/ and http://www.iconclass.org/rkd/9/

    Sample metadata record in Spine and METS:

    Spine metadata record

    METS metadata record
  • September 28, 2011--The DLF-DCC Linked Data Prototype for the Digital Public Library of America
    Session Leader: Richard Urban

    Archive:

    Location: This session will take place in 242 LISB.

    Description: This ERRT session will review the efforts to translate IMLS Digital Collection and Content collection-level XML records into Linked Data.

    • Mapping from XML to RDF/XML syntaxes
    • Analyzing Collection-level metadata patterns using SIMILE Gadget
    • Reconciling values against Linked Data Vocabularies, such as the LoC Thesaurus of Graphic Materials and
    Freebase Locations using Google Refine
    • Providing different serialization formats using TALIS Morph

    While much of the session will be devoted to pragmatic how-to topics,  we will conclude by relating these activities to ongoing research agendas on collection-level description, collection development, and the logical forms of metadata records.
  • September 14, 2011--No ERRT meeting this week
    ERRT members may be interested to attend the following GSLIS event:

    History Salon: The Database and its Discontents
    Noon-12:50pm, room 131 LIS

    In this salon, Bonnie Mak will share some thoughts about the curious slippage between facsimile and fact in the database.  More information will be posted at: http://www.lis.illinois.edu/events/2011/09/14/history-salon.
    Archive:
    Description: Overlooked in the excitement about the possibility of “digitizing everything” is the day-to-day manual —and often menial —labour demanded of volunteers, student workers, devoted enthusiasts, scholars, and professionals. Theirs is important work, and supports the lofty promises to save the cultural heritage of the world.   This paper sheds light on some of the very human efforts that underpin the production of digitally-encoded materials. Drawing upon Bruno Latour and Steven Woolgar’s work on the social construction of scientific fact, the paper scrutinizes the processes by which information is generated and marketed in the digital environment. An analysis of materials from the database, “Early English Books Online,” among others, will help lay bare the dynamics by which the status of transcriptions and facsimiles of texts and books shifts from interpretation to fact as they are recontextualized and remediated online.  By investigating the production and circulation of digitally-encoded materials with this critical lens, the paper seeks to develop a richer understanding of information and knowledge in the twenty-first century.


  • September 7, 2011--"Meditations on the Logical Form of a Metadata Record"
    Session Leader: Allen Renear, Richard Urban, Karen Wickett

    Archive: Audio

    Location: This session will take place in109 LISB.

    Description: Open linked data and semantic technologies promise support for information integration and inferencing. But taking advantage of this support often requires that the information currently carried by ordinary "colloquial" metadata records be made explicit and available for computer processing.  Given the fairly simple structured nature of metadata records this looks easy to do. Turns out though that it is not at all easy to do. A number of very fundamental puzzles arise, some of them related to identifier elements, others are issues with knowledge representation in general.  Although related problems have been studied here at GSLIS for some time, the current systematic development is largely new -- its first exposure was just a few weeks ago as a "Late Breaking" report at "Baliage: The Markup Conference" (Montreal). It is also very much a work in progress (with suspected flaws) and so this is an invitation to participate in evolving this account of what metadata records really are, and how they do what they do.


    Resources:


  • August 31, 2011 -- ERRT Planning Session
    Session Leader: Carole Palmer

    Archive:

    Location: This session will take place in 242 LISB.

    Description: This ERRT meeting will be a planning session. Please bring your ideas for sessions that you would like to see on the schedule for this year. This includes new topics and ideas, as well as previously suggested sessions that have not yet made it on the schedule, and updates to previous sessions.



  • May 25, 2011 -- Linked Open Data for Libraries
    Session Leader: Richard Urban

    Archive: Audio (mp3)

    Location: This session will take place in 341 LISB.

    Description: Richard Urban will participate in the Linked Open Data for Libraries Archives and Museums Summit in June. This ERRT will introduce attendees to the Linked Data movement and the W3C Linked Library Data Incubator. At the Summit Urban will be leading a discussion about LOD and current approaches for sharing metadata for cultural heritage collections through the Open Archives Initiative - Protocol for Metadata Harvesting.

    Resources:

    LOD-LAM Summit

    W3C Linked Library Data Incubator

    Tim Berners Lee - Linked Data

    Haslhofer, B. & Schandi, B. (2010) Interweaving OAI-PMH data sources with the linked data cloud. International Journal of Metadata, Semantics and Ontologies 5(1), pp. 17-31

  • May 11, 2011 -- Europeana Data Model
    Session Leaders: Katrina Fenlon and Peter Organisciak

    Archive: Audio (mp3)

    Description: With more than 10 million items, Europeana is Europe's largest aggregation of digital cultural heritage resources from libraries, archives, and museums. This session will explore the Europeana Data Model, a new proposal for structuring the data that Europeana will be ingesting, managing and publishing. The Europeana Data Model is designed to replace the Europeana Semantic Elements (ESE), the basic data model that Europeana began life with. Each of the different heritage sectors represented in Europeana uses different data standards, and ESE reduced these to the lowest common denominator. EDM reverses this reductive approach and is an attempt to transcend the respective information perspectives of the sectors that are represented in Europeana – the museums, archives, audiovisual collections and libraries. EDM is not built on any particular community standard but rather adopts an open, cross-domain Semantic Web-based framework that can accommodate the range and richness of particular community standards such as LIDO [LIDO] for museums, EAD1 for archives or METS2 for digital libraries.

    Resource: Europeana Data Model primer

  • May 4, 2011 -- What's in a name? Problems with Relationships in FRAD
    Session Leader: Liza Coburn

    Archive: Audio (mp3)

    Location: This session will take place in 341 LISB.

    Description: FRAD (functional requirements for authority data), a product of the FRANAR working group, is an extension of FRBR. The goal of a project undertaken last fall by University of Illinois Library Senior Coordinating Cataloger Qiang Jin and GSLIS student Liza Coburn has been to explain FRAD through entity-relationship diagramming, the way that Robert Maxwell did with FRBR (FRBR: A Guide for the Perplexed, 2009).

    Along the way they have discovered some problems with the FRAD model, and these problems will be the focus in this session of ERRT. In a deviation from the usual ERRT format, Coburn will present a brief introduction to FRAD (with the hope that participants will be able to review the model documentation on their own, ahead of time) and the project, the problems encountered, and then will open it up for discussion to see what we can come up with.

    Resource:

    Functional Requirements for Authority Data (FRAD) A Conceptual Model. Final Report. December 2008 IFLA Working Group on Functional Requirements and Numbering of Authority Records (FRANAR).

  • April 27, 2011 -- Units, measures, and physical quantities in WolframAlpha
    Session Leader: Michael Trott (Content manager for physics at Wolfram|Alpha)


    Description: All quantitative measurement values come with units (like meters, kilograms, pascals, volts, ...) . In addition to the modern SI, there are thousands of different units in use, sometimes for historical, sometimes for geographic reasons. Recognizing units and converting between them is very important for dimensional calculations, data statistics, and more. The structure of the unit system of Wolfram|Alpha and the statistics about the use of units will be discussed.

  • April 13, 2011 -- Enabling Long-Term Access to Born-Digital Materials on CD-ROMs: Migration, Emulation, and Imperative to Pool Technical Knowledge
    Session Leader: Geoffrey Brown, Professor of Computer Science at the School of Informatics and Computing, Indiana University


    Description: For the past 20 years, CD-ROMs have been the primary media for distributing key economic, scientific, environmental, and societal data as well as educational and scholarly work. Indeed, 10,000's of titles have been published including thousands distributed by the United States and other governments. Yet no viable strategy has been developed to ensure that these materials will be accessible to future generations of scholars. In the short term, these materials are subject to physical degradation which will make them ultimately unreadable and, in the long-term, technological obsolescence will make their contents unusable.

    The diaries of H.R. Haldeman, Richard Nixon's chief of staff, were published in their entirety on CD-ROM, but only in abridged form on paper. References by Haldeman to Mark Felt, who was unveiled as the Watergate source, appear only on the CD-ROM version. This CD-ROM no longer operates in modern Windows environments, but can be accessed, with some effort, in an emulation environment. In other cases, the files on a CD-ROM can still be accessed, but may be in obsolete formats. Finally, many publications of government agencies are available only for local use in a few libraries.

    I will discuss two aspects of our work in digital preservation: the creation of a browsable networked archive of the approximately 5000 CD-ROMs published by the United States Government Printing Office and the development of emulation technologies to enable future scholars ready access to materials such as the Haldeman diaries.

    The goals for this roundtable are to discuss the limits of the available technological solutions, the social implications their implementation, and the legal constraints on deploying them.



    Resources:

    Kam Woods and Geoffrey Brown. Creating Virtual CD-ROM Collections

    Stuart Granger. "Emulation as a Preservation Strategy".

    Copyright Law Section 108

  • March 30, 2011 -- The Digital Public Library of America Initiative: Considering Content and Scope
    Session Leader: Carole Palmer (Director of CIRSS)

    Archive: Audio (mp3), Slides

    Description: The Digital Public Library of America (DPLA) initiative began in December 2010 with support from the Alfred P. Sloan Foundation. This ERRT session will provide an overview of the initiative and the first working meeting held at Harvard on March 1st on content and scope issues. We will discuss themes that emerged from the meeting and questions about Europeana as a model for DPLA and possible roles for our Digital Collections and Content project and the inclusion of the aggregation in DPLA.

    Resources:

    DPLA Wiki: Please review the Content and Scope section, and the workshop links, in particular. Workshop participants are listed here.

    See also a recently released Concept Note, an outcome of the March 1st meeting.

    See the main DPLA website at the Berkman Center for Internet & Society at Harvard for additional information and context.

  • March 16, 2011 -- Progress Report: Revisiting the Dublin Core 1:1 Principle
    Session Leader: Richard Urban

    Archive: Audio (mp3), Slides

    Description: The Dublin Core 1:1 Principle exhorts metadata creators to create descriptions that describe one, and only one resource. But how is that that metadata describes anything at all, let alone one and only one thing? This session will explore how traditional puzzles about description and reference help us understand 1:1 Principle violations.

    Resources:

    MILLER, S.. The One-To-One Principle: Challenges in Current Practice. International Conference on Dublin Core and Metadata Applications, North America, 0, sep. 2010. Available at: http://dcpapers.dublincore.org/ojs/pubs/article/view/1043. Date accessed: 05 Mar. 2011.

    Ludlow, Peter, "Descriptions", The Stanford Encyclopedia of Philosophy (Spring 2011 Edition), Edward N. Zalta (ed.), forthcoming.

  • March 2, 2011 -- Disciplinary Culture And Interoperability, An Incompatible Mix?
    Session Leader: Carl Lagoze, Associate Professor of Information Science at Cornell University

    Archive: Slides, Audio (mp3)

    Location: This session will take place in 341 LISB.

    Description: interoperability: A key enabler of cyberinfrastructure development is the ability to discover and deploy functionality that leverages commonalities amongst the practices of scientists in diverse fields, thereby allowing data sharing and other collaborative activities amongst them. However, there is good evidence from the literature that individual disciplinary cultures are deeply culturally embedded and based on the number of factors including the nature of the research, the economic value of the research products, and in some cases dysfunctional, historically-based path dependencies. Our own work examining the research practices and collaborative patterns of chemists and physicists has shown strong evidence of this. Designers and researchers of cyberinfrastructure are faced with two unpleasant alternatives. Ignore aspects of these disciplinary idiosyncrasies and possibly create cyberinfrastructure that its target communities resist. Or, accommodate these differences by creating lowest common denominator cyberinfrastructure that fails to provide sufficient functionality to really facilitate new scientific practices. These are some of the questions we face the Data Conservancy project, which is funded by the National Science Foundation to research, prototype, and possibly develop new cyber infrastructure for data Curation. I certainly don't know the answers to these questions and look forward to a stimulate discussion on the best way to approach this problem.

    Background Material:
    • P.N. Edwards, S.J. Jackson, G.C. Bowker, and C.P. Knobel, Understanding Infrastructure: Dynamics, Tensions, and Design, National Science Foundation, 2007.
    • C.L. Palmer and M.H. Cragin, Scholarship and disciplinary practices, Annual review of information science and technology, vol. 42, 2008, p. 163212.
    • T. Velden and C. Lagoze, Communicating Chemistry, Nature Chemistry, vol. 1, 2009.
    • T. Velden, A.-ul Haque, and C. Lagoze, A new approach to analyzing patterns of collaboration in co-authorship networks: mesoscopic analysis and interpretation, Scientometrics, Apr. 2010.


  • February 16, 2011 -- Technology's positive impact on the cultural heritage of Native American tribes
    Session Leader: Biagio Arobba

    Archive: Audio (mp3)

    Description: At this session, Biagio Arobba will introduce his background in semantic middleware and Native American communities, discuss his heritage (and answer any questions), and explain his interest in social media, the Web, and mobile devices and why he believes they will help Native American communities with culture, language, and heritage preservation.

    Semantic middleware, originally developed for e-science, has the potential to be transformational for Native American communities. We all know (or assume) that Native American languages are disappearing. You might be surprised to learn that over half of the pre-colonial Native American languages in the United States are still spoken today; but, that number is changing dramatically. Many Native American people are concerned with the disappearance of their spoken languages, and there is a desire for ... something ... to help the people in Native American communities, and local government organization, increase fluency among their peers.

    There are both problems and opportunities. For example, many places in the United States are resistant to multi-lingual education. Then, working with local government and tribal agencies can be a nightmare. On the other hand, tribes in the United States have far better access to digital media and the Internet than would a community in the Amazon rain forest. Additionally, Native American children in either tribal communities or communities with relatively high Native American populations are drawn to social media, gaming, and mobile devices. The majority of elderly and young parents want the digital age for today's generation.

    Also, there are lots of research and methods for teaching major world languages, but many of these same techniques aren't quite right for smaller minority language communities. In recent years, a growing number of tools have been popping up across the Internet (possibly because more attention is being paid to minority languages, or simply because computers, best practices, and the Internet have reached the necessary critical mass to make this possible). Mr. Arobba in his work looks for any way to reduce the need for reinventing the database for every application, to reduce time-to-deployment, and to make user interfaces easier for everyday users.

    Resources:

    Arobba, B., R.E. McGrath, J. Futrelle, and A.B. Craig, "A Community-Based Social Media Approach for Preserving Endangered Languages and Culture" In: "The Changing Dynamics of Scientific Collaborations" workshop at 44th Hawaii International Conference on System Sciences, January 3, 2011.

    Live and Tell
  • January 26, 2011 -- Report on IDCC 2010
    Session Leaders: Tiffany Chao, Liza Coburn, Simone Sacchi, Nic Weber, Laurence Cook, and Trevor Munoz

    Archive: Audio (mp3)

    Description: These student participants in the IDCC 2010 will report out on the conference, summarizing the workshops and the other conference sessions that they attended.



  • January 19, 2011 -- Briefings from the front: RDA testing at GSLIS
    Session Leaders: MJ Han and Kathryn La Barre

    Archive: Audio (mp3), Slides

    Description:
    MJ Han and Kathryn La Barre will be discussing preliminary results of the recently concluded RDA test practicum, and the experiences of the 3 instructors, 5 library faculty and 8 students who participated in the test. We will pay particular attention to MJ's experience creating RDA/Dublin Core records.

    Resources: If you want to know more about the test of RDA please view the first slides from this presentation. The later slides offer comparisons between the existing code AACR2 and RDA, changes to MARC, and preparation strategies.

    Judith Kuhagen & RDA: Resource, Description and Access Essentials
    • The U.S. National Libraries RDA Test Plan
    • Critical differences between AACR2 and RDA
    • Changes to MARC21
    • How to best prepare yourself, your colleagues, and your library

    Resources:

    Download presentation with speaker notes (ppt 3 MB)

    RDA bibliography (doc)

  • December 1, 2010 -- The Impact of Massive Data on Astronomy
    Session Leader: Robert Brunner (Astronomy)

    Archive: Slides

    Description:
    As we tackle ever more difficult questions, Astronomy is evolving from a data-poor to a data-rich scientific discipline. In this presentation, I will discuss the questions we are trying to address, introduce the projects and data that are being (or soon will be) produced, and present some of the challenges and opportunities that we now face.

    Resources:

    The Post-Singularity Future Of Astronomy: Astronomy could be the first discipline in which the rate of discovery by machines outpaces humans' ability to interpret it

    Next-generation astronomy

    We regret that the recording of Roberts session was lost due to technical difficulties.


  • November 17, 2010 -- Working with Supplementary Materials - data, software, scripts - to Dissertations and Theses
    Session Leader: Sarah Shreeves

    Archive: Audio (mp3)

    Description: Illinois has allowed electronic deposit of theses and dissertations since 2009 and is now mandating such deposit with Fall 2010. We now allow deposit of supplementary materials along side these ETDs and all of this material will appear in IDEALS. I will discuss the range of materials we're seeing and how we are approaching some of the stewardship issues. We can also discuss more generally the successes and obstacles to the ETD program.

    Resources:

    The IDEALS collection for theses and dissertations

    Illinois Graduate College Thesis Office Page

  • November 10, 2010 -- Europeana Semantic Elements: supporting cross-domain, European metadata exchange
    Session Leader: Katrina Fenlon

    Archive: Audio (mp3)

    Description: Europeana is Europe's multimedia, on-line library/museum/archive: an ambitious aggregation of digital resources from all heritage sectors in all 27 European Union member states. This presentation will introduce the Europeana Semantic Elements (ESE) version 3.3, the Dublin Core-based metadata set underlying the portal. ESE supports cross-domain metadata provision to the current version of the Europeana aggregation. The presentation will also make a very brief introduction to Europeana Data Model, the Semantic Web-based data model intended to replace ESE as the standard for description and exchange in the next release of the Europeana portal.

    Resources:

    Semantic Elements Specification, Version 3.3, 19/07/2010.

    Metadata Mapping & Normalisation Guidelines for the Europeana Semantic Elements, Version 2.0, 19/07/2010.

  • October 20, 2010 -- Linked Data Issues
    Session Leader: Joe Futrelle

    Archive: Audio (mp3), Prep Notes

    Description: Joe has been planning this session on Linked Data Issues based upon questions and issues submitted by ERRT members. It promises to be a very interesting session.

  • October 13, 2010 -- Describing artifacts: A look at the concepts of CDWA
    Session Leader: Peter Organisciak

    Archive: Audio (mp3), Slides

    Description: The description of artwork and other material culture carries with it a unique set of challenges. CDWA, represented in XML with the CDWA-Lite schema, is one framework for classifying such artifacts. We will look at the features of CDWA and discuss the principles that inform it. Finally we may consider varying definitions of art itself and the implications that the act of classification brings.

    Resources:

    Baca, Murtha. (Ed.) (2006) Cataloging cultural objects :a guide to describing cultural works and their images. Available as e-book from http://www.library.illinois.edu/

    J. Paul Getty Trust. Categories for the Description of Works of Art. Available at http://www.getty.edu/research/conducting_research/standards/cdwa/index.html

  • October 6, 2010 -- Glimpses of future research practice: a musical study
    Session Leader: David De Roure (Professor of e-Research, Oxford e-Research Centre, Oxford University )

    Archive: Audio (mp3), Slides, Demo

    Description: 10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artifacts, both for 'big science' and the 'long tail scientist'. Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before? This talk will draw on examples in music information retrieval and linked data from the NEMA and SALAMI projects, together with glimpses of research from the myExperiment social website, to suggest we are now moving into the next (and very exciting!) phase of research practice.

  • September 29, 2010 -- Dispatches From the Field (Part 2)
    Session Leaders: Liza Coburn, Aaron Collie, Tracy Popp, Lynn Yarmey

    Description: GSLIS MS and CAS students will be presenting summaries of their data curation internship work. Each presentation will be followed by a brief roundtable discussion.

  • September 22, 2010 -- I Think Therefore I Am Someone Else: Understanding the confusion of granularity with Continuant/Occurrent and related perspective shifts
    Session Leader: Jim Myers

    Resources:

    Galton, A., Mizoguchi, R,: The water falls but the waterfall does not fall: New Perspectives on objects, processes, and events, Applied Ontology 4 71-107 (2009)

    Grenon, P., Smith, B., "SNAP and SPAN: Towards Dynamic Spatial Ontology", Spatial Cognition & Computation: An Interdisciplinary Journal, Vol. 4, No. 1. (2004), pp. 69-104.

    Description: Over the past few years, there has been a broad effort to define common requirements for provenance, to outline real-world use cases, to define core models of provenance, and to assess interoperability of existing systems. In these discussions, there has been recognition that there are a variety of levels of granularity and a variety of types of processes for which provenance is a critical enabler. Further, there has been a recognition that many use cases of interest require integration of provenance information across these dimensions. To a large extent, the issues involved in such integration has been viewed as simple matters of aggregation, i.e. requiring concepts such as "collections" of artifacts and composite processes. However, the need for constructs such as agents (as in the Open Provenance Model) hint at deeper issues related to the concepts of identity and distinctions between continuant and occurrent (or endurant and perdurant respectively), and of versions and replicas. This work develops a set of concrete examples where such issues arise in provenance, discusses the core conceptual distinctions involved, and postulates a basic mechanism for extending provenance models to enable integration across granularities and process types, recognizing the OPM "agent" concept as a special case.

  • September 15, 2010 -- ERRT Planning Session
    Session Leaders: Carole Palmer, Kevin Trainor

    Description: Please bring your ideas about roundtable sessions that you would like included on the schedule. This includes completely new ideas, as well as previously suggested sessions that have not made it onto the schedule.

  • September 1, 2010 -- Dispatches From the Field (Part 1)
    Session Leaders: Naomi Bloch, Ana Lucic, Trevor Munoz, Dana Muvceski, Gina Reis, Karen Wickett

    Description: GSLIS MS, CAS, and PhD students will be presenting summaries of their data curation internship work and conference workshop participation. Each presentation will be followed by a brief roundtable discussion.

  • April 14, 2010 -- The Claim Framework (Part 2)
    Session Leader: Cathy Blake

    Resources: http://bibapp.org/

    Description: [see Part 1 description]

  • April 28, 2010 -- BibApp project
    Session Leader: Sarah Shreeves

    Resources: http://bibapp.org/

  • March 10, 2010 -- The Claim Framework (Part 1)
    Session Leader: Cathy Blake

    Resources: Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles

    Description: Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.

  • February 24, 2010 -- Data is the network: link or die
    Session Leader: Joe Futrelle

    Resources: RDF FAQ; Linked Data; Science Commons; Audio (.mp3); Slides

    Description: In a world dominated by social networking and wireless communication, most scientific information remains stubbornly locked up in specialized databases, repositories and domain-specific applications. New strategies are needed to free all of this information from the rigid containers, frameworks and work processes in which it is born and increasingly dies. Can data be organized as an active, evolving, open network of heterogeneous concerns and affordances, free of the control of any single software agent or framework? Joe Futrelle will describe promising new opportunities making data radically portable and worthy of long-term preservation and access, drawing on several projects in the "semantic grid," e-science and digital preservation communities.

  • January 27, 2010 -- Joint Metadata Roundtable and E-Research Roundtable
    Session Leader: Oksana Zavalina; Kevin Trainor

    Resources: ERRT; MDRT

    Description: An informal discussion of our members' current research interests. This ERRT/MDRT joint planning meeting for will take place on Wednesday, January 20, from 12:30 to 2:00 pm in LIS341 (ISRL Fishbowl) on the third floor of GSLIS building.

  • October 28, 2009 -- Metadata for a web 2.0 software marketplace
    Session Leader: John Unsworth, Loretta Auvil
    Resources: Reading 1; Audio

    Description: The Mellon Foundation is interested in supporting the sharing of web services and academic software widgets, and they would like SEASR (the NCSA software environment for advancement of scholarly research) to be able to keep track of whose web services, software widgets, etc. are being used, by whom, in order that some system of professional credit and/or a system of exchange of value could be developed across the universities whose faculty and staff contribute to the system. This has a near-term practical possibility of implementation, as part of Project Bamboo (http://projectbamboo.org/).

  • October 7, 2009 -- NIF Resource Registry and Ontology
    Session Leader: Anita Bandrowski (NIF, UCSD)
    Resources: Slides; audio; Reading 1; Reading 2; Reading 3

    Description: The Neuroscience Information Framework (NIF) has a resource registry of over 2200 resources that include software tools, databases, atlases, services, teaching tools and other things that we deemed "interesting to neuroscientists". -- The main classes of metadata will be discussed including the data model and NIF's resource ontology, recently harmonized with the Biomedical Resource Ontology.

  • Sept 16, 2009 -- Introduction to the Neuroscience Information Framework (NIF)
    Session Leader: Anita Bandrowski (NIF, UCSD)
    Resources: Audio; Slides; ; NIF Web Site; NIF Federated Access Article

    Description: NIF is a dynamic inventory of web-based neuroscience resources, data, and tools accessible via any computer connected to the Internet. An initiative of the NIH Blueprint for Neuroscience Research, NIF advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.

  • July 01, 2009 -- Using Pliny to Annotate Digital Resources
    Session Leader: Tim Cole (UIUC Library); Yan Wang (GSLIS student)
    Resources: Reading 1; Reading 2; Reading 3; Reading 4 Reading 5

    Description: For our first roundtable related to the new Open Annotation Collaboration Mellon-funded grant project, we will examine John Bradley's PLINY annotation tool. In particular we will discuss how and to what extent PLINY can be used to perform some of the scholarly functions described in Renear, Allen H.; DeRose, Steve J.; Mylonas, Elli; van Dam, Andries (1999) _An Outline for a Functional Taxonomy of Annotation_. Yan Wang and Tim Cole will lead the discussion which will include demonstrations of PLINY.

  • June 10, 2009 -- Science and Sceptics: blogs, climate science and reproducible research.
    Session Leader: Dave Nichols
    Resources: Slides; Audio; Reading 1; Reading 2;

    Description: The scientific consensus on global warming is well known. Less widely known is the sceptical online community that attacks diverse aspects of this consensus. Irrespective of the validity of their criticisms of the science, their activities involve many interesting aspects of knowledge work and public policy, including: reproducibility of research, policies of academic journals, citizen science, freedom of information and scientific work practices.

  • May 27, 2009 -- What Defines a Data Community?
    Session Leader: Carole Palmer; Melissa Cragin
    Resources: Slides; Audio

  • May 6, 2009 -- Open Provenance Model
    Session Leader: Jim Myers; Joe Futrelle
    Resources: Slides; Audio; OPM Definition; Provenance Challenge Wiki; Whitepaper

    Description: The discussion will include a general overview of the technical scope of the Open Provenance Model, the international community and "Provenance Challenge" activities driving its development, and NCSA's provenance management technologies. While OPM has been driven primarily by scientific workflow interests, NCSA's interest is broader; the discussion will also include OPM's potential value in electronic notebooks/electronic records, community model validation and reference data development, 'active' curation, and long-term preservation.

  • April 29, 2009 -- SEASR Analytics via Zotero
    Session Leader: Loretta Auvil; Michael Welge
    Resources: Slides; Audio

  • April 15, 2009 -- NASA EOS Data Levels and Traditional Text Editing
    Session Leader: Allen Renear
    Resources: Slides

HomeRoundtables & Seminars › E-Research Roundtable
CIRSS
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
cirssinfo@cirss.lis.uiuc.edu | (217) 333-1980 | [fax] (217) 244-3302
I3