Next Meeting
- May 4, 2012--Metrics for Openness
Session Leader: David Nichols, Department of Computer Science, University of Waikato, New Zealand.
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived:
Description: Metrics in information science have been largely based around publications and citations. The altmetrics proposal has highlighted that citations alone are inadequate for a holistic description of the impact of scholarly communication. This talk will present some further metrics to characterize research publications emphasizing open access and open science.
Bio: David Nichols is a faculty member in the Department of Computer Science at the University of Waikato, New Zealand. His research interests include digital libraries, usability and open source software. He co-authored (with Ian Witten and David Bainbridge) the textbook How to Build a Digital Library (2010, Second Edition) and is a member of the research group that develops Greenstone, a suite of software for building and distributing digital library collections.
About the CIRSS Seminar Series
The CIRSS Seminar Series will meet on most Friday afternoons when classes are in
session. Meetings will be held from 4:00 - 5:00 in 126 LISB unless annouced otherwise.
The aim of the CIRSS Seminar Series is to provide a relaxed, no-prep venue
for sharing our current research by presenting reruns of our recent conference presentations. All CIRSS faculty
and student affiliates are welcome to present, and session attendance
is open to the entire campus community.
Announcements and meeting reminders for the CIRSS Seminar Series will be distributed widely via the regular GSLIS
mail lists for faculty, students, and staff.
If you have any questions regarding the CIRSS Seminar Series, or,
if you would like volunteer to lead a future session, please contact Janet Eke at 217-333-4701 or
jeke@illinois.edu.
Future Meetings
- Future CIRSS Seminar Series TBA
Archived Meetings
- April 20, 2012--Probabilistic identification of author-inventors across PubMed and USPTO
Session Leader: Vetle Torvik
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived: audio
Description: Large-scale studies of named entities like people, organizations, genes, or drugs can suffer from severe bias introduced by name ambiguity. The assumption that a name uniquely identifies an entity is often made because disambiguation is time-consuming and error-prone when done manually, and simple computational approaches fail to capture the complexity of an identity that can also change over time. In an effort to enable unambiguous studies of biomedical scientists, their collaborative networks, and the flow of knowledge at the intersection of science and technology, this talk will focus on a project aimed at identifying the individuals who both publish and patent across two large bibliographic databases: PubMed and USPTO. At the heart of our approach is a multi-dimensional model that combines many explicit and implicit dimensions of similarity between the publication profile of an author and the patenting profile of an inventor in order to estimate the probability that the two profiles refer to the same individual. Our results show that, even though the overlap among the authors and inventors is relatively low, this approach can capture the great majority of the actual author-inventors with high precision.
- April 13, 2012--Knee-Deep in the Data: Practical Problems in Applying the OAIS Reference Model to the Preservation of Computer Games
Session Leader: Jerome McDonough
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived: audio, slides
Description:The Reference Model for an Open Archival Information System has been extraordinarily influential within the digital preservation community. The Preserving Virtual Worlds project explored the application of the OAIS Reference Model for the preservation of computer games, videogames and electronic literature within a research library setting. This paper identifies practical problems in determining the appropriate range of representation and context information needed to preserve computer games and discusses possible solutions to those problems.
Best paper award winner at the 2012 45th Hawaii International Conference on System Sciences (HICSS), held in Maui, January 2012.
- April 6, 2012--Characterizing Authorship Style Using Linguistic Features
Session Leader: Ana Lucic
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived: recording and slides from this presentation will be available after the Digital Humanities conference in late July.
Description: Of the five categories of features generally used in authorship attribution studies-lexical, character, syntactic, semantic, application-specific (Stamatos, 2009)-it is the lexical and character features that have been tested most extensively. Syntactic, semantic, and application-specific features, also called high-level features, have been tested with far less consistency. The process of extraction of high-level features is more complex, usually depends on the availability of a parser, and generally is not as reliable as the extraction of surface-level features. High-level features, however, provide a more complex view of the text and thus have the potential to be a stronger marker of difference between different authorial styles than surface-level features. In this study, we explore the potential of syntactic dependencies which hold between two words in a sentence to separate writing styles in an authorship attribution study which uses a collection of movie reviews downloaded from the film database service imdb.com in 2009 (Seroussi et al., 2010). Rather than focusing on syntactic dependencies on the level of the entire text of the review, we focus on personal names and on syntactic dependencies which occur immediately before and immediately after personal names. The references to personal names and the grammatical structures that govern these references thus become the key feature for this analysis. The experiment conducted using this feature revealed its high variability and a potential to be a strong marker of difference between authorial styles.
- March 16, 2012--Building Topic Models in a Federated Digital Library Through Selective Document Exclusion
Session Leader: Miles Efron
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived: audio, slides
Description: Building topic models in federated digital collections presents numerous challenges due to metadata inconsistencies. The quality of topical metadata is difficult to ascertain and is interspersed with often irrelevant administrative metadata. In this study, we propose a way to improve topic modeling in large collections by identifying documents that convey only weak topical information. These documents are ignored when training topic models. Their topical associations are instead inferred model training. A method is outlined for identifying weakly topical documents by defining runs of similar documents in a collection. In preliminary evaluation using a corpus from the Institute of Museum and Library Services Digital Collections and Content aggregation, results show an increase in coherence among words in topics. In showing this, we demonstrate that it may be beneficial to induce topic models using less, higher-quality data.
Originally presented at ASIST 2011 Annual Meeting (best paper award winner), with co-authors Peter Organisciak and Katrina Fenlon
New Orleans, LA, October 11, 2011
- March 9, 2012--Special two-hour, two part Seminar: Europeana Researchers Carlo Meghini and Antoine Isaac Part 1: A Model for Digital Libraries and its Translation to RDF
Session Leader: Carlo Meghini
Time: 3:00pm
Location: 126 LISB
Archive: audio, slides
Description: With the advent of the Web, the traditional concept of library has undergone a profound change: from a collection of physical information resources (mostly books) to a collection of digital resources. Additionally, the notion of digital resource includes not only texts in digital form, but in general any kind of multimedia resources. In a traditional library, physical information resources are managed through well-understood manual procedures, whereas in a digital library digital resources are organized according to a data model, discovered through a query language and managed in a highly automated way. In this paper, we present a data model and query language for digital libraries supporting identification, structuring, metadata support, re-use and discovery of digital resources. The model that we propose is inspired by the Web and it is formalized as a first-order theory, certain models of which correspond to the notion of digital library. We demonstrate the feasibility of the model and its suitability for practical applications by providing a full translation of the model to RDF and of the query language to SPARQL.
Bio: Carlo Meghini is a prime researcher at ISTI, working in the area of digital libraries and digital preservation. In the area of digital libraries, he has been involved in the DELOS Network of Excellence in Digital Libraries, contributing to the DELOS Reference Model for Digital Libraries; he participated in the FP6 Integrated Project BRICKS, aiming at developing a distributed Digital Library Management System, in the DL.org coordination action, and is involved in the making of Europeana since 2007, through the EDLnet, Europeana version 1.0, Europeana version 2.0 and ASSETS Best Practice Networks. In the area of digital preservation, he has been involved in the CASPAR project, an FP6 Integrated Project aiming at developing an OAIS-based architecture for preservation; he has also taught the OAIS Reference Model in several events organized by the CASPAR Project in conjunction with PLANETS and DPE Network of Excellence in Digital Preservation.For more information: http://www.nmis.isti.cnr.it/meghini/
Part 2: Europeana and Linked Open Data
Session Leader: Antoine Isaac
Time: 3:00 - 5:00 p.m.
Location: 126 LISB
Archived:
Description:Europeana recently launched a small animation advertising Linked Open Data to its network of partner cultural institutions: http://vimeo.com/36752317. Why? Why now? This talk will briefly introduce Linked Data technology, as especially seen from the perspective of cultural institutions like libraries. It will then present the particular perspective of Europeana, who has recently openly released part of its metadata as Linked Data -- see http://data.europeana.eu . How can we benefit? How can we contribute? A technical perspective on exchanging and enriching data will be presented, and complemented with a more strategical one, where Linked Data paradigm goes hand in hand with parallel efforts to make cultural data oen (http://pro.europeana.eu/support-for-open-data).
Bio: Antoine Isaac works as scientific coordinator for Europeana (http://www.europeana.eu/) and researcher in the Web and Media group (http://wiki.cs.vu.nl/web-media/) at the Vrije Universiteit Amsterdam. He has been investigating and promoting the use of Semantic Web and Linked Data technology in the Cultural Heritage environment since his
PhD studies in Computer Science at the Université Paris IV Sorbonne http://www.paris-sorbonne.fr/en/ and the Institut national de l'audiovisuel http://www.ina.fr/. His work focuses especially on the representation and interoperability of collections and ther vocabularies.
- February 17, 2012--Je t'aime, moi non plus: the tension between technology and documentation practices.
Session Leader: Seth van Hooland, Université Libre de Bruxelles (ULB), Belgium
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived: audio, slides
Description:The early-to-mid 2000s economic downturn in the US and Europe forced digital cultural heritage projects to adopt a more pragmatic stance towards metadata creation and to deliver short-term results towards grant providers. It is precisely in this context that the concept of Linked and Open Data (LOD) has gained momentum. Unfortunately, Semantic Web projects sometimes tend to be the victim of a technologically driven vision, where the mean becomes an end in itself. The presentation will put this tension between technologies and documentation practices in a larger context by presenting a hermeneutical framework for the analysis of metadata quality.
Bio:
Seth van Hooland holds the chair in Digital Information at the Information and Communication Science department of the Université Libre de Bruxelles (ULB), Belgium, and he is the president of the Master in Information and Communication Technologies (http://mastic.ulb.ac.be). His research focuses on metadata quality, digitization projects within the cultural heritage sector and digital humanities at large. Van Hooland also works as a consultant and a trainer for diverse European, national and local institutions in the domain of digital cultural heritage and document management. He is a member of the Dublin Core Metadata Initiative (DCMI) Advisory Board and co-chair of the DCMI Tools Community. An overview of his research activities can be found on http://homepages.ulb.ac.be/~svhoolan/
- February 3, 2012--Scaffolding & Embodiment - Perspectives in Human Computer Interaction
Session Leader: Christopher Lueg
Time: 4:00 - 5:00 p.m.
Location: 131 LISB
Archived: audio
Description: In this talk Professor Lueg will discuss how scaffolding and embodiment concepts originating from the cognitive sciences can be used to look at, and re-interpret, research topics in human computer interaction ranging from interaction in online communities to information behaviors in the real world. In his work Professor Lueg understands human computer interaction as interaction with pretty much any kind of computer-based system ranging from desktop computers and mobile phones to microwave ovens and parking meters.
* Bio: Dr. sc.nat. Christopher Lueg is a Professor of Computing at the University of Tasmania (UTAS) and convenor of the Information and Interaction (I2) Research Group where he and his students research topics in Human Computer Interaction (HCI), Computer Supported Cooperative Work (CSCW), Mobile Computing, Ubiquitous Computing and Information Research. In 2010-2011 Professor Lueg served as Interim Co-Director (Development) of the newly established Human Interface Technology Lab (HITlab) Australia. 2008-2011 he organised for UTAS to be the only Australian participant in the global ShanghAI lectures about Natural and Artificial Intelligence that were held via videoconference at the University of Zurich in Switzerland and broadcast simultaneously to some 20 universities around the globe.
- December 9, 2011--Are Collections Sets?
Session Leader: Karen Wickett
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archived:
Description:The concept of a collection plays key roles in library, museum, and archival practice, and is arguably fundamental to information organization systems in general. Locating collections concepts in a reasonably robust ontology should have a number of practical advantages, including revealing inferencing opportunities on the one hand, and supporting consistency and coherence in system design on the other. However, although practices involving collections have been studied empirically there has been surprisingly little attention given to the formal analysis of the concept itself, or related notions like collection membership. With this paper we hope to convene that discussion, beginning with the question: Are collections sets? We consider in detail the substantial arguments against collections being a kind of set, but recognize that at least one version of that claim, one based on considerations from Guarino and Welty's Ontology evaluation rules, cannot be ruled out. We recognize though that ontology decisions, whether practical or theoretical, ultimately come down to weighing competing considerations and not decisive formal arguments. Any conclusions therefore must await the development of alternative theories in subsequent papers. We invite the information science community to join us in this effort.
First presented at ASIST 2011 Annual Meeting, with co-authors Allen Renear and Jonathan Furner
New Orleans, LA, October 11, 2011
- November 11, 2011--Two part series: Jevin West and Peter Organizciak
Part 1: Document Discovery: Advancing Research with Large Knowledge Networks
Session Leader: Jevin West
Time: 2:45-3:45
Location: 126 LISB
Archive: Audio
Description: By putting the world's scholarly literature online, publisher websites and digital archives have made millions articles instantly available anywhere, any time, in digital form. This is a breakthrough in document delivery; we now await comparable breakthroughs in document discovery. As De Solla Price noted in 1965, the scholarly literature forms a vast network -- where the nodes are the millions of papers published in scholarly journals and the links are the hundreds of millions of citations connecting these papers. Can we use this vast network of trails, in combination with intelligent algorithms, to help researchers navigate the scholarly landscape? Can we develop research tools that not only deliver the content but facilitate the content? New approaches to measuring, mapping and evaluating documents are creating new forms of value that can be derived from the digital research content already available to the research community. In this presentation, I will talk about the Eigenfactor Project and the tools we have developed to rank and map scientific knowledge.
Part 2:When to ask for help: Evaluating projects for crowdsourcing
Session Leader: Peter Organisciak
Time: 4:00 - 5:00 p.m.
Location: 126 LISB
Archive: Audio
Description:A growing online phenomenon is that of crowdsourcing, where groups of disparate people, connected through technology, contribute to a common product. It refers to the collaborative possibilities of a communications medium as flexible and as populated as the Internet. If many hands make light work, crowdsourcing websites show how light the work can be, breaking tasks into hundreds of pieces for hundreds of hands. Building from the growing body of research in the area including the author’s work on crowd motivations, this presentation outlines the necessary steps and considerations in enriching projects through crowdsourcing. Presented at Digital Humanities 2011.
Resources: Full text of Peter's thesis
- November 4, 2011--Tinker, tailor, searcher, bricoleur: Studying software ecosystem
Time: 4:15 - 5:15 p.m. in Rm. 131 LISB
Session Leader: Mike Twidale
Archived:
Description:Lots of people, including those who don't think of themselves as computer scientists or programmers, are now able to get things done in their life by assembling a set of computer applications and web services, often for free or at low cost. For example, scientists can build good-enough cyber-infrastructures out of things like email, Skype, Doodle, Google Docs, Microsoft Office, DropBox, ManyEyes, Google Maps, etc. The same kind of thing can happen in domestic and business settings. How does it work? How can we help it work better? How can we study it when it keeps changing all the time? How come searching for tech help, for ideas, for applications and for code plays such a big role?
- October 14, 2011--IMLS DCC Contribution to the Digital Public Library of America Beta Sprint
Session Leaders: Carole Plamer and Jacob Jett
Archive: Audio
Description: We will present highlights from our contribution to the Digital Public
Library of American (DPLA) Beta Sprint, an initiative to solicit models,
prototypes, tools, and interfaces that demonstrate how the DPLA might
index and provide access to a wide range of broadly distributed content.
The IMLS Digital Collections and Content (DCC) project collaborated with
the Digital Library Federation to leverage the DCC's 1000+ cultural
heritage collections from libraries, museums, and
archives from across the U.S. The team experimented with the DCC national
aggregation model to extend the collections, make technical advances, and
redesign the resource for the DPLA community.
Resources: http://www.diglib.org/community/collaborations/dpla-beta-sprint/
- September 30, 2011--Disciplinary Research: Investigating the Impact of Dataset Reuse in the Earth Sciences
Session Leader: Tiffany Chao
Time: 3-4pm Please note the earlier start time
Location: 126 LIS
Archive: Audio
Description: In the realm of scholarly communication, scientific datasets are becoming more widely recognized for their scholarly and reuse value. However, given the investment toward maintaining and storing research data for long-term access, there is no clear strategy or metric for determining the reuse of research datasets. This study proposes a novel approach to track use and measure the impact of publicly accessible datasets in scholarly publications through disciplinary reach- the number of unique journals and related subject categorizations in which articles are published. Using affiliated publication(s), described by the author as the works identified by the dataset creator or curator related to a dataset, the principles underlying the bibliometric technique of citation analysis are leveraged and applied. Preliminary results show that affiliated publications are primarily in physical science and multidisciplinary journals, indicating these earth datasets may have an impact on a number of different research areas. Continued refinement of these approaches, measures, and the design will serve to broaden our understanding of the reuse potential of scientific data and their influence on advancing scholarship.
- September 23rd-- Automatic Subject Metadata Quality Assessment in IDEALS and MEDLINE Using the Conformance to Expectation Metric
Session Leader: Walker Weyerhaeuser
Description: This week CAS student Walker Weyerhaeuser will deliver a presentation based on his final project.
A proposed automatically-calculable subject metadata quality metric is evaluated in the context of two small collections, one an Institutional Repository with diverse content and metadata provenance, another a random sample of a larger, professionally indexed and more topically coherent database. The intuition behind and empirical basis of the metric is that information content correlates with human perception of quality. Discusses the dominant factors in the scoring functions and implications for collection managers who want to use the metric.
Archived: audio
- September 16th -- Data Curation, Infrastructure, and Services at Sheridan Libraries: An On-the-ground View of Professional Roles and Practice
Session Leader: Elliot Metsger, Tim DiLauro, and Sayeed Choudhury
Digital Research and Curation Center (DRCC)
Sheridan Libraries, Johns Hopkins University
Archive: Audio
Description: For our CIRSS Seminar this Friday we’ve invited our colleagues from Sheridan Libraries at Johns Hopkins University to talk with students and faculty about work “in the trenches” at the Digital Research and Curation Center. They will discuss the day-to-day activities and responsibilities involved in their positions and share their perspectives on the kinds of skills needed by new professionals working on the technical side of data curation, infrastructure, and services.
Tim DiLauro is a Digital Library Architect, Elliot Metsger is a Repository Programmer and the Team Lead for Infrastructure R&D; and Sayeed Choudhury is the Associate Dean for Library Digital Programs and leads the Data Conservancy project, a multi-million-dollar NSF DataNet initiative. As a partner on the Data Conservancy, CIRSS is conducting research on scientific data practices and foundational data concepts, in collaboration with DiLauro, Metsger, and others on the Infrastructure Research and Development team, and with other partners at the National Snow and Ice Data Center, Marine Biological Laboratory, and a number of other institutions.
More information about project activities can be found at http://dataconservancy.org, and on the CIRSS web site at http://cirss.lis.illinois.edu/SciCom/DataConservancy.html.
- September 9th -- Estimation Methods for Ranking Recent Information
Session Leader: Miles Efron
Archived: Audio
Description: Temporal factors often constitute a crucial dimension of relevance during information seeking. In addition to topically relevant documents, a good information retrieval (IR) system should retrieve documents that are timely. This talk will address statistical approaches to handling a common scenario--the case when a searcher wants to find topically relevant documents that are also recently published. Searching for recent information is common in domains such as current events and social media (e.g. blogs and microblogs). Handling so-called "recency queries" presents a keen challenge. We must balance topical and temporal evidence during document ranking. But we must do so without reducing search quality when queries lack a temporal component. During this talk I will present research that brings results from Bayesian parameter estimation to bear on the problem of admitting temporal evidence into information retrieval. The talk will focus on theoretical developments and experimental findings obtained from news and microblog data sets.
Originally presented at SIGIR 2011, the 34th annual international conference on research and development in information retrieval; co-author Gene Golovchinsky.
Resources: None.
- Friday, September 2, 4:00 - 5:00 p.m.
Session Leader: Kathryn La Barre
Title: "But I Still Haven't Found What I'm Looking For:" Searching For Folktakes and Films in Cultural Heritage Repositories
Archived: Audio, Slides
Description:“But I still haven't found what I'm looking for” is the title of the1987 smash hit by the rock band U-2. This plaintive refrain is commonly voiced by subjects in two ongoing research projects – Facets and Folktales, and Facets and Films. Insights gained from the facet analytical approach taken by these two projects may well inform the creation of better access structures for digital cultural heritage materials. This presentation will be a mash-up of three summer talks given during June and July at the Third North American Symposium of Knowledge Organization [Toronto, Ontario, Canada], The Tenth Congress of the Spanish Chapter of the International Society for Knowledge Organization (ISKO) [Ferrol Spain], and the ISKO United Kingdom Biennial Conference [London, England]. I’ll extend this talk with material I plan to discuss as an invited panelist at IBERSID (International Conference on Information and Documentation Systems) in Zaragoza Spain this coming October.
Resources:
- June 17th -- Comparing the Similarities and Differences between Two Translations
Session Leader: Ana Lucic
Archived: audio
Description: Ana will be making a pre-conference presentation of the paper co-authored with Dr. Catherine Blake
which will be presented at Digital Humanities 2011 conference. The paper explores the degree to which automated text analysis tools can
capture the different styles used by Burton Pike and Stephen Mitchell in their respective translations of the only prose novel written by
the famous German-language poet Rainer Maria Rilke, The Notebooks of Malte Laurids Brigge. Two candidate analysis tools were used to identify
similarities and differences between the Pike and Mitchell translations: the first approach used a syntactic representation of the texts
which was generated using the Stanford lexical parser (http://nlp.stanford.edu/index.shtml) and the second approach used principal component
analysis. So far, the main areas of difference were found in the use of negation modifiers, prepositional modifiers, object of preposition,
parataxis, and in the word choices for adjectival and adverbial modifiers.
Resources:
Comparing the Similarities and Differences between Two Translations, by Blake and Lucic
Stanford typed dependencies manual
McKenna, W., Burrows, J., Antonia, A. (1999). ‘Beckett’s Trilogy: Computational Stylistic and the Nature of Translation’
- April 1, 2011 -- How LIS Faculty Respond to Library Service Innovations: A Case Study
Session Leader: Sue Searing
Archived: Slides, Audio (mp3). We regret that the last 15 minutes of this recording was lost due to technical difficulties.
Description: The University of Illinois at Urbana-Champaign closed its Library & Information Science Library
and replaced it with a virtual library and an embedded librarian. A year later, a survey of the faculty and staff of the Graduate
School of Library and Information Science (GSLIS) and the University Library assessed how well the new service model meets faculty
needs. The data provide a snapshot of how LIS scholars discover and access new publications, how they seek reference assistance,
and what they desire from the library. The survey also captured faculty attitudes toward the realignment of library support for
their research and teaching.
Resources:
Background documents on the LIS library new service model are available
here.
See especially: Team implementation report (February 2009)
The librarian's library in transition from physical to virtual place
- March 11, 2011 -- Definitions of Dataset in the Scientific and Technical Literature
Session Leader: Simone Sacchi
Description: The integration of heterogeneous data in varying formats and from
diverse communities requires an improved understanding of the concept
of a dataset, and of key related concepts, such as format, encoding,
and version. Ultimately, a normative formal framework of such concepts
will be needed to support the effective curation, integration, and use
of shared multi-disciplinary scientific data. To prepare for the
development of this framework we reviewed the definitions of dataset
found in technical documentation and the scientific literature. Four
basic features can be identified as common to most definitions:
grouping, content, relatedness, and purpose. In this summary of our
results we describe each of these features, indicating the directions
a more formal analysis might take.
Resources:
Renear, A., Sacchi, S., & Wickett, K. (2010, October 22-27). Definitions of Dataset in the
Scientific and Technical Literature. Proceedings of the Annual Meeting of the American Society for Information Science &
Technology (ASIS&T), October 22-27, 2010, Pittsburgh, PA.
- February 25, 2011 -- Beyond Size and Search: Building Contextual Mass in Digital Aggregations for Scholarly Use
Session Leaders: Katrina Fenlon and Carole Palmer
Archived: Audio (mp3)
Description: At present there are no established collection development methods for building large-scale digital aggregations.
However, to realize the potential of the collective base of digital content and advance scholarship, aggregations must do more than provide
search of sizable bodies of content. Informed by empirical understanding of scholarly information practices, the IMLS Digital Collections and
Content project developed an aggregation strategy for building Opening History, one of the largest digital cultural heritage aggregations
in the country. The strategy applied policy-driven collecting, based on the principle of contextual mass, and conspectus-style
evaluation of collection-level metadata to identify strong subject areas within the aggregation.
Analysis of density, interconnectedness, diversity, and small/large collection complementarity determined
subject concentrations and thematic strengths to be prioritized for future collection development and used as
organizational structures for browsing and visualization. The approach models how scholars build their own personal
research collections, as they follow leads from collection to collection across institutions near and far, and adds value
that cannot be achieved through conventional retrieval and browsing at the item-level. This presentation is based on a paper presented at ASIS&T
2010, in Pittsburgh in late October.
Resources:
Palmer, C., Zavalina, O., Fenlon, K. (2010). Beyond Size and Search: Building Contextual Mass in Digital Aggregations for Scholarly Use. ASIST 2010, October 20-27, 2010, Pittsburgh, PA.
- February 4, 2011-- Making Digital Curation a Systematic Institutional Function
Session Leader:
Chris Prom (University Library)
Archived: Audio (mp3)
,Slides
Description:
Over the past decade, a rich body of research and practice has emerged under the rubrics of electronic records, digital preservation, and digital curation but few traditional archives have implemented systematic methods to capture, preserve, and provide access to the complete range of documentation that end users need to understand and interpret past human activity.
This talk describes the Practical E-Records Method, which attempts to address this problem by providing easy-to-implement software reviews, guidance/policy templates, and program recommendations that blend digital curation research findings with traditional archival processes and workflows. Using the method discussed in this paper, archives and manuscript repositories can use existing resources to incrementally develop digital curation skills, building a collaborative, expanding program in the process. Archival programs that make digital curation a systematic institutional function will systematically gather, preserve, and provide access to genres of documentation that are contextually-rich and highly susceptible to loss, complementing efforts undertaken by librarians, information scientists, and external service providers. Over the next year, the suggested techniques will be tested and refined at the University of Illinois Archives and possibly elsewhere.
- January 28, 2011 -- Rule Categories for Collection/Item Metadata Relationships
Session Leader:Karen Wickett
Archived: Audio (mp3)
Slides
Description:
Collections of artifacts, images, texts, and other cultural objects are not arbitrary aggregations, but are designed to
support specific research and scholarly activities. Collection-level metadata directly supports this objective, providing
critical contextual information. However, exploiting this information, especially in a semantic web environment of linked
data, requires a precise formalization of the rules that characterize collection/item metadata relationships. Toward this
end we are developing a logic-based framework of relationship rule categories for collection/item metadata. This framework
will support metadata specification developers, metadata catalogers, and system designers. This presentation summarizes the
results of a three year effort, part of the IMLS Digital Collections and Content project.
Resources:
Wickett, K., Renear, A., & Urban, R. (2010, October
22-27). Rule categories for collection/item metadata
relationships. Proceedings of the Annual Meeting of
the American Society for Information Science &
Technology (ASIS&T), October 22-27, 2010,
Pittsburgh, PA.
- November 12, 2010 -- What We Learned Trying to Save the Worlds
Session Leader:
Jerome McDonough
Archived: Audio (mp3)
, Slides
Description:
This talk provided a summary of the recently completed Preserving Virtual Worlds project,
including a discussion of issues of collection management, bibliographic description, intellectual
property law, preservation strategies for video games and interactive fiction, and metadata and
archival packaging. Possible future directions for both research and professional activity were
also discussed.
Resources:
McDonough et al. (2010). Preserving Virtual Worlds Final Report (Library of Congress, UleRA#2008-01111-00-00). Available at https://www.ideals.illinois.edu/handle/2142/17097.
- October 15, 2010 -- Contouring Curation in Research Libraries:
Defining “Working” Data Units and Communities
Session Leaders:
Carole Palmer
and
Melissa Cragin
Description:
Palmer & Cragin will present on their research related to data repositories, providing an overview
of the range of organizational structures investigated in a series of studies and then focusing on their
current work on the Data Conservancy, a research library based initiative to develop a broad data curation
strategy for scientific data. They will discuss their approach to studying data practices in “small science”
that is addressing the research question: What are the meaningful social units for organization and use of
data over the long term?
Resources:
Cragin, M.H., C.L.Palmer, J.R. Carlson, and M. Witt (2010). “Data sharing, small science,
and institutional repositories.” Philosophical Transactions of the Royal Society A 368(1926), 4023-4038.
- October 1, 2010 -- Bootstrapping Location Relations from Text
Session Leader:
Wu Zheng (GSLIS doctoral student)
Description:
Wu will be making a pre-conference presentation of this paper, which he will be
presenting at ASIS&T 2010.
This paper presents a semi-supervised bootstrapping algorithm that, when provided with a seed term,
automatically induces relations from text. This project is part of Cathy Blake's Evidence Based Discovery
project, which aims to develop new text mining methods that are consistent with the manual processes
that experts currently use to resolve contradictory and redundant evidence.
Abstract:
Ontologies play a critical role in information organization
and can be used for a range of applications from
information retrieval to knowledge discovery. However,
manual ontology construction is extremely labor intensive.
This paper describes a bootstrapping algorithm that, when
provided with a seed term, automatically induces relations
from text. We describe a series of experiments that explore
the role of sentence syntax during the bootstrapping process
and demonstrate the feasibility of this approach by
identifying a primitive instance-level relation – the location
relation, which is of interest because locations are described
in multiple genres, such as in the news, novels and
scientific articles. Our results suggest that syntax plays a
critical role in identifying location relations.
Resources:
Zheng, W. & Blake, C.(2010). Bootstrapping Location Relations from Text. ASIST 2010. Forthcoming 2010, October 22-27.
- September 24, 2010 -- The Modifiability Puzzle
Session Leaders: Allen Renear, Karen Wickett (GSLIS doctoral student)
Description:
We summarize presentations given at ASIS&T 2008 (Allen Renear, Karen Wickett, Dave Dubin) and Balisage 2009
and 2010 (Renear and Wickett) that analyze problems in our commonsense understanding of digital objects. The
discussion of these problems began some years ago in the GSLIS Electronic Publishing Research Group and has more
recently continued at times in the Conceptual Foundations Group. Related topics are now being actively pursued
within the Data Concepts group (Dubin, Renear, Wickett, and Simone Sacchi) of the NSF funded Data Conservancy.
Abstract:
The digital world seems to be a place of constant change. Documents are edited, databases updated,
files modified, datasets reformatted, and so on. But apparently we are deluded. Standard theories
of what digital objects are entail that those objects are immutable and cannot undergo any genuine
modification at all. It gets worse. Arguments against modifiability do not apply only to digital
objects and do not depend upon specialized definitions -- in a few simple steps ordinary beliefs
lead to paradoxes about many things.
Embedded inconsistencies in our commonsense beliefs have long entertained philosophers, but our
problem here is more than an idle Milesian amusement. While for the most part human beings manage
quite well with inconsistent conceptual schemes, the emerging world of linked data and semantic
technologies depends on precise definitions and straightforward logical reasoning, and carries
out automatic inferencing, based on those definitions, often with few opportunities for human
intervention and correction.
How can we reconcile our commonsense concepts of documents, databases, datasets, and the like
with the unforgiving demands of semantic technologies? We believe this is a profound and urgent
open question in information science and that the success of semantic technologies and linked
data depends on its resolution. On Friday, we will not defend a specific answer but rather
try to make the problem clear, and show that none of the known resolutions are without difficulties.
We present you with the puzzle -- you tell us how to solve it.
Resources:
Renear, A. H.,
Dubin, D. and Wickett, K. M. (2008), When digital objects change — exactly what changes?.
Proceedings of the American Society for Information Science and Technology, 45:1-3. doi: 10.1002/meet.2008.14504503143
Renear, Allen H., and Karen M. Wickett. “Documents Cannot Be Edited.” Presented at Balisage: The Markup
Conference 2009, Montréal, Canada, August 11-14, 2009. In Proceedings of Balisage: The Markup Conference
2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:10.4242/BalisageVol3.Renear01.
Renear, Allen H., and Karen M. Wickett. “There are No Documents.” Presented at Balisage: The Markup
Conference 2010, Montréal, Canada, August 3-6, 2010. In Proceedings of Balisage: The Markup
Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). doi:10.4242/BalisageVol5.Renear01.
- September 17, 2010 -- Hashtag Retrieval in a Microblogging Environment
Session Leader: Miles Efron
Archived: Slides
Description:
The paper, which Miles presented at this summer's SIGIR conference in Switzerland, will report
on how information retrieval (IR) systems can improve access to microblog information through
user-generated metadata, such as hashtags in Twitter. Miles' project is part of a larger effort
to identify key challenges and opportunities in the emerging domain of microblog IR.
Abstract:
Microblog services such as Twitter let users broadcast brief textual messages to people who "follow" their activity. Often these posts contain terms called hashtags, markers that signal a post's topic, audience, etc. Hashtags constitute user-generated metadata. This talk will report work based on the question, how can information retrieval
(IR) systems use this metadata to improve access to microblog information? The talk will report research on two problems.
- Ad hoc hashtag retrieval: Given a topical query Q, find a ranked list of hashtags that are useful
to a person interested in learning about or staying abreast of the topic underpinning Q.
- Text expansion using hashtags: Microblog IR systems traffic in a variety of text types
(e.g. tweets, queries, hashtags, representations of people). Given a text T,
create an expanded representation T' by probabilistic hashtag assignment.
Is retrieval based on T' more effective than retrieval from T?
The work I will describe is part of a larger effort to identify key challenges and opportunities in the emerging domain of microblog IR.
Resources:
Efron, M. (2010). Hashtag retrieval in a microblogging environment.
SIGIR '10: Proceeding of the 33rd International ACM SIGIR Conference
on Research and Development in Information Retrieval, Geneva,
Switzerland. 787-788.