About e-Research Roundtable
The ERRT is a new CIRSS research study group focusing on information problems in the curation and integration of digital research data and the development of research cyberinfrastructure more generally. The seminar is modeled after the Metadata Roundtable (MDRT), which we’ve held regularly since 2003. It will share the MDRT time slot, with the ERRT and MDRT each meeting about twice per month on Wednesdays from 12:30 – 2:00.
The ERRT is open to researchers, faculty, staff, students and others who are interested in e-Research issues. It will be a very informal exchange around participants’ research activities and open problems and advances in the field. Meetings will be held in 341 LIS. This is located at the east end of the 3rd floor of the Library and Information Science Building (inside the Information Science Research Laboratory).
Announcements and meeting reminders for the ERRT are distributed via a mail list. To subscribe to the list, please visit the mail list Web page at https://mail.lis.uiuc.edu/mailman/listinfo/errt.
If you have any questions regarding ERRT, please contact Kevin Trainor at 217-333-5881 or firstname.lastname@example.org.
- Wednesday, April 24 2013--Information Retrieval and Text Mining Meet e-Research: Towards a Researcher's Workbench for e-Research
Session Leader: ChengXiang Zhai, Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, affiliated with GSLIS, the Institute for Genomic Biology, and the Department of Statistics
Location: 242 LIS
Abstract: As more and more scientific data sets and literature articles are being accumulated, the natural question to ask is what kind of information systems do we need to develop in order to enable researchers to exploit all the relevant data to improve their research productivity. In this talk, I will discuss the vision of developing a researcher's workbench to integrate scattered data and knowledge and enable researchers to interact with data and knowledge effectively for hypothesis generation and testing. Using this vision as a framework, I will present some relevant work on information retrieval and text data mining done in the Text Information Management and Analysis (TIMAN) group in the Department of Computer Science at UIUC. Finally, I will discuss some open challenges that have to be solved in order to make such a researcher's workbench really work.
Bio: ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign (UIUC), where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and biomedical informatics, in which he published over 150 research papers. He is an Associate Editor of ACM Transactions on Information Systems, and Information Processing and Management, and serves on the editorial board of Information Retrieval Journal. He is a program co-chair of ACM CIKM 2004, NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist and a recipient of multiple best paper awards, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE). He also received the Rose Award for Teaching Excellence from the College of Engineering at UIUC.
- Wednesday, April 10 2013--Scientific Integrity and Transparency
Session Leader: Sayeed Choudhury, Director of the Digital Research and Curation Center and Associate Dean for Research Data Management at Johns Hopkins University
Location: 242 LIS
Archived: audio with slides, audio
Abstract: The Research Subcommittee of the Committee on Science, Space and Technology held a hearing on Scientific Integrity and Transparency. Sayeed Choudhury testified as one of the expert witnesses who was asked to focus on data sharing, access, and preservation. Choudhury chose to do so from the perspective of infrastructure development, comparing the development of data infrastructure to historical infrastructure development efforts such as railroads, automobiles and banking. During this ERRT, Choudhury will describe the rationale behind his comparisons and the process for such Congressional hearings generally and specifically as it relates to the most recent Office of Science and Technology Policy executive memorandum on data management.
Research Subcommittee of the Committee on Science, Space and Technology hearing on Scientific Integrity and Transparency
Biosketch: G. Sayeed Choudhury is the Associate Dean for Research Data Management and Hodson Director of the Digital Research and Curation Center at the Sheridan Libraries of Johns Hopkins University. He is also the Director of Operations for the Institute of Data Intensive Engineering and Science (IDIES) based at Johns Hopkins. He is a member of the National Academies Board on Research Data and Information, the ICPSR Council, DuraSpace Board, and a Senior Presidential Fellow with the Council on Library and Information Resources. Previously, he was a member of the Digital Library Federation advisory committee, Library of Congress' National Digital Stewardship Alliance Coordinating Committee and Federation of Earth Scientists Information Partnership (ESIP) Executive Committee. He has been a Lecturer in the Department of Computer Science at Johns Hopkins and a Research Fellow at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. He is the recipient of the 2012 OCLC/LITA Kilgour Award.
Choudhury has served as principal investigator for projects funded through the National Science Foundation, Institute of Museum and Library Services, the Andrew W. Mellon Foundation and Microsoft Research. He is the Principal Investigator for the Data Conservancy, one of the awards through NSF's DataNet program. He has oversight for data curation research and development at the Sheridan Libraries at Johns Hopkins University. Choudhury has published articles in journals such as the International Journal of Digital Curation, D-Lib, the Journal of Digital Information, First Monday, and Library Trends.
Full bio at http://www.educause.edu/members/sayeed-choudhury.
- Wednesday, March 27 2013--Combining Historical Empirical Data with the Study of Networks: Rethinking Anglo-American and German-Jewish Cooperation Before the First World War
Session Leader: Susie Pak, Assistant Professor in the Department of
> History at St. John's University
Location: 242 LIS
Abstract: Like scientific paradigms, historical narratives can be extremely resilient unless new evidence is introduced to reevaluate and replace them. In business history, certain paradigms have been so effective in explaining causality that they have reached the status of clichés: "It's only business; It's all about the money." Combining extensive archival research in private and public libraries with the theories of network analysis, Susie Pak discusses the ways in which a quantitative approach to traditional sources of information can yield new understandings of historical subjects. Focusing on the relationship between Anglo-American and German Jewish elite banks in the United States before and during the First World War, Pak will present a paper that studies the foundation of business cooperation between diverse ethnic groups in environments characterized by ethnic, racial, or national conflict. Using diverse sources of historical information from kinship ties to banking records to geographic analysis, she will analyze the conditions that make it possible for the distrust created by ethno-racial conflict to be minimized and allow the pursuit of economic self-interest to serve as the primary explanation for the presence of trust between economic competitors.
Biosketch: SUSIE J. PAK is an Assistant Professor in the Department of History at St. John's University where she teaches U.S. American history (late-19th and 20th centuries). She is a graduate of Dartmouth College (B.A., 1994) and Cornell University (M.A., 1999; Ph.D., 2004). She has been the recipient of the Harvard Business School Alfred D. Chandler Jr. Traveling Fellowship in Business History and Institutional Economy History and the Einstein Fellowship of the Jacob Rader Marcus Center of the American Jewish Archives. She specializes in the study of American business networks and her research bridges the divide between social and economic history through the innovative use of quantitative and qualitative methods. Her book, Gentlemen Bankers: The World of J.P. Morgan, is forthcoming (Harvard University Press, May 2013).
- Wednesday, March 6 2013--Scholarly Social Machines
Session Leader: David De Roure, Director and Professor of e-Research at the University of Oxford e-Research Centre
Location: 109 LIS
Abstract: Scholarly communication underpins the processes of research and innovation, but what if it fundamentally restricts them? I will argue for research communication mechanisms that better support today's digital scholarship, and suggest a new perspective on the scholarly ecosystem as interacting "social machines". The talk will draw on the music domain as an informative exemplar of pervasive digital practice.
Biosketch: David De Roure is Professor of e-Research at University of Oxford, Director of the Oxford e-Research Centre and coordinates Digital Humanities at Oxford. Focused on advancing digital scholarship, he works closely with multiple disciplines including social sciences (concentrating on social machines), digital humanities (computational musicology) and previously bioinformatics (in silico experimentation), chemistry (smart labs) and environmental science (sensor networks). He is an expert in big data analytics and has an extensive background in distributed computing, Web, Linked Data and social computing, runs the myexperiment.org social website for sharing scientific workflows and promotes innovation in scholarly communication. For the last 3 years he has also held a national role as National Strategic Director for Digital Social Research.
- Wednesday, February 27 2013--Deep Carbon Observatory Data Science: Developing large science networks on modern technology steroids
Session Leader: Peter Fox, Professor and Tetherless World Research Constellation Chair, Climate Variability and Solar-Terrestrial Physics, Rensselaer Polytechnic Institute (RPI)
Abstract: The Deep Carbon Observatory (DCO) is a decadal, multidisciplinary, international initiative dedicated to achieving a transformational understanding of Earth's deep carbon cycle. While DCO is still in its very early years, embedded in DCO science is what is now called data science. DCO researchers are beginning to experience a change in the conduct of their carbon-related research. DCO data science will rest upon a 21st century data science platform, and a series of aggregate data holdings that have never existed before. The platform must also coexist with, and fundamentally enhance key community (academic, commercial and agency) data resources already in existence. Initially DCO data science efforts will be unified in the Deep Carbon Virtual Observatory (DCVO): a collaborative scalable education and research environment for searching, accessing, integrating, and analyzing distributed observational, experimental, and model databases. A shift in the conduct of science toward data science will require a transition to data and software infrastructures that facilitate networked science. Built on current and future technologies, these infrastructures must scale from more traditional small investigator/student efforts, all the way to large international and multi-disciplinary teams. Many of the initial activities involve data and information structuring guidance, framework adaptation, network and collaboration stimulation, and integration and support of outreach/engagement goals and priorities. Ultimately the opportunity to assess the value of how such data platforms facilitate collaboration, data generation and use, and how additional peer norms (e.g. credit for data production) emerge for a new generation of researchers, is one topic to discuss during this roundtable.
Biosketch: Peter Fox joined the Tetherless World Constellation in 2008. Formerly, he was the Chief Computational Scientist at the High Altitude Observatory (HAO) of the National Center for Atmospheric Research (NCAR).
Fox's research specializes in the fields of solar and solar-terrestrial physics, computational and computer science, information technology, and grid-enabled, distributed semantic data frameworks. This research utilizes state-of-the-art modeling techniques, internet-based technologies, including the semantic web, and applies them to large-scale distributed scientific repositories addressing the full life-cycle of data and information within specific science and engineering disciplines as well as among disciplines.
Fox is currently PI for the Virtual Solar-Terrestrial Observatory (VSTO), the Semantically-Enabled Scientific Data Integration, and Semantic Provenance Capture in Data Ingest Systems projects. Since 1985 Fox has been bridging science and distributed data and information systems to support community activities utilizing use case driven design. Fox leads working groups for: Virtual Observatories for the Electronic Geophysical Year, semantic web for NASA technology infusion as well as the Earth Science Information Partnership federation, is chair of the AGU Special Focus Group on Earth and Space Science Informatics, is an associate editor for the [[Earth Science Informatics]] journal, is a member of the editorial board for [[Computers in Geosciences]] and lead editor for the AGU monograph Virtual Observatories in Geosciences currently in preparation. Fox is a member of the ad-hoc International Council for Science's Strategic Committee for Information and Data and chair of the International Union of Geodesy and Geophysics's Union Commission on Data and Information. Fox also currently serves as President for the not-for-profit Open source Project for a Network Data Access Protocol (OPeNDAP).
- Wednesday, February 20 2013--Building Community as a Curation Activity: Sharing Knowledge for Game Preservation
Session Leader: Jerry McDonough, Associate Professor at GSLIS
Abstract: The OAIS Reference Model emphasizes the need for archivists to monitor their designated community's level of knowledge and their common practices and tools for creating and using digital information. Unfortunately this model assumes an existing (and relatively well-defined) community of practice which an archive serves, an assumption which is not warranted for a large number of digital repositories. The Preserving Virtual Worlds 2 project has been investigating possible mechanisms for sharing knowledge among game curators, game developers and gamers as a way of 'self-organizing' a community of practice to support game preservation efforts. This talk will report on the PVW Curation Survey Wiki, an effort to prototype a tool for sharing knowledge needed to support the preservation of computer and video games.
- Wednesday, February 6 2013--Geobiology Site-Based Data Curation at Yellowstone National Park and Its Connection to Global Coral Reefs
Session Leader: Bruce Fouke, Director of the Roy J. Carver Biotechnology Center and Associate Professor in Geology, Microbiology and the Institute for Genomic Biology at Illinois
Location: 126 LIS
Abstract: A newly funded Institute of Museum and Library Services Site-Based Data Curation (IMLS-SBDC) project is developing a framework for the curation of research data generated at scientifically significant research sites. The framework will be based on geobiology research conducted at Yellowstone National Park, as an exemplar site producing data with long-term value. Yellowstone is a tremendously important and rich site for geobiology data collection, drawing scientists investigating research questions ranging from the origin of life on Earth to the search for life on other planets. Modern research in the earth sciences increasingly depends on the development of systematic accounts of the interactions of physical, chemical and biological phenomena and the integration of diverse measurements and observations. Making data accessible and functional for these purposes will depend on: (1) principled curation practices early in the data lifecycle; and (2) curating cohesive and usable sets of data for transfer to repositories. The Fouke lab at Illinois has ongoing Systems Geobiology research on Yellowstone hot springs and Caribbean and Pacific coral reef ecosystems. While at first glance, these seem like wildly different and unrelated environments, closer examination indicates a host of striking similarities and scientific parallels. The types of data collected in these geobiology studies and their interpretation will be evaluated and discussed.
Biosketch: Bruce Fouke is a professor in the Departments of Geology and Microbiology, and the Biocomplexity Theme in the Institute for Genomic Biology, at the University of Illinois Urbana-Champaign. He also serves as Director of the Roy J. Carver Biotechnology Center. Bruce specializes in integrated geological and biological studies of: (1) the control of sea surface temperature on coral reef ecosystems in the Caribbean and the global emergence of infectious marine diseases; (2) the response of heat-loving (thermophilic) bacteria in Yellowstone and Turkey to changes in hot-spring water flow rate, chemistry and temperature; (3) microbially enhanced hydrocarbon recovery in deep subsurface oil and gas rock reservoirs of Canada, Alaska and Ireland; and (4) the timing and cause of the last flow of water in the aqueducts of ancient Rome and Pompeii.
- Wednesday, December 12 2012--In search of non-text: Films, music, data visualizations, and more
Session Leader: Diane Rasmussen, Assistant Professor in the Faculty of Information and Media Studies at The University of Western Ontario
Location: 109 LIS
Abstract: Diane Rasmussen will discuss the diverse research agenda that drives her new book Indexing and retrieval of non-text information, an edited volume of peer-reviewed research which explores the issues surrounding documents that exist in a format other than (or in addition to) text. They appear in many contexts, such as user-generated content websites and authoritative digital collections, as well as in many formats, including photographs, videos, music, data visualizations, technical drawings, and video games. These contexts and formats present unique and timely challenges to information researchers. Diane will lead a discussion about the research area after an informal presentation.
Diane Rasmussen is an assistant professor in the Faculty of Information and Media Studies at The University of Western Ontario. Diane holds an MS and a PhD in information science from the University of North Texas. Additionally, she has been a systems librarian and a corporate information technology professional. She is a Director-at-Large for the American Society for Information Science and Technology and the incoming president of the Canadian Association for Information Science. Details about her teaching and research are available at http://bit.ly/fDmIsq, and information about her book can be found at http://bit.ly/RjmmLH.
- Wednesday, December 5 2012--‘Big data’ challenges for social research
Session Leader: Rob Procter, Professor and Director of the Manchester eResearch Centre (MeRC) at the University of Manchester
Location: 242 LIS
Abstract: The explosion of social media in the form of blogs, micro-blogs, social networking platforms and other ‘born-digital’ social data means that more economic and social data than ever before is now available to researchers. Where once the main problem was a scarcity of data, researchers must now cope with its abundance.
In this talk I will present a study of a large corpus of tweets sent during the UK August 2011 riots. I will outline the methodology and tools used and summarise some of the findings. I will conclude with a discussion of methodological issues it raises and how these might be addressed.
Bio: Rob Procter is the Director of the Manchester eResearch Centre (MeRC) at the University of Manchester. His research focuses on socio-technical issues in the design, implementation, evaluation and use of interactive computer systems, with a particular emphasis on ethnographic studies of work practices, computer-supported cooperative work and participatory design. Recently he has also been applying his deep expertise in social science to better understand micro-blogging. More information available at http://www.merc.ac.uk/?q=rob.
- Wednesday, November 28 2012--How do we build a better Biodiversity Informatics Workbench?
Session Leader: Matt Yoder, Biological Informatician, Illinois Natural History Survey
Location: 242 LIS
Abstract: Bioinformatics generally concerns informatics problems associated with managing vast quantities of strings of letters (DNA). Biodiversity Informatics focuses on the phenome (expression of said DNA), it pays particular attention to the challenges that arise from the vast quantity of life's diversity. The challenges of Biodiversity Informatics are immense because of the near infinite number of ways we can describe life- how should we digitize these descriptions, and what should our goals be post digitization? I'll attempt to drive conversation on these issues by outlining some existing workbenches that deal specifically with biological taxonomy, and then introduce some ideas as to where we might be headed in the near future. I'll pay particular attention to the role that biological ontologies (anatomical, nomenclatural) may have within these efforts.
- Wednesday, November 7 2012--Collection Examples for Europeana Data Model
Session Leaders: Karen Wickett, Katrina Fenlon and Jacob Jett
Location: 242 LIS
Description: We will discussing example collections for modeling collections from the IMLS Digital Collections and Content aggregation in the Europeana Data Model. We are hoping to use these examples to show the benefits integrating collections and collection description into the design of a digital library or aggregation system, by increasing access and by allowing users to understand the context of items and collections.
Our collection narratives are currently under construction, and can be found here:
- Wednesday, October 24 2012--Active and Social Data Curation: Reinventing the Business of Community-scale Lifecycle Data Management
Session Leader: Jim Myers, Director of the Computational Center for Nanotechnology Innovations (CCNI) supercomputing facility at Rensselaer Polytechnic Institute
Location: 242 LIS
Description: Effective long-term curation and preservation of data for community use has historically been limited to high value and homogeneous collections produced by mission-oriented organizations. The technologies and practices that have been applied in these cases, e.g. relational data bases, development of comprehensive standardized vocabularies, and centralized support for reference data collections, are arguably applicable to the much broader range of data generated by the long tail of investigator-led research, with the logical conclusion of such an argument leading to the call for training, evangelism, and vastly increased funding as the best means of broadening community-scale data management. Within the Sustainable Environments-Actionable Data (SEAD) project, we question this reasoning and are exploring how alternative approaches focused on the overall data lifecycle and the sociological and business realities of distributed multidisciplinary research communities might dramatically lower costs, increase value, and consequently drive dramatic advances in our ability to use and reuse data, and ultimately enable more rapid scientific advance. Specifically, we've introduced the concepts of active and social curation as a means to decrease coordination costs, align costs and values for individual data producers and data consumers, and improve the immediacy of returns for data curation investments. In this presentation, I'll describe our thinking and present a bit of the specific architecture and services for active and social curation that are being prototyped within the SEAD project within NSF's DataNet network and discuss how they are motivated by the long-tail dynamics in the cross-disciplinary sustainability research community we're supporting.
- Wednesday, October 10 2012--Working through Significance 2.0: A guide to assessing the significance of collections
Session Leaders: Katrina Fenlon, Karen Wickett and Carole Palmer
Location: 109 LIS
Abstract: We will walk through this 2009 document from the Collections Council of Australia, available at http://www.environment.gov.au/heritage/publications/significance2-0/ . From the introduction: "Significance 2.0 outlines the theory, practice and many applications of the concept of significance in collection management. It takes readers through the key concepts and steps in assessing significance, for single items, collections and cross-collection projects. With examples and case studies it shows significance in action, in a wide range of applications. This is a new and revised edition of Significance; a guide to assessing the significance of cultural heritage collections, published in 2001 by the Commonwealth of Australia on behalf of the Heritage Collections Council.
- Wednesday, October 3 2012--Informatics at its logical conclusion: how do we educate the current generation, and in what
Session Leaders: Peter Fox (Tetherless World Constellation, RPI) and Carole Palmer (Director, CIRSS)
Location: 109 LIS
Description: In the last 5 years, discipline-based informatics (e.g. geo, bio-, astro-) have demonstrably changed research practices in increasingly data intensive science pursuits. Even recognition and reward structures are broadening (e.g. the new data citation metric from Thompson Reuters). As currently envisioned, a not too far off logical conclusion may be a core informatics capability that is common across a large umber of disciplines as well as the full exploitation of the creative tension between informatics research and informatics applications. However on the people side of informatics, significant implications are now apparent: the significantly multi-disciplinary skill and knowledge set required. Thus, we are left with a rheteorical and age old question: how do we educate the current informatics generation, and in what?
We suggest discussions around: combining research and application training and experience for students; exploring possible student roles in curation, integration, information modeling, architecture, analytics, visualization, etc. In particular we consider the language barriers" that the new type of student being attracted to the various informatics career options will experience, to be a major initial challenge for existing education and degree offerings.
With many thanks to our GSLIS Instructional Technology & Design folks, we're pleased to provide a remote attendance option for this week's ERRT session. Below is a link for participating online, along with other connection details.
* SESSION TIME: 12:30pm to 2pm CENTRAL TIME, Wednesday 3 October 2012
-- Please note, we suggest you connect to the Blackboard Collaborate participant link below by 12:15pm CT, in case of any technical difficulties.
* PARTICIPANT LINK: https://sas.elluminate.com/d.jnlp?password=GSLIS-Mtg1Participant&sid=407
-- This link will prompt you to enter your name, which will appear in the participant list during the session.
-- Note that the system will need to install a Java applet.
-- Once you are fully logged in, our slides will run directly in the Blackboard Collaborate content area once the presentation starts.
-- Please run the audio set up wizard in the system to ensure your audio inputs and outputs are working properly.
-- When ready, you'll hear the session audio over your computer speakers or headset and a microphone feature can be toggled on for to speak to us (click talk to speak and click again to turn off your mic).
- Wednesday, September 26 2012--Exploring The Provenance of XSLTs
Session Leader: Ashley Clark
NOTE EARLIER TIME: 12:30pm-2:00pm
Location: 242 LIS
Description: When documents are transformed with XSLT, what methods can be used to understand and record those transformations? Though they weren't created for provenance capture, existing tools and informal practices can be used to manually piece together the provenance of XSLTs. However, a meta-stylesheet approach can generate provenance information by creating a copy of XSLT stylesheets with provenance-specific instructions. Even with the complications and limitations of the method, XSLT itself enables a surprising amount of provenance capture.
- Wednesday, September 12 2012--e-Research Roundtable Planning Session
Location: 242 LIS
Description:Please bring your ideas for sessions that you would like to see on the schedule for this year. This includes new topics and ideas, as well as previously suggested sessions that have not yet made it on the schedule, and updates to previous sessions.
- Wednesday, August 22 2012 -- SPECIAL NOON SESSION: NISO/DCMI Webinar on Metadata for Managing Scientific Research Data
Session Leaders: Webinar with Jane Greenberg and Thomas Baker
Location: 109 LIS
Description: CIRSS has registered for this NISO/DCMI webinar on Metadata for Managing Scientific Research Data led by Jane Greenberg and Thomas Baker, and we're pleased to offer group participation for anyone who would like to join us in room 109 LIS from noon to 1:30pm next Wednesday 22 August 2012. Further details from the NISO announcement follow below.
WEBINAR: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
DATE: August 22, 2012
TIME: 12pm-1:30pm (revised to Central Time); locally hosted in room 109 LIS
EVENT WEBPAGE: http://www.niso.org/news/events/2012/dcmi/scientific_data/
ABOUT THE WEBINAR
The past few years have seen increased attention to national and international policies for data archiving and sharing. Chief motivators include the proliferation of digital data and a growing interest in research data and supplemental information as a part of the framework for scholarly communication. Key objectives include not only preservation of scientific research data, but making data accessible to verify research findings and support the reuse and repurposing of data.
Metadata figures prominently in these undertakings, and is critical for the success of any data repositories or archiving initiative, hence increased attention to metadata for scientific data -- specifically for metadata standards development and interoperability, data curation and metadata generation processes, data identifiers, name authority control (for scientists), Linked Data, ontology and vocabulary work, and data citation standards.
This NISO/DCMI webinar will provide a historical perspective and an overview of current metadata practices for managing scientific data, with examples drawn from operational repositories and community-driven data science initiatives. It will discuss challenges and potential solutions for metadata generation, identifiers, name authority control, Linked Data, and data citation.
Jane Greenberg, professor at the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill and director of the SILS Metadata Research Center, is well known for research and writing on topics ranging from automatic metadata creation to metadata best practices, ontology research, Semantic Web, data repositories, thesauri, and scientific data curation. She has served as Principal Investigator or partner on a number of grants from the Institute of Museum and Library Services, National Science Foundation, and the National Institute of Health, and actively participates in organizations such as the American Library Association, American Society for Information Science and Technology, and the Dublin Core Metadata Initiative. Jane is the recipient of the 2012 Margaret Mann Citation from the Association for Library Collections & Technical Services (ALCTS).
Thomas Baker, Chief Information Officer of the Dublin Core Metadata Initiative, has recently co-chaired the W3C Semantic Web Deployment Working Group and the W3C Incubator Group on Library Linked Data.
REGISTRATION [note, this section not relevant if you plan to join the CIRSS group in LIS 109 for the session]
Registration is per site (access for one computer) and closes at 12:00 pm Eastern on August 22, 2012. Discounts are available for NISO and DCMI members and students.
Can't make it on the webinar date/time? Register now and gain access to the recorded archive for one year.
Visit the event webpage to register and for more information: http://www.niso.org/news/events/2012/dcmi/scientific_data/
- Monday, 25 June 2012--Social Machines
Session Leader: David De Roure
Location: 242 LIS
Description: While supercomputing and cyberinfrastructure have taken us towards greater scales of computation, the increasing engagement of scholars and citizens with the Web has taken us to greater scales of social participation, and e-Science is now in the space where these come together – whether that be Citizen Science, Twitter analytics or Cloud applications. In fact we now think so much in terms of the sociotechnical system that it's interesting to ask whether we might redefine the fundamental notion of "machine" to be something intrinsically sociotechnical – i.e. "Social Machines". We are setting out to observe the Social Machines that are out there in order to so seek insights into their classification and behaviour, then ultimately we hope to influence the design and construction of new Social Machines and observe them "in the wild" - for which we need another sociotechnical system: the Web Observatory.
Biosketch: David De Roure is interim Director and Professor of e-Research in the Oxford e-Research Centre. He is National Strategic Director for Digital Social Research and has a coordinating role in Digital Humanities @ Oxford. Focused on advancing digital scholarship, he has worked closely with multiple disciplines including bioinformatics (in silico experimentation), chemistry (smart labs), environmental science (sensor networks), social sciences (social statistics, behavioural interventions and social machines) and digital humanities (computational musicology). He has an extensive background in distributed computing, Web, Linked Data and social computing, runs the myexperiment.org social website for sharing scientific workflows and promotes new forms of scholarly communication. David has been closely involved in the UK e-Science programme and is chair of the UK e-Science Forum. He is a champion for the Web Science Trust and in 2011 was elected as a Research Fellow at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. He is a Fellow of the British Computer Society and a Member of the Institute of Mathematics and its Applications.
- Wednesday, May 2 2012--Modeling User Searching Behaviors and Search Assistance Usage via Transaction Logs
Session Leaders: William Mischo, Mary Schlembach, Joshua Bishoff (Grainger Engineering Library, UIUC); Elizabeth German (University of Houston)
Archive: slides, audio
Description: In order to optimize search and discovery services, it is important to develop evidence-based models of user information seeking behaviors within distributed retrieval environments. While a large number of user information seeking studies have been performed, our knowledge of user searching patterns, particularly in online catalogs (OPACs), is incomplete and often contradictory. The University of Illinois at Urbana-Champaign Library has been collecting custom transaction log data from their main gateway interface and its underlying Easy Search (ES) federated search system since 2007. ES provides contextual and adaptive search assistance mechanisms that present the user with search modification and reformulation suggestions and perform additional target searches in the background. The Illinois team performed a detailed analysis of the project’s custom transaction logs collected over the Fall 2010 and Spring 2011 semesters. This analysis looked at approximately 1.4 million user searches and over 1.5 million user target clickthroughs. This analysis has revealed rich information on user search characteristics, search assistance usage, and user clickthrough actions. This transaction log analysis provides several implications for web-scale discovery system design.
Among the findings: Users of the Illinois gateway enter an average of 4.33 terms per search query – much higher than previous studies; 48.05% of the search sessions contain more than one search term or a combination of search terms and search assistance actions – also higher than other studies; and while 66% of all searches originate as default keyword searches, the percentage of known-item or specific title/author searches exceeds 51% of the search queries. Known-item searches are performed in almost 55% of the search sessions. In addition, the ES search assistance suggestions and custom links are well-accepted by users; in 32.45% of all search sessions and 58% of the sessions with more than a single search query, users employed one or more search assistance operations. The logs also revealed that users are entering complete or partial journal titles and then clicking through at a high frequency into an A-to-Z e-journal list link and that the exact phrase/title words added links shown in selected results displays are heavily used. Users click on the presented journal title link 21.41% of the time that they are suggested and in over 6.86% of all search sessions. In addition, the journal title search option tab constitutes over 12% of the searches within the gateway. The use of publisher e-book matches is also high -- with clickthroughs into all the e-book content targets totaling 9.31% of all result target clicks and taking place in 11.36% of all search sessions.
This study will be published shortly. Full citation and an advance copy is posted here, under Resources.
Resources: William H. Mischo, Mary C. Schlembach, Josh Bishoff, Elizabeth German, "User Search Activities within an Academic Library Gateway: Implications for Webscale Discovery Systems", in Planning and Implementing Resource Discovery Tools in Academic Libraries, edited by Mary Popp and Diane Dallis, IGI Global, 2012 (in press), 22 ms pages.
- Wednesday, April 25 2012--Bandits and Browsing: Data Mining and Network Analysis for Library Collections
Session Leader: Harriett Green, English and Digital Humanities Librarian and assistant professor of library administration, University Library; Kirk Hess, Digital Humanities Specialist, University Library; Richard Hislop, Economics PhD candidate, University of Illinois at Urbana-Champaign
Archive: slides, audio
Description: Our project proposes to conduct network analyses and data regressions on a data set of 22 million items indexed in the University of Illinois Library catalog. Based on the network analyses resulting from this project, we will begin development for an enhanced recommender system for library catalogs and digital libraries that retrieves richer search results from a library collection search based on network analysis of subject relevancy, circulation data of items, and usage data for items that share interrelated subjects.
In order to build this test bed for algorithm and functionalities in the recommender system, we are utilizing the advanced computing resources of XSEDE to develop self-optimizing search algorithms and network analyses that would run against the bibliographic and catalog data in the University of Illinois library catalog and digital library indexes.
We have created initial prototypes of search algorithms, topic analyses, and network analyses using the English literature collection's 40,000 item sample set. A core algorithm that we initially developed identifies items that are infrequently used, yet have a high degree of topical relevance to other heavily used works in a collection. Another search algorithm we have developed identifies subject relevancy between items in a library collection, through a multi-faceted approach incorporating subject heading correlations, user behavior in circulation transactions, and probability of circulation usage.
Based on these and other analyses conducted on the sample data set, we will test the scalability of these search algorithms and network analyses by expanding them to run against full 22 million-item set of the University of Illinois Library catalog data on the Blacklight cluster in XSEDE.
- Wednesday, April 18 2012--Early Career Reuse of Quantitative Social Science Data
Session Leader: Ixhcel Faniel, post-doc researcher at OCLC
Description: Large scale data reuse over the long term depends in part on uptake from early career researchers who are still learning the norms, conventions, and practices of their discipline. Yet we know less about the data reuse practices of these novice data consumers. Such knowledge is important for data repositories. This is particularly true for data repositories seeking the Trustworthy Repository Audit and Certification (TRAC) or the Data Seal of Approval that must demonstrate an understanding of their designated communities. This talk discusses preliminary findings describing how novice social science researchers make sense of quantitative social science data.
Biosketch: Ixchel joined OCLC Research in 2011 as a Post-Doctoral Researcher on Lynn Silipigni Connaway's team. In this position she is working on projects associated with the User Behavior Studies & Synthesis activities theme. Ixchel also has an interest in the Research Information Management activities theme. She is currently studying the reuse of university research in industry.
She is also Principal Investigator on the DIPIR (Dissemination Information Packages for Information Reuse) project, an IMLS-funded study. Along with Elizabeth Yakel at the University of Michigan (Co-Principal Investigator) and collaborators at The Inter-university Consortium for Political and Social Research, the University of Michigan Museum of Zoology, and Open Context, Ixchel is studying data reuse in three academic disciplines. The team plans to identify how contextual information about the data that supports reuse can best be created and preserved.
Prior to joining OCLC Research, Ixchel was an Assistant Professor at the School of Information, University of Michigan. She earned an MBA and a Ph.D. in Business Administration (Information Systems) from the Marshall School of Business, University of Southern California. She also has a B.S. in Computer Science from Tufts University and has worked at Andersen Consulting (now Accenture) and IBM.
- Wednesday, April 11 2012--Recap of Research Data Access and Preservation Summit 2012
Session Leader: Karen Wickett, Andrea Thomer
Archive: slides, audio
Description:Wickett and Thomer will discuss the recent ASIS&T RDAP Summit held in New Orleans on March 22 and 23. We will discuss the major themes around data management that were discussed at the summit, including interoperability, policy development, strategies for deploying services, data citation, and education.
- Wednesday, March 7 2011--Introduction to the Europeana Data Model: Framework & Requirements
Session Leader: Carlo Meghini, Institute of Information Science & Technology, Italy
Archive: audio, elluminate recording
Description: The Europeana Data Model (EDM) is the data framework within which all Europeana data is ingested, managed, and published. This talk describes the fundamental parts of EDM and the EDM requirements that data providers need to meet to successfully produce EDM compatible data. The talk will culminate with a discussion of how EDM can accommodate collection-level representations.
Bio: Carlo Meghini is a prime researcher at ISTI, working in the area of digital libraries and digital preservation. In the area of digital libraries, he has been involved in the DELOS Network of Excellence in Digital Libraries, contributing to the DELOS Reference Model for Digital Libraries; he participated in the FP6 Integrated Project BRICKS, aiming at developing a distributed Digital Library Management System, in the DL.org coordination action, and is involved in the making of Europeana since 2007, through the EDLnet, Europeana version 1.0, Europeana version 2.0 and ASSETS Best Practice Networks. In the area of digital preservation, he has been involved in the CASPAR project, an FP6 Integrated Project aiming at developing an OAIS-based architecture for preservation; he has also taught the OAIS Reference Model in several events organized by the CASPAR Project in conjunction with PLANETS and DPE Network of Excellence in Digital Preservation. For more information: http://www.nmis.isti.cnr.it/meghini/.
Suggested Readings: Europeana documentation, including Primer, is available at: http://version1.europeana.eu/web/europeana-project/technicaldocuments/
Paper: The Europeana Linked Open Data Pilot; Haslhofer, Bernhard and Antoine Issac. Proc Int'l Conf. on Dublin Core and Metadata Applications 2011.
- Wednesday, February 29 2012--National Parks, Biological Collections & Natural Histroy Museum Informatics
Session Leader: Andrea Thomer, GSLIS MS student
Location: 242 LIS
Description: According to the 2008 IWGSC report, "Scientific Collections: Mission-Critical Infrastructure for Federal Science Agencies," object-based scientific collections contain billions of biological specimens that comprise a "vital research infrastructure" capable of supporting research in everything from climate science to agriculture to paleontology. However, this "infrastructure" is poorly funded, often poorly described, and dispersed throughout the country in collections at National Parks, in universities and in museums. Andrea spent last summer working at the Petrified Forest National Park (PEFO) as a Biological Science Technician and learning about this dispersed infrastructure first hand. In this presentation, she will present an overview of her work at PEFO and talk about some of her research interests in the description, meaning and content of natural history museum collections as they intersect with LIS.
Resources: National Science and Technology Council, Committee on Science, Interagency Working Group On Scientific Collections. (2009). Scientific Collections: Mission-Critical Infrastructure for Federal Science Agencies A Report of the Interagency Working Group on Scientific Collections. Washington, D.C. Retrieved from www.whitehouse.gov/sites/default/files/sci-collections-report-2009-rev2.pdf
- Wednesday, February 22 2012--An Introduction to the Campus Shared Computing Cluster
Session Leaders: Brynnen Owen & Jennifer Anderson, GSLIS IT
Location: 341 LIS
Description: Brynnen and Jennifer will provide an overview of the University of Illinois campus cluster computing resources (https://campuscluster.illinois.edu/), including the basics of using parallel computing and how to start jobs on the cluster. Brynnen and Jennifer are also happy to provide additional information and answer questions on other campus computing resources.
- Wednesday, February 15 2012--Exploring Problems of Data Mobility, Sharing and Reuse
Session Leader: Rob Procter, University of Manchester, UK (via Skype in room 242)
Archive: slides, audio
Location: 242 LIS
Description: The e-Research vision sees increased data re-use and sharing as key to future scientific advances. This talk explores some problems that may make achieving this difficult in practice. It draws on experiences in two related projects involving the creation and curation of an archive of digital mammograms and its use in training.
Bio: Rob Procter is Professor and Director of the Manchester eResearch Centre(MeRC) at the University of Manchester, having previously been Director of Research at NCeSS. Before that, he was leader of the Social Informatics Cluster, a multi–disciplinary research group within the School of Informatics, University of Edinburgh. His research focuses on socio–technical issues in the design, implementation, evaluation and use of interactive computer systems, with a particular emphasis on ethnographic studies of work practices, computer-supported cooperative work and participatory design.
- Wednesday, February 1 2012--GSLIS work force study requirements: the past, present and future
Session Leader: Cathy Blake
Archive: slides, audio
Location: 242 LIS
Description: There was (perhaps) a time when the training that you received during your first degree was sufficient to sustain your entire career. The information age has drastically altered the time-frame for re-tooling, particularly in the field of information library science. The goal of this e-research roundtable is to identify GSLIS infrastructure requirements that will (a) provide a resource for GSLIS students to align their interests with positions and curricula and (b) enable longitudinal analyses of workforce needs and issues. To achieve this goal will require a collective GSLIS effort that includes faculty, staff, and students.
Dr. Blake will lead this working session by providing a framework that would enable us to leverage text mining for the initial activities. Faculty, staff, and students who have conducted work-force analyses are particularly welcome.
- Wednesday, November 30 2011--Mining Biomedical Multiple-Ontology Patterns
Session Leader: Samir E. AbdelRahman (with Cathy Blake)
Location: 242 LIS
Description: Ontologies provide a powerful way to organize information and understand the world in which we live. Despite their usefulness, articulating a complete and consistent ontology is difficult and time-consuming. Moreover, knowledge evolves and the effort required to keep an ontology current and thus relevant is often underestimated. Our goal in this project is to infer new ontological concepts based on an existing ontology and full text documents. In this presentation we focus on the Unified Medical Language System (UMLS), that maps biomedical concepts to surface level features in text (words). For example, the kidney cancer concept in the UMLS includes ‘cancer of kidney’ and ‘kidney cancer’ phrases, which can be framed as a text transformation X of Y = YX. We also explore word patterns and transformations between parent and child relationships in the UMLS. For example kidney cancer and breast cancer are both children of cancer and take the form <body part> cancer. We report frequent patterns in the UMLS and preliminary results on text.
- Wednesday, November 9 2011--IMLS DCC and the Europeana Data Model: Convergences and next steps
Session Leaders: Carole Palmer, Katrina Fenlon, Allen Renear
Location: 242 LIS
Description: IMLS DCC is collaborating with Europeana, a massive international digital cultural heritage aggregation. This session will be a conversation about next steps toward meeting the challenges to interoperability between these two aggregations, particularly addressing the Europeana Data Model and its potential for accommodating IMLS DCC collections.
- Wednesday, October 19 2011--A Cross-Disciplinary Typology of Topical Relevance Relationships and Its Implications
Session Leader: Xiaoli Huang, Assistant professor, Business School, Sun Yat-sen University, China
Location: 341 LIS
Description: This presentation reports on a cross-disciplinary inquiry into topicality and relevance, involving an in-depth literature analysis and an inductive development of a faceted typology (containing 227 fine-grained topical relevance relationships arrayed in three facets and 33 types of presentation relationships). This inquiry reveals a large variety of topical connections beyond topic matching (the common assumption of topical relevance in the field), renders a closer look into the structure of a topic, and induces a generic topic-oriented information architecture that is meaningful across topics and domain boundaries. The findings from the analysis contribute to the foundation work of information organization, metadata development, intellectual access /information retrieval, and knowledge discovery.
- Wednesday, October 5 2011--The Prototype Open Emblem Book Portal: Leveraging the Emblem Community’s Spine Metadata Schema
Session Leader: Tim Cole, Myung-Ja Han, Jordan Vannoy
Location: 109 LIS
Description: In 2003 Stephen Rawles (Glasgow University Centre for Emblem Studies) outlined an approach for creating metadata records for digitized emblem books in a Web-published paper entitled, A Spine of Information Headings for Emblem-Related Electronic Resources . As compared to many other classes of retrospectively digitized texts, digitized emblem books offer added challenges for description. A genre of European literature popular between 1530 and 1750, emblems unite three elements—a motto, a picture, and poetry. These three components create puzzles that carry metaphors and messages for readers. Individual books may contain only a handful of emblems or may contain more than 1,000 emblems. To support scholarship, emblems (as well as emblem books) need to be discoverable, retrievable and citable individually, further complicating issues of descriptive granularity. Rawles’s paper became the foundation for the Spine metadata XML schema  created by Thomas Stäcker of the Herzog August Bibliothek (Wolfenbüttel, Germany), with subsequent modifications and additions by Tim Cole and Myung-Ja Han. As part of a NEH/DFG funded grant project  (Mara Wade, U.S. PI), Cole, Han, and Jordan Vannoy have created a functioning prototype of a new Open Emblem Book Portal . The new design leverages unique features of the Spine schema and is intended to be responsive to the evolving needs of the Emblem Studies community. Scholars expect more of digital libraries today than in years past. For digitized special collections materials such as our digitized emblem books collection, this has required the UIUC Library to reexamine our digital content processing workflows and rethink how we provide access to such digitized special collections content. For this project, our workflows have become more in keeping with standard Semantic Web and Linked Data principles, and now make use of globally-scoped, persistent and precise identifiers for our digitized emblem resources. This roundtable will start off with a reprise of a 20-minute presentation given by Cole and Han at the recent triennial meeting of the Society of Emblem Studies in Glasgow. Cole, Han and Vannoy will then lead an in-depth discussion of the Spine schema and the design choices implemented in the Open Emblem Book Portal to date.
and Management of Knowledge: Renessance Emblem Literature as a case study for the digitization of rare texts and images. 2004. Mara R. Wade (ed.), DigiCULT. Available online: http://www.digicult.info/downloads/dc_emblemsbook_lowres.pdf
Iconclass, a multilingual classification system for cultural content: http://www.iconclass.org/ and http://www.iconclass.org/rkd/9/
Sample metadata record in Spine and METS:
Spine metadata record
METS metadata record
- Wednesday, September 28 2011--The DLF-DCC Linked Data Prototype for the Digital Public Library of America
Session Leader: Richard Urban
Location: 242 LIS
Description: This ERRT session will review the efforts to translate IMLS Digital Collection and Content collection-level XML records into Linked Data.
• Mapping from XML to RDF/XML syntaxes
• Analyzing Collection-level metadata patterns using SIMILE Gadget
• Reconciling values against Linked Data Vocabularies, such as the LoC Thesaurus of Graphic Materials and Freebase Locations using Google Refine
• Providing different serialization formats using TALIS Morph
While much of the session will be devoted to pragmatic how-to topics, we will conclude by relating these activities to ongoing research agendas on collection-level description, collection development, and the logical forms of metadata records.
- Wednesday, September 14 2011--No ERRT this week
ERRT members may be interested in attending the following GSLIS event instead:
History Salon: The Database and its Discontents
Location: 131 LIS
In this salon, Bonnie Mak will share some thoughts about the curious slippage between facsimile and fact in the database. More information will be posted at: http://www.lis.illinois.edu/events/2011/09/14/history-salon.
Description: Overlooked in the excitement about the possibility of “digitizing everything” is the day-to-day manual —and often menial —labour demanded of volunteers, student workers, devoted enthusiasts, scholars, and professionals. Theirs is important work, and supports the lofty promises to save the cultural heritage of the world. This paper sheds light on some of the very human efforts that underpin the production of digitally-encoded materials. Drawing upon Bruno Latour and Steven Woolgar’s work on the social construction of scientific fact, the paper scrutinizes the processes by which information is generated and marketed in the digital environment. An analysis of materials from the database, “Early English Books Online,” among others, will help lay bare the dynamics by which the status of transcriptions and facsimiles of texts and books shifts from interpretation to fact as they are recontextualized and remediated online. By investigating the production and circulation of digitally-encoded materials with this critical lens, the paper seeks to develop a richer understanding of information and knowledge in the twenty-first century.
- Wednesday, September 7 2011--Meditations on the Logical Form of a Metadata Record
Session Leaders: Allen Renear, Richard Urban, Karen Wickett
Location: 109 LIS
Description: Open linked data and semantic technologies promise support for information integration and inferencing. But taking advantage of this support often requires that the information currently carried by ordinary "colloquial" metadata records be made explicit and available for computer processing. Given the fairly simple structured nature of metadata records this looks easy to do. Turns out though that it is not at all easy to do. A number of very fundamental puzzles arise, some of them related to identifier elements, others are issues with knowledge representation in general. Although related problems have been studied here at GSLIS for some time, the current systematic development is largely new -- its first exposure was just a few weeks ago as a "Late Breaking" report at "Baliage: The Markup Conference" (Montreal). It is also very much a work in progress (with suspected flaws) and so this is an invitation to participate in evolving this account of what metadata records really are, and how they do what they do.
- Wednesday, August 31 2011--ERRT Planning Session
Session Leader: Carole Palmer
Location: 242 LIS
Description: This ERRT meeting will be a planning session. Please bring your ideas for sessions that you would like to see on the schedule for this year. This includes new topics and ideas, as well as previously suggested sessions that have not yet made it on the schedule, and updates to previous sessions.
- Wednesday, May 25 2011--Linked Open Data for Libraries
Session Leader: Richard Urban
Archive: Audio (mp3)
Location: 341 LIS
Description: Richard Urban will participate in the Linked Open Data for Libraries Archives and Museums Summit in June. This ERRT will introduce attendees to the Linked Data movement and the W3C Linked Library Data Incubator. At the Summit Urban will be leading a discussion about LOD and current approaches for sharing metadata for cultural heritage collections through the Open Archives Initiative - Protocol for Metadata Harvesting.
W3C Linked Library Data Incubator
Tim Berners Lee - Linked Data
Haslhofer, B. & Schandi, B. (2010) Interweaving OAI-PMH data sources with the linked data cloud. International Journal of Metadata, Semantics and Ontologies 5(1), pp. 17-31
- Wednesday, May 11 2011--Europeana Data Model
Session Leaders: Katrina Fenlon and Peter Organisciak
Archive: Audio (mp3)
Description: With more than 10 million items, Europeana is Europe's largest aggregation of digital cultural heritage resources from libraries, archives, and museums. This session will explore the Europeana Data Model, a new proposal for structuring the data that Europeana will be ingesting, managing and publishing. The Europeana Data Model is designed to replace the Europeana Semantic Elements (ESE), the basic data model that Europeana began life with. Each of the different heritage sectors represented in Europeana uses different data standards, and ESE reduced these to the lowest common denominator. EDM reverses this reductive approach and is an attempt to transcend the respective information perspectives of the sectors that are represented in Europeana – the museums, archives, audiovisual collections and libraries. EDM is not built on any particular community standard but rather adopts an open, cross-domain Semantic Web-based framework that can accommodate the range and richness of particular community standards such as LIDO [LIDO] for museums, EAD1 for archives or METS2 for digital libraries.
Europeana Data Model primer
- Wednesday, May 4 2011--What's in a name? Problems with Relationships in FRAD
Session Leader: Liza Coburn
Archive: Audio (mp3)
Location: 341 LIS
Description: FRAD (functional requirements for authority data), a product of the FRANAR working group, is an extension of FRBR. The goal of a project undertaken last fall by University of Illinois Library Senior Coordinating Cataloger Qiang Jin and GSLIS student Liza Coburn has been to explain FRAD through entity-relationship diagramming, the way that Robert Maxwell did with FRBR (FRBR: A Guide for the Perplexed, 2009).
Along the way they have discovered some problems with the FRAD model, and these problems will be the focus in this session of ERRT. In a deviation from the usual ERRT format, Coburn will present a brief introduction to FRAD (with the hope that participants will be able to review the model documentation on their own, ahead of time) and the project, the problems encountered, and then will open it up for discussion to see what we can come up with.
Functional Requirements for Authority Data (FRAD) A Conceptual Model. Final Report. December 2008 IFLA Working Group on Functional Requirements and Numbering of Authority Records (FRANAR).
- Wednesday, April 27 2011--Units, measures, and physical quantities in WolframAlpha
Session Leader: Michael Trott, Content manager for physics at Wolfram|Alpha
Description: All quantitative measurement values come with units (like meters, kilograms, pascals, volts, ...) . In addition to the modern SI, there are thousands of different units in use, sometimes for historical, sometimes for geographic reasons. Recognizing units and converting between them is very important for dimensional calculations, data statistics, and more. The structure of the unit system of Wolfram|Alpha and the statistics about the use of units will be discussed.
- Wednesday, April 13 2011--Enabling Long-Term Access to Born-Digital Materials on CD-ROMs: Migration, Emulation, and Imperative to Pool Technical Knowledge
Session Leader: Geoffrey Brown,Professor of Computer Science at the School of Informatics and Computing, Indiana University
Description: For the past 20 years, CD-ROMs have been the primary media for distributing key economic, scientific, environmental, and societal data as well as educational and scholarly work. Indeed, 10,000's of titles have been published including thousands distributed by the United States and other governments. Yet no viable strategy has been developed to ensure that these materials will be accessible to future generations of scholars. In the short term, these materials are subject to physical degradation which will make them ultimately unreadable and, in the long-term, technological obsolescence will make their contents unusable.
The diaries of H.R. Haldeman, Richard Nixon's chief of staff, were published in their entirety on CD-ROM, but only in abridged form on paper. References by Haldeman to Mark Felt, who was unveiled as the Watergate source, appear only on the CD-ROM version. This CD-ROM no longer operates in modern Windows environments, but can be accessed, with some effort, in an emulation environment. In other cases, the files on a CD-ROM can still be accessed, but may be in obsolete formats. Finally, many publications of government agencies are available only for local use in a few libraries.
I will discuss two aspects of our work in digital preservation: the creation of a browsable networked archive of the approximately 5000 CD-ROMs published by the United States Government Printing Office and the development of emulation technologies to enable future scholars ready access to materials such as the Haldeman diaries.
The goals for this roundtable are to discuss the limits of the available technological solutions, the social implications their implementation, and the legal constraints on deploying them.
Kam Woods and Geoffrey Brown. Creating Virtual CD-ROM Collections
Stuart Granger. "Emulation as a Preservation Strategy".
Copyright Law Section 108
- Wednesday, March 30 2011--The Digital Public Library of America Initiative: Considering Content and Scope
Session Leader: Carole Palmer (Director of CIRSS)
Archive: Audio (mp3), Slides
Description: The Digital Public Library of America (DPLA) initiative began in December 2010 with support from the Alfred P. Sloan Foundation. This ERRT session will provide an overview of the initiative and the first working meeting held at Harvard on March 1st on content and scope issues. We will discuss themes that emerged from the meeting and questions about Europeana as a model for DPLA and possible roles for our Digital Collections and Content project and the inclusion of the aggregation in DPLA.
DPLA Wiki: Please review the Content and Scope section, and the workshop links, in particular. Workshop participants are listed here.
See also a recently released Concept Note, an outcome of the March 1st meeting.
See the main DPLA website at the Berkman Center for Internet & Society at Harvard for additional information and context.
- Wednesday, March 16 2011--Progress Report: Revisiting the Dublin Core 1:1 Principle
Session Leader: Richard Urban
Archive: Audio (mp3), Slides
Description: The Dublin Core 1:1 Principle exhorts metadata creators to create descriptions that describe one, and only one resource. But how is that that metadata describes anything at all, let alone one and only one thing? This session will explore how traditional puzzles about description and reference help us understand 1:1 Principle violations.
MILLER, S. The One-To-One Principle: Challenges in Current Practice. International Conference on Dublin Core and Metadata Applications, North America, 0, sep. 2010. Available at: http://dcpapers.dublincore.org/ojs/pubs/article/view/1043. Date accessed: 05 Mar. 2011.
Ludlow, Peter, "Descriptions", The Stanford Encyclopedia of Philosophy (Spring 2011 Edition), Edward N. Zalta (ed.), forthcoming.
- Wednesday, March 2 2011--Disciplinary Culture And Interoperability, An Incompatible Mix?
Session Leader: Carl Lagoze, Associate Professor of Information Science at Cornell University
Archive: Slides, Audio (mp3)
Description: Interoperability: A key enabler of cyberinfrastructure development is the ability to discover and deploy functionality that leverages commonalities amongst the practices of scientists in diverse fields, thereby allowing data sharing and other collaborative activities amongst them. However, there is good evidence from the literature that individual disciplinary cultures are deeply culturally embedded and based on the number of factors including the nature of the research, the economic value of the research products, and in some cases dysfunctional, historically-based path dependencies. Our own work examining the research practices and collaborative patterns of chemists and physicists has shown strong evidence of this.
Designers and researchers of cyberinfrastructure are faced with two unpleasant alternatives. Ignore aspects of these disciplinary idiosyncrasies and possibly create cyberinfrastructure that its target communities resist. Or, accommodate these differences by creating lowest common denominator cyberinfrastructure that fails to provide sufficient functionality to really facilitate new scientific practices.
These are some of the questions we face the Data Conservancy project, which is funded by the National Science Foundation to research, prototype, and possibly develop new cyber infrastructure for data Curation. I certainly don't know the answers to these questions and look forward to a stimulate discussion on the best way to approach this problem.
- P.N. Edwards, S.J. Jackson, G.C. Bowker, and C.P. Knobel, Understanding Infrastructure: Dynamics, Tensions, and Design, National Science Foundation, 2007.
- C.L. Palmer and M.H. Cragin, Scholarship and disciplinary practices, Annual review of information science and technology, vol. 42, 2008, p. 163212.
- T. Velden and C. Lagoze, Communicating Chemistry, Nature Chemistry, vol. 1, 2009.
- T. Velden, A.-ul Haque, and C. Lagoze, A new approach to analyzing patterns of collaboration in co-authorship networks: mesoscopic analysis and interpretation, Scientometrics, Apr. 2010.
- Wednesday, February 16 2011--Technology's positive impact on the cultural heritage of Native American tribes
Session Leader: Biagio Arobba
Description: At this session, Biagio Arobba will introduce his background in semantic middleware and Native American communities, discuss his heritage (and answer any questions), and explain his interest in social media, the Web, and mobile devices and why he believes they will help Native American communities with culture, language, and heritage preservation.
Semantic middleware, originally developed for e-science, has the potential to be transformational for Native American communities. We all know (or assume) that Native American languages are disappearing. You might be surprised to learn that over half of the pre-colonial Native American languages in the United States are still spoken today; but, that number is changing dramatically. Many Native American people are concerned with the disappearance of their spoken languages, and there is a desire for ... something ... to help the people in Native American communities, and local government organization, increase fluency among their peers.
There are both problems and opportunities. For example, many places in the United States are resistant to multi-lingual education. Then, working with local government and tribal agencies can be a nightmare. On the other hand, tribes in the United States have far better access to digital media and the Internet than would a community in the Amazon rain forest. Additionally, Native American children in either tribal communities or communities with relatively high Native American populations are drawn to social media, gaming, and mobile devices. The majority of elderly and young parents want the digital age for today's generation.
Also, there are lots of research and methods for teaching major world languages, but many of these same techniques aren't quite right for smaller minority language communities. In recent years, a growing number of tools have been popping up across the Internet (possibly because more attention is being paid to minority languages, or simply because computers, best practices, and the Internet have reached the necessary critical mass to make this possible). Mr. Arobba in his work looks for any way to reduce the need for reinventing the database for every application, to reduce time-to-deployment, and to make user interfaces easier for everyday users.
Arobba, B., R.E. McGrath, J. Futrelle, and A.B. Craig, "A Community-Based Social Media Approach for Preserving Endangered Languages and Culture" In: "The Changing Dynamics of Scientific Collaborations" workshop at 44th Hawaii International Conference on System Sciences, January 3, 2011.
Live and Tell
- Wednesday, January 26 2011--Report on IDCC 2010
Session Leaders: Tiffany Chao, Liza Coburn, Simone Sacchi, Nic Weber, Laurence Cook, and Trevor Munoz
Description: These student participants in the IDCC 2010 will report out on the conference, summarizing the workshops and the other conference sessions that they attended.
- Wednesday, January 19 2011--Briefings from the front: RDA testing at GSLIS
Session Leaders: MJ Han and Kathryn La Barre
Archive: audio, slides
Description: MJ Han and Kathryn La Barre will be discussing preliminary results of the recently concluded RDA test practicum, and the experiences of the 3 instructors, 5 library faculty and 8 students who participated in the test. We will pay particular attention to MJ's experience creating RDA/Dublin Core records.
If you want to know more about the test of RDA please view the first slides from this presentation. The later slides offer comparisons between the existing code AACR2 and RDA, changes to MARC, and preparation strategies.
Judith Kuhagen & RDA: Resource, Description and Access Essentials
- The U.S. National Libraries RDA Test Plan
- Critical differences between AACR2 and RDA
- Changes to MARC21
- How to best prepare yourself, your colleagues, and your library
Download presentation with speaker notes (ppt 3 MB)
RDA bibliography (doc)
- Wednesday, December 1 2010--The Impact of Massive Data on Astronomy
Session Leader: Robert Brunner (Astronomy)
Description: As we tackle ever more difficult questions, Astronomy is evolving from a data-poor to a data-rich scientific discipline. In this presentation, I will discuss the questions we are trying to address, introduce the projects and data that are being (or soon will be) produced, and present some of the challenges and opportunities that we now face.
The Post-Singularity Future Of Astronomy: Astronomy could be the first discipline in which the rate of discovery by machines outpaces humans' ability to interpret it
We regret that the recording of Roberts session was lost due to technical difficulties.
- Wednesday, November 17 2010--Working with Supplementary Materials - data, software, scripts - to Dissertations and Theses
Session Leader: Sarah Shreeves
Description: Illinois has allowed electronic deposit of theses and dissertations since 2009 and is now mandating such deposit with Fall 2010. We now allow deposit of supplementary materials along side these ETDs and all of this material will appear in IDEALS. I will discuss the range of materials we're seeing and how we are approaching some of the stewardship issues. We can also discuss more generally the successes and obstacles to the ETD program.
The IDEALS collection for theses and dissertations
Illinois Graduate College Thesis Office Page
- Wednesday, November 10 2010--Europeana Semantic Elements: Supporting cross-domain, European metadata exchange
Session Leader: Katrina Fenlon
Description: Europeana is Europe's multimedia, on-line library/museum/archive: an ambitious aggregation of digital resources from all heritage sectors in all 27 European Union member states. This presentation will introduce the Europeana Semantic Elements (ESE) version 3.3, the Dublin Core-based metadata set underlying the portal. ESE supports cross-domain metadata provision to the current version of the Europeana aggregation. The presentation will also make a very brief introduction to Europeana Data Model, the Semantic Web-based data model intended to replace ESE as the standard for description and exchange in the next release of the Europeana portal.
Semantic Elements Specification, Version 3.3, 19/07/2010.
Metadata Mapping & Normalisation Guidelines for the Europeana Semantic Elements, Version 2.0, 19/07/2010.
- Wednesday, October 20 2010--Linked Data Issues
Session Leader: Joe Futrelle
Archive: audio, prep notes
Description: Joe has been planning this session on Linked Data Issues based upon questions and issues submitted by ERRT members. It promises to be a very interesting session.
- Wednesday, October 13 2010--Describing artifacts: A look at the concepts of CDWA
Session Leader: Peter Organisciak
Archive: audio, slides
Description: The description of artwork and other material culture carries with it a unique set of challenges. CDWA, represented in XML with the CDWA-Lite schema, is one framework for classifying such artifacts. We will look at the features of CDWA and discuss the principles that inform it. Finally we may consider varying definitions of art itself and the implications that the act of classification brings.
Baca, Murtha. (Ed.) (2006) Cataloging cultural objects :a guide to describing cultural works and their images. Available as e-book from http://www.library.illinois.edu/
J. Paul Getty Trust. Categories for the Description of Works of Art. Available at http://www.getty.edu/research/conducting_research/standards/cdwa/index.html
- Wednesday, October 6 2010--Glimpses of future research practice: A musical study
Session Leader: David De Roure (Professor of e-Research, Oxford e-Research Centre, Oxford University )
Archive: audio, slides, emo
Description: Ten years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artifacts, both for 'big science' and the 'long tail scientist'. Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before? This talk will draw on examples in music information retrieval and linked data from the NEMA and SALAMI projects, together with glimpses of research from the myExperiment social website, to suggest we are now moving into the next (and very exciting!) phase of research practice.
- Wednesday, September 29 2010--Dispatches From the Field (Part 2)
Session Leaders: Liza Coburn, Aaron Collie, Tracy Popp, Lynn Yarmey
Description: GSLIS MS and CAS students will be presenting summaries of their data curation internship work. Each presentation will be followed by a brief roundtable discussion.
- Wednesday, September 22 2010--I Think Therefore I Am Someone Else: Understandingthe confusion of granularity with Continuant/Occurrent and related perspective shifts
Session Leader: Jim Myers
Description: Over the past few years, there has been a broad effort to define common requirements for provenance, to outline real-world use cases, to define core models of provenance, and to assess interoperability of existing systems. In these discussions, there has been recognition that there are a variety of levels of granularity and a variety of types of processes for which provenance is a critical enabler. Further, there has been a recognition that many use cases of interest require integration of provenance information across these dimensions. To a large extent, the issues involved in such integration has been viewed as simple matters of aggregation, i.e. requiring concepts such as "collections" of artifacts and composite processes. However, the need for constructs such as agents (as in the Open Provenance Model) hint at deeper issues related to the concepts of identity and distinctions between continuant and occurrent (or endurant and perdurant respectively), and of versions and replicas. This work develops a set of concrete examples where such issues arise in provenance, discusses the core conceptual distinctions involved, and postulates a basic mechanism for extending provenance models to enable integration across granularities and process types, recognizing the OPM "agent" concept as a special case.
Galton, A., Mizoguchi, R,: The water falls but the waterfall does not fall: New Perspectives on objects, processes, and events, Applied Ontology 4 71-107 (2009)
Grenon, P., Smith, B., "SNAP and SPAN: Towards Dynamic Spatial Ontology", Spatial Cognition & Computation: An Interdisciplinary Journal, Vol. 4, No. 1. (2004), pp. 69-104.
- Wednesday, September 15 2010--ERRT Planning Session
Session Leaders: Carole Palmer, Kevin Trainor
Description: Please bring your ideas about roundtable sessions that you would like included on the schedule. This includes completely new ideas, as well as previously suggested sessions that have not made it onto the schedule.
- Wednesday, September 1 2010--Dispatches From the Field (Part 1)
Session Leaders: Naomi Bloch, Ana Lucic, Trevor Munoz, Dana Muvceski, Gina Reis, Karen Wickett
Description: GSLIS MS, CAS, and PhD students will be presenting summaries of their data curation internship work and conference workshop participation. Each presentation will be followed by a brief roundtable discussion.
- Wednesday, April 28 2010--BibApp Project
Session Leader: Sarah Shreeves
- Wednesday, April 14 2010--The Claim Framework (Part 2)
Session Leader: Cathy Blake
Description: [see Part 1 description]
- Wednesday, March 10 2010--The Claim Framework (Part 1)
Session Leader: Cathy Blake
Description: Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.
Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles
- Wednesday, February 24 2010--Data is the network: Link or die
Session Leader: Joe Futrelle
Description: In a world dominated by social networking and wireless communication, most scientific information remains stubbornly locked up in specialized databases, repositories and domain-specific applications. New strategies are needed to free all of this information from the rigid containers, frameworks and work processes in which it is born and increasingly dies. Can data be organized as an active, evolving, open network of heterogeneous concerns and affordances, free of the control of any single software agent or framework? Joe Futrelle will describe promising new opportunities making data radically portable and worthy of long-term preservation and access, drawing on several projects in the "semantic grid," e-science and digital preservation communities.
RDF FAQ; Linked Data; Science Commons; Audio (.mp3); Slides
- Wednesday, January 27 2010--Joint Metadata Roundtable and E-Research Roundtable
Session Leaders: Oksana Zavalina; Kevin Trainor
Description: An informal discussion of our members' current research interests. This ERRT/MDRT joint planning meeting for will take place on Wednesday, January 20, from 12:30 to 2:00 pm in LIS341 (ISRL Fishbowl) on the third floor of GSLIS building.
- Wednesday, October 28 2009--Metadata for a web 2.0 software marketplace
Session Leaders: John Unsworth, Loretta Auvil
Description: The Mellon Foundation is interested in supporting the sharing of web services and academic software widgets, and they would like SEASR (the NCSA software environment for advancement of scholarly research) to be able to keep track of whose web services, software widgets, etc. are being used, by whom, in order that some system of professional credit and/or a system of exchange of value could be developed across the universities whose faculty and staff contribute to the system. This has a near-term practical possibility of implementation, as part of Project Bamboo (http://projectbamboo.org/).
- Wednesday, October 7 2009--NIF Resource Registry and Ontology
Session Leader: Anita Bandrowski (NIF, UCSD)
Description: The Neuroscience Information Framework (NIF) has a resource registry of over 2200 resources that include software tools, databases, atlases, services, teaching tools and other things that we deemed "interesting to neuroscientists". -- The main classes of metadata will be discussed including the data model and NIF's resource ontology, recently harmonized with the Biomedical Resource Ontology.
slides; audio; reading 1; reading 2; reading 3
- Wednesday, September 16 2009--Introduction to the Neuroscience Information Framework (NIF)
Session Leader: Anita Bandrowski (NIF, UCSD)
Description: NIF is a dynamic inventory of web-based neuroscience resources, data, and tools accessible via any computer connected to the Internet. An initiative of the NIH Blueprint for Neuroscience Research, NIF advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.
audio; slides; NIF website; NIF Federated Access Article
- Wednesday, July 1 2009--Using Pliny to Annotate Digital Resources
Session Leaders: Tim Cole (UIUC Library); Yan Wang (GSLIS student)
Description: For our first roundtable related to the new Open Annotation Collaboration Mellon-funded grant project, we will examine John Bradley's PLINY annotation tool. In particular we will discuss how and to what extent PLINY can be used to perform some of the scholarly functions described in Renear, Allen H.; DeRose, Steve J.; Mylonas, Elli; van Dam, Andries (1999) _An Outline for a Functional Taxonomy of Annotation_. Yan Wang and Tim Cole will lead the discussion which will include demonstrations of PLINY.
reading 1; reading 2; reading 3; reading 4 reading 5
- Wednesday, June 10 2009--Science and Sceptics: blogs, climate science and reproducible research
Session Leader: Dave Nichols
Description: The scientific consensus on global warming is well known. Less widely known is the sceptical online community that attacks diverse aspects of this consensus. Irrespective of the validity of their criticisms of the science, their activities involve many interesting aspects of knowledge work and public policy, including: reproducibility of research, policies of academic journals, citizen science, freedom of information and scientific work practices.
slides; audio; reading 1; reading 2
- Wednesday, May 27 2009--What Defines a Data Community?
Session Leaders: Carole Palmer; Melissa Cragin
- Wednesday, May 6 2009--Open Provenance Model
Session Leader: Jim Myers; Joe Futrelle
Description: The discussion will include a general overview of the technical scope of the Open Provenance Model, the international community and "Provenance Challenge" activities driving its development, and NCSA's provenance management technologies. While OPM has been driven primarily by scientific workflow interests, NCSA's interest is broader; the discussion will also include OPM's potential value in electronic notebooks/electronic records, community model validation and reference data development, 'active' curation, and long-term preservation.
audio; OPM definition; Provenance Challenge Wiki; Whitepaper
- Wednesday, April 29 2009--SEASR Analytics via Zotero
Session Leaders: Loretta Auvil; Michael Welge
- Wednesday, April 15 2009--NASA EOS Data Levels and Traditional Text Editing
Session Leader: Allen Renear