Technical Program: Symposium Summaries

Technical Program — Symposium Summaries

Global Initiatives in Research Data Management and Discovery

The topic of research data is a major feature of current conversations about science and scholarly communication. Governments and funders are promoting policies and mandates that aim to ensure that research data is properly managed and openly shared. Doing so aids in the validation and reproducibility of results and creates opportunities for reuse of data across disciplines, which, in turn, advances science. The principles and benefits of data sharing are easy to understand, but putting them into practice presents many challenges. These are not specific to chemistry and are shared by other disciplines across the globe.

The CINF symposium in San Diego "Global Initiatives in Research Data Management and Discovery" brought together thought leaders and practitioners from a variety of scientific domains who are engaged in tackling research data challenges on a global basis. This two-day symposium highlighted the experiences of scientists, international and national scientific data organizations, tool vendors, and service professionals. They brought their perspectives on issues of globalization, standards, community practice, technical infrastructure, and cultural shift. One general sentiment that emerged was that chemistry touches every area of science and initiatives in this arena can benefit and contribute to collaborations among a wide variety of stakeholders.

Global organizations and international initiatives represented included the Research Data Alliance; CODATA and the World Data System, both of which were established by the International Council for Science (ICSU); DataCite; FORCE11; and the International Union of Pure and Applied Chemistry (IUPAC). Attendees also heard about disciplinary initiatives, including NIF (neuroscience), the Resource Identification Initiative (biomedicine), and DataOne (earth sciences). Recurrent themes revolved around data publication, interdisciplinary interoperability, persistent identifiers, certifying trust, machine actionability, community engagement, cooperation among initiatives, training, and service support.

Examples of scientific community standards of practice in depositing and managing collections of chemistry related data featured a number of proven data systems, including PubChem, the National Institute of Standards and Technology Thermodynamics Research Centre (NIST/TRC), the Cambridge Crystallographic Data Centre (CCDC), the STRENDA and MIRAGE community standards developed by the Beilstein Institute, and the CODATA Nanomaterials project. An emerging theme was the expressed need for best practices for establishing and sustaining guidelines and for engagement with the research community around data reporting and citation: could this be a role for the global initiatives?

Perspectives on digitally managing scientific data and practically addressing some of these issues “where the rubber meets the road” were presented by a group of researchers, including Henry Rzepa, Simon Coles, Stuart Chalk, and John Kitchin; tool developers, including MestraLab and Dotmatics; and research service providers, including the the California Digital Library (CDL), the Royal Society of Chemistry (RSC), and the Pistoia Alliance. They emphasized development of capabilities and supporting services for data analysis before, during, and after experimental documentation, including downstream scientific use of the data. Such practices involve publishing ‘live’ data; supporting compilation and integration at scale; and supporting workflows that allow individual scientists to manage their data at capture, assembly, analysis, and publication, as well as subsequent organization, delivery, discovery, and functionality for reuse of the data by the broader community.

Many tractable solutions are emerging across this diverse scientific community. The challenges facing academia are fairly similar to those addressed by industry and government, and there is precedent for worldwide, crosssector collaboration on common issues. Is it time for an international chemistry data interest group to tackle the digital data challenges particular to chemistry standards and community practice and to unpack questions about which data and metadata can be contributed to the pre-competitive, collective chemistry data pool for the benefit of the global enterprise? Are there lower-barrier, higher-impact opportunities for improving usability of chemical data in both research and business, such as machine-actionable and extensible chemical identifiers and -accessible patent documentation? Do we need to change the chemistry research and reporting culture to enable researchers to share data, or align funder expectations and publishing workflows to scientific workflows? The newly formed RDA Chemistry Research Data Interest Group (CRDIG or DIG Chemistry) aims to connect and promote these conversations through widely spread symposia, networking, workshops, and other venues.

The "Global Initiatives in Research Data Management and Discovery" symposium established the context for the San Diego meeting’s program track on chemistry data, which included many other data-related symposia that covered the impact of funder policies on the research data and publication landscape, the challenges of incompatibilities in scientific data, opportunities for linking chemical and biological data, and the role of semantic technologies in enabling richer representation of chemistry data and knowledge. The topic of data management is at once timely and timeless for the field of chemistry, and the conversation is set to continue in several venues, including at the ACS national meeting in Philadelphia, PA, in August 2016, and during International Data Week in Denver, CO, in September 2016.

Ian Bruno Cambridge
Crystallographic Data Centre
ibruno@ccdc.cam.ac.uk
Leah McEwen
Cornell University
lrm1@cornell.edu

Chemistry, Data, and the Semantic Web: An Important Triple to Advance Science
(or Challenges and Opportunities in Chemical Knowledge Management)

I (Stuart Chalk) agreed to organize this symposium because it was exactly the area I am interested in, the representation of chemical knowledge using semantics and ontologies. I came up with the title on the spur of the moment and had not even thought about exactly whom I would invite. Luckily, I started talking with Stephen Boyer from IBM about the workshop he and Evan Bolton were planning for October 2015 in Basel, and it wasn’t long before Evan was on board as co-organizer. It turns out that Evan knows some people.

We invited over 50 speakers to participate in the symposium and in the end we had forty-three speakers present over three days in San Diego, organized into the six sessions below.

  • Tuesday morning: Chemical Classification
  • Tuesday afternoon: Chemical Information
  • Wednesday morning: Informatics Application
  • Wednesday afternoon: Knowledge Representation Evolution
  • Thursday morning: Informatics Evolution & Use
  • Thursday afternoon: Ontology Evolution & Use

Looking back on it now, I am still floored by how many experts in their respective areas contributed to the quality and rigor of the sessions, to the wonderful audience questions (during the sessions and at the breaks), and to the size of the audience. We had audiences of up to eighty for some sessions, and there were even forty on the last day (Thursday). Since the end of the conference, there have been many requests to do more symposia like this at future meetings, and we are certainly working on that.

It would impossible to report about each and every talk at the six different sessions that made up the symposia, so we will not try here. We are planning to write up an editorial with contributions from many of the speakers so there will be more to read about the perspectives of the presenters and the outcomes of the symposium, but it is appropriate to report the “pain points” that the speakers and the audience contributed. Many folk took photos of the pages we had up on the walls of the conference room (that is how important the issues were), so below is a cleaned-up list of the issues articulated.

  • Access Issues
    • accessing good data that's not published
    • data exchange standards, protocols, and best practices
    • data locked in text
    • extracting data from literature (time consuming)
    • interoperability (systems, data sets)
    • machine learning for data extraction
    • quality of search results
    • querying across resources (global search across heterogeneous data sets)
    • specificity versus generality
    • too many results affect search granularity (consequence of limited metadata?)
  • Audience Issues
    • answering complicated questions
    • context of user: perspective on data is important and impacts usage and needs
    • speaking the same language: chemistry is different than biology
    • targeting data and metadata to specific users
    • wide range of users with different backgrounds
  • Chemical Structure Representation Issues
    • dealing with large structures
    • different chemical entities in the same crystal
    • identifying parts of large structures (partial ids? InChI parts?)
    • organic compound stereochemistry
    • protein mapping issues
    • representation of inorganics
    • representation of stereo centers, and cis/trans isomers
    • structural representation of organometallics
    • symmetry
  • Community Issues
    • agreement on best practices for interoperability (of data, data systems, data formats)
    • approaches to normalization
    • community agreement on ... everything...
    • community standards change
    • community standards don’t change fast enough
    • incentives for generating metadata
    • policies for making systems open
    • where are the data and metadata resources?
  • Data Issues
    • bioactivity multiplexing
    • biological data different from chemical data
    • cannot interpret data
    • context of chemical data (Complete? Accurate? Usable?)
    • data formats not correctly used
    • data standardization
    • data without context
    • finding what we need to find
    • help with annotation
    • identifier mapping (across different identifier systems)
    • integrating legacy identifiers (important for historical data)
    • interoperability
    • make open data really open
    • metadata of experiments
    • normalization of data (crosswalking to standard format)
    • obvious data gaps
    • quality of data (trust in data)
    • scale of data; too many data (difficult to find what you need quickly and accurately)
    • share your data, please! (including “dark data” the data that “did not work”)
    • structural multiplexing (equivalent drug forms cause problems in analysis)
    • the data is always changing (better versioning needed?)
    • time dependent identifiers and classifications
    • what are best sources for specific pieces of data?
    • units: clean-up needed
    • units normalization important for data interoperability
  • Ontology/Vocabulary Issues
    • consistent mapping of terms
    • coverage of ontologies (where are the gaps?)
    • creating an ontology: coming to agreement on terms
    • don’t make terms too specific (start by creating more generic terms that can be broadly used)
    • finding the (domain) experts to develop ontologies
    • gaps in terminology
    • harmonization and reaching consensus (of terms)
    • how do we deal with ontology evolution? (new terms added, some terms deprecated)
    • linking chemical terminology to biological terminology
    • ontology convergence (where there are multiple ontologies in the same domain)
    • stability of ontologies (are they being actively managed?)
    • user friendly vocabularies
    • vocabulary clean-up (who does it?)
    • (we need to encourage scientists to) invent ontologies for areas where there are none
    • what ontologies are out there? knowing which ontologies to use (we need a list of ontologies and coverage)
    • who has authority/responsibility to coordinate ontology development? (IUPAC)
  • Tools to Help Data and Metadata Capture Issues
    • best systems? (for storing, accessing data)
    • better tools that help data curation (i.e., research data plus metadata)
    • consistency of identifiers (naming)
    • curation of lists difficult
    • effective GUI for data input
    • globally synchronized data (across multiple sites); availability to third parties
    • improvements needed in machine learning
    • limiting options for users so they make sensible choices (provide contextual enumerated lists)
    • no feedback on tools/usability
    • old code and legacy systems

To wrap up, if you missed the symposium, you missed an invigorating, stimulating, tour de force of talks addressing all the issues around data and how to glean knowledge from it. We truly believe this symposium, and what follows on from it, will be the catalyst for a lot of activity that will end up changing how we store and represent chemical data and knowledge. Bring it on!

Stuart Chalk
University of North Florida
schalk@unf.edu

From Data to Prediction: Applying Structural Knowledge in Drug Discovery & Development

This session brought together a mixture of presentations with a central theme: namely, the authors had a particular interest in using structural data from either 2D or 3D sources to tackle key problems in the drug design arena. The presenters were a mix of end-users of data and providers of data and applications.

3D Methods in Drug Discovery & Development

The session began with a presentation by Marcel Verdonk, who used PDB data to analyze the relative likelihood of atoms being involved in protein binding, given their degree of exposure or flexibility. He highlighted how such information could be useful in ligand design applications. The talk was interesting, very current, and challenged the audience to consider how such information could be used in future to produce better scoring functions.

Matthew Segall (Optibrium) then gave an exposition of how the StarDrop platform can be useful in drug design programs by providing a powerful system for visualizing and understanding data from both 2D and 3D sources, and Christian Lemmen (BioSolveIT) presented an analysis of how their platform, in particular, Hyde scoring, can aid and abet the interpretation of protein structures, specifically focusing on solvent molecules.

Matthias Rarey (University of Hamburg) presented recent work on ASCONA and SIENA, methods for protein binding site alignment and protein ensemble selection respectively. The methods were shown to be useful in tackling complex alignment and selection problems where current sequence-alignment-based methods generally fail.

Tobias Brinkjost, a third-year PhD student from Oliver Koch’s group at the Technical University in Dortmund bravely presented his PhD work using secondary structure information to facilitate searching for ligand sensing cores. His graph-based methods allow very fast comparisons of pockets, not driven by sequence, but based on secondary structure assignment. Tobias showed that his methods could predict cases of non-homologous cavity similarities.

On a different tack, Colin Groom of the Cambridge Crystallographic Data Centre (CCDC) presented the results of the recent CCDC Crystal Structure Prediction blind test and described how structural knowledge from the CSD can be used to aid and abet predictions of crystal structures, a field that still challenges computational chemists. He highlighted how this technology is coming of age and may now be becoming more relevant to the pharmaceutical industry, particularly in drug development.

2D Methods in Drug Discovery & Development

2D methods were illustrated by a number of speakers during the day. Susanne Stalford from Lhasa showed how collaborative efforts between pharmaceutical companies can help in risk assessments associated with mutagenic impurities. Valery Polyakov showed how better random forest algorithms could be applied to large QSAR data sets to derive more meaningful models. In particular, Valery and co-workers have carefully assessed ‘novelty’ to show the true worth of an underlying model, and have defined methods for picking the best training sets to improve the prospective outcomes of models generated. The results appeared promising; it was surprising (at least to the chair) how much change one could observe in the value of a model by removing seemingly small biases in the underlying training data.

Data Provisioning

Barry Bunin showed how the Collaborative Drug Discovery (CDD) vault was aiding collaborative efforts between research groups across the globe with a number of examples of projects. He explained how data sharing and data security are not necessarily mutually-exclusive.

We had an excellent presentation from Marian Brodney. She has been deeply involved at Pfizer with integrating the disparate data from many projects and many sources in a useful and coherent platform. This presentation really highlighted the challenges faced by large pharma to produce a coherent system that serves a large community with very disparate needs. It also highlighted the breadth of data sources that an information provider has to consider in a large organization.

Finally, despite some technical difficulties, Asta Gindulyte showed how we all could be making far better use of Google for finding chemically-relevant results. The presentation included a demo, where Asta created an on-thefly search engine to search PubMed in a more targeted way. The session chair will certainly be trying to create his own customized engines in the future!

Jason Cole
Cambridge Crystallographic Data Centre
cole@ccdc.cam.ac.uk

Driving Change: Impact of Funders on the Research Data and Publications Landscapes

In the last few years, there has been a renewed emphasis from funders worldwide on making research results more widely-available. In the United States, the Office of Science and Technology Policy (OSTP) released a memorandum in 2013 requiring federal agencies to develop public access plans to federally funded research. Likewise, the Research Councils UK (RCUK) has an open access (OA) policy for the publication of peerreviewed articles, and Canada recently introduced the Tri-Agency policy, which mandates OA to articles funded by the country’s major research agencies. Some private foundations, such as the Gates Foundation, also have OA policies for publications and data.

Roughly coinciding with the third anniversary of the OSTP memorandum, the ACS Division of Chemical Information held the symposium “Driving Change: Impact of Funders on the Research Data and Publications Landscapes” to explore how these new policies and requirements are shaping scholarly communications. Organized by Andrea Twiss-Brooks (University of Chicago), and Elsa Alvaro (Northwestern University), the symposium was part of the CINF program of the 251st ACS National Meeting, and took place on Tuesday, March 15, 2016. The goal of the symposium was to foster a conversation between the stakeholders involved in scholarly communications and to discuss the challenges and opportunities posed by these new policies and requirements.

The symposium opened with talks outlining the public access plans of several US federal agencies. Neil Thakur discussed NIH’s public access policy, which requires all peer-reviewed journal articles arising from NIH funds to be posted to PubMed Central (http://www.ncbi.nlm.nih.gov/pmc/). Dr. Thakur explained the different submission methods that currently exist and how to track compliance at the institutional level. Carly Robinson of the Department of Energy (DOE) explained how public access to DOE publications happens through Public Access Gateway for Energy & Science (PAGES, http://www.osti.gov/pages/). PAGES has centralized metadata, but it has decentralized distribution of full-text articles and manuscripts, relying instead on a partnership with publishers (CHORUS; http://www.chorusaccess.org/) or, in the cases where publishers do not provide public access, a link to the author-submitted, full-text, accepted manuscript. Dr. Robinson also talked about the digital research data management plans, and DOE Data ID Service (http://www.osti.gov/home/doe-data-id-service). Leah McEwen of Cornell University described NSF’s open data policy, and her participation in a series of workshops aimed at understanding the community’s views on public access to research data in the Mathematical and Physical Sciences (MPS) directorate.

These initial presentations on federal agencies’ requirements were followed by talks from different stakeholders, including librarians, publishers, researchers, and tool-makers. Sharon Kipphut-Smith of Rice University and Betty Rozum of Utah State University described an ongoing project aimed at understanding how libraries and research offices are supporting compliance with federal mandates. According to their results, institutions are leveraging existing resources, and there is a general recognition that campus collaboration is important. Ho Jung Yoo of UC San Diego described how the University of California is facilitating public access to scholarly publications and how the Research Data Curation Program (http://libraries.ucsd.edu/services/data-curation/) of UC San Diego Library is supporting public access to research data.

Judy Ruttenberg or the Association of Research Libraries (ARL) talked about SHARE (http://www.shareresearch.org/). SHARE’s mission is to build a free, open data set about research and scholarly activities across their life cycle. Ms. Ruttenberg gave an update on the progress of SHARE Notify, and discussed the progress of Phase II of SHARE, which involves expanding the number of providers, and enhancing the metadata. ARL’s partner for the development of SHARE is the Center for Open Science (COS). Sara Bowman of COS talked about the Open Science Framework (OSF; https://osf.io/), which is a Web application to manage the research lifecycle, including data archiving and dissemination. Dr. Bowman also gave an overview on the Transparency and Openness Promotion (TOP; https://osf.io/9f6gx/) Guidelines and their adoption by different journals.

Darla Henderson of ACS and Ann Gabriel of Elsevier represented the publishers’ perspective. Darla Henderson discussed ACS Open Access strategy (http://acsopenaccess.org/), which encompasses the OA journals ACS Central Science and the new ACS Omega, and the programs ACS Editors’ Choice, ACS AuthorChoice, and ACS Author Rewards. Ann Gabriel described Elsevier’s efforts to comply with the new mandates; their new content types, including data, software, methods articles, and more; and the importance of supporting new workflows (https://www.elsevier.com/about/open-science).

Jeremy Frey (http://www.southampton.ac.uk/chemistry/about/staff/jgf.page) of the University of Southampton addressed the challenges posed by the new landscape from the perspective of the UK researcher. He discussed how this new situation is placing new obligations on the researchers who are securing funding but also is opening up new opportunities.

Bringing the perspective of the data center and database, Amy Sarjeant explained how the Cambridge Crystallographic Data Centre (CCDC; http://www.ccdc.cam.ac.uk/) is supporting researchers who wish to comply with mandates. Dr. Sarjeant outlined the principles of data management, as well as its challenges, including maintaining quality, attribution, funding acknowledgments, and flexibility for the future.

Maryann Martone talked about FORCE 11 (https://www.force11.org/), which is a grassroots movement aimed at accelerating the pace of scholarly communications through technology, education, and community. The membership of FORCE11 includes all stakeholders of scholarly communication, including publishers, scholars, tool builders, and librarians.

Finally, the tool-makers’ perspective was represented by Dan Valen of Figshare (https://figshare.com/) and Kortney Capretta of Altmetric (https://www.altmetric.com/). Mr. Valen emphasized that Figshare is building tools that align with funders’ requirements to support publishers, researchers, and institutions storing and sharing research outputs. Ms. Capretta introduced the approaches that Altmetric is using to measure and to track the impact of research.

Elsa Alvaro
Northwestern University
elsa.alvaro@northwestern.edu

Andrea Twiss-Brooks
University of Chicago
atbrooks@uchicago.edu

Linking Big Data with Chemistry: Databases Connecting Genomics, Biological Pathways and Targets to Chemistry

This symposium was concerned with linking disparate sources of database information, primarily linking chemical molecule (drug or toxin) bioactivity and structural information with biological pathway (disease) information and genomic information. There is an increasing amount of genomic information and with growing interest in “personal medicine,” there is significant emphasis on correlating genomic information, disease-causing mutations and biological pathway information, with protein targets and drug development. In the past few years there has been significant growth in the number of databases connecting and correlating this type of information, including pathway and network-based processes, prediction of toxicity and bioassay information, and target linkage to disease. This symposium addressed these issues with a group of researchers representing the major database developers on the cutting edge of correlating this type of information: the Cambridge Crystallographic Data Centre, NCBI’s PubChem, Elsevier, EMBL-EBI-ChEMBL, IUPHAR/BPS, and others.

Ian Bruno from the Cambridge Crystallographic Data Center (CCDC) spoke regarding “Connecting 3D Chemical Data with Biological Information” and discussed the CRESTANO project – a Common REST API for Structural Annotation. This is a collaboration between the PDB (Protein Data Bank) and the CCDC to improve the smallmolecule information within the PDB. He also discussed the CCDC’s APIs and linking crystal structures to Pub- Chem’s data, as only 8% of the structures in the Cambridge Structural Database are currently present in Pub- Chem. The Cambridge Structural Database will also have links to ChemSpider and OpenPHACTS, which will help answer questions regarding the biological pathways with which an entry in the Cambridge Structure database is associated. There are now Biovia Pipeline Pilot components based on the Cambridge APIs.

The talk by YanLi Wang of NCBI, “PubChem BioAssay: Link Chemical Research to GenBank and Beyond,” discussed the chemical structure and connectivity information available within PubChem. She discussed data types, metadata, and linking the chemistry data in PubChem to genomics, to the NIH Molecular Library Small Molecule Chemical Probes, and to the biological pathways from KEGG and MedGene. Using NCBI’s tools, it is now possible to map BioAssay data to target structures. Assay targets can be aligned to structure, and it is possible to visualize the binding site and interactions between the ligand and protein structure. RNAi BioAssay records and Gene target IDs and kinase selectivity profiling assay are included. There are APIs available to link to CHEMBL ontology, genes, and IUPHAR and KEGG data. PubChem BioAssay data can also be linked to Entrez Gene to verify gene function with RNA data.

James Rinker (Wuxi Business Development) discussed “Predicting Adverse Drug Events Using Literature Based Pathway Analysis,” based on work that he had done while at Elsevier. Elsevier offers Pathway Studio and Reaxys as tools to provide mechanism-based information regarding drug toxicity and targets, including factors influencing adverse events. Pathway Studio (an Elsevier product) can be used for text mining. James presented an example of text mining for JAK kinase family inhibition with a biological target and failed compounds and adverse events. Reaxys can be used to understand target families and to build a network for a target or target class in pathway studies. The literature can be scanned for possible adverse events related to the target with these tools. James gave an example, scanning the literature for possible adverse events related to target modulation by JAK2.

Chris Southan of the International Union of Basic and Clinical Pharmacology (IUPHAR) discussed “Intersecting Different Databases to Define the Inner and Outer Limits of the Data Supported Druggable Proteome”, where he discussed the curated druggable databases sources: BindingDB, ChemEMBL, DrugBank, and the IUPHAR Guide to Pharmacology. UniProt chemistry cross-references and collates these curated sources with complementary selectivity. The IUPHAR/BPS Guide to Pharmacology has curated quantitative interactions between 1300 protein targets and 6000 ligands. The NIH launched the Illuminating the Druggable Genome (IDG) program and the private-public partnership to unlock the untargeted genome, which spearheaded advances in this work.

“Applications of Drug-Target Data in Translating Genomic Variation into Drug Discovery Opportunities” was the topic of a talk by Anne Gaulton (EMBL/EBI), which discussed linking ChEMBL druggability and drug-target data with results of genome-wide association studies to facilitate drug discovery repurposing. She discussed the application of using the combined information to identify non-active site pockets within drug targets, broadening their druggability, and using drug target information for the design of new genotyping arrays around the druggablegenome.

“How Can Genomic Databases Be Linked to Chemical Structural Information?” was the topic of the talk given by Rachelle Bienstock, in which she focused largely on a discussion of the Cancer Genome Atlas project and the NCI cancer genomics cloud pilot projects. Three project awards were given by NCI to develop cloud pilots: one to the Broad Institute, one to the Institute for Systems Biology (ISB), and one to Seven Bridges Genomics. The objective of these cloud projects will be to develop tools to navigate the genomic information and link the molecular basis of cancer to cancer target drug discovery and the cancer therapeutics response portal to correlate specific genomic characteristics of tumors with drug treatment outcomes.

Robin Haw (OICR, Canada) spoke about “Reactome Pathway Knowledgebase: Connecting Pathways, Networks and Disease” and presented a systems biology graphical notation, a pathway browser, and protein and chemical structures with external data linkages to ZINC and ChemEMBL. The reactome knowledgebase (http://www.reactome.org) is an open access, public domain bioinformatics resource with reaction and biological pathway information. This data set visualizes interactions of gene products and the application of bioinformatics tools to find patterns in genomic data sets. ReactomeFIVIz is a Cytoscape application which utilizes FI (functional interactions) to correlate pathway and network analysis to identify genes. An example of clustering and annotating The Cancer Genome Atlas Breast Cancer mutations was shown.

Huijun Wang of Merck discussed the “Competitive Intelligence Workbench,” a tool developed and used within Merck to integrate multiple sources of data through project dashboards. “Using Systems Biology in Computational Drug Design Workflows” was the topic of the presentation by George Nicola from Afecta Pharmaceuticals, which is a drug repurposing company. Afecta uses KNIME workflows to enumerate derivatives from patents and generates a combinatorial library of analogues and fingerprints to profile compounds and screen them within a virtual library.

In “Combining Semantic Triples across Domains to Identify New and Novel Relationships and Knowledge,” Michael Clark concluded the session by presenting some of the Elsevier Pathway Studio tools to link chemical compound data with biological pathway information.

In summary, the symposium provided state-of-the-art information regarding database searching and linking of diverse chemistry activity and structure information with bioactivity, biological pathway, and genomics information and the use of workflow tools and APIs to integrate and use the data effectively.

Rachelle J. Bienstock
rachelleb1@gmail.com

Reimagining Libraries as Innovation Centers: Enabling, Facilitating & Collaborating throughout the Research Life Cycle

This symposium on March 16, 2016, had an exciting variety of presentations focused on collaborative work between libraries and researchers. Some of the topics included new services in libraries, the future of libraries, scholarly communication trends, 3D printing, data visualization, researcher profiles, and chemical safety.

In the morning, Jeremy Garritano (University of Virginia; formerly University of Maryland) first presented “Expanding the Research Commons Model into Disciplinary Instances.” Jeremy discussed the ongoing development of learning and research commons from one small space to multiple-location spaces enhanced by virtual portals at the University of Maryland Library. The renaming of the physical space to Research Commons eliminated silos and facilitated collaborations among a variety of disciplines. The virtual presence not only provided integrated library services but also enabled campus partnerships with the Division of Research, IT, and the Library through a virtual portal, Integrated Research Resources on Campus (IRRoC). The challenges of establishing the research commons included no consensus on key terminology, such as research versus search, steep learning curves with grant-funded research for some collaborators, and communication obstacles across units.

Jeremy Frey (University of Southampton) discussed “Libraries for the Future: A Digital Economy Perspective.” Jeremy started with a comparison between the traditional view of the library and the current and future roles of librarians and libraries. Researchers turn to libraries for guidance in efficient and accurate access to information, in handling data, and in using information and data. However, libraries can no longer handle the scale required to be the sole quality control center for information. To address this issue, librarians should continue curating information, while teaching other people to curate information, as well. In addition, librarians can help establish amongst researchers a mindset of sharing research, starting from the very beginning of the research process. Libraries will continue to be the focal point of universities and provide connectivity to all.

Kiyomi Deards (University of Nebraska-Lincoln) presented “Leveraging the Interdisciplinarity of Chemistry: Building Interdisciplinary Collaborations.” Kiyomi introduced the SciPop Talks! outreach program and other efforts to create collaborations with Nebraska STEM literacy programs. Ensuring speaker-audience engagement and good publicities were key to the success of the science outreach program, SciPop Talks! She also demonstrated how librarians could be involved in state-wide initiatives to facilitate STEM literacy, such as organizing a statewide survey and retreats for supporters to create an opt-in STEM directory, hosting it on a sustainable Web site of a statewide, nonprofit organization.

Ye Li (University of Michigan, U-M) presented “Predicting Local Trends in Scholarly Communication for Decision-Making in Collection Development: An Exploration Beyond Citation Analysis.” Ye analyzed a large set of research articles (61,269) from U-M researchers working on chemistry-related topics as an aid to understanding the research community and developing collections. The bibliographic data were retrieved from the Michigan Expert system provided by Elsevier. The analysis revealed the top journals in which U-M chemists choose to publish, due to campus research focus and journal scales, and provided quantitative evidence of the broad distribution of chemistry-related research across campus. The network plot of publications and research themes could help identify units with which to consult for decision-making about a specific journal and to identify the best journals to recommend to local researchers working on a specific field. Ye also attempted to classify journals using time-series clustering in an attempt to predict those journals in which U-M chemists would tend to publish more in the future. The current research suggests increased publishing in PLoS ONE, as the growth of publications in this open access journal was extremely large in recent years.

Vincent Scalfani (University of Alabama) presented “Academic Technologies: A New Library Service to Offer Advanced Software Training.” Vin introduced the partnership between the library and other campus divisions in offering robust software training. All science librarians share the responsibility of delivering software trainings for tools, such as MS PowerPoint, MS Excel, QtiPlot, Adobe InDesign, ChemDraw, 3D Printing, MatLab etc. Vin took opportunities to include cheminformatics skills and promotion of library resources into the training, too, including preparing scripts in MatLab to allow interaction with chemistry databases through APIs and performing text analysis with MatLab and Mathematica. The major challenges for the library in hosting the software training were access, licensing, maintenance of software, impact measures, and time limitations.

Amy Sarjeant (Cambridge Crystallographic Data Centre) presented “Enhanced Chemical Understanding through 3D-printed Models.” Amy introduced how the partnership among CCDC, librarians, and faculty could help develop 3D-prints of molecules to facilitate research and learning. Amy showcased several 3D prints of mechanicallyinterlocked molecules, molecules with rotatable bonds, and molecules with flexible bonds, all of which could be helpful with real 3D models. Amy also discussed the challenges in converting the 3D models exported from the Cambridge Structural Database System into an appropriate format for a local 3D printer and how librarians could help in the process, aside from providing easy access to 3D printers.

In the afternoon, Danielle Bodrero Hoggan (The Scripps Research Institute) presented “Leveraging the VIVO Research Networking System to Facilitate Collaboration and Data Visualization.” VIVO is a Web platform research networking system initially developed at Cornell University that supports searching, browsing, and advanced visualizations. Danielle and her colleague, Michaeleen Trimarchi, used the VIVO networking system to create profiles for Scripps faculty members. Several interactive graphics were created, including a co-author network and a network of scientific discipline areas. The Science of Science (Sci2) Tool was used to plot the VIVO data into networked visualizations that showed faculty-department collaborative connections.

Grace Baysinger (Stanford University) presented on “Stanford Profiles Created to Support the University’s Scholarly Community.” Stanford Profiles is a Web platform that allows faculty, students, staff, and post-docs to create personal profiles highlighting their research, teaching, and publications. The system auto-populates names, appointments, courses, advisor names, and publication citations. Users are able to search Stanford Profiles via name and keyword queries for research area, thus making communication and collaboration easier. Stanford Profiles also enables users to easily generate a CV, and Stanford University developers to access the data via an API.

Linda Galloway (formerly of Syracuse University; now at Chapman University) presented on “Managing Researchers’ Reputations throughout the Research Life Cycle.” Linda discussed the changing landscape of scholarly communications and how discovery of literature is changing (e.g., discovery via social media). Social networks such as Academia, LinkedIn, Mendeley, and ORCID are becoming very popular with researchers, as they allow control over shared materials and help to promote scholarship, but Linda discussed some drawbacks to social media platforms: users must monitor them frequently, be aware of copyright policies, and understand their metrics. Finally, best practices for managing social media profiles and accounts were highlighted.

Leah McEwen (Cornell University) presented on “Anatomy of the Chemistry Research Enterprise in the Academic Sector: Serving the Underserved in a Large Research Institution.” Leah talked about creating a digital data culture. Interestingly, much of the chemistry data is still typed manually and organized by researchers (e.g., NMR shifts, mass spectral data). Workflows and systems should make this process easier and more efficient for researchers. How can service groups (e.g. instrument labs, health and safety, and libraries) support researchers with digital management of data? Different levels of chemistry data management were discussed, including highly- curated post-publication data, at-publication data, and pre-publication laboratory data. Lastly, several case studies were discussed highlighting potential methods of supporting chemistry data management. The future will likely need to focus on methods that improve accuracy of data collection, verification of data, streamlined report generation, and increased communication.

Ralph Stuart (Keene State College) presented “The Safety Use Case for Chemical Safety Information.” Limitations in finding chemical information were discussed, including the use of generic safety information in textbooks followed by the notation, “see MSDS,” outdated links on Wikipedia, and no clear description of the rationale for choosing a particular source. Ralph discussed a project on which he is working, comparing the safety information in Wikipedia to PubChem using the following methodology: locating chemicals in the PubChem Laboratory Chemical Safety Summary (LCSS), converting the PubChem CIDs into InChI keys, and using these InChI keys to compare the LCSS to Wikipedia data. Future directions include studying the Wikipedia chembox structure, developing a Wikipedia-PubChem link for safety information, and evaluating which chemical safety data should be included within the Wikipedia chemboxes.

Yanli Wang (NCBI, NLM, NIH) presented “PubChem BioAssay: Grow with the Community.” Yanli discussed the current PubChem BioAssay tools and how PubChem has grown in depositions since 2005. Usage of the PubChem database and citations to PubChem also continue to grow. Next, Yanli discussed the benefits of data sharing, such as increased discoverability, value-added integration, data management, and building collaborations. Finally, the PubChem data submission tool was discussed, as well as the increasing number of journal publishers requiring deposition of data into PubChem.

Ye Li
University of Michigan
liye@umich.edu
Vin Scalfani
University of Alabama
vfscalfani@ua.edu

Future Thematic Programming at ACS National Meetings

  • 252nd, August 21-25, 2016, Philadelphia, PA: Chemistry of the People, by the People and for the People Rudy Baum, r_baum@acs.org
  • 253rd, April 2-6, 2017, San Francisco, CA: Advanced Materials, Technologies, Systems and Processes Kathryn Beers, kathryn.beers@nist.gov
  • 254th, August 20-24, 2017, Washington, DC: Chemistry’s Impact on the Global Economy Nancy Jackson, nbjacks@sandia.gov
  • 255th, March 2018, New Orleans, LA: The Food, Energy, Water Nexus
  • 256th, August 2018, San Diego, CA: Nanotechnology
  • 257th, March 31-Apr. 4 2019 Orlando, FL: Chemistry for New Frontiers
  • 258th August 25-29, 2019, San Diego, CA: Chemistry of Water
  • 259th, March 2020, San Francisco, CA: TBD
  • 260th, August 2020, Philadelphia, PA: Chemistry from Bench to Market