Vol. 62, No. 3: Fall, 2010

Chemical Information Bulletin

A Publication of the Division of Chemical Information of the ACS

Volume 62 No. 3 Fall 2010

Image

Boston's Battery Wharf

ISSN: 0364-1910
Chemical Information Bulletin,
©Copyright 2010 by the Division of Chemical Information of the American Chemical Society.

Message from the Chair

ImageDear Colleagues,

The Fall 2010 ACS National Meeting is just around the corner. In this issue you will find all the information you need to organize your schedule in Boston. Early Saturday morning we begin our Division business meetings with long range planning followed by committee and executive meetings. On Sunday the CINF and COMP Divisions invite you to the Joint CINF/COMP Welcoming Reception and CINF Scholarship for Scientific Excellence Posters. ACS Publications is generously sponsoring the event, in recognition of the 50th anniversary of the Journal of Chemical Information and Modeling.

The Program Committee has assembled a terrific technical program for us. We will be awarding a Best Presentation award once again at this meeting, thanks to an ACS Innovation Grant. Papers presented at the Data Intensive Drug Discovery session, organized by John Van Drie on Sunday afternoon will be eligible, and the author of the winning paper will receive the award at the CINF Luncheon on Tuesday. We also are trying an experiment on Monday afternoon. The CINFlash session offers speakers a chance to present truly new research and results.

Tuesday will be dedicated to the Herman Skolnik awardee, our esteemed colleague Professor Anton Hopfinger. The all-day session is entitled “The Marriage, or at least Dating, of Molecular Simulation and Modeling with QSAR Analysis.” The celebration will be capped off with the Evening Reception at the Seaport Hotel.

Make sure you reserve your seat at the Tuesday CINF Luncheon (order your tickets when you register, or see me at the meeting). We have lined up the journalist and author Mike Capuzzo as our luncheon speaker. His first book, Close to Shore, was a non-fictional account of the first US shark attacks off the shores of New Jersey. He will be speaking about his latest book, The Murder Room: The Heirs of Sherlock Holmes Gather to Solve the World's Most Perplexing Cold Cases, due out August 10th.

In other Division news, I am pleased to report that CINF was granted a $5,000 innovation grant to work on web design of the eCIB (thanks to Bill Town for preparing the submission). This will help us as we consider the redesign of the entire CINF web presence. I can also report that we have submitted a grant proposal to support remote attendance/participation for our programming. To be more specific, we are requesting funds to allow live blogging and tweeting during our lightning sessions. Find out more about our current and future divisional activities by joining the CINF business meetings on Saturday.

I see that 37 CINF Division members have joined the CINF group on the ACS Network. I encourage all of you to sign up before the meeting, so that we can use the network as our primary division communication mechanism as well as reduce our web infrastructure costs. All committee reports will be posted there in the Documents area in the next few weeks.

See you in Boston!

Carmen Nitsche
Chair, ACS Division of Chemical Information

Letter from the Editor

ImageThis issue of the Chemical Information Bulletin (CIB) is the third one we are publishing online. The first (Spring) issue was posted on the CINF web page (www.acscinf.org) before the ACS Spring National Meeting. It contained the technical program, the abstracts, awards information, book reviews, and three interviews. The second (Summer) issue, which is the equivalent of the former CINF eNews newsletter, was edited by Svetlana Korolev and contained information about the ACS Spring National Meeting.

As you will notice, CIB is still in a period of transition from print to online format — the functionality of the product is not what we are used to seeing from the commercial publishers. I hope that for the next issues the process will become smoother and easier.

Here in this issue you will find the usual categories — a Message from the Chair, Awards and Scholarships, Book Reviews, Technical and Social programs, Abstracts, and a list of the CINF functionaries. Rajarshi Guha, chair of Technical Program Committee highlights the technical program for the upcoming ACS Fall National Meeting (p. 11). Maryadele O’Neil, editor-in-chief of The Merck Index, discusses in an interview (p. 5) the history of this famous resource and what it takes to publish each new edition. The two book reviews (p. 23) were written by Bob Buntrock: (1) Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics, by Nicola De Bellis and (2) Patents for Chemicals, Pharmaceuticals, and Biotechnology, by Grubb & Thomsen. On p. 4 you will be able to see who the CINF sponsors are. As always, we highly appreciate their financial support for the Division.

In the near future, the CINF Executive Committee will be looking for my replacement, and those who are interested and believe that they are qualified for this job should consider this exciting opportunity. Being the editor of CIB for five years has allowed me to become acquainted with many interesting people. The editor’s position is a voluntary one, and I have been greatly helped by many colleagues from the Division who submitted materials and proofread the issues. The journal is now published online, and it will be much easier for the next editor to put together the issues — there will be no strict deadlines, as there will be no need to send it to the printer.

I hope you will find this issue useful and enjoyable.

Svetla Baykoucheva , Editor

CINF Sponsors

The American Chemical Society Division of Chemical Information (CINF) is very fortunate to receive generous financial support from our sponsors to maintain the high quality of the Division’s programming and to promote communication between members at social functions at the ACS Fall 2010 National Meeting in Boston, and to support other divisional activities during the year, including scholarships to graduate students in Chemical Information.

The Division gratefully acknowledges contribution from the following sponsors:

Platinum

ACS Publications
FIZ CHEMIE Berlin

Gold

Elsevier\Reaxys®
Procter & Gamble

Silver Bio-Rad Laboratories
Bronze

ACS Corporation Associates
CambridgeSoft
InfoChem
RSC Publishing
Thieme Publishers

Opportunities are available to sponsor Division of Chemical Information events, speakers, and material. Our sponsors are acknowledged on the CINF web site, in the Chemical Information Bulletin, on printed meeting materials, and at any events for which we use your contribution.

If you would like more information about supporting the CINF, please feel free to contact:

The ACS CINF Division is a non-profit tax-exempt organization with taxpayer ID no. 52-6054220.

Interested in becoming a CINF Sponsor?

Interviews

The Merck Index, an Encyclopedia of Chemicals and Natural Products:
Interview with Maryadele O’Neil

By Svetla Baykoucheva

ImageMaryadele O’Neil is the Senior Editor of The Merck Index and the Director of Scientific Nomenclature Services. She obtained her Master’s of Library Science from Rutgers University and her BA in Bacteriology (minor in Chemistry) from Douglass College at the same university. After working as a lab-bench researcher for several years, she became involved with The Merck Index. Now she directs all aspects of researching, writing, and publishing of this famous book and is responsible for its content and editorial style, as well as for the authorized product nomenclature and the development of official non-proprietary names for all Merck products.

Ms O’Neil has founded the Women in Chemistry Scholarship Program to encourage women to pursue PhDs in medicinal or synthetic organic chemistry. This program has provided $5,000 scholarships, plus travel stipends for winners, to present their research at American Chemical Society national meetings. She has started different initiatives to provide networking opportunities for scholarship winners with their colleagues at the ACS.

SB: The Merck Index has been an icon for chemists for decades, and it has served them very well. Could you tell our readers a little bit of its history and the people who have contributed to it?

MO: The first edition, known as Merck’s Index, was published in 1889 by the German chemical company, E. Merck, in order to communicate with customers in the United States. The company’s American subsidiary was established 2 years later as Merck & Company. While the first edition was little more than a sales catalog, the second edition, published as Merck’s 1896 Index, grew in scope to include important medicines from the US Pharmacopeia and the National Formulary as well as common laboratory and manufacturing chemicals. Already, The Index was more of a reference handbook than a price list and included physical properties and medicinal uses for the compounds. Because of the popular demand, a Third Edition was published in 1907. Chemists were now an important part of the readership, and additional physical properties and line formulae were added for their use.

Ties between the German and American companies were severed by the impact of World War I, but Merck & Co. continued to publish The Merck Index. An ever-growing number of compounds would be added in succeeding editions to increase the utility of the information for researchers. Still today, The Index is published on a not-for-profit basis as a service to the scientific community.

The material in The Index has been written over the last 120+ years by generations of Merck scientists. No editors were specifically named until Paul G. Stecher was appointed as such for the Seventh Edition in 1960. Martha Windholz, also a Merck chemist, was named Editor for the Ninth Edition, followed by Susan Budavari, from whom I took over in 1999.

The scientists of Merck Research Laboratories have always been very generous with their expertise and advice, and I am most grateful to them for their continued support.

SB: As the editor of The Merck Index, what do your responsibilities include?

MO: Most importantly, I author a significant amount of the material in the monograph section and tables. I am responsible for the selection of new compounds to be included as well as monographs that are retired from each edition. It is very difficult to select the compounds that will no longer appear in print, but it is necessary to maintain the single volume size. The retired monographs are still available in the online editions, although the information is no longer updated.

If you look through previous print editions of The Index, you will notice that the contents are really an encapsulation of what was happening in science during a particular period of time. I so much enjoy reading and scanning journals and news articles for the latest developments, and I have been privileged to read and write about a number of significant scientific advancements.

ImagePerhaps the most enjoyable aspect of my role as editor is the opportunity to interact directly with those who use The Index in their daily work. Their specialties span a wide range of scientific disciplines and include researchers in the lab, emergency first responders, information professionals, educators, and, of course, students. It is most gratifying to speak with these individuals to learn how The Index has contributed to their ability to do their work. Many of them have great suggestions for us that they are willing to share. We try to incorporate their ideas into our workflow to continually improve both the information we provide and the way we deliver it. I and some of my team regularly attend the ACS National Meetings and would welcome everyone to stop by at our booth in the exhibition center.

SB: With the many new resources available online, what is the niche that The Merck Index occupies now and which is its audience today? What is unique about it? Why would people use it rather than go to other resources?

MO: There is an overwhelming amount of information available now through the internet and some of it is very good. However, it is often difficult to find authoritative information while weeding through thousands of leads in your search engine of choice. Scientists have come to rely on the accuracy of the physical properties and chemical structures presented in The Index. It’s also viewed as a key to the literature — a first stop for an overview of a compound’s use or its significance.

The Merck Index centralizes a lot of information that is critical to many different research disciplines. Approximately half of the monographs cover human and veterinary drugs, traditional and herbal medicines, and diagnostic aids. The remainder includes standard lab reagents, agricultural and commercial chemicals, plants and natural products, and compounds of environmental significance. We try to select the most important compounds across this broad sweep of science to retain the ready reference characteristic.

In our user survey conducted last year, lab scientists surprisingly told us that they still prefer the printed handbook to keep nearby them at the bench. The encyclopedic organization of the data makes it easy for them to look up a particular compound and find all of the information together in a single monograph.

SB: What is the publication process for the manual? How are new data added and how are these data verified?

MO: All of the data included in The Index come from published sources, usually peer-reviewed journal articles that are cited within the monographs. We evaluate multiple sources before selecting the best values to report. If it is not possible to choose between varying data, both values are reported. Citations are also given so that our users may read the experimental details on their own. The online versions now include digital object identifiers (DOIs) and PubMed IDs for many of the cited articles to make it even easier to review the original sources.

After a new compound has been selected for inclusion, the writer of the monograph performs a search of the scientific literature to immerse him or herself in the topic. Chemical names, trademarks, and generic names are gathered and verified as well as registry numbers and drug codes, if applicable. The chemical structure is drawn to TMI specifications and physical properties are gleaned as described above. Finally, literature citations, such as syntheses and analytical methods, are selected based on the information they provide and their applicability to our end users’ needs.

As early as 1984, The Index has been available as an online database. We knew very early on that a publication cycle of 5-6 years had become too long for many of our readers to wait for new material. The online versions are updated twice per year with new and revised monographs. To accommodate this rapid cycle time, our manuscript is always publication-ready. Each monograph is written or updated and released into the manuscript as a completed entity. New monographs are appended in the order of completion, which is why they are not in alphabetical order in the electronic editions. Database extracts are prepared by our technical staff and exported to our online partners to be processed and published in their format.

Publishing the print edition is somewhat more complicated. After the retired material is removed, monographs are re-alphabetized and re-numbered. The various indices for the printed edition are automatically created by the attributes assigned to specific data elements, such as registry number, formula, and synonyms. Extracts for each section and index are prepared and sent to the typesetter to produce the pages according to our specifications. Once the page proofs are reviewed to ensure accuracy, the work of the editorial staff is essentially done. Our good colleagues in the Merck Publishing Group take over and handle the printing and production. They also are responsible for the sales, marketing, and distribution of the new books.

SB: Who are the people who are involved with The Merck Index now?

MO: You may be surprised that the editorial staff of The Index is quite small. There are only 4 editors who write original content, each with a specific area of expertise. For example, Dr. Peter Dobbelaar is an experienced synthetic chemist who prepares the monographs on chemical reagents and is responsible for the Organic Name Reactions (ONR) section. We are fortunate to be able to call upon the expertise of chemists from outside of Merck as well. For the upcoming 15th edition, Dr. David MacMillan has been of great assistance in reviewing the ONRs. In previous editions, we have been privileged to work with Dr. David Evans and Dr. Barry Trost.

Senior Associate Editor Patricia Heckelman has an advanced degree in toxicology. In addition to writing, she is responsible for all of the operational activities, manages the interactions with the typesetter and co-publishers, and ensures the accuracy of each of the electronic editions.

Assistant Editor Kristin Roman utilizes her Masters Degree in Biotechnology to author new material pertaining to this field. Rounding out the team, we have excellent support from the senior editorial assistants, Catherine Kenny and Edwin Enraca, and the technical specialist, Linda Karaffa. We also have assistance from Merck’s nomenclature expert, Margaret Hill, who is always on-call to answer naming and structure questions.

The Merck Index is part of a family of publications that include The Merck Manual of Diagnosis and Therapy, The Merck Veterinary Manual, and the Home Health and Pet Health editions. As publisher, Gary Zelko brings his years of experience in print publications to oversee the production and marketing of the entire set of handbooks. I would be remiss if I did not also mention my manager, Matthew Cahill, who also brings valuable expertise from the publishing world.

SB: You have been passionate about creating opportunities for women to get involved and advance in science. What are the main issues women scientists are facing now?

MO: I do believe that it is easier today for women to pursue a career in science than it was 30 or 40 years ago. While the number of women chemists at the master’s level continues to grow, the number of female PhD chemists is dropping precipitously. The choice to continue past the master’s level is a difficult one for women who continue to worry about balancing their work with a family. The availability of day care options is certainly more prevalent now, but it is still difficult for young mothers to cope with long hours in the laboratory. Experienced, successful women have a responsibility to mentor and encourage young talent. I have had the opportunity to speak with many young women as they are beginning their advanced studies. Overwhelmingly, they tell me that the opportunity to network with their female colleagues is invaluable.

SB: Could you tell us what your background and professional and personal interests are?

MO: My undergraduate degree is from Douglass College, which is part of Rutgers University and sister to the then all-male Rutgers College. While I had planned to pursue a career in the medical field, my professors convinced me that I had an aptitude for chemistry and I found myself spending more and more time in the chemistry building. I still maintained my love of the biological sciences and ended up with a degree in Microbiology.

My first position at Merck was as a research assistant in the laboratories, working in the immunology department. There was an opening on The Index staff after the 10th Edition was published, and I thought that it would be the ideal job for someone like me who loved to read and write science. After joining the staff, I returned to Rutgers University part-time to pursue my Master’s Degree in Information Management, with two small children in tow.

The decision to come to The Index was absolutely the right one for me. I have enjoyed my many years on the staff and have continued to learn something new every day.

Not surprisingly, I do have a passion for reading, and enjoy non-fiction as well as novels of many different genres. One of my favorite hobbies is working in my perennial garden which is probably why I enjoy writing material on natural products and traditional medicines for The Index. I am an avid baseball fan and have been able to attend baseball games in the wonderful stadiums throughout the US, whenever my travel coincided with the home team’s schedule.

SB: What is your relationship with the ACS?

MO: I have personally been a member of ACS and the Chemical Information Division for many years and regularly attend the ACS national meetings. When not at the booth in the exhibit hall, you will more than likely find me attending one or more of the CINF sessions. I have also enjoyed being associated with the Women Chemists’ Committee and the great work they are doing to advance the careers of women in science.

The Merck Index has had a long partnership with the ACS, particularly in support of various educational initiatives. Last year, in conjunction with National Chemistry Week, Merck donated 12,000 copies of The Index to local ACS sections to distribute to high school students and teachers across the country. I had the privilege of attending a science fair at Ballou High School in Washington DC with ACS President Tom Lane when the first books were distributed. The students and their teachers were most inspiring and I was delighted to be able to meet them and present them with copies of The Merck Index for use in their classrooms.

SB: Why is the online version of The Merck Index accessible through different platforms rather than having it available like other similar resources directly from a web page?

MO: The editorial staff prepares all of the material and maintains our content management system, but we do not have any search software with which to produce an online edition ourselves. Because of the richness of our indexing and the granularity of the data, The Merck Index Online is much more than a text-searchable eBook. Our co-publishers process our data and apply their specific tools to enhance the searchability of the material.

The Merck Index has a diverse user base ranging from healthcare professionals, to engineers, to bench researchers. It is important to us that our end users are able to easily access the information, regardless of their discipline. The various platforms have been strategically selected based on the way the data are presented and/or paired with other resources to meet the needs of their specific target audience.

SB: As science is becoming more and more interdisciplinary, how do you see the evolution of The Merck Index?

MO: In the preface to the Tenth Edition (1983), Martha Windholz described her most important challenge as being able to “effectively report major developments at the forefront of the life sciences and to reflect the complex and inextricable interdependence of chemistry, biology, and medicine.” This began the incorporation of biologics into The Index to provide research chemists with access to a broader scope of information than they previously needed.

Speaking selfishly, I was quite happy with this decision, since I was purposefully hired to bring my blended perspective of biology and chemistry to the staff. There is always the temptation to stray too far from our true niche, and we must not forget that our readers have relied on The Index as a key core reference for organic chemistry.

Future editors of The Merck Index must balance the need for this interdisciplinary information without abandoning the original purpose set forth in the First Edition to provide a summary of whatever chemical products are today adjudged as being useful in either medicine or technology.

SB: Thank you, Maryadele, for giving this interesting interview. Our readers will have a better idea now how this “icon” of chemical information is being published.

National Meetings

Technical Program Highlights

ACS Chemical Information Division (CINF)
Fall 2010 ACS National Meeting
Boston, MA (August 22-26)

Technical Program Highlights

The Fall ACS National meeting is nearly upon us and we are looking forward to attending an exciting line up of symposia and talks. As in recent meetings, we’ve got a packed program covering a diverse set of topics ranging from the Semantic Web to structure-activity relationships.

Sunday starts off with a symposium on the use of the Semantic Web in chemistry, organized by Egon Willighagen and Martin Braendle. This is the first session of the three-session symposium that extends to Monday. Topics being covered range from tools & technologies for enabling the use of semantic concepts with chemical data, to actual chemical and biological applications of these concepts. In parallel, Sunday also is the day for the joint CINF-SLA symposium that will address how collections and information resources can be assessed — quantitatively and qualitatively. On Sunday afternoon we will also have the third session of the Best Presentation Award Symposium, organized by John Van Drie. It will focus on methods and approaches to handling the problems associated with the data deluge in drug discovery settings. The winner will receive an invitation to the CINF Luncheon, a plaque, as well as $1,000 towards registration and expenses.

On Monday, we continue with the Semantic Web symposium and concurrently will have the first session of the symposium celebrating 50 years of the Journal of Chemical Information. This is being organized by Prof. William Jorgensen and has a lineup of chemical information luminaries who will be describing some of the pioneering work that was published in the journal.

On Monday afternoon we will have a brand new symposium — CINFlash. This is an experimental symposium where we’ve tried to have people speak about recent work by allowing them to bypass the ACS abstract system. In addition, each talk is going to be strictly timed (with the help of a loud horn) for six minutes. So, you should expect some fun stuff! Tuesday is a day devoted to the Herman Skolnik Symposium, honoring Prof. Anton Hopfinger. This whole-day event features talks from his past students and collaborators covering a range of topics in molecular modeling. The symposium will be followed by the Skolnik Reception.

On Wednesday, we will have a symposium to discuss recent developments in the structure-activity landscape (SAL) concept. Organized by Jurgen Bajorth, Gerry Maggiora and Mic Lajiness, it will cover topics ranging from novel descriptions of landscapes to new applications of the concept in molecular modeling. Wednesday also sees a symposium address recent developments in chemical structure representation, which is organized and run by Richard Apodaca; it will be covering new tools and applications that address traditional and novel chemical structure representations.

I’m quite excited with the upcoming program and at the same time I am grateful for the contributions from the Program Committee members and symposium organizers.

Thanks to everybody and I look forward to meeting you in Boston.

Rajarshi Guha
Chair, Technical Programming

Technical Program #240

CINF Symposia

ACS Chemical Information Division (CINF)
Fall, 2010 ACS National Meeting
Boston, MA

R. Guha, Program Chair

Program also available in PDF format.

SUNDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

Semantic Web in Chemistry - Cosponsored by COMP
E. Willighagen, Organizer, Presiding
M. Braendle, Organizer
8:45   Introductory Remarks.
8:55 1 Semantic envelopment of cheminformatics resources with SADI.
L. L. Chepelev, E. Willighagen, M. Dumontier Abstract
9:40 2 RESTful RDF web services for predictive toxicology.
N. Jeliazkova Abstract Presentation (pdf)
10:10   Intermission.
10:25 3 Linking the resource description framework to cheminformatics and proteochemometrics.
E. L. Willighagen, J. E. Wikberg Abstract Presentation (pdf)
10:55 4 Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience infrastructure.
A. Paschke, S. Heineke Abstract
11:25 5 Use of semantic web services to access small molecule ligand database.
A. P. Tamhankar, A. S. Ausekar Abstract

Section B
Boston Convention & Exhibition Center
155

Assessing Collections & Information Resources in Science & Technology - Cosponsored by SLA
E. Kajosalo, Organizer, Presiding
9:00   Introductory Remarks.
9:05 6 Usage metrics: Tools for evaluating science monograph collections.
M. M. Foss, V. Kisling, S. Haas Abstract Presentation (pdf)
9:30 7 Happily ever after or not: E-book collection usage analysis and assessment at USC Library.
N. Xiao Abstract Presentation (pdf)
9:55 8 From Chemical Abstracts to SciFinder: Transitioning to SciFinder and assessing customer usage.
S. Makar, S. Bruss Abstract Presentation (pdf)
10:20   Intermission
10:35 9 Using Web of Knowledge to identify publishing and citation patterns of campus researchers at the University of Arkansas.
L. Salisbury, J. S. Smith Abstract Presentation (pdf)
11:00 10 Don't forget the qualitative: Including focus groups in the collection assessment process.
S. Shepherd, T. M. Vogel Abstract Presentation (pdf)

SUNDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

Data-intensive Drug Design - Cosponsored by COMP
J. Van Drie, Organizer, Presiding
1:45   Introductory Remarks.
1:50 11 Strategies for the identification and generation of informative compound sets.
M. S. Lajiness Abstract
2:15 12 Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery.
C. Steinbeck Abstract
2:40 13 Decision making in the face of complicated drug discovery data using the Novartis system for virtual medicinal chemistry (FOCUS).
D. Chin Abstract
3:05 14 Integrating chemical and biological data: Insights from 10 years of VERDI.
S. Roberts, W. P. Walters, R. McLoughlin, P. Gabriel, J. Willis, T. Kramer Abstract
3:30   Intermission.
3:45 15 Collaborative database and computational models for tuberculosis drug discovery decision making.
S. Ekins, J. Bradford, K. Dole, A. Spektor, K. Gregory, D. Blondeau, M. Hohman, B. A. Bunin Abstract
4:10 16 Data drive life sciences: The Pyramids meet the Tower of Babel.
R. Guha Abstract
4:35 17 Design principles for diversity-oriented synthesis: Facilitating downstream discovery with upfront design.
L. Marcaurelle Abstract
5:00 18 Overview: Data-intensive drug design.
J. H. Van Drie Abstract Presentation (pdf)


Section B
Boston Convention & Exhibition Center
155

Assessing Collections & Information Resources in Science & Technology - Cosponsored by SLA
E. Kajosalo, Organizer, Presiding
2:00 19 Data-driven development: How ACS Publications uses data to enhance products and services, and respond to customer needs.
M. Blaney, S. Rouhi Abstract Presentation (pdf)
2:25 20 Objective collections evaluation using statistics at the MIT Libraries.
M. Willmott, E. Kajosalo Abstract Presentation (pdf)
2:50 21 Getting the biggest bang for your buck: Methods and strategies for managing journal collections.>
G. Baysinger Abstract Presentation (pdf)
3:15 22 Taking a collection down to its elements: Using various assessment techniques to revitalize a library.
L. Solla Abstract Presentation (pdf)
3:40   Panel discussion

SUNDAY EVENING

2010 CINF Scholarship for Scientific Excellence - Financially supported by FIZ Chemie Berlin
G. Grethe, Organizer
6:30 - 9:30
  23 Predicting specific inhibition of cyclophilins A and B using docking, growing, and free energy perturbation calculations.
S. V. Sambasivarao, O. Acevedo Abstract
  24 Using aggregative web services for drug discovery.
Q. Zhu, M. S. Lajiness, D. J. Wild Abstract
  25 Semantifying polymer science using ontologies.
E. O. Cannon, A. Nico, P. Murray-Rust Abstract
  26 Toxicity reference database (ToxRefDB) to develop predictive toxicity models and prioritize compounds for future toxicity testing.
H. Tang, H. Zhu, L. Zhang, A. Sedykh, A. Richard, I. Rusyn, A. Tropsha Abstract
  27 OrbDB: A database of molecular orbital interactions.
M. A. Kayala, C. A. Azencott, J. H. Chen, P. F. Baldi Abstract
  28 Novel approach to drug discovery integrating chemogenomics and QSAR modeling: Applications to anti-Alzheimer's agents.
R. Hajjo, S. Wang, B. L. Roth, A. Tropsha Abstract
  29 Cheminformatics improvements by combining semantic web technologies, cheminformatical representations, and chemometrics for statistical modeling and pattern recognition.
E. L. Willighagen Abstract
  30 Prediction of consistent water networks in uncomplexed protein binding sites based on knowledge-based potentials.
M. Betz, G. Neudert, G. Klebe Abstract
  31 Functional binders for non-specific binding: Evaluation of virtual screening methods for the elucidation of novel transthyretin amyloid inhibitors.
C. J. Simões, T. Mukherjee, R. M. Jackson, R. M. Brito Abstract

MONDAY MORNING

Section B
Boston Convention & Exhibition Center
155

Semantic Web in Chemistry - Cosponsored by COMP
E. Willighagen, Organizer
M. Braendle, Organizer, Presiding
8:30 32 Using the oreChemexperiments ontology: Planning and enacting chemistry
J. G. Frey, M. I. Borkum, C. Lagoze, S. J. Coles Abstract
9:15 33 CHEMINF: Community-developed ontology of chemical information and algorithms.
L. L. Chepelev, J. Hastings, E. Willighagen, N. Adams, C. Steinbeck, P. Murray-Rust, M. Dumontier Abstract
9:45   Intermission
10:00 34 Chemical entity semantic specification: Knowledge representation for efficient semantic cheminformatics and facile data integration.
L. L. Chepelev, M. Dumontier Abstract
10:30 35 Semantic assistant for lipidomics researchers.
A. Kouznetsov, R. Witte, C. J. Baker Abstract
11:00 36 ChemicalTagger: A tool for semantic text-mining in chemistry.
L. Hawizy, D. M. Jessop, P. Murray-Rust Abstract

Section B
Boston Convention & Exhibition Center
156A

The Journal of Chemical Information and Modeling's 50th Anniversary Symposium - Cosponsored by COMP
W. Jorgensen, Organizer, Presiding
8:45   Introductory Remarks.
8:55 37 From canonical numbering to the analysis of enzyme-catalyzed reactions: 32 years of publishing in JCIM (JCICS).
J. Gasteiger, J. Gasteiger Abstract
9:25 38 Fifteen years of JCICS.
G. W. Milne Abstract
9:55 39 Fifteen years in chemical informatics: Lessons from the past, ideas for the future.
D. Agrafiotis Abstract
10:25   Intermission.
10:40 40 Applications of wavelets in virtual screening.
V. Gillet, R. Martin, E. Gardiner, S. Senger Abstract
11:10 41 Privileged substructures revisited: Target community-selective scaffolds.
J. Bajorath Abstract Presentation (pdf)
11:40 42 Automated retrosynthetic analysis: An old flame rekindled.
P. Johnson, A. P. Cook, J. Law, M. Mirzazadeh, A. Simon Abstract
12:10   Lunch
1:30   Introductory remarks
1:40 139 Configurational entropy and mechanical stress in molecular recognition
M. K. Gilson Abstract
2:10 140 Advancing anthrax toxin countermeasures using topomeric searching and virtual screening methodologies
E. A. Amin, T.-L. Chiu, D. J. Hook, M. A. Walters, B. C. Finzel, J. Solberg, S. Patil, T. W. Geders, S. Rangarajan, R. Francis, X. Zhang Abstract
2:40 141 Model-free drug-like filters
T. I. Oprea, O. Ursu, C G. Bologa Abstract
3:10   Intermission
3:25 142 Chemocentric informatics: Enabling bioactive compound discovery through structural hypothesis fusion
A. Tropsha Abstract
3:55 143 Computers and drug discovery: From duds to $5B drugs
R. C. Glen. Abstract
4:25 144 Weighting and fusion methods for similarity-based virtual screening
P. W. S. Arif, J. Holliday, N. Malim, C. Mueller Abstract

MONDAY AFTERNOON

Section B
Boston Convention & Exhibition Center
155

Wheres the Good Stuff? Consumer Health Information, and Social Networking Resources and Services - Cosponsored by CHED
A. Twiss-Brooks, Organizer, Presiding
1:00 43 Dietary supplements: Free evidence-based resources for the cautious consumer.
B. Erb Abstract Presentation (pdf)
1:25 44 What lessons learned can we generalize from evaluation and usability of a health website designed for lower literacy consumers?.
M. J. Moore, R. G. Bias Abstract
1:50 45 National Library of Medicine resources for consumer health information.
M. Eberle Abstract
2:15 46 Better prescription for information: Dietary supplements online.
G. Y. Hendler Abstract Presentation (pdf)

Section A
Boston Convention & Exhibition Center
156A

Semantic Web in Chemistry - Cosponsored by COMP
M. Braendle, Organizers
E. Willighagen, Organizer, Presiding
1:15 47 Overview of the linking open drug data task.
E. Prudhommeaux, E. Willighagen, S. Stephens Abstract
2:00 48 Control, monitoring, analysis and dissemination of laboratory physical chemistry experiments using semantic web and broker technologies.
J. G. Frey, S. Wilson Abstract
2:30   Intermission.
2:45 49 Semantic analysis of chemical patents.
D. M. Jessop, L. Hawizy, P. Murray-Rust, R. C. Glen Abstract
3:15 50 Data mining and querying of integrated chemical and biological information using Chem2Bio2RDF.
D. J. Wild, B. Chen, Y. Ding, X. Dong, H. Wang, D. Jiao, Q. Zhu, M. Sankaranarayanan Abstract
3:45 51 Mining and visualizing chemical compound-specific chemical-gene/disease/pathway/literature relationships.
Q. Zhu, P. Purohit, J. Youl Choi, S. Bae, J. Qiu, Y. Ding, D. Wild Abstract
4:15   Intermission.
4:20   CINF Open Meeting
4:30   Open Meeting. Committees on Publications and Chemical Abstracts Service

Section B
Boston Convention & Exhibition Center
155

CINFlash: Can You Present Faster Than a Femtosecond Laser?
R. Guha, Organizer, Presiding
2:45   Panel Discussion

MONDAY EVENING

Sci-Mix
R. Guha, Organizer
8:00 - 10:00   See listings: 2, 6, 20, 28, 31, 78, 91.

TUESDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

Herman Skolnik Award Symposium: The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis
E. X. Esposito, Organizer
A. Hopfinger, Organizer, Presiding
8:15   Introductory Remarks.
8:30 52 What makes polyphenols good antioxidants? Alton Brown, you should take notes...
E. X. Esposito Abstract
9:15 53 Engineering and 3D protein-ligand interaction scaling of 2D fingerprints
J. Bajorath Abstract Presentation (pdf)
10:00   Intermission.
10:15 54 In silico binary QSAR models based on 4D-fingerprints and MOE descriptors for prediction of hERG blockage.
Y. Tseng Abstract
11:00 55 Telling the good from the bad and the ugly: The challenge of evaluating pharmacophore model performance.
R. D. Clark Abstract

TUESDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

Herman Skolnik Award Symposium: The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis
A. Hopfinger, Organizer
E. X. Esposito, Organizer, Presiding
2:00 56 Creative application of ligand-based methods to solve structure-based problems: Using QSAR approaches to learn from protein crystal structures.
C. M. Breneman, S. Das, M. Sundling, M. Krein, S. Cramer, K. P. Bennett, C. Bergeron, J. Zaretzki Abstract
2:45 57 Computer-aided drug discovery.
W. L. Jorgensen Abstract
3:30   Intermission.
3:45 58 Structure-based discovery and QSAR methods: A marriage of convenience.
J. S. Duca Abstract
4:30 59 Extending the QSAR Paradigm using molecular modeling and simulation.
A. J. Hopfinger Abstract
5:15   Presentation of Award

WEDNESDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research - Cosponsored by COMP and MEDI
G. Maggiora, M. Lajiness, Organizers
J. Bajorath, Organizer, Presiding
8:50   Introductory remarks.
9:00 60 Overview of activity landscapes and activity cliffs: Prospects and problems.
G. M. Maggiora Abstract
9:30 61 Exploring and exploiting the potential of structure-activity cliffs.
G. M. Maggiora Abstract
10:00 62 What makes a good structure activity landscape? Network metrics and structure representations as a way of exploring activity landscapes.
R. Guha Abstract
10:30   Intermission.
10:45 63 Consensus model of activity landscapes and consensus activity cliffs.
J. L. Medina-Franco, K. Martinez-Mayorga, F. Lopez-Vallejo Abstract
11:15 64 R-Cliffs: Activity cliffs within a single analog series.
D. Agrafiotis Abstract

Section B
Boston Convention & Exhibition Center
155

Recent Progress in Chemical Structure Representation
R. Apodaca, Organizer, Presiding
9:00   Introductory Remarks.
9:05 65 Chemical structure representation in the DuPont Chemical Information Management Solutions database: Challenges posed by complex materials in a diversified science company.
M. A. Andrews, E. S. Wilks Abstract Presentation (pdf)
9:35 66 From deposition to application: Technologies for storing and exploiting crystal structure data.
C. R. Groom, J. Cole, S. Bowden, T. Olsson Abstract
10:05 67 Recent IUPAC recommendations for chemical structure representation: An overview.
J. Brecher Abstract
10:35   Intermission
10:50 68 Orbital development kit.
E. L. Willighagen Abstract Presentation (pdf)
11:20 69 Line notations as unique identifiers.
K. Boda Abstract

WEDNESDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research - Cosponsored by COMP and MEDI
J. Bajorath, M. Lajiness, Organizers
G. Maggiora, Organizer, Presiding
2:00 70 Analysis of activity landscapes, activity cliffs, and selectivity cliffs.
J. Bajorath Abstract Presentation (pdf)
2:30 71 Using Activity Cliff Information in structure-based design approaches.
B. Seebeck, M. Wagener, M. Rarey Abstract
3:00 72 Exploring activity cliffs using large scale semantic analysis of PubChem.
D. J. Wild, B. Chen, Q. Zhu Abstract
3:30 73 Quantifying the usefulness of a model of a structure-activity relationship: The SALI Curve Integral.
J. H. Van Drie, R. Guha Abstract
4:00   Concluding Remarks

Section B
Boston Convention & Exhibition Center
155

Recent Progress in Chemical Structure Representation
R. Apodaca, Organizer, Presiding
2:00 74 Status of the InChI and InChIKey algorithms.
S. Heller Abstract Presentation (pdf)
2:30 75 Self-contained sequence representation (SCSR): Bridging the gap between bioinformatics and cheminformatics.
K. T. Taylor, W. L. Chen, B. D. Christie, J. L. Durant, D. L. Grier, B. A. Leland, J. G. Nourse Abstract Presentation (pdf)
3:00   Intermission.
3:15 76 Representation of Markush structures: From molecules toward patents.
S. Csepregi, N. Máté, R. Wágner, T. Csizmazia, S. Dóránt, E. Bíró, T. Dudgeon, A. Baharev, F. Csizmadia Abstract Presentation (pdf)
3:45 77 CSRML: A new markup language definition for chemical substructure representation.
C.H. Schwab, B. Bienfait, J. Gasteiger, T. Kleinoeder, J. Marucszyk, O. Sacher, A. Tarkhov, L. Terfloth, C. Yang Abstract Presentation (pdf)

THURSDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

General Papers
R. Guha, Organizer, Presiding
8:45 78 Prediction of solvent physical properties using the hierarchical clustering method.
T. M. Martin, D. M. Young Abstract
9:10 79 Scaffold diversity analysis using scaffold retrieval curves and an entropy-based measure.
J. L. Medina-Franco, K. Martinez-Mayorga, A. Bender, T. Scior Abstract
9:35 80 Nonsubjective clustering scheme for multiconformer databases.
A. B. Yongye, A. Bender, K. Martinez-Mayorga Abstract
10:00   Intermission.
10:10 81 Finding drug discovery "rules of thumb" with bump hunting.
T. Hashimoto, M. Segall Abstract
10:35 82 Machine learning in discovery research: Polypharmacology predictions as a use case.
N. Wale, K. McConnell, E. M. Gifford Abstract
11:00 83 Interpretable correlation descriptors for quantitative structure-activity relationships.
J. D. Hirst Abstract

THURSDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

General Papers
X. Wang, Presiding
R. Guha, Organizer
1:30 84 Chemistry in your hand: Using mobile devices to access public chemistry compound data.
A. J. Williams, V. Tkachenko Abstract
1:55 85 Feature analysis of ToxCastTM compounds.
P. Volarath, S. Little, C. Yang, M. Martin, D. Reif, A. Richard Abstract
2:20 86 Extracting information from the IUPAC Green Book.
J. G. Frey, M. I. Borkum Abstract
2:45 87 Biologics and biosimilars: One and the same?
R. Schenck Abstract
3:10   Intermission.
3:20 88 Intelligent mining of drug information resources.
R. Jain, A. Tamhankar, A. Ausekar, Y. Dixit Abstract
3:45 89 Cheminformatics semantic grid for neglected diseases.
P. J. Kowalczyk Abstract
4:10 90 Extraction and integration of chemical information from documents.
H. O. Villar, J. Betancort, M. R. Hansen Abstract
4:35 91 SAR and the role of active-site waters in blood coagulating serine proteases: A thermodynamic analysis of ligand-protein binding.
N. K. Salam, W. Sherman, R. Abel Abstract

 

Abstracts #240

CINF Symposia

ACS Chemical Information Division (CINF)
Fall, 201 ACS National Meeting
Boston, MA

R. Guha, Program Chair

SUNDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

Semantic Web in Chemistry - Cosponsored by COMP
E. Willighagen, Organizer, Presiding
M. Braendle, Organizer
8:45   Introductory Remarks.
8:55 1 Semantic envelopment of cheminformatics resources with SADI.
L. L. Chepelev, E. Willighagen, M. Dumontier
Department of Biology, School of Computer Science, and Institute of Biochemistry, Carleton University, Ottawa, Ontario, Canada; Department of Pharmaceutical Sciences, Uppsala University, Uppsala, Sweden

The distribution of computational resources as web services and their execution as workflows has enabled facile computation and data integration for bio- and cheminformatics. The Semantic Automated Discovery and Integration (SADI) framework addresses many shortcomings of similar frameworks, such as SSWAP and BioMoby, while allowing for more efficient semantic envelopment of computational chemistry services, resource discovery, and automated workflow organization. In this work, we apply the CHEMINF ontology and Chemical Entity Semantic Specification and demonstrate the usability of the SADI framework in solving common cheminformatics problems starting from RDF-based chemical entity representations. Our eventual goal is to convert all of the functions and functionalities of the Chemistry Development Kit (CDK) into distinct SADI services. This would enable the formulation of all cheminformatics problems currently addressed by CDK, as SPARQL queries, returning meaningful RDF output which can then be easily integrated with existing RDF-based knowledgebases or used for further processing.
9:40 2 RESTful RDF web services for predictive toxicology.
N. Jeliazkova
Ideaconsult Ltd., Sofia, Bulgaria

The Open Source Predictive Toxicology Framework http://www.opentox.org, developed by partners of the EC FP7 OpenTox project , aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology http://www.opentox.org/api/1.1/opentox.owl ii) flexibility by linking with related ontologies; iii) availability of data and algorithms via a standardized REST web services interface, where every compound, data set or predictive method has an unique web address, used to retrieve its RDF representation, or initiate the calculations. The OpenTox framework allows building user-friendly applications for toxicological experts or model developers, or direct access by an application programming interface for development, integration and validation of new algorithms. The work presented describes the experience of building RESTful web services, based on RDF representation of resources, to incorporate diverse IT solutions into a distributed and interoperable system.
Presentation (pdf)
10:10   Intermission.
10:25 3 Linking the resource description framework to cheminformatics and proteochemometrics.
E. L. Willighagen, J. E. Wikberg
Department of Pharmaceutical Biosciences, Uppsala University, Uppala, Sweden

Background: Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation. Results: The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and Ki values are modeled for a number of biological targets using data from the chEMBL database. Conclusions: We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.
Presentation (pdf)
10:55 4

Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience infrastructure.
A. Paschke, S. Heineke
FIZ Chemie, Berlin, Germany; Department of Mathematics and Computer Science, FU Berlin, Berlin, Germany

Our Chemical e-Science Information Cloud (ChemCloud) - a Semantic Web based eScience infrastructure - integrates and automates a multitude of databases, tools and services in the domain of chemistry, pharmacy and bio-chemistry available at the Fachinformationszentrum Chemie (FIZ Chemie), at the Freie Universitaet Berlin (FUB), and on the public Web. Based on the approach of the W3C Linked Open Data initiative and the W3C Semantic Web technologies for ontologies and rules it semantically links and integrates knowledge from our W3C HCLS knowledge base hosted at the FUB, our multi-domain knowledge base DBpedia (Deutschland) implemented at FUB, which is extracted from Wikipedia (De) providing a public semantic resource for chemistry, and our well-established databases at FIZ Chemie such as ChemInform for organic reaction data, InfoTherm the leading source for thermophysical data, Chemisches Zentralblatt, the complete chemistry knowledge from 1830 to 1969, and ChemgaPedia the largest and most frequented e-Learning platform for Chemistry and related sciences in German language.

Chem Cloud

11:25 5 Use of semantic web services to access small molecule ligand database.
A. P. Tamhankar, A. S. Ausekar
Software Solutions Group, Evolvus, Pune, Maharashtra, India

Resource Description Framework (RDF) and a set of associated technologies like OWL, SPARQL etc..., which form the W3C's semantic web technology stack, are renewing interest in semantic chemistry. Semantic Web Services not only specify syntactic interoperability but also specify and enforce the semantic constraints of messages being transmitted and objects being accessed.Liceptor database is a small molecule ligand database consisting of approximately 4 million compounds. The database schema consists of fields like molecular properties (2D-structure, molecular weight, molecular formula etc...), molecular descriptors (H-donors, H-acceptors, logP, logD number of rotational bonds etc...) and pharmacological properties (bio-assays, receptors, enzymes, parameters, animal models, therapeutic indications etc...). Pharmaceutical and Bio-Technology companies use this database to mine chemical space for internal research, to prioritize QSAR and pharmacophore studies, for synthetic chemistry endeavors and for advancing hit-to-lead patterns.The database records are available in multiple formats (relational database, XML, Rdfile etc...) as well as available online through an interactive web application (html format). The soon to be released version of the database includes access using semantic web services. The ontology is expressed in OWL and RDF defines the overall framework. Typical consumers of the data using this access mechanism are expected to be third-party tool vendors and data aggregators. Use of semantic web services allows evolution of the schema over time without explicitly communicating the change as well as requiring all data consumers to be changed.



Section B
Boston Convention & Exhibition Center
155

Assessing Collections & Information Resources in Science & Technology - Cosponsored by SLA
E. Kajosalo, Organizer, Presiding
9:00   Introductory Remarks.
9:05 6 Usage metrics: Tools for evaluating science monograph collections.
M. M. Foss, V. Kisling, S. Haas
Department of Marston Science Library, University of Florida, Gainesville, FL, United States

As academic libraries are increasingly supported by a matrix of databases functions, the use of data mining and visualization techniques offer significant potential for future collection development based on quantifiable data. While data collection techniques are not standardized and results may be skewed because of granularity problems, or faulty algorithms, useful baseline data is extractable and broad trends identified. The purpose of the study is to provide an initial assessment of data associated with the science monograph collection at the Marston Science Library (MSL), University of Florida. The sciences fall within the major Library of Congress Classification schedules of Q, S, and T, excluding TN, TR, TT, and R. The overall strategy of this project is to analyze audience-based circulation patterns, e-book usage, purchases, and interlibrary loan statistics from the academic year July 1, 2008 to June 30, 2009. Such analyses provide an evidence-based framework for future collection decisions.
Presentation (pdf)
9:30 7 Happily ever after or not: E-book collection usage analysis and assessment at USC Library.
N. Xiao
University of Southern California, United States

With more and more e-book collections being launched by publishers, USC Science and Engineering Library initiated its e-book collection acquisition since late 2008, and one of first and biggest acquired collections is Springer e-books. Now after two years, are users satisfied with this e-book collection? Are they accessing and using it? Like any other e-collection, how well have we, librarians and staff, been coping with this collection in collection development (e.g. e-book packages from other publishers), access services (e.g. interlibrary loan, off-campus access, e-books technical issues), outreach (e.g. e-book market strategies), and information literacy? This presentation will overview our assessment of this e-book collection after 2 years. What have we learned from the usage data? And by analyzing the data, how did and can we improve our services to users? It is hoped to our experience can present a proactive implementation plan for others considering comprehensive digital migration of their content, with the goal of not only better coping with the current economic environment, but of spurring development, innovation, and efficiency in the long run.
Presentation (pdf)
9:55 8 From Chemical Abstracts to SciFinder: Transitioning to SciFinder and assessing customer usage.
S. Makar, S. Bruss
National Institute of Standards and Technology, United States

The Research Library of the National Institute of Standards and Technology (NIST) monitors SciFinder usage to ensure customers have ready access to the database and to determine who uses it. Usage statistics played a critical role in determining whether to increase the number of seats and which heavy users should help pay for those additional seats. While most NIST researchers were very excited to acquire access to this product, many, who were well acquainted with using the print version of Chemical Abstracts, needed to learn best techniques for searching and browsing the chemistry literature using SciFinder. Transitioning from the printed Chemical Abstracts to SciFinder posed significant challenges to one research project. This presentation will describe how the NIST Research Library used SciFinder usage statistics to make collection development decisions and how library staff worked with NIST researchers to successfully transition from the printed Chemical Abstracts to SciFinder.
Presentation (pdf)
10:20   Intermission
10:35 9 Using Web of Knowledge to identify publishing and citation patterns of campus researchers at the University of Arkansas.
L. Salisbury, J. S. Smith
University of Arkansas, United States

This presentation will provide information on a project undertaken at the University of Arkansas in Fayetteville to study publications by the campus researchers with an emphasis on the STEM (agricultural sciences, physical science, biological sciences, engineering and mathematics, etc.) disciplines at the macro-level for a three-year period. The overall objective of the study was (1) to provide an overview of the productivity of faculty and researchers in the various departments which could be used in allocating resources for collection development and (2) to provide evidence-based data of periodical use to assist with collection decisions and to identify collection strengths at the university level. We used the Web of Knowledge database (Science Citation Index, Social Science Citation Index and Arts and Humanities Citation Index) to identify the periodical literature in which our researchers published and those that they cite in their publications to do several analysis including determining the extent to which our researchers are publishing in and citing periodicals from the Elsevier, Wiley and IEEE journal packages. A methodology for extracting citations from Web of Knowledge into an Excel spreadsheet will also be presented. The strengths and weaknesses of the Web of Knowledge for this study will also be highlighted.
Presentation (pdf)
11:00 10 Don't forget the qualitative: Including focus groups in the collection assessment process.
S. Shepherd, T. M. Vogel
University of California San Diego, United States

To complement our ongoing quantitative collection evaluations based on cost and usage data, the UC San Diego Science & Engineering Library conducted a series of focus groups with graduate students and faculty in our core departments. Our objective was to learn more about how they use the collection for research and teaching, so that we could make more informed decisions about collection management, as well as how best to deploy our staff resources for increased promotion, outreach and instruction. Participants were asked about the resources they use, how they use them, and what gaps they perceived. We also probed their familiarity with the top licensed resources in their fields. In this presentation we will discuss our focus group methods, results and the next steps we have taken in this assessment, including a follow-up survey to the same departments to obtain more quantitative information about usage of the collection.
Presentation (pdf)

SUNDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

Data-intensive Drug Design - Cosponsored by COMP
J. Van Drie, Organizer, Presiding
1:45   Introductory Remarks.
1:50 11 Strategies for the identification and generation of informative compound sets.
M. S. Lajiness
Computer Aided Drug Discovery, Eli Lilly & Company, Indianapolis, IN, IN, United States

Mounting pressures in drug discovery research dictate more efficient methods of picking the winners: molecules that actually have a chance to be the drugs of the future. Clearly, these methods need to navigate a highly, multi-dimensional landscape. It is also clear that hard filters should never be used and that a more continuous treatment or prioritization has clear advantages. Further, structural diversity needs to be considered in order for the best structural ideas to be found most efficiently. In addition, history and external sources of information also must be examined. This presentation will describe some of the methods, techniques, and strategies that have been employed by the author over the past 25 years working in cheminformatic that attempt to identify compounds that are likely to provide the most useful information so that one might discover solid leads more rapidly.
2:15 12 Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery.
C. Steinbeck
European Bioinformatics Institute, EMBL Outstation - Hinxton, Hinxton, Cambridge, United Kingdom

Small molecules are of increasing interest for bioinformatics in areas such as metabolomics and drug discovery. The recent release of large open chemistry databases into the public domain calls for flexible, open toolkits to process them. These databases and tools will, for the first time, create opportunities for academia and third-world countries to perform state-of-the-art open drug discovery and translational research - endeavors so far a domain of the pharmaceutical industry. This talk will describe a couple of relevant data resources at the European Bioinformatics Institute and will also outline our research on and development of toolkits such as the Chemistry Development Kit and CDK-Taverna to support the exploitation of these data sources.
2:40 13 Decision making in the face of complicated drug discovery data using the Novartis system for virtual medicinal chemistry (FOCUS).
D. Chin
Global Discovery Chemistry, Novartis Institutes for BioMedical Research, Cambridge, MA, United States

This talk will describe some of the broad concepts that led to the development of the Novartis software system for data analysis & virtual medicinal chemistry (FOCUS). The system, which is routinely used globally, is designed to present the scientist with an accessible interface that permits iterative hypothesis testing of many possible chemical candidates while accounting for undesirable ADMET properties. Some of the key principles are to present the data in a way that reflects stored knowledge and facilitates the decision about what compound to make next. We will highlight some of these concepts in applications spaning the range from target identification to drug optimization.
3:05 14 Integrating chemical and biological data: Insights from 10 years of VERDI.
S. Roberts, W. P. Walters, R. McLoughlin, P. Gabriel, J. Willis, T. Kramer
Vertex Pharmaceuticals, Cambridge, MA, United States

VERDI is a software system, originally developed in 2000 at Vertex Pharmaceuticals, for integrating chemical and biological data and delivering this information to drug discovery teams. In addition to traditional table views, VERDI incorporated a number of modules designed to enable scientists to understand relationships between chemical structure and biological data. Over the last 10 years, VERDI has been the primary data access tool for hundreds of scientists at multiple sites around the world. A retrospective evaluation of VERDI has provided us with a number of 'lessons-learned', which come from a multitude of revisions, improvements and new feature additions. Some of these lessons, which are being used as the basis for development of the next generation of data analysis and visualization tools at Vertex, will be presented and discussed in detail.
3:30   Intermission.
3:45 15 Collaborative database and computational models for tuberculosis drug discovery decision making.
S. Ekins, J. Bradford, K. Dole, A. Spektor, K. Gregory, D. Blondeau, M. Hohman, B. A. Bunin
Collaborative Drug Discovery, Burlingame, CA, United States; Collaborations in Chemistry, Jenkintown, PA, United States; Department of Pharmaceutical Sciences, University of Maryland, Baltimore, MD, United States; Department of Pharmacology, Robert Wood Johnson Medical School, University of Medicine & Dentistry of New Jersey, Piscataway, NJ, United States

Drug discovery is being re-shaped involving large scale collaborations that connect individual researchers using collaborative computational approaches and crowdsourcing. Future drug discovery decisions will ultimately still be made based on massive multidimensional datasets. As an example, the search for molecules with activity against Mycobacterium tuberculosis (Mtb) is employing many approaches in collaborating national and international laboratories. We have developed a database (CDD TB) to capture public and private Mtb data while enabling data mining and collaborations with other researchers. We have also used the public data along with several computational approaches including Bayesian classification models for 220,463 molecules and tested them with external molecules, enabling the discrimination of active or inactive substructures from other datasets in CDD TB. The combination of the database, dataset analysis, and computational models provides new insights into molecular properties and features that are determinants of whole cell activity, allowing prioritization and decision making around molecules.
4:10 16 Data drive life sciences: The Pyramids meet the Tower of Babel.
R. Guha
Department of Informatics, NIH Chemical Genomics Center, Rockville, MD, United States

A characteristic feature of modern life science research is the fact that it has become data intensive. As a result we are faced with datasets of massive size and wide variety in terms of the type of data. Examples include massive datasets from next generation sequencing to more complex datasets of chemical structure and activity from high-throughput small molecule screens. In this talk I will discuss some aspects of how one can handle datasets of such size and variability. I will consider examples from computational science and distributed services that allow us to easily and cheaply handle massive datasets to integration approaches that attempt to merge data from multiple sources to obtain a systems level view of the biological effects of small molecules. In all cases, the focus will be data generated from and for small molecule studies.
4:35 17 Design principles for diversity-oriented synthesis: Facilitating downstream discovery with upfront design.
L. Marcaurelle
Chemical Biology Platform, Broad Institute, Cambridge, MA, United States

To expand the diversity of our screening collection to access a broad range of biological targets, we aspire to produce libraries of small-molecules that combine the structural complexity of natural products and the efficiency of high-throughput processes. Moreover, we aim to synthesize the complete matrix of stereoisomers for all library members. We reason that this unique collection will enable the rapid development of stereo-structure/activity relationships (SSAR) upon biological testing providing valuable information for the prioritization and optimization of hit compounds. Although our library products may be distinct compared to traditional compound collections, we are faced with fundamental questions relevant to library design: How do you prioritize scaffolds for synthesis? How do you select products with desirable physicochemical properties? In designing DOS libraries we employ a number of cheminformatic methods to tackle such issues and select compounds for synthesis/screening. An overview of our design criteria and decision-making process will be presented.
5:00 18 Overview: Data-intensive drug design.
J. H. Van Drie
R&D, Van Drie Research, Andover, MA, United States

How do we best make med chem decisions in the face of a lot of data? This is an issue that confronts us at many stages of the drug discovery process: screening, hit-to-lead, early lead optimization, and late-stage lead optimization. In this session, speakers representing each of these stages will describe how they have successfully tackled these issues, emphasizing general principles over specific computational tools. Our brains can conveniently handle only about 7 things at a time, and most traditional med chem. decision-making processes reflect that. Already when the number of molecules being considered is in the range of dozens, things get tricky; when that number is in the thousands to hundreds of thousands, one must re-orient one's perspective.
Presentation (pdf)


Section B
Boston Convention & Exhibition Center
155

Boston Convention & Exhibition Center - Cosponsored by SLA
E. Kajosalo, Organizer, Presiding
2:00 19 Data-driven development: How ACS Publications uses data to enhance products and services, and respond to customer needs.
M. Blaney, S. Rouhi
ACS Publications, United States

As the scholarly publishing landscape continues to rapidly transform in unprecedented ways, publishers and libraries have had to quickly pivot to accommodate the changing preferences that users have for accessing, collecting, and consuming digital information. ACS Publications has used a data-driven approach to handle these changing customer and end-user needs. Everything from our ACS Mobile iPhone application to our transition from print to online Web products has been shaped by this approach. This presentation will address the role of data in developing new products, enhancing our web presence, and responding to user behavior on the ACS Web Editions Platform.
Presentation (pdf)
2:25 20 Objective collections evaluation using statistics at the MIT Libraries.
M. Willmott, E. Kajosalo
Engineering & Science Libraries, Massachusetts Institute of Technology, United States

Recent budget pressures have forced many libraries to reevaluate their collections and substantially cut back on their subscription spending. The task of evaluating a large collection of subscription-based materials, however, is a difficult one. Journals from different subject areas are used differently, and journals from different publishers have their usage measured differently. Evaluating each individual journal subscription separately would be a monumental task bordering on infeasibility. This paper will discuss the approach taken by the MIT Engineering and Science Libraries in the spring of 2009 and 2010 to evaluate their journal collections, specifically for Springer, Elsevier, and Wiley-Blackwell, the three journal publishers with which these libraries hold the most subscriptions. Discussion will include the gathering and analysis of usage data, publication data, and citation data, as well as the process by which these data were combined to create an objective ranking for each journal. These objective rankings were not final decisions; librarians with subject expertise then evaluated the lower-ranked journals to determine if they were appropriate choices for cancellation, often taking into consideration many additional factors. However, these objective evaluations helped librarians to more efficiently use their time by indicating which journals may be strong candidates for cancellation, and they helped department liaisons to defend final cancellation choices to a very data-driven faculty. The end result was a more efficient cancellation process as well as a more comprehensive understanding of the library's journal collections.
Presentation (pdf)
2:50 21 Getting the biggest bang for your buck: Methods and strategies for managing journal collections.
G. Baysinger
Stanford University, United States

Chemistry journals have the highest average cost per title of all subject areas. Library collection budgets have not kept pace with price increases and funds to acquire new titles are scarce. Signing big deals for journals has limited flexibility in adapting to changes. These factors have made acquiring journals to support programmatic needs more of a challenge than ever before. This presentation will cover methods, strategies, and tools than can be used to help assess how resources are allocated when developing and managing journal collections.
Presentation (pdf)
3:15 22 Taking a collection down to its elements: Using various assessment techniques to revitalize a library.
L. Solla
Cornell University, 283 Clark Hall, Ithaca, NY, United States

What are the elements of a research literature collection in the physical sciences? How are they being used and what roles are they playing in research and teaching and learning? Who is using them- students, faculty, related disciplines? These are the questions that drove the extensive analyses conducted on the print and electronic literature collections in the Physical Sciences Library at Cornell University in preparation for transitioning the service model from a print-based facility to electronic collections and services. General trends indicated the usage of the collection had been well over 90% electronic for years and the acquisition of books and journals in print had been reduced to minimal levels under budget pressures. But there were significant gaps in the electronic holdings and there remained a small but very active core of the print collection, both warranted further study to enable us to provide the best possible access to these crucial materials in the new service model. The library management system was mined for a variety of data points and complemented with external data sources and user input to build the transition map for the physical sciences literature collections.
Presentation (pdf)
3:40   Panel discussion

SUNDAY EVENING

2010 CINF Scholarship for Scientific Excellence - Financially supported by FIZ Chemie Berlin
G. Grethe, Organizer
6:30 - 9:30
  23 Predicting specific inhibition of cyclophilins A and B using docking, growing, and free energy perturbation calculations.
S. V. Sambasivarao, O. Acevedo
Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, United States

Cyclophilins (Cyp) belong to the enzyme class of peptidyl-prolyl isomerases which catalyze the cis-trans conversion of prolyl bonds in peptides and proteins. Twenty human Cyp isoenzymes have been reported and many are excellent targets for the inhibition of hepatitis C virus replication and multiple inflammatory diseases and cancers. Given the complete conservation of all active site residues between many of the enzymes, i.e., CypA, CypB, CypC and CypD, a better understanding of how to specifically inhibit individual targets could potentially reduce reported side effects in current treatments. Docking and growing programs have been used to construct protein-ligand complexes for a variety of reported selective inhibitors, including acylurea and aryl 1-indanylketone derivatives. Free-energy perturbation/Monte Carlo (FEP/MC) calculations have been utilized to quantitatively reproduce the free energies of binding for the inhibitors in multiple Cyp active sites in order to elucidate the origin of the specificity for the compounds.
  24 Using aggregative web services for drug discovery.
Q. Zhu, M. S. Lajiness, D. J. Wild
School of Informatics and Computing, Indiana University, Bloomington, IN, United States

Recent years have seen a huge increase in the amount of publicly-available information pertinent to drug discovery, including online databases of compound and bioassay information; scholarly publications linking compounds with genes, targets and diseases; and predictive models that can suggest new links between compounds, genes, targets and diseases. However, there is a distinct lack of data mining tools available to harness this information, and in particular to look for information across multiple sources. At Indiana University we are developing an aggregative web service framework to solve this kind of problems. It offers a new approach to data mining that crosses information source types to look at the "big picture" and to identify corroborating or conflicting information from models, assays, databases and publications.
  25 Semantifying polymer science using ontologies.
E. O. Cannon, A. Nico, P. Murray-Rust
Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom

Ontologies are graph based, formal representations of information in a domain. Currently, there is a large interest in ontologies for biology and medicine, though little effort has been concentrated in the field of chemistry, let alone polymer science. We have developed a number of ontologies for polymer science: properties, measurement techniques and measurement conditions, using the Web Ontology Language. These ontologies will help facilitate the standardization of data exchange formats in polymer science by providing a common domain of knowledge. The properties ontology contains over 150 properties and has been integrated with the measurement techniques and conditions ontology, to give information on how a property is measured and under what conditions. The ontologies will be of use to polymer scientists wishing to reach a consensus in this area of knowledge. The ontologies also have the advantage that they can be integrated into software applications to leverage this knowledge.
  26 Toxicity reference database (ToxRefDB) to develop predictive toxicity models and prioritize compounds for future toxicity testing.
H. Tang, H. Zhu, L. Zhang, A. Sedykh, A. Richard, I. Rusyn, A. Tropsha
Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States; Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States; National Center for Computational Toxicology, Office of Research&Developoment, U.S. Environmental Protection Agency, Chapel Hill, NC, United States; Department of Environmental Sciences and Engineering, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

EPA's ToxCast program aims to use in vitro assays to predict chemical hazards and prioritize chemicals for toxicity testing. We employed the predictive QSAR workflow to develop computational toxicity models for ToxCast compounds with historical animal testing results available from ToxRefDB. To ensure model stability and robustness, multiple classifiers and 5-fold external cross-validation were applied. Results show that for three of the 78 toxicity endpoints, including one chronic and two reproductive endpoints, the Correct Classification Rate for external validation datasets was above 0.6 for all types of QSAR models. Our studies suggest that it is feasible to develop QSAR models for some endpoints, which could be further augmented by in vitro assay measures. The validated toxicity models were used for virtual screening of 50,000 chemicals compiled for the REACH program. The compounds predicted as toxic could be regarded as candidates for future toxicity testing. Abstract does not reflect EPA policy.
  27 OrbDB: A database of molecular orbital interactions.
M. A. Kayala, C. A. Azencott, J. H. Chen, P. F. Baldi
Department of Computer Science, University of California - Irvine, Irvine, CA, United States

The ability to anticipate the course of a reaction is essential to the practice of chemistry. This aptitude relies on the understanding of elementary mechanistic steps, which can be described as the interaction of filled and unfilled molecular orbitals. Here, we create a database of mechanistic steps from previous work on a rule-based expert system (ReactionExplorer). We derive 21,000 priority ordered favorable elementary steps for 7800 distinct reactants or intermediates. All other filled to unfilled molecular orbital interactions yield 106 million unfavorable elementary steps. To predict the course of reactions, one must recover the relative priority of these elementary steps. Initial cross-validated results for a neural network on several stratified samples indicate we are able to retrieve this ordering with a precision of 98.9%. The quality of our database makes it an invaluable resource for the prediction of elementary reactions, and therefore of full chemical processes.
  28 Novel approach to drug discovery integrating chemogenomics and QSAR modeling: Applications to anti-Alzheimer's agents.
R. Hajjo, S. Wang, B. L. Roth, A. Tropsha
Department of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States; Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Chemogenomics is an emerging interdisciplinary field relating the receptorome-wide biological screening to functional or clinical effects of chemicals. We have developed a novel chemogenomics approach combining QSAR modeling, virtual screening (VS), and gene expression profiling for drug discovery. Gene signatures for the Alzheimer's disease (AD) were used to query the Connectivity Map (cmap,http://www.broad.mit.edu/cmap/) to identify potential anti-AD agents. Concurrently, QSAR models were developed for the serotonin, dopamine, muscarinic and sigma receptor families implicated in the AD. The models were used for VS of the World Drug Index database to identify putative ligands. 12 common hits from QSAR/VS and cmap studies were subjected to parallel binding assays against a panel of GPCRs. All compounds were found to bind to at least one receptor with binding affinities between 1.7 - 9000 nM. Thus, our approach afforded novel experimentally confirmed GPCR ligands that may be implied as putative treatments for the AD.
  29 Cheminformatics improvements by combining semantic web technologies, cheminformatical representations, and chemometrics for statistical modeling and pattern recognition.
E. L. Willighagen
Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Uppland, Sweden

My research focuses on the methods needed for large-scale molecular property prediction, using semantic web, cheminformatics, and chemometrics methods. Originally starting with a Dictionary on Organic Chemistry website, research was started to find methods to accurately disseminate molecular knowledge, resulting in participation in Open Source cheminformatics projects, including Jmol, JChemPaint, and the Chemical Markup Language project, and an oral presentation at the "2000 Chemistry & Internet" conference. In that year, the applicant founded together with the Jmol and JChemPaint project leaders the Chemistry Development Kit (CDK), which is now a highly cited Open Source cheminformatics toolkit. Between 2001 and 2006 the applicant continued research in the area of data analysis with a PhD thesis on the "Representation of Molecules and Molecular Systems in Data Analysis and Modeling" with Prof. dr L.M.C. Buydens at the Analytical Chemistry Department at the Radboud University Nijmegen. The thesis studies the interaction of representation and the statistics and shows how tightly these need to match. Topics of the thesis include: a critical analysis of the use of proton and carbon NMR in QSAR; the use of Open Source, Open Data, and Open standards in interoperability in cheminformatics; the clustering of crystal structures using a novel similarity measure; and, the use of new supervised self-organizing maps in pattern recognition in crystallography. Part of the research was performed in the group of dr P. Murray-Rust at Cambridge University. Later research focused on the use of semantic technologies to reduce error in the aggregation and exchange of molecular data. Recent work applies developed technologies to cheminformatics in general and QSAR and metabolite identification in particular, with dr C. Steinbeck at Cologne University in Germany, and with dr R. van Ham at Wageningen University within the Netherlands Metabolomics Center. The applicant recently joined the development team of the award-winning cheminformatics-platform Bioclipse in Uppsala with Prof. J. Wikberg in Sweden, to continue his research in improving interoperability and reproducibility in cheminformatics and pharmaceutical bioinformatics and proteochemometrics in particular. This implies continued CDK development, development of semantic methods in computational chemistry, and making these technologies accessible to the non-programming chemist by supporting the development of cheminformatics in bench-chemist-oriented platforms such as Bioclipse and Taverna.
  30 Prediction of consistent water networks in uncomplexed protein binding sites based on knowledge-based potentials.
M. Betz, G. Neudert, G. Klebe
Pharmaceutical Chemistry, Philipps-University Marburg, Marburg, Germany

Within the active site of a protein water fulfills a variety of different roles. Solvation of hydrophilic parts stabilizes a distinct protein conformation, whereas desolvation upon ligand binding may lead to a gain of entropy. In an overwhelming number of cases, water molecules mediate interactions between protein and the bound ligand. Therefore, a reliable prediction of water molecules participating in ligand binding is essential for docking and scoring, and is necessary to develop strategies in ligand design. We require some reasonable estimates about the free energy contributions of water to binding. Useful parameters for such estimations are the total number of displaceable water molecules and the probabilities for their displacement upon ligand binding. These parameters depend on specific interactions with the protein and other water molecules, and thus the positions of individual water molecules. The high flexibility of water networks makes it difficult to observe distinct water molecules at well defined positions in structure determinations. Thus, experimentally observed positions of water molecules have to be assessed critically, bearing in mind that they represent an average picture of a highly dynamic equilibrium ensemble. Moreover, there are many structures with inconsistent and incomplete water networks. To address these deficiencies we developed a tool that predicts possible configurations of complete water networks in binding pockets in a consistent way. It is based on the well established knowledge-based potentials implemented into DrugScore, which also allow for a reasonable differentiation between "conserved" and "displaceable" water molecules. The potentials used were derived specifically for water positions as observed in small molecule crystal structures in the CSD. To account for the flexibility and high intercorrelation we apply a clique-based approach, resulting in water networks maximizing the total DrugScore. To incorporate as much known information as possible about a given target, we also allow to include constraints defined by experimentally observed water positions. Our tool provides a useful starting point whenever a possible configuration of water molecules need to be estimated in an uncomplexed protein, and suggests their spatial positions and their classification with respect to some kind of affinity prediction. In first tests we were able to get classifications and positional predictions which are in good agreement with crystallographically observed water molecules with remarkably small deviations.
  31 Functional binders for non-specific binding: Evaluation of virtual screening methods for the elucidation of novel transthyretin amyloid inhibitors.
C. J. Simões, T. Mukherjee, R. M. Jackson, R. M. Brito
Department of Chemistry, Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Institute of Molecular and Cellular Biology, University of Leeds, Leeds, West Yorkshire, United Kingdom

Inhibition of fibril formation by stabilization of the native form of transthyretin (TTR) is a viable approach for the treatment of Familial Amyloid Polyneuropathy that has been gaining momentum in the field of amyloid research. Herein, we present a benchmark of five virtual screening strategies to identify novel TTR stabilizers: (1) 2D similarity searches with chemical hashed fingerprints, pharmacophore fingerprints and UNITY fingerprints, (2) 3D-searches based on shape, chemical and electrostatic similarity, (3) LigMatch, a ligand-based method employing multiple templates, (4) 3D- pharmacophore searches, and (5) docking to consensus X-ray crystal structures. By combining the best-performing VS protocols, a small subset of molecules was selected from a tailored library of 2.3 million compounds and identified as representative of multiple series of potential leads. According to our predictions, the retrieved molecules present better solubility, halogen fraction and binding affinity for both TTR pockets than the stabilizers discovered to date.

MONDAY MORNING

Section B
Boston Convention & Exhibition Center
155

Semantic Web in Chemistry - Cosponsored by COMP
E. Willighagen, Organizer
M. Braendle, Organizer, Presiding
8:30 32 Using the oreChemexperiments ontology: Planning and enacting chemistry
J. G. Frey, M. I. Borkum, C. Lagoze, S. J. Coles
School of Chemistry, Univeristy of Southampton, Southampton, Hants, United Kingdom; Department of Information Science, Cornell Univeristy, Ithica, NY, United States

This paper presents the oreChem Experiments Ontology, an extensible model that describes the formulation and enactment of scientific methods (referred to as “plans”), designed to enable new models of research and facilitate the dissemination of scientific data on the Semantic Web. Currently, a high level of domain-specific knowledge is required to identify and resolve the implicit links that exist between digital artefacts, constituting a significant barrier-to-entry for third parties that wish to discover and reuse published data. The oreChem ontology radically simplifies and clarifies the problem of representing an experiment to facilitate the discovery and re-use of the data in the correct context. We describe the main parts of the ontology and detail the enhancements made to the Southampton eCrystals repository to enable the publication of oreChem metadata.
9:15 33 CHEMINF: Community-developed ontology of chemical information and algorithms.
L. L. Chepelev, J. Hastings, E. Willighagen, N. Adams, C. Steinbeck, P. Murray-Rust, M. Dumontier
Department of Biology, School of Computer Science, and Institute of Biochemistry, Carleton University, Ottawa, Ontario, Canada; Chemoinformatics and Metabolism Team, European Bioinformatics Institute, Cambridge, United Kingdom; Department of Pharmaceutical Sciences, Uppsala University, Uppsala, Sweden; Department of Chemistry, Unilever Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom

In order to truly convert RDF-encoded chemical information into knowledge and break out of domain- and vendor-specific data silos, reliable chemical ontologies are necessary. To date, no standard ontology that addresses all chemical information representation and service integration needs has emerged from previously proposed ontologies, ironically threatening yet another “Tower of Babel” event in cheminformatics. To avoid resultant substantial ontology mapping costs, we hereby propose CHEMINF, a community-developed modular and unified ontology for chemical graphs, qualities, descriptors, algorithms, implementations, and data representations/formalisms. Further, CHEMINF is aligned with ontologies developed within the OBO Foundry effort, such as the Information Artifact Ontology. We present the application of CHEMINF to efficiently integrate two RDF-based chemical knowledgebases with different representation structures and aims, but common classes and properties from CHEMINF. Finally, we discuss the steps taken to ensure applicability of this ontology in the semantic envelopment of computational chemistry resources, algorithms, and their output.
9:45   Intermission
10:00 34 Chemical entity semantic specification: Knowledge representation for efficient semantic cheminformatics and facile data integration.
L. L. Chepelev, M. Dumontier
Department of Biology, School of Computer Science, and Institute of Biochemistry, Carleton University, Ottawa, Ontario, Canada

Though the nature of RDF implies the ability to interoperate and integrate diverse knowledgebases, designing adequate and efficient RDF-based representations of knowledge concerning chemical entities is non-trivial. We hereby describe Chemical Entity Semantic Specification (CHESS), which captures chemical descriptors, molecular connectivity, functional composition, and geometric structure of chemical entities and their components. CHESS also handles multiple data sources and multiple conformers for molecules, as well as reactions and interactions. We demonstrate the generation of a chemical knowledgebase from disparate data sources, using which we conduct an analysis of the implications of design choices taken in CHESS on the efficiency of solutions for some classical cheminformatics problems, including molecular similarity searching and subgraph detection. We do this through automated conversion of SMILES-encoded query fragments into SPARQL queries and DL-Safe rules. Finally, we discuss approaches to identification of potential reaction participants and class members in chemical entity knowledgebases represented with CHESS.
10:30 35 Semantic assistant for lipidomics researchers.
A. Kouznetsov, R. Witte, C. J. Baker
Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, New Brunswick, Canada; Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada

Lipid nomenclature has yet to become a robust research tool for lipidomics or lipid research in general. This is in part because no rigorous structure based definitions exist for membership of specific lipid classes has existed. Recent work on the OWL-DL Lipid Ontology with defined axioms for class membership and has provided new opportunities to revisit the lipid nomenclature issue [1], [2]. Also necessary is a framework for sharing these axioms with scientists during scientific discourse and the drafting of publications. To achieve this we introduce here a new paradigm for Lipidomics researchers in which a client side application tags raw text about lipids with information, such as canonical name or relevant functional groups, derived from the ontology and is delivered using web services. Our approach includes following core components: (i)Semantic Assistant Framework [6]; (ii) Lipid ontology [4]; (iii) Ontological NLP methodology; (iv) Ontology Axiom-extractor for the GATE framework. The Semantic Assistant Framework is aservice-oriented architecture used to enhancing existing end-user clients, such Open Office Writter, with online Lipidomics text analysis capabilities provided as a set of web services. The Ontological NLP methodology links Lipid named entities occurred in a document opened on client side with existing ontologies on server side. The Ontology Axiom-extractor annotates each named entity with canonical name, class name and related class axioms providing annotation for documents on the client side. The proposed system is scalable and extensible allowing researchers to easily customize the information to be delivered as annotations depending on the availability of chemical ontologies with defined axioms linked to canonical names for chemical entities. [1] Baker CJO, Low HS, Kanagasabai R, and Wenk MR, (2010) Lipid Ontologies, 3rdInterdisciplinary Ontology Conference, Tokyo, Japan, February 27-28, 2010 [2] Low HS, Baker CJO, Garcia A and Wenk M., OWL-DL (2009), Ontology for Classification of Lipids, International Conference on Biomedical Ontology, Buffalo, New York, July 24-26 [3] Witte R., Gitzinger T., (2008), A General Architecture for Connecting NLP Frameworks and Desktop Clients Using Web Services, 13th International Conference on Applications of Natural Language to Information Systems [4] Lipid Ontology available at http://bioportal.bioontology.org/ontologies/39503
11:00 36 ChemicalTagger: A tool for semantic text-mining in chemistry.
L. Hawizy, D. M. Jessop, P. Murray-Rust
The Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom

The primary method for scientific communication is in the form of published scientific articles and theses and the use of natural language combined with domain-specific terminology. As such, they contain unstructured data. Given the unquestionable usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. Using chemical synthesis procedures as an exemplar, we present ChemicalTagger. ChemicalTagger is a tool that combines chemical entity recognisers such as OSCAR with tokenisers, part-of-speech taggers and shallow parsing tools to produce a formal structure of reactions. This extracted data can then be expressed in RDF. This allows for the generation of highly informative visualisations, such as visual document summaries, structured querying and further enrichment can be provided by linking with domain specific ontologies.

Section B
Boston Convention & Exhibition Center
156A

The Journal of Chemical Information and Modeling's 50th Anniversary Symposium - Cosponsored by COMP
W. Jorgensen, Organizer, Presiding
8:45   Introductory Remarks.
8:55 37 From canonical numbering to the analysis of enzyme-catalyzed reactions: 32 years of publishing in JCIM (JCICS).
J. Gasteiger, J. Gasteiger
Computer-Chemie-Centrum, University of Erlangen-Nuremberg, Erlangen, Germany; Molecular Networks GmbH, Erlangen, Germany

In 1972 we embarked on the development of a program for computer-assisted synthesis design which eventually led to the present system THERESA. Along the way many fundamental problems had to be solved such as the unique representation of chemical structures published in 1977. This work laid the foundation for building the Beilstein database. Methods had to be developed for the computer representation of chemical reactions which formed the basis for constructing the ChemInform reaction database. Recent work has concentrated on the analysis of biochemical reactions, the prediction of metabolism and the risk assessment of chemicals.
9:25 38 Fifteen years of JCICS.
G. W. Milne
NCI, NIH (Retd), Williamsburg, VA, United States

During the period 1989-2004 when I was Editor of the Journal of Chemical Information and Computer Sciences (JCICS), the predecessor of the Journal of Chemical Information and Modeling (JCIM), many papers appeared addressing contemporary problems in computational chemistry.Some of these problems were completely settled and significant progress was made with others. A third group, in spite of numerous publications, defied attempts at resolution and remain to this day as challenges to computational chemists. As JCIM, aka JCICS, aka J. Chem. Doc embarks upon its second 50 years, the progress recorded during the 1990s and the advances in computer hardware and software are reviewed. With a longer perspective, the impact of computers on chemistry is considered resolved.
9:55 39 Fifteen years in chemical informatics: Lessons from the past, ideas for the future.
D. Agrafiotis
Pharmaceutical Research & Development, Johnson & Johnson, Spring House, Pennsylvania, United States

A unique aspect of chemical informatics is that it has been heavily influenced and shaped by the needs of the pharmaceutical industry. As this industry undergoes a profound transformation, so will the field itself. In this talk, we reflect on the experiences of the past and explore the possibilities we see for the future. These possibilities lie on the convergence of chemistry, biology, and information technology, and will require thinking and working across scientific and organizational boundaries in a way that has never been previously possible.
10:25   Intermission.
10:40 40 Applications of wavelets in virtual screening.
V. Gillet, R. Martin, E. Gardiner, S. Senger
Department of Information Studies, University of Sheffield, Sheffield, United Kingdom; Computational and Structural Chemistry, GlaxoSmithKline, Stevenage, Hertfordshire, United Kingdom

The interactions which a small molecule can make with a receptor can be modelled using three-dimensional molecular fields, such as GRID fields, however, the cumbersome nature of these fields makes their storage and comparison computationally expensive. Wavelets are a family of multiresolution signal analysis functions which have become widely used in data compression. We have applied the non-standard wavelet transform to generate low-resolution approximations (wavelet thumbnails) of finely sampled GRID fields, without loss of information. We demonstrate various applications of wavelet thumbnails including the development of an alignment method to enable the comparison of the wavelet representations of GRID fields in arbitrary orientation.
11:10 41 Privileged substructures revisited: Target community-selective scaffolds.
J. Bajorath
Department of Life Science Informatics, University of Bonn, Germany

Molecular scaffolds that preferentially bind to a given target family, so-called “privileged” substructures, have long been of high interest in drug discovery. Many privileged substructures have been proposed, in particular, for G protein coupled receptors and protein kinases. However, the existence of truly privileged structural motifs has remained controversial. Frequency-based analysis has shown that many scaffolds thought to be target class-specific also occur in compounds active against other types of targets. In order to explore scaffold selectivity on a large scale, we have carried out a systematic survey of publicly available compound data and defined target communities on the basis of ligand-target networks. The analysis was based on compound potency data and target pair potency-derived selectivity. More than 200 hierarchical scaffolds were identified, each represented by at least five compounds, which exclusively bound to targets within one of ca. 20 target communities. By contrast, currently available compound data is too sparsely distributed to assign target-specific scaffolds. Most scaffolds that exclusively bind to a single target within a community are only represented by one or two compounds in public domain databases. However, characteristic selectivity patterns are found to evolve around community-selective scaffolds that can be explored to guide the design of target-selective compounds.
Presentation (ppt)
11:40 42

Automated retrosynthetic analysis: An old flame rekindled.
P. Johnson, A. P. Cook, J. Law, M. Mirzazadeh, A. Simon
School of Chemistry, University of Leeds, Leeds, United Kingdom; Simbiosys Inc, Toronto, Ontario, Canada

The last century saw truly innovative research aimed at the creation of systems for computer aided organic synthesis design (CAOSD). However, such systems have not achieved significant user acceptance, perhaps because they required manual creation of reaction knowledge bases, a time consuming task which requires considerable synthetic chemistry expertise. More recent systems like ARChem1 circumvent this problem by automated abstraction of transformation rules from very large databases of specific examples of reactions. ARChem is still a work in progress and specific problems which are being addressed include:

  • a) dentification of precise structural characteristics of each reaction, often requiring knowledge of reaction mechanism;
  • b) treatment of interfering functional groups;
  • c) minimising the combinatorial explosion inherent in automated multistep retrosynthesis; d) treatment of the results of extensive recent research into enantioselective and stereoselective reactions.

1. Law et al J. Chem. Inf. Model., 2009, 49 (3), pp 593-602

12:10   Lunch
1:30   Introductory remarks
1:40 139

Configurational entropy and mechanical stress in molecular recognition
M. K. Gilson
School of Pharmacy, University of California, San Diego, La Jolla, CA, United States

I will present molecular dynamics simulations consistent with long-ranged entropy effects throughout a protein upon binding a peptide. The results are somewhat preliminary, given the challenge of generating converged simulation results, but are qualitatively consistent with the long-ranged changes in orientational order parameters due to binding, which have been observed in NMR studies of binding. These apparent long-ranged effects raise questions regarding the mechanisms by which binding affects remote parts of the protein. I will explain why the concept of mechanical stress may be useful in thinking about such long-ranged consequences, and will describe our initial computational studies of stress at the molecular level. This image

shows computed stress tensors as a guest molecule is pulled from its cucurbituril host in a simulated single-molecule pulling experiment.

2:10 140 Advancing anthrax toxin countermeasures using topomeric searching and virtual screening methodologies
E. A. Amin, T.-L. Chiu, D. J. Hook, M. A. Walters, B. C. Finzel, J. Solberg, S. Patil, T. W. Geders, S. Rangarajan, R. Francis, X. Zhang
Department of Medicinal Chemistry, University of Minnesota, Minneapolis, Minnesota, United States; Institute for Therapeutics Discovery and Development, University of Minnesota, United States; Department of Chemistry, University of Minnesota, United States

One of the most dangerous bioterror agents is the rod-shaped, spore-forming bacterium Bacillus anthracis, which is the causative agent of anthrax. Concentrated anthrax spores have been deployed as biological weapons in the United States and elsewhere, resulting in high mortality rates among those exposed. The lethal factor (LF) enzyme is secreted by the bacillus as part of the anthrax lethal toxin, and is mainly responsible for anthrax-related cytotoxicity. As LF can remain in the system long after antibiotics have eradicated the bacilli, the preferred therapeutic modality would be the administration of antibiotics together with an effective LF inhibitor. To date, however, no LF inhibitor is available as a therapeutic or preventive agent. Here we present an original high-throughput computational protocol that successfully identified five promising novel LF inhibitor scaffolds with low micromolar inhibition against that target, demonstrating a 12.8% experimental hit rate. This protocol incorporated topomeric shape-based searching techniques that were particularly effective in identifying potential new leads. Three of the five new hits exhibited experimental IC50 values less than 100 mM and may potentially serve as scaffolds for lead optimization. Virtual screening simulations predicted that these preliminary hits are likely to engage in critical ligand-receptor interactions with nearby residues in at least two of the three (S1', S1-S2, and S2') subsites in the LF binding area. Notably, it was found that micromolar-level LF inhibition can be attained by compounds with non-hydroxamate zinc-binding groups that exhibit monodentate zinc chelation as long as key hydrophobic interactions with at least two LF subsites are retained.
2:40 141

Model-free drug-like filters
T. I. Oprea, O. Ursu, C G. Bologa
Department of Biochemistry and Molecular Biology, Division of Biocomputing, University of New Mexico School of Medicine, Albuquerque, NM, United States

Extended connectivity descriptors computed by the Morgan algorithm have been used for the classification of various molecular properties. The information content encoded by such descriptors can be used to compute any 2D descriptors [1]. As these atom environments are canonical, we extracted them as molecular substructures (SMARTS) queries. Rooted in the information gain concept, already applied to derive selection rules in decision trees [2], we aimed at a better separation between classes of chemicals such as “drugs” and “non-drugs”. The most discriminating atom environments (having the highest information gain) were selected as model-free drug-like filters. These can be used to evaluate third party chemical libraries to assess drug-likeness.

[1] JL Faulon, DP Visco, RS Pophale. J. Chem. Inf. Comput. Sci. 2003, 43:707-720

[2] JR Quinlan. Machine Learning 1986, 1:81-106

3:10   Intermission
3:25 142 Chemocentric informatics: Enabling bioactive compound discovery through structural hypothesis fusion
A. Tropsha
School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Historically, computational drug discovery studies have relied on limited sources of data such as biological assays of compound libraries tested against single targets with results published in print. Nowadays, the information resources have broadened dramatically including large chemical genomics databases (e.g., ChEMBL, PubChem, PDSP, ToxCast), digital libraries (e.g., PubMed), gene expression profiles (e.g., cmap), and others. I shall describe a chemocentric informatics strategy integrating different information resources and diverse computational methodologies towards discovering novel bioactive compounds. I shall describe the use of digital libraries for establishing new datasets to analyze the relationships between chemical structure and biological activity; highlight the importance of chemical data curation; and illustrate how computational models help spotting and correcting erroneous data. I will describe a study combining Quantitative Structure Activity Relationship (QSAR) modeling, virtual screening (VS), text mining, and gene expression profiling of chemicals for identifying novel experimentally confirmed high-affinity GPCR ligands as potential anti-Alzheimer drug candidates.
3:55 143 Computers and drug discovery: From duds to $5B drugs
R. C. Glen
Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom

Despite what you may think, given the investment in industrial scale pharmacology and chemistry, drug discovery is still a cottage industry. Small focussed groups of scientists combine diverse expertise from pharmacology and biology to synthesis and design, wrestling with complex and uncertain data. It is a poorly defined science, with undefined outcomes, often guided by rule-of-thumb, intuition and sheer luck. Bringing the logic of computation to the chaos of biology is very difficult, but every so often we succeed beyond our wildest dreams. Since this is the 50th anniversary of The Journal of Chemical Information and Modeling, I would like to review some of our work on novel algorithms and drug discovery, focussing on GPCR's, over the past twenty years and in particular identify some things that worked, some that didn't and also challenge some views of where modelling and computation should be applied, and where it shouldn't (yet).
4:25 144 Weighting and fusion methods for similarity-based virtual screening
P. Willett, S. Arif, J. Holliday, N. Malim, C. Mueller
Information School, University of Sheffield, Sheffield, South Yorkshire, United Kingdom

Recent work in Sheffield on similarity searching has focussed on the use of data fusion and fragment weighting methods to search the MDDR, WOMBAT and MUV databases. Data fusion involves the combination of multiple similarity searches. The overlap between multiple searches is shown to follow a Zipf-like, power law distribution, with very few molecules (or active molecules) common to multiple searches; and a comparison of a large number of different group-fusion algorithms shows that one based on molecules' inverse rank positions is the most effective of those tested. Information about the frequencies with which fragments occur in molecules can be used in two ways to increase search effectiveness (when compared with using just the presence or absence of fragments in molecules): using functions of the frequencies of fragment occurrences in individual molecules, and using inverse functions of the frequency of fragment occurrences in the database as a whole.

MONDAY AFTERNOON

Section B
Boston Convention & Exhibition Center
155

Wheres the Good Stuff? Consumer Health Information, and Social Networking Resources and Services - Cosponsored by CHED
A. Twiss-Brooks, Organizer, Presiding
1:00 43 Dietary supplements: Free evidence-based resources for the cautious consumer.
B. Erb
McGoogan Library of Medicine, University of Nebraska Medical Center, Omaha, NE, United States

Vitamin, mineral and dietary supplements are a 70 billion dollar industry. With marginal FDA regulation, it can be difficult to evaluate the health claims of a given product. How can the skeptical consumer distinguish a promising nutritional supplement from a substance that lacks the evidence to back its nutritional claims? This short presentation will highlight some evidence-based Internet sources that will help the consumer navigate the dietary supplement minefield. These sources will not only help the consumer separate bogus claims from research supported evidence, but also help the consumer make informed nutritional decisions regarding which supplements might be a relevant and useful part of their healthy diet and lifestyle. The resources to be explored have been collected in a UNMC libguide at http://unmc.libguides.com/supplements for ease of navigation and dissemination.
Presentation (pdf)
1:25 44 What lessons learned can we generalize from evaluation and usability of a health website designed for lower literacy consumers?.
M. J. Moore, R. G. Bias
Department of Health Informatics, University of Miami Miller School of Medicine, Miami, FL, United States; Department of Information, University of Texas at Austin, Austin, Texas, United States

Objectives: Researchers conducted multifaceted usability testing and evaluation of a website designed for use by those with lower computer literacy and lower health literacy. Methods included heuristic evaluation by a usability engineer, remote usability testing and face-to-face testing. Results: Standard usability testing methods required modification, including interpreters, increased flexibility for time on task, presence of a trusted intermediary, and accommodation for family members who accompanied participants. Participants suggested website redesign, including simplified language, engaging and relevant graphics, culturally relevant examples, and clear navigation. Conclusions: User-centered design was especially important for this audience. Some lessons learned from this experience are echoed in usability and evaluation of commercial sites designed for similar audiences, and may be generalizable.
1:50 45 National Library of Medicine resources for consumer health information.
M. Eberle
National Network of Libraries of Medicine - New England, Shrewsbury, MA, United States

Come learn about free, high quality web resources for consumer health information from the National Library of Medicine. We will cover MedlinePlus, a resource for health information for the public. The presenter will take you on a guided tour of http://medlineplus.gov and other specialized web resources for consumer health information including the Drug Information Portal, DailyMed and the Dietary Labels Supplement Database. The program will wrap up with a brief introduction to ClinicalTrials.gov. You will leave this program equipped with expertise to find, critically appraise, and use online health information more effectively.
2:15 46 Better prescription for information: Dietary supplements online.
G. Y. Hendler
Hirsh Health Sciences Library, Tufts University, Boston, MA, United States

Dietary supplements are becoming staples in the health regimens of a growing number of consumers worldwide. According to the most recent National Health and Nutrition Examination Survey, 52% percent of adults in the United States reported taking a nutraceutical in the past month. Consumers turn to these products believing they are safe and effective because they are “all natural.” Supplementing knowledge about the benefits and the potential risks associated with nutraceutical use requires information resources that are authoritative, accurate and readable to a large and general audience. This presentation will provide recommendations for locating high-quality, freely available online resources that today's consumers need to support decision-making. Featured resources will include books, databases and websites that discuss the pros and cons and provide the evidence for better use of dietary supplements, herbs and functional foods.
Presentaiton (pdf)

Section A
Boston Convention & Exhibition Center
156A

Semantic Web in Chemistry - Cosponsored by COMP
M. Braendle, Organizers
E. Willighagen, Organizer, Presiding
1:15 47

Overview of the linking open drug data task.
E. Prudhommeaux, E. Willighagen, S. Stephens
W3C/MIT, Cambridge, MA, United States; Uppsala University, Uppsala, Sweden; , Johnson and Johnson, United States

There is much interesting information about drugs that is available on the Web. Data sources range from medicinal chemistry results, to the impacts of drugs on gene expression, through to the results of drugs in clinical trials. Linking Open Drug Data (LODD) is a task within the W3C's Health Care Life Sciences Interest Group. LODD has surveyed publicly available data sets about drugs, created Linked Data representations of the data sets and interlinked them together, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task also actively explores best practices for exposing data in a Linked Data representation. The figure below shows part of the data sets that have been published and interlinked by the task so far.

The LODDse data sets are represented in dark gray, while light gray represents other Linked Data from the life sciences, and white indicates data sets from different domains. Collectively, the LODD data sets consist of over 8 million RDF triples, which are interlinked by more than 370,000 RDF links. This presentation will introduce the LODD task and show examples of recent.

2:00 48 Control, monitoring, analysis and dissemination of laboratory physical chemistry experiments using semantic web and broker technologies.
J. G. Frey, S. Wilson
School of Chemistry, Univeristy of Southampton, Southampton, Hants, United Kingdom

A suite of software was developed to control and monitor experimental and environmental data and used for probing of the air/water interface using Second Harmonic Generation. A centralised message broker enabled a common communication protocol between all objects in the system; experimental apparatus, data loggers, storage solutions and displays. The data and context are captured and represented in ways compatible with the Semantic Web. Experimental plans and the enactment are described using the oreChem experiments ontology; this provides the means to capture the metadata associated with the experimental process and the resulting data. Environmental data was stored in the Open Geospatial Consortium Sensor Observation Service (SOS). The SOS is part of the Sensor Web Enablement architecture; this describes a number of interoperable interfaces and metadata encodings for integrating sensors webs into the cloud. A mashup web interface was produced to link all these sources of information from a single point.
2:30   Intermission.
2:45 49 Semantic analysis of chemical patents.
D. M. Jessop, L. Hawizy, P. Murray-Rust, R. C. Glen
The Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom

Chemical patents are a rich source of technical and scientific information. They include meta-data, such as bibliographic information, as well as scientific data relating to reactions and synthesis experiments. However, they are lengthy, largely unstructured and rich in technical terminology such that it takes a signification amount of human efforts for analyses. This would make them an ideal candidate for 'semantification'. As a demonstration, an RDF triplestore of chemical patents is created. The patents, provided by the European Patent Office, are in an XML format. Document segmentation is used initially to extract the relevant information, mainly bibliographic information and experimental paragraphs. The experimental paragraphs are then processed using Natural Language Processing tools to extract the various components of the chemical reaction; roles, such as reactant, product or solvent, are then assigned. This extracted information is then converted into RDF and stored in a triplestore where it can then be queried, visualised and basic inferences can be made.The ultimate goal of this semantic representation, is to make data available and re-usable by the scientific community.
3:15 50 Data mining and querying of integrated chemical and biological information using Chem2Bio2RDF.
D. J. Wild, B. Chen, Y. Ding, X. Dong, H. Wang, D. Jiao, Q. Zhu, M. Sankaranarayanan
School of Informatics and Computing, Indiana University, Bloomington, IN, United States; School of Library and Information Science, Indiana University, Bloomington, IN, United States

We have recently developed a freely-available resource called Chem2Bio2RDF (http://chem2bio2rdf.org) that consists of chemical, biological and chemogenomic datasets in a consistent RDF framework, along with SPARQL querying tools that have been extended to allow chemical structure and similarity searching. Chem2Bio2RDF allows integrated querying that crosses chemical and biological information including compounds, publications, drugs, genes, diseases, pathways and side-effects. It has been used for a variety of applications including investigation of compound polypharmacology, linking drug side-effects to pathways, and identifying potential multi-target pathway inhibitors. In the work reported here, we describe a new set of tools and methods that we have developed for querying and data mining in Chem2Bio2RDF, including: Linked Path Generation (a method for automatically identifying paths between datasets and generating SPARQL queries from these paths); an ontology for integrated chemical and biological information; a Cytoscape plugin that allows dynamic querying and network visualization of query results; and a facet-based browser for browsing results.
3:45 51 Mining and visualizing chemical compound-specific chemical-gene/disease/pathway/literature relationships.
Q. Zhu, P. Purohit, J. Youl Choi, S. Bae, J. Qiu, Y. Ding, D. Wild
School of Informatics and Computing, Indiana University, Bloomington, IN, United States; School of Library & Information Science, Indiana University, Bloomington, IN, United States; Department of Computer Science, Indiana University, Bloomington, IN, United States

In common with most scientific disciplines, there has in the last few years been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery, owing to a variety of factors including improvements in experimental technologies. So the big challenge for us is how we can use all of this information together in an intelligent way, in an integrative fashion. We are developing an application to mine relationships between Chemical and Gene/Disease/Pathway/Literature, and visualize them. It aims to help answer the question “anything else should I know about this compound?” from a medicinal chemistry perspective based on the full picture of chemicals. For the mining part, we have already developed an aggregating web services, named WENDI, which calls multiple individual or atomic, web services including diversity of compound-related data sources, predictive models and self-developed algorithms, and aggregates the results from these services in XML; For visualizing, two ways to go: First, we create a RDF reasoner to convert XML from WENDI to RDF, find inferred relationships based on RDF, rank evidences focused on chemical-disease, and print all evidences out by using SWP faceted browser based on Longwell http://simile.mit.edu/wiki/Longwell), it mixes the flexibility of the RDF data model with the faceted browser to enable users to browse complex RDF triples in a user-friendly and meaningful manner; Second, we place all relationships from WENDI into a chemical space consisted of 60M PubChem compounds, then clustered/highlighted particular chemical compounds with specific attributes, like gene/disease/pathway/literature by using PubChemBrowse, which is a customized visualization tool for cheminformatics research and provides a novel 3D data point browser that displays complex properties of massive data on commodity clients and supports fast interaction with an external property database via semantic web interface.
4:15   Intermission.
4:20   CINF Open Meeting
4:30   Open Meeting. Committees on Publications and Chemical Abstracts Service

Section B
Boston Convention & Exhibition Center
155

CINFlash: Can You Present Faster Than a Femtosecond Laser?
R. Guha, Organizer, Presiding
2:45   Panel Discussion

MONDAY EVENING

Sci-Mix
R. Guha, Organizer
8:00 - 10:00   See previous listings (2, 6, 20, 28, 31, 78, 91)

TUESDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

Herman Skolnik Award Symposium: The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis
E. X. Esposito, Organizer
A. Hopfinger, Organizer, Presiding
8:15   Introductory Remarks.
8:30 52 What makes polyphenols good antioxidants? Alton Brown, you should take notes...
E. X. Esposito
The Chem21 Group, Inc, Lake Forest, Illinois, United States

The dominant physical feature of antioxidants are phenols; polyphenols according to Alton Brown. The proposed antioxidant-tyrosinase mechanism, based on a series of experimentally determined mushroom tyrosinase structures, provides insight to the molecular interactions that drive the reaction. While the enzyme structures illustrate the important molecular interactions for tyrosinase inhibition, the enzyme structures do not always facilitate the understanding of what makes a good inhibitor or the mechanism of the reaction. Using an antioxidant (tyrosinase inhibitors) dataset of 626 compounds (from the linear discriminate analysis research of Martín et al. Euro J Med Chem 42 p1370-1381, 2007) we constructed binary QSAR models to indicate the important antioxidant molecular features. Exploring models constructed from molecular descriptors based on fingerprints (MACCS keys), traditional molecular descriptors (2D and 2½D), VolSurf-like molecular descriptors (3D) and molecular dynamics (4D-Fingerprints), the relationship between polyphenols' biologically relevant molecular features - as determined by each set of descriptors - and their antioxidant abilities will be discussed.
9:15 53 Engineering and 3D protein-ligand interaction scaling of 2D fingerprints
J. Bajorath
Department of Life Science Informatics, University of Bonn, Bonn, Germany

Different concepts are introduced to further refine and advance molecular descriptors for SAR analysis. Fingerprints have long been among preferred descriptors for similarity searching and SAR studies. Standard fingerprints typically have a constant bit string format and are used as individual database search tools. However, by applying “engineering” techniques such as “bit silencing”, fingerprint reduction, and “recombination”, standard fingerprints can be tuned in a compound class-directed manner and converted into size-reduced versions with higher search performance. It is also possible to combine preferred bit segments from fingerprints of distinct design and generate “hybrids” that exceed the search performance of their parental fingerprints. Furthermore, effective 2D fingerprint representations can be generated from strongly interacting parts of ligands in complex crystal structures. These “interacting fragment” fingerprints focus search calculations on pharmacophore elements without the need to encode interactions directly. Moreover, 3D protein-ligand interaction information can implicitly be taken into account in 2D similarity searching through fingerprint scaling techniques that emphasize characteristic bit patterns.
Presentation (pdf)
10:00   Intermission.
10:15 54 In silico binary QSAR models based on 4D-fingerprints and MOE descriptors for prediction of hERG blockage.
Y. Tseng
Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan Republic of China

Blockage of the human ether-a-go-go related gene (hERG) potassium ion channel is a major factor related to cardiotoxicity. Hence, drugs binding to this channel have become an important biological endpoint in side effects screening. We have collected all available biologically active hERG compounds from the hERG literature for a total of 250 structurally diverse compounds. This data set was used to construct a set of two-state hERG QSAR models. The descriptor pool used to construct the models consisted of 4D-fingerprints generated from the thermodynamic distribution of conformer states available to a molecule, 204 traditional 2D descriptors and 76 3D VolSurf-like descriptors computed using the Molecular Operating Environment (MOE) software. One model is a continuous partial least squares (PLS) QSAR hERG binding model. Another related model is an optimized binary QSAR model that classifies compounds as active, or inactive. This binary model achieves 91% accuracy over a large range of molecular diversity spanning the training set. An external test set was constructed from the condensed PubChem bioassay database containing 816 compounds and successfully used to validate the binary model. The binary QSAR model permits a structural interpretation of possible sources for hERG activity. In particular, the presence of a polar negative group at a distance of 6 to 8 Å from a hydrogen bond donor in a compound is predicted to be a quite structure-specific pharmacophore that increases hERG blockage. Since a data set of high chemical diversity was used to construct the binary model, it is applicable for performing general virtual hERG screening.
11:00 55 Telling the good from the bad and the ugly: The challenge of evaluating pharmacophore model performance.
R. D. Clark
Simulations Plus, Inc., Lancaster, California, United States

Pharmacophore models are useful when they provide qualitative insight into the interactions between ligands and their target macromolecules, and therefore are more akin in many ways to molecular simulations than to quantitative structure activity relationships (QSARs) based on the partition of activity across a set of molecular descriptors. When the performance of a pharmacophore model is assessed quantitatively, it is usually in terms of its ability to recover known ligands or, less often, in terms of how well it distinguishes ligands from non-ligands. This status as a classification technique also sets it apart from more numerical QSAR methods, in part because of fundamental differences in what being "good" means. Carefully defining what "good" classification is, however, can make creative combination with other techniques a productive way to capture the value of their intrinsic complementarity.

TUESDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

Herman Skolnik Award Symposium: The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis
A. Hopfinger, Organizer
E. X. Esposito, Organizer, Presiding
2:00 56

Creative application of ligand-based methods to solve structure-based problems: Using QSAR approaches to learn from protein crystal structures.
C. M. Breneman, S. Das, M. Sundling, M. Krein, S. Cramer, K. P. Bennett, C. Bergeron, J. Zaretzki
Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, United States; Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, United States; Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, United States

In practice, there is no inherent disconnect between the descriptor-based cheminformatics methods commonly used for predicting small molecule properties and those that can be used to understand and predict protein behaviors. Examples of such connections include the development of predictive models of protein/stationary phase binding in HIC and ion-exchange chromatography, protein/ligand binding mode characterization through PROLICSS analysis of crystal structures, and the use of PESD binding site signatures for pose scoring and predicting off-target drug interactions. In all of these cases, models were created using descriptors based on protein electronic and structural features and modern machine learning methods that include model validation tools and domain of applicability assessment metrics.

2:45 57 Computer-aided drug discovery.
W. L. Jorgensen
Department of Chemistry, Yale University, New Haven, CT, United States

Drug development is being pursued through computer-aided structure-based design. For de novo lead generation, the BOMB program builds combinatorial libraries in a protein binding site using a selected core and substituents, and QikProp is applied to filter all designed molecules to ensure that they have drug-like properties. Monte Carlo/free-energy perturbation simulations are then executed to refine the predictions for the best scoring leads including ca. 1000 explicit water molecules and extensive sampling for the protein and ligand. FEP calculations for optimization of substituents on an aromatic ring and for choice of heterocycles are now common. Alternatively, docking with Glide is performed with the large databases of purchasable compounds to provide leads, which are then optimized via the FEP-guided route. Successful application has been achieved for HIV reverse transcriptase, FGFR1 kinase, and macrophage migration inhibitory factor (MIF); micromolar leads have been rapidly advanced to extraordinarily potent inhibitors.
3:30   Intermission.
3:45 58 Structure-based discovery and QSAR methods: A marriage of convenience.
J. S. Duca
Novartis, Cambridge, MA, United States

The art of building predictive models of the relationships between structural descriptors and molecular properties has been historically important to drug design. In the recent years there has been an extraordinary amount of experimental data available from processes designed to accelerate drug discovery in pharma; from high throughput screening and automation applied to library design and synthesis to chemogenomics and microarray analysis. QSAR methods are one of the many tools to predict affinity-related, physicochemical, pharmacokinetic and toxicological properties through analyzing and extracting information from molecular databases and HTS campaigns.This presentation will cover case studies in which QSAR and Structure-Based Drug Design (SBDD) have worked in concert during the discovery process of pre-clinical candidates. The importance of incorporating time-dependent sampling to improve the quality of the nD-QSAR models (n=3,4) will also be discussed and compared to simplified low dimensional QSAR models. For those cases where structural information cannot be readily available an extension of these methodologies will be discussed in relation to ligand-based approaches.
4:30 59 Extending the QSAR Paradigm using molecular modeling and simulation.
A. J. Hopfinger
College of Pharmacy, MSC 09 5360, University of New Mexico, Albuquerque, NM, United States; Computational Chemistry, The Chem21 Group, Inc., Lake Forest, IL, United States

QSAR analysis and molecular modeling/ simulation methods are often complementary, and when combined in a study yield results greater than the sum of their parts. Modeling and simulation offer the ability to design custom, information-rich trial descriptors for a QSAR analysis. In turn, QSAR analysis is able to discern which of the custom descriptors most fully relate to the behavior of an endpoint of interest. One useful set of custom QSAR descriptors from modeling and simulation for describing ligand-receptor interactions are the grid cell occupancy descriptors, GCODs, of 4D-QSAR analysis. These descriptors characterize the relative spatial occupancy of all the atoms of a molecule over the set of conformations available to the molecule when in a particular environment. GCODS permit the construction of a 4D-QSAR equation for virtual screening, as well as a spatial pharmacophore of the 4D-QSAR equation for exploring mechanistic insight. Applications that can particularly benefit from combining QSAR analysis and modeling/simulation tools are those in which a model chemical system is needed to determine the sought after property. One such application is the transport of molecules through biological compartments, an integral part of many ADMET properties. The reliable estimation of eye irritation is greatly enhanced by simulating the transport of test solutes through membrane bilayers, and using extracted properties from the simulation trajectories as custom descriptors to build eye irritation QSAR models. These key descriptors of the QSAR models, in turn, also permit the investigator to probe and postulate detailed molecular mechanisms of action.
5:15   Presentation of Award

WEDNESDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research - Cosponsored by COMP and MEDI
G. Maggiora, M. Lajiness, Organizers
J. Bajorath, Organizer, Presiding
8:50   Introductory remarks.
9:00 60 Overview of activity landscapes and activity cliffs: Prospects and problems.
G. M. Maggiora
Department of Pharmacology & Toxicology, University of Arizona College of Pharmacy, Tucson, AZ, United States; BIO5 Institute, University of Arizona, Tucson, AZ, United States; Translational Genomics Research Institute, Phoenix, AZ, United States

Substantial growth in the size and diversity of compound collections and the capability to subject them to an increasing variety of different high-throughput assays manifests the need for a more systematic and global view of structure-activity relationships. The concepts of chemical space and molecular similarity, which are now well known to the drug-research community, provide a suitable framework for developing such a view. Augmenting a chemical space with activity data from various assays generates a set of activity landscapes, one for each assay. The topography of these landscapes contains important information on the structure-activity relationships of compounds that inhabit the chemical space. Activity cliffs, which arise when similar compounds possess widely different activities, are a particularly informative feature of activity landscapes with respect to SAR. The talk will present an overview of activity landscapes and cliffs and will describe some of the prospects and problems associated with these important concepts.
9:30 61 Exploring and exploiting the potential of structure-activity cliffs.
M. S. Lajiness, G. M. Maggiora
Department of Pharmacology & Toxicology, University of Arizona College of Pharmacy, Tucson, Arizona, United States; Scientific Informatics, Eli Lilly & Co, Indianapolis, IN, United States

It's well known that small structural changes sometimes result in large changes in activity. There have been some recent efforts to identify such changes but little in regards to defining which structural changes are most informative or even real. Also, the missing value problem often obfuscates the ability to detect relevant patterns if in fact they exist. This presentation will present several ideas and applications for exploring and exploiting Structure-Activity Cliffs. In addition, various visualizations and approaches to communicate the information contained in these "cliffs" will be shared. Examples will be drawn from PubChem.
10:00 62 What makes a good structure activity landscape? Network metrics and structure representations as a way of exploring activity landscapes.
R. Guha
Department of Informatics, NIH Chemical Genomics Center, Rockville, MD, United States

The representation of SAR data in the form of landscapes and the identification of activity cliffs in such landscapes is well known. A number of approaches have been described to identifying activity cliffs, including several network based methods such as the SALI approach (JCIM, 2008, 48, 646-658). While a network representation of an SAR landscape moves away from the intuitive idea of rolling hills and steep gorges, it allows us to apply a variety of quantitative analyses. In this talk I will first examine some of the properties of SALI networks using various measures of network structures and attempt to correlate these features with features of the SAR data. While most examples are from relatively small datasets I will highlight some examples from larger datasets from high-throughput screens. While such data can be noisy and contain artifacts I will examine whether the underlying network structure can shed light on specific molecules that may be worth following up. The second focus of the talk will look at the effect of structure representations on the smoothness of the landscape and how one can derive ideas from the SALI characterization to suggest good or bad landscapes.
10:30   Intermission.
10:45 63 Consensus model of activity landscapes and consensus activity cliffs.
J. L. Medina-Franco, K. Martinez-Mayorga, F. Lopez-Vallejo
Torrey Pines Institute for Molecular Studies, Port St Lucie, FL, United States

Characterization of activity landscapes is a valuable tool in lead optimization, virtual screening and computational modeling of active compounds. As such understanding the activity landscape and early detection of activity cliffs [Maggiora, G. M. J. Chem. Inf. Model. 2006, 46, 1535] can be crucial to the success of computational models. Similarly, characterizing the activity landscape will be critical in future ligand-based virtual screening campaigns. However, the chemical space and activity landscape are influenced by the particular representation used and certain representations may lead to apparent activity cliffs. A strategy to address this problem is to consider multiple molecular representations in order to derive a consensus model for the activity landscape and in particular identify consensus activity cliffs [Medina-Franco, J. L. et al. J. Chem. Inf. Model. 2009, 49, 477]. The current approach can be extended to indentify consensus selectivity cliffs.
11:15 64 R-Cliffs: Activity cliffs within a single analog series.
D. Agrafiotis
Pharmaceutical Research & Development, Johnson & Johnson, Spring House, Pennsylvania, United States

The concept of activity cliffs has gained popularity as a means to identify and understand discontinuous SAR, i.e., regions of SAR where minor changes in structure have unpredictably large effects on biological activity. To the best of our knowledge, activity cliffs have been invariably evaluated using global measures of molecular similarity that do not take into account the presence of finer substructure among a series of related analogs. In this talk, we look at activity cliffs within a congeneric series, by decomposing them into R-groups and analyzing how activity is affected by changes in a single variation site. The analysis is greatly enhanced by R-group-aware visualization tools such as the SAR maps, which have been enhanced to specifically highlight such discontinuities.

Section B
Boston Convention & Exhibition Center
155

Recent Progress in Chemical Structure Representation
R. Apodaca, Organizer, Presiding
9:00   Introductory Remarks.
9:05 65 Chemical structure representation in the DuPont Chemical Information Management Solutions database: Challenges posed by complex materials in a diversified science company.
M. A. Andrews, E. S. Wilks
CR&D, Information & Computing Technologies, DuPont, Wilmington, DE, United States

This talk will describe the novel ways we have developed to represent precisely the structures of the diverse chemical materials of interest to DuPont. These range from simple organics and inorganics to polymers, mixtures, formulations, multi-layer films, composites, and even devices and incompletely defined substances. Part of the solution involves evaluating trade-offs, which may be situation dependent, between details captured in the structure vs. details captured at the sample history level, e.g., ratios of components, polymer molecular weights and microstructures, and the existence of “fairy dust” components. An important aspect of the solution involves ensuring robust structure standardization and duplicate checking for complex and ill-defined substances. We believe that our needs and solutions have challenged and inspired a number of chemical software vendors to provide significant upgrades to the functionalities of their drawing packages and database cartridges.
Presentation (pdf)
9:35 66 From deposition to application: Technologies for storing and exploiting crystal structure data.
C. R. Groom, J. Cole, S. Bowden, T. Olsson
Cambridge Crystallographic Data Centre, United Kingdom

In December 2009 The Cambridge Crystallographic Data Centre (CCDC) archived the 500,000th small-molecule crystal structure to the Cambridge Structural Database (CSD). The passing of this milestone highlights the rate of growth of the CSD in recent years and the continuing challenges this represents in terms of information storage and exchange. This talk will describe the development of a number of tools for the processing, validation, and storage of crystal structure data. Recent developments that will aid this growing body of structural knowledge to be exploited in a range of applications and the provision of additional services that can assist the scientific community will also be illustrated.
10:05 67 Recent IUPAC recommendations for chemical structure representation: An overview.
J. Brecher
CambridgeSoft Corporation, Cambridge, MA, United States

Accurate and unambiguous depiction of chemical information is a key step in communicating that information. Such depiction is equally important whether the intended audience is a human chemist (as in a journal article or patent) or a computer (as in a chemical registration system). Recent IUPAC publications provide chemists a practical guide for producing chemical structure diagrams that accurately convey the author's intended meaning. A summary of those recommendations will be presented. As part of that summary, common pitfalls in producing chemical structure diagrams will be discussed. Solutions to those pitfalls will also be described, with an emphasis on solutions that are simple, straightforward, and accessible to the majority of practicing chemists.
10:35   Intermission
10:50 68 Orbital development kit.
E. L. Willighagen
Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden

Understanding properties of molecular structures requires a computer representation, and quantum mechanical and chemical graph representations have been used abundantly. Own have found their own areas of application in chemistry, and their fields are best described as theoretical chemistry and cheminformatics, respectively. The Orbital Development Kit (ODK) positions itself in-between these two representations, though closest to chemical graph theory, and addressing shortcomings of the latter. In particular, it replaces coloring of the nodes and edges in the chemical graph with atom hybridization and bond order explicit, making the representation more precise in how it represents geometrical features of the molecule. The ODK does so by replacing the atom as single node in the chemical graph by a central atomic core surrounded by valence orbitals, possible hybridized. Using this approach, the definition of an atom type is reformulated as a core element with a particular and well-defined set of identifiable orbitals with an implied, though relative, geometrical orientation. Bonding is now the connection of two orbitals, and a lone pair becomes a single orbital, and is therefore directional too. This approach means that the classical double bond in ethene is now represented by one sigma bonding between two sp2 orbitals of the two carbons, and one bonding of their two pz orbitals. This ODK representation leaves also room for representations beyond the chemical graph, such as proposed by Dietz in 1995: more than two orbitals can be combined into set to represent delocalization. The presentation will present the ODK data model, serialization and deserialization into a Resource Description Framework-based file format, and a bridge to the Chemistry Development Kit, for visualization and molecular property calculation.
Presentation (pdf)
11:20 69 Line notations as unique identifiers.
K. Boda
OpenEye Scientific Software, Santa Fe, New Mexico, United States

A wide variety of structure representation formats have been devised to encode molecular information in order to register, store and manipulate molecules in silico. One class of these formats, called line notations, is designed to express molecules as compact, unambiguous strings that can be used as unique identifiers for compound registration eliminating the computationally more expensive graph matching. The presentation will provide an overview of popular line notations, such as canonical SMILES, isomeric SMILES, and InChI, discussing their merits and shortcomings in regards to using them as robust lossless unique identifiers.We will present results of testing a variety of line notations on a diverse set of 10M compounds generated by combining organic and inorganic vendor databases. We will also examine the information loss of various molecular normalization procedures with regard to line notation generation.

WEDNESDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research - Cosponsored by COMP and MEDI
J. Bajorath, M. Lajiness, Organizers
G. Maggiora, Organizer, Presiding
2:00 70 Analysis of activity landscapes, activity cliffs, and selectivity cliffs.
J. Bajorath
Department of Life Science Informatics, University of Bonn, Germany

The concept of activity landscapes (ALs) is of fundamental importance for the exploration of structure-activity relationships (SARs). ALs are best rationalized as biological activity hypersurfaces in chemical space. When reduced to three dimensions, ALs display characteristic topologies that determine the SAR behavior of compound sets. Prominent features of ALs are activity cliffs that are formed by structurally similar compounds having large potency differences, giving rise to SAR discontinuity. ALs and activity cliffs can be analyzed in different ways including similarity-potency diagrams, approximate three-dimensional landscape representations, or molecular networks integrating compound similarity and potency information. Annotated similarity-based compound networks that incorporate results of numerical SAR analysis functions, termed Network-like Similarity Graphs (NSGs) are designed to explore relationships between global and local SAR features in compound data sets of any source. For collections of analogs, substitution patterns that introduce activity cliffs are identified in Combinatorial Analog Graphs (CAGs) that make it also possible to study additive and non-additive effects of compound modifications. Activity cliffs identified in CAGs can frequently be rationalized on the basis of complex crystal structures. When studying multi-target SARs using the NSG framework, the concept of activity cliffs can be extended to selectivity cliffs, i.e. similar compounds having significant differences in target selectivity.
Presentation (pdf)
2:30 71 Using Activity Cliff Information in structure-based design approaches.
B. Seebeck, M. Wagener, M. Rarey
for Bioinformatics (ZBH), University of Hamburg, Hamburg, Germany; Molecular Design and Informatics, MSD, Oss, The Netherlands

Activity cliffs are often the pitfall of QSAR modeling techniques, but at the same time they exhibit key features of a SAR. Based on the principles of the structure-activity landscape index (SALI) [1], here we present an approach to use the valuable information of activity cliffs in a structure-based design scenario, analyzing key interactions between protein-ligand complexes in activity cliff events. We visualize those interaction “hot spots” directly in the active site of target proteins. In addition, we use the activity cliff information to derive target-specific scoring models and pharmacophoric hypothesis, which are validated in enrichment experiments on independent external test sets. The results show an improved enrichment in comparison to the standard score for various protein targets.
1. Guha R. and Van Drie J.H., J. Chem. Inf. Model., 2008, 48, 646-658.
3:00 72 Exploring activity cliffs using large scale semantic analysis of PubChem.
D. J. Wild, B. Chen, Q. Zhu
School of Informatics and Computing, Indiana University, Bloomington, IN, United States

Identification of Activity Cliffs, defined as the ratio of the difference in activity of two compounds to their “distance” of separation in a given chemical space [1], has been established as important in the creation of robust quantitative-structure activity relationship models. Previously, a method, SALI, for identifying and visualizing these activity cliffs was developed at Indiana University, and applied successfully to several established QSAR datasets [2]. In the work reported here, we have extended this work in two ways. First, we have used structure and activitydata from the public PubChem BioAssay dataset to evaluate the method on a much larger scale, and second, we have integrated it with a project called Chem2Bio2RDF to look not just for activity cliffs based on reported assay values, but also on computationally established relationships between compounds and genes and diseases. We thus propose an extended application of SALI which can be used in a systems chemical biology and chemogenomic context.

[1] J. Chem. Inf. Model., 2006, 46 (4), p 1535
[2] J. Chem. Inf. Model., 2008, 48 (3), pp 646-658

3:30 73 Quantifying the usefulness of a model of a structure-activity relationship: The SALI Curve Integral.
J. H. Van Drie, R. Guha
Research LLC, Andover, MA, United States; Chemical Genomics Center, NIH, Bethesda, MA, United States

In 2008, in two papers Guha and Van Drie introduced the notion of structure-activity landscape index (SALI) curves as a way to assess a model and a modeling protocol, applied to structure-activity relationships. The starting point is to study a structure-activity relationship pairwise, based on the notion of "activity cliffs"--pairs of molecules that are structurally similar but have large differences in activity. The basic idea behind the “SALI Curve” is to tally how many of these pairwise orderings a model is able to predict. Empirically, testing these SALI curves against a variety of models, ranging over structure-based and non-structure-based models, the utility of a model seems to correspond to characteristics of these curves. In particular, the integral of these curves, denoted as SCI and being a number ranging from -1.0 to 1.0, approaches a value of 1.0 for two literature models, which are both known to be prospectively useful.
4:00   Concluding Remarks

Section B
Boston Convention & Exhibition Center
155

Recent Progress in Chemical Structure Representation
R. Apodaca, Organizer, Presiding
2:00 74 Status of the InChI and InChIKey algorithms.
S. Heller
CBRD, MS - 8320, NIST, Gaithersburg, MD, United States

The Open Source chemical structure representation standard, the IUPAC InChI/InChIKey project, has evolved considerably in the past two years. The project is now being supported and widely used by virtually all major publishers of chemical journals, databases, and structure drawing and related software. This usage of the InChI/InChIKey in their products enable them to link information between their products and other (fee-free and fee-based) chemical information available on the world wide web via the Internet These organizations are now providing for a stable and financially viable structure to the project. This is enabling the world-wide chemistry community to expand its use of the InChI knowing that this freely available Open Source algorithm will be widely accepted and used of as a mainstream standard. The mission of the Trust is quite simple and limited; its sole purpose is to create and support administratively and financially a scientifically robust and comprehensive InChI algorithm and related standards and protocols. This presentation will describe the current technical state of the InChI and InChIKey algorithms.
Presentation (pdf)
2:30 75 Self-contained sequence representation (SCSR): Bridging the gap between bioinformatics and cheminformatics.
K. T. Taylor, W. L. Chen, B. D. Christie, J. L. Durant, D. L. Grier, B. A. Leland, J. G. Nourse
Symyx Technologies Inc, San Ramon, CA, United States

In this paper we will discuss the benefits and disadvantages of the current approaches for storing biological sequence information. We have developed a hybrid representation that uses the compactness of the sequence, together with the detail of chemical connectivity information for modified regions. It represents standard residues with substructure. All instances of the same residue are represented by a single template. This hybrid approach is compact and scalable. We have developed a converter that takes a UniProt format file extracts the sequence information and derives the modifications producing an SCSR record. The SCSR is encoded as a molfile and registered into a Symyx Direct database. Duplicate checking, exact matching - with and without the modifications -molecular weight calculation, and substructure searching are all available with these structures. We are using this representation for peptides, oligonucleotides, and we are now extending it to oligosaccharides. Non-natural residues can be included in an SCSR.
Presentation (pdf)
3:00   Intermission.
3:15 76 Representation of Markush structures: From molecules toward patents.
S. Csepregi, N. Máté, R. Wágner, T. Csizmazia, S. Dóránt, E. Bíró, T. Dudgeon, A. Baharev, F. Csizmadia
ChemAxon Ltd., Budapest, Hungary

Cheminformatics systems usually focus primarily on handling specific molecules and reactions. However, Markush structures are also indispensable in various areas, like combinatorial library design or chemical patent applications for the description of compound classes. The presentation will discuss how an existing molecule drawing tool (Marvin) and chemical database engine (JChem Base/Cartridge) are extended to handle generic features (R-group definitions, atom and bond lists, link nodes and larger repeating units, position and homology variation). Markush structures can be drawn and visualized in the Marvin sketcher and viewer, registered in JChem databases and their library space is searchable without the enumeration of library members. Different enumeration methods allow the analysis of Markush structures and their enumerated libraries. These methods include full, partial and random enumerations as well as calculation of the library size. Furthermore, unique visualization techniques will be demonstrated on real-life examples that illustrate the relationship between Markush structures and the chemical structures contained in their libraries (involving substructures and enumerated structures). Special attention will be given to file formats and how they were extended to hold generic features.
Presentation (pdf)
3:45 77 CSRML: A new markup language definition for chemical substructure representation.
C;H. Schwab, B. Bienfait, J. Gasteiger, T. Kleinoeder, Joerg Marucszyk, O. Sacher, A. Tarkhov, L.Terfloth, C. Yang 
Molecular Networks GmbH, Erlangen,, Bavaria, Germany; Altamira LLC, Columbus, Ohio, United States

Although, chemical subgraphs or substructures are quite popular and used since a long time in chemoinformatics, the existing and well established standards still have some limitations. In general, these standards are suited even for complex substructure queries, however, show some insufficiences, e.g., for the inclusion of physicochemical properties or annotation of meta information. In addition, the existing standards are not fully interconvertible and specify no validation techniques to check the semantic correctness of a query definition. This paper proposes an approach for the representation of chemical subgraphs that aims to overcome the limitations of existing standards. The approach presents a well-structured, XML-based standard specification, the Chemical Subgraph Representation Markup Language (CSRML), that supports a flexible annotation mechanism of meta information and properties at each level of a substructure as well as user-defined extensions. Furthermore, he specification foresees a mandatory inclusion and use of test cases. In addition, it can be used as an exchange format.
Presentation (pdf)

THURSDAY MORNING

Section A
Boston Convention & Exhibition Center
156A

General Papers
R. Guha, Organizer, Presiding
8:45 78 Prediction of solvent physical properties using the hierarchical clustering method.
T. M. Martin, D. M. Young
National Risk Management Research Laboratory, Environmental Protection Agency, Cincinnati, OH, United States

Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including surface tension and the normal boiling point. The hierarchical clustering method divides a chemical dataset into a series of clusters containing similar compounds (in terms of their 2D molecular descriptors). Multilinear regression models are fit to each cluster. The toxicity or property is estimated using the prediction value from several different cluster models. The physical properties are estimated using 2D molecular structure only (i.e. w/o the use of critical constants). The hierarchical clustering methodology was able to achieve excellent predictions for the external prediction sets. A freely available software tool to estimate toxicity and physical properties has been developed. The software tool is based on the open source Chemistry Development Kit (written in Java).
9:10 79 Scaffold diversity analysis using scaffold retrieval curves and an entropy-based measure.
J. L. Medina-Franco, K. Martinez-Mayorga, A. Bender, T. Scior
Torrey Pines Institute for Molecular Studies, Port St. Lucie, FL, United States; Leiden University, Leiden, The Netherlands; Benemerita Universidad Autonoma de Puebla, Puebla, Mexico

Scaffold diversity analysis of compound collections has several applications in medicinal chemistry and drug discovery. Applications include, but are not limited to, library design, compounds acquisition and assessment of structure-activity relationships. The scaffold diversity is commonly measured based on frequency counts. Scaffold retrieval curves are also employed. Further information can be obtained by considering the specific distribution of the molecules in those scaffolds. To this end, we present an entropy-based information metric to assess the scaffold diversity of compound databases [Medina-Franco, J. L. et al. QSAR Comb. Sci. 2009, 28, 1551]. The entropy-based information metric takes into account the frequency distribution of the different scaffolds and is a complementary measure of scaffold diversity enabling a more comprehensive analysis.
9:35 80 Nonsubjective clustering scheme for multiconformer databases.
A. B. Yongye, A. Bender, K. Martinez-Mayorga
Torrey Pines Institute for Molecular Studies, Port St Lucie, FL, United States; Medicinal Chemistry Division and Pharma-IT Platform, Leiden/Amsterdam Center for Drug Research, Leiden University, Leiden, The Netherlands

Representing the 3D-structures of ligands in virtual screenings via multi-conformer ensembles can be computationally intensive, especially for compounds with a large number of rotatable bonds. While clustering and RMSD filtering methods are employed in existing conformer generators, the novelty of this work is the inclusion of a non-subjective clustering scheme. This algorithm simultaneously optimizes the number and the average spread of the clusters. Using this method 10 times less conformers per compound were obtained on averaged and performed as well as OMEGA. Furthermore, we propose thresholds for root-mean square filtering depending on the number of rotors in a compound: 0.8, 1.0 and 1.4 for structures with low (1-4), medium (5-9) and high (10-15) numbers of rotatable bonds, respectively. The protocol employed is general and can be applied to reduce the number of conformers in multi-conformer compound collections and alleviate the complexity of downstream data processing in virtual screening experiments.
10:00   Intermission.
10:10 81 Finding drug discovery "rules of thumb" with bump hunting.
T. Hashimoto, M. Segall
Department of Statistics, Harvard University, Cambridge, MA, United States; Optibrium, Cambrdige, United Kingdom

Rules-of-thumb for evaluating potential drug molecules, such as Lipinski's Rule of Five, are commonly used because they are easy to understand and translate into practice. These rules have traditionally been constructed by observation or by following simple statistical analysis. However, application of these techniques to QSAR models or early screening data often ignores the underlying statistical structure. Conversely, when machine learning algorithms are used to classify 'drug-like' molecules, they often result in black-box classifiers that cannot be modified to suit a particular target drug profile. We propose a novel hybrid approach to constructing rules-of-thumb from existing data to match a given target product profile for any therapeutic objective. These rules are easily interpretable and can be rapidly modified to reflect expert opinions before application.
10:35 82 Machine learning in discovery research: Polypharmacology predictions as a use case.
N. Wale, K. McConnell, E. M. Gifford
Computational Sciences Center of Emphasis, Pfizer Inc, Groton, CT, United States

In this talk I will lay out the increasing role of machine learning technology in discovery research at Pfizer. Specifically, I will talk about how algorithms and methods inspired by (Machine) Learning Theory are playing an increasing role in in-silico predictive technologies in pharmaceutical research. These methods will be put in the context of other popular methods based on the classical statistics based approaches and overlap and contrast will be discussed. I will use poly-pharmacology predictions as an important use case to demonstrate the power of large scale machine learning methods for such application. In particular, prospective validation of these methods will be emphasized and discussed.
11:00 83 Interpretable correlation descriptors for quantitative structure-activity relationships.
J. D. Hirst
School of Chemistry, University of Nottingham, Nottingham, Nottinghamshire, United Kingdom

Highly predictive Topological Maximum Cross Correlation (TMACC) descriptors for the derivation of quantitative structure-activity relationships (QSARs) are presented, based on the widely used autocorrelation method. They require neither the calculation of three-dimensional conformations, nor an alignment of structures. Open source software for generating the TMACC descriptors is freely available from our website: http://comp.chem.nottingham.ac.uk/download/TMACC. We illustrate the interpretability of the TMACC descriptors, through the analysis of the QSARs of inhibitors of angiotensin converting enzyme (ACE) and dihydrofolate reductase. In the case of the ACE inhibitors, the TMACC interpretation shows features specific to C-domain inhibition, which have not been explicitly identified in previous QSAR studies.

THURSDAY AFTERNOON

Section A
Boston Convention & Exhibition Center
156A

General Papers
X. Wang, Presiding
R. Guha, Organizer
1:30 84 Chemistry in your hand: Using mobile devices to access public chemistry compound data.
A. J. Williams, V. Tkachenko
ChemSpider, Royal Society of Chemistry, Wake Forest, North Carolina, United States

Mobile devices allowing browsing of the internet to access chemistry related data come in many forms: phones, music players and, increasingly, as “tablets” and “pads”. With the permanently online connectivity of these mobile devices, the browser now being the default environment for much of our computer-based interactions, and the increasing availability of rich datasets online, the aggregation of these offerings mesh together to provide chemists with the capabilities to query and search for chemistry in ways that were the stuff of science fiction only a few years ago. Using the ChemSpider platform as a foundation, and with the intention of continuing to enable the community to access Chemistry, we have delivered mobile chemistry applications to search across over 20 million compounds sourced from over 300 data sources to retrieve data including properties, spectra and links to patents and publications. This presentation will discuss Mobile ChemSpider and the challenges of delivering such a tool.
1:55 85 Feature analysis of ToxCastTM compounds.
P. Volarath, S. Little, C. Yang, M. Martin, D. Reif, A. Richard
National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, United States; Center for Food Safety and Nutrition, U.S. Food and Drug Administration, Bethesda, MD, United States

ToxCastTM was initiated by the US Environmental Protection Agency (EPA) to prioritize environmental chemicals for toxicity testing. Phase I generated data for 309 unique chemicals, mostly pesticide actives, that span diverse chemical feature/property space, as determined by quantum mechanical, feature-/QSAR-based, and ADME-based descriptors. Results in over 450 high-throughput screening assays were generated for the chemicals. Deriving associations across such a structurally diverse and information-rich dataset is challenging. Approaches to determine relationships between the bioassay data and chemistry-/biology-informed structural features, and methods to meaningfully represent this knowledge are being developed. We initially focus on the Phase I data set. Successful approaches will be applied to the much larger chemical libraries in ToxCast Phase II and Tox21 projects (the latter to screen approximately 10,000 chemicals). These approaches will be used to develop data mining approaches to inform toxicity testing and risk assessment modelling. This abstract does not reflect EPA or FDA policy.
2:20 86 Extracting information from the IUPAC Green Book.
J. G. Frey, M. I. Borkum
School of Chemistry, Univeristy of Southampton, Southampton, Hants, United Kingdom

The IUPAC manual of Symbols and Terminology for Physicochemical Quantities and Units (the Green Book) was first published in 1969. One of the fundamental principles of the IUPAC Green Book is the reuse of existing symbols and terminology, in order to enable the accurate exchange of information and data. Accordingly, there is a need for the IUPAC Green Book to be repurposed as a machine-processable resource. This paper reports an experiment where we define a syntax for the subject index of the IUPAC Green Book in the Parsing Expression Grammar (PEG) formalism. We repurpose the resulting Abstract Syntax Tree (AST) as the primary data source for a Ruby on Rails application and Simple Knowledge Organization System (SKOS) concept scheme. We demonstrate a metric that gives prominence to the most significant terms and pages in the subject index, and reflect upon the usefulness and relevance of the information obtained.
2:45 87 Biologics and biosimilars: One and the same?
R. Schenck
Chemical Abstracts Service, Columbus, OH, United States

Biopharmaceuticals (or biologics) and generic follow-on biosimilars currently account for more than 10% of the revenue in the pharmaceutical market. As patent protection for first generation biotherapeutics begins to expire, follow-on biosimilars have begun to appear. This presentation will provide insights on how the CAS databases handle biologics and biosimilars, how these substances are treated differently in patents, and how biosimilars are viewed by different patenting authorities. What the CAS databases reveal about trends in biopharmaceutical research and development will be discussed along with specific examples
3:10   Intermission.
3:20 88 Intelligent mining of drug information resources.
R. Jain, A. Tamhankar, A. Ausekar, Y. Dixit
Evolvus Group, Pune, India

A fundamental aspect of any research is to understand and keep track of progress made by peer groups in terms of scientific discoveries. Research Conferences form a definitive source of this information. Annually, thousands of papers are presented in such conferences for any given disease vertical from a Therapeutic, Biological, Pharmacological, Clinical perspective. At first glance, the problem of finding relevant conference proceedings of interest and then organizing the information into a format which is easily analyzed, stored and efficiently retrieved seems to be difficult and chaotic as there are no patterns by which a process can be defined, furthermore conference presentations are highly fragmented and non-standardized. A hybrid approach, wherein a Machine Learning based text-extraction software coupled with assisted expert annotations by human editors come to the rescue. An in-house Machine Learning software system is used in the first stage wherein the conference proceedings are classified based on keywords, segmented and converted into standardized format. The software then uses a proprietary, heuristic based, learning algorithm to extract relevant data from the segments. Since it is well known that any automated approach cannot be 100% accurate, in this step the software is assisted by a team of expert human editors who analyze the extracted and segmented data and perform necessary corrections, if any. In the third step, the software then pushes each segment to a team of expert human editors who analyze the segment, extract information relevant to the area of research, and store the information in our internal databases.
3:45 89 Cheminformatics semantic grid for neglected diseases.
P. J. Kowalczyk
Department of Computational Chemistry, SCYNEXIS, Durham, NC, United States

We present a summary of our progress towards establishing a cheminformatics semantic grid for neglected diseases. Our efforts are based on using public data and open-source programs to generate both descriptive and predictive models, which are themselves made publicly available. There are three modes of model access: as web services, via web portals, and as downloads. Models are saved in Predictive Model Markup Language (PMML) format. Information stored for each model includes the training set, test set, descriptors and model tuning parameters. This information is provided so that researchers may determine a model's domain, and its applicability to their data. Examples will be presented for two data sets retrieved from PubChem: enzyme inhibition of dihydroorotate dehydrogenase (AID:1175), and a cytochrome panel assay with activity outcomes (AID:1851).
4:10 90 Extraction and integration of chemical information from documents.
H. O. Villar, J. Betancort, M. R. Hansen
Altoris, Inc., La Jolla, California, United States

Effective chemical research requires that all sources of information be incorporated in the decision making. Here we introduced a tool that saves time when trying to build chemical databases that can be built from web information or chemical literature, including patent information. We discuss some of the challenges faced in automating the identification and extraction of chemicals named in patents, and their conversion into chemical databases that can be mined effectively. The integration of external sources of data can be valuable for research informatics. To that end we have integrated the conversion of IUPAC names with chemical optical character recognition. We show examples where such integration can provide useful competitive information.
4:35 91 SAR and the role of active-site waters in blood coagulating serine proteases: A thermodynamic analysis of ligand-protein binding.
N. K. Salam, W. Sherman, R. Abel
Schrodinger, Inc., San Diego, CA, United States; Schrodinger, Inc., New York, New York, United States

The prevention of blood coagulation is important in treating thromboembolic disorders. Several serine proteases involved in the coagulation cascade are classified as pharmaceutically relevant and are the focus of structure-based drug design campaigns. Here, we investigate the serine proteases thrombin and factors VIIa, Xa, and XIa, using a computational method called WaterMap that describes the thermodynamic properties of the water solvating the active site. We show that the displacement of key waters from specific subpockets (e.g. S1, S2, S3 and S4) of the active site by the ligand is a dominant term governing potency, providing insights into SAR cliffs observed in several compound series. Furthermore, we describe how WaterMap scoring can be supplemented with terms from an MM-GBSA calculation to improve the overall predictive capabilities.

 

Division Meetings #240

ACS National Meeting #240 (Boston, MA) - Committee Meetings

Sat, Aug 21
7:30 - 9:30 AM Long Range Planning Breakfast
BCEC 156 A
Agenda (doc)
9:00 - 10:00 AM Awards Committee
BCEC 156 B
 
9:00 - 10:30 AM Membership Committee
BCEC 155
 
9:00 AM - 12:00 PM Communications & Publications Commiittee
BCEC 156 C
Agenda (doc)
9:00 AM - 12:00 PM Education Committee
BCEC 157 A
 
9:00 AM - 12:00 PM Program Committee
BCEC 156 A
 
10:00 - 11:00 AM Fundraising Committee
BCEC 156 B
 
10:30 AM - 12:00 PM Careers Committee
BCEC 155
 
11:00 AM - 12:00 PM Finance Committee
BCEC 156 B
 
12:00 - 1:00 PM CINF Functionaries Luncheon
BCEC 156 A
 
1:00 - 5:30 PM CINF Executive Committee (closed meeting)
BCEC 156 A
Agenda (doc)
Sun, Aug 22
12:00 - 2:00 PM CINF-CSA Trust Meeting (closed meeting)
BCEC 153 A
 
Wed, Aug 25
8:00 AM - 12:00 PM ACS Council Meeting
Sheraton Boston Hotel Grand Ballroom
Agenda (doc)
12:00 - 5:00 PM CINF-CIC Collaborative Working Group (closed meeting)
BCEC 104 A
 
  • The CINF Executive Committee is a closed meeting; if you wish to attend, please contact the CINF chair.
  • CINF members are encouraged to attend any of the other committee meetings and all of our social functions.  If you would like information about any of these committees, please contact the committee chair. or the CINF chair
     

Social Events #240

ACS National Meeting #240 (Boston, MA) - Social Events

Joint CINF/COMP Welcoming Reception & Scholarship for Scientific Excellence Posters
Sun, Aug 22, 6:30PM - 8:30PM
Westin Boston Waterfront, Harbor Ballroom I
Sponsored by: ACS Publications (reception), FIZ CHEMIE Berlin (poster session)
Celebrating the 50th Anniversary of JCIM

Harry's Party
Mon, Aug 23, 5:30PM - 8:00PM
Westin Boston Waterfront, Presidential Suite
Sponsored by: FIZ CHEMIE Berlin

CINF Luncheon
Tue, Aug 24, 12:00PM - 1:30PM
BCEC, 162 A
Sponsored by: Bio-Rad Laboratories, CambridgeSoft, Thieme
Ticketed event (SE-22; $15) Speaker: Mike Capuzzo, author of “The Murder Room”

Herman Skolnik Award Reception
Tue, Aug 24, 6:30PM - 8:30PM
Seaport Hotel, Plaza Ballroom C
Sponsored by: Elsevier\Reaxys®, Procter & Gamble, InfoChem, RSC Publishing

Committee & Council Reports

Report on the Council Agenda for August 25, 2010

The Council of the American Chemical Society will meet in Boston, MA on Wednesday, August 25, 2010 from 8:00am until approximately 12:00pm in the Grand Ballroom of the Sheraton Boston Hotel. All ACS members are welcome to attend, although only Councilors are permitted to vote. A continental breakfast is usually available at 7:00am for all attendees.

The few items for Council Action are summarized below. There are no contentious petitions for bylaw changes. There is a special discussion item on the possibility of moving Council meetings to Tuesday and this is a potentially a hot issue.

Nominations and Elections

Council Policy Committee: Council will vote to fill four slots on the Council Policy Committee. There are eight nominees as follows: John E. Adams, Lawrence Barton, Alan B. Cooper, Alan M. Ehrlich, Mary Virginia Orna, Sally B. Peters, Dorothy J. Phillips, and Donivan R. Porterfield.

Committee on Committees: Council will vote to fill five slots on the Committee on Committees. There are ten nominees as follows: Janet L. Bryant, H. N. Cheng, Alan W. Elzerman, Amber S. Hinkle, Roland F. Hirsch, Ann H. Hunt, V. Michel Mautino, Roger A.Parker, Yorke E. Rhodes, and Steven W. Yates

Committee on Nominations and Elections: Council will vote to fill five slots on the Committee on Nominations and Elections. There are ten nominees as follows: Jeannette E. Brown, Martha L. Casey, D. Richard Cobb, Lissa Dulany, John W. Finley, Martin L. Gorbaty, Melanie J. Lesko, David J. Lohse, Herbert B. Silber, and Angela K. Wilson

Committee on Committees

The Committee on Committees will recommend the continuation or dissolution of the committees for which they have completed their review. The committees were not identified in the Council Agenda book.

Special Discussion

In August 2009, Council-related committee chairs were asked to consider how to improve efficiency and effectiveness of their committee’s work. These considerations include holding virtual meetings, shorter meetings, etc. Linked to this is the proposal to reschedule Council meetings from Wednesday to Tuesday as many Councilors are finding it difficult to be away from the workplace for a long period of time. When a group of Councilors, committee members, etc. were surveyed, there were 546 responses of which 52.4% favor such a move. However, it is complicated issue as there would be a domino effect and other meetings would have to be changed as well. This discussion is meant to give everyone who wants a chance to speak in favor or against the proposal to do so. If the change ultimately is made, it would not likely take place until the spring of 2012.

Bylaw Changes for Action

There is one Bylaw change on the agenda for Council Action:

Petition on Recorded Votes

The objective of this petition is to allow Council votes to be recorded by audience response devices (clickers) and not be limited to written votes. It eliminates the time required for a manual counting of hand-written votes. The information stored in the voting system would be subsequently retrieved and printed, showing the recorded vote of each Councilor as prescribed by the Bylaws. According to the Committee on Budget and Finance, the financial implications of this petition are minimal ($0 - $100K).

Local Section Activities Committee

The Local Sections Activity Committee (LSAC) has two petitions for a change in section territory that must be approved by Council. First, the Northeast Oklahoma Local Section requests approval to include the North Central Oklahoma Local Section territory that will automatically dissolve on 12/31/2010 as it has fallen below the minimum membership requirements. If included in the Northeast Oklahoma Section the members of the dissolving section will remain members of a local section. Members of both sections have given their approval.

The Binghamton Local Section requests approval to change its territory to include the Norwich Local Section that will automatically dissolve on 12/31/2010 as it has fallen below the minimum membership requirements. If included in the Binghamton Section the members of the dissolving section will remain members of a local section. Members of both sections have given their approval.

Town Hall Meeting

A Town Hall meeting organized by the Committee on Nominations and Elections is scheduled for Sunday, August 22, 2010 in the Sheraton Boston Hotel, Back Bay Ballroom C, from 4:45pm - 5:45pm. All ACS members are encouraged to attend. It is a great way to gather first-hand information and decide for whom you might want to vote in the fall election.

Respectfully submitted August 3, 2010

CINF Councilors

Bonnie Lawlor
Andrea Twiss-Brooks

 

CINF Awards

Scientific Excellence

ImageImage

Image

 

CINF Scholarship for Scientific Excellence Sponsored by Accelrys®

The scholarship program of the Division of Chemical Information (CINF) of the American Chemical Society (ACS) funded by Accelrys is designed to reward graduate and postdoctoral students in chemical information and related sciences for scientific excellence and to foster their involvement in CINF.

Up to two scholarships valued at $1,000 each will be presented at the 241th ACS National Meeting in Anaheim, CA, March 27 – 31, 2011. Applicants must be enrolled at a certified college or university, and they will present a poster during the Welcoming Reception of the division on Sunday evening at the National Meeting. Additionally, they will have the option to also show their poster at the Sci-Mix session on Monday night. Abstracts for the poster must be submitted electronically through PACS, the new abstract submission system of ACS.

To apply, please inform the Chair of the selection committee, Guenter Grethe at ggrethe@comcast.net, that you are applying for a scholarship. Submit your abstract at http://abstracts.acs.org using your ACS ID. If you do not have an ACS ID, follow the registration instructions and submit your abstract for “CINF Scholarship for Scientific Excellence”. PACS will be open for abstract submissions on August 3, 2010, and close on October 18, 2010. Additionally, please send a 2,000-word abstract describing the work to be presented in electronic form to the Chair of the selection committee by January 31, 2011. Any questions related to applying for one of the scholarships should be directed to the same e-mail address.

Winners will be chosen based on contents, presentation and relevance of the poster and they will be announced during the reception. The contents shall reflect upon the student’s work and describe research in the field of cheminformatics and related sciences.

Winning posters will be marked “Winner of Accelrys-CINF Scholarship for Scientific Excellence” at the poster session.

Guenter Grethe
ggrethe@comcast.net

Herman Skolnik Award

2010 Herman Skolnik Award Recipient Announced

ImageAt the ACS Fall National Meeting there will be a Skolnik Award Symposium entitled "The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis: Exploring Chemometric Methods," organized by Tony Hopfinger and Emilio Esposito. It will be held on Tuesday,August 24th, and abstracts of the papers to be presented are listed under “Abstracts” in this issue.

Anton (Tony) J. Hopfinger, Distinguished Research Professor of Pharmacy, University of New Mexico, Professor Emeritus of Medicinal Chemistry and Pharmacognosy, University of Illinois, and co-Founder and Chief Science Officer of The Chem21 Group, Inc. is the recipient of the 2010 Herman Skolnik Award presented by the ACS Division of Chemical Information (CINF). The award recognizes outstanding contributions to and achievements in the theory and practice of chemical information science and related disciplines. The prize consists of a $3,000 honorarium and a plaque. Tony Hopfinger is recognized as a pioneer and major contributor in the fields of quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) techniques employing three and higher dimensional levels of information derived from modeling and simulation. Tony has addressed chemical information and modeling problems in the pharmaceutical, polymer and materials sciences, in both industry and academia, and he is generally acknowledged as having fathered the development of QSPR modeling in polymer and materials science, including coining the acronym QSPR. The breadth of his interests and the applicability of the techniques he has developed are reflected in the topics covered in some of his recent papers, including drug discovery, ADME-Tox property prediction, nanotoxicity, cheminformatic descriptors and molecular similarity analysis.

Tony has made many contributions to the field of cheminformatics through publication, teaching, mentoring, advising and organizing. He has authored or co-authored more than 270 peer-reviewed (and highly cited) papers and delivered almost 360 invited lectures. He has served on many journal editorial boards and has been an associate editor the Journal of Chemical Information and Modeling (previously Journal of Chemical Information and Computer Science) for the past 16 years. He has been a member of government and industrial advisory boards, and he chaired a Gordon Research Conference on Quantitative Structure Activity-Relationships in Biology. He has coordinated and taught at short courses in North and South America and Europe; more than 50 computational scientists earned their Ph.D. degrees under Tonys mentoring; and he has also provided advanced training to more than 70 postdoctoral students.

Tony Hopfinger received a B.S. in Math and Physics from the University of Wisconsin in 1966, and a Ph.D. in Biophysical Chemistry from Case Western Reserve University in 1969. He started his career in 1969 as an NIH Postdoctoral Fellow, Department of Biological Chemistry, Harvard Medical School, and from there moved to Case Western Reserve University in 1970 as Assistant Professor of Macromolecular Science. He held increasingly senior positions at Case Western, eventually becoming Professor of Macromolecular Science in 1978 and Director, Research Computing Laboratory in 1979. In 1981 he moved from academia to industry, joining G.D. Searle (now part of Pfizer) as Director, Department of Drug Design, and later Director, Department of Medicinal Chemistry. Tony maintained links with academia, holding several adjunct and visiting professorships, and in his spare time founded, or co-founded, a number of software and pharmaceutical companies including Intersoft, ChemLab, Receptor Laboratories and DNACodes. He returned to academia in 1985 and was Professor of Bioengineering, Chemistry and Medicinal Chemistry, University of Illinois at Chicago until 2005. Since then he divides his time as Distinguished Research Professor of Pharmacy, University of New Mexico, Chief Science Officer of The Chem21 Group, Inc. and Professor Emeritus of Medicinal Chemistry and Pharmacognosy, University of Illinois.

Tony Hopfinger is highly respected by all of his colleagues worldwide and this Award is a well-deserved recognition of the outstanding career of an unstinting and generous pioneer and practitioner of cheminformatics.

Phil McHale,
Chair, CINF Awards Committee
pmchale@cambridgesoft.com

Lucille Wert Scholarship

Call for Applications: 2011 Lucille M. Wert Scholarship

Deadline: February 1, 2011

Designed to help persons with an interest in the fields of Chemistry and Information to pursue graduate study in Library, Information, or Computer Science, the Scholarship consists of a $1,500 honorarium. This scholarship is given yearly by the Division of Chemical Information of the American Chemical Society. The applicant must have a bachelor’s degree with a major in Chemistry or related disciplines (related disciplines are, for example, Biochemistry or Chemical Informatics).

The applicant must have been accepted (or currently enrolled) into a graduate Library, Information, or Computer Science program in an accredited institution. Work experience in Library, Information or Computer Science preferred.
The deadline to apply for the 2011 Lucille M. Wert Scholarship is February 1, 2011. Details on the application procedures can be found at: http://www.acscinf.org and once there click on “Awards” and then click on “Lucille M. Wert Student Scholarship”.

Applications (e-mail preferred) can be sent to: margaret.matthews@thomsonreuters.com


Contact:
Marge Matthews

CINF Awards Committee
633 Dayton Rd.
Bryn Mawr, PA 19010-3801
Phone: 215-823-3922

Other Awards

Other Chemical Information Awards

CAS/STI Student Travel Award
(sponsored by Chemical Abstracts Service and American Society for Information Science)
     
Chemical Structure Association Trust Grant
(sponsored by Chemical Structure Association Trust)
     
Patterson-Crane Award
(sponsored by Columbus and Dayton, Ohio, Sections of the American Chemical Society)
     
SLA DCHE Marion E. Sparks Award for Professional Development
(sponsored by Special Libraries Association, Chemistry Division)

Other Awards

ACS ChemLuminary Awards
(sponsored by American Chemcal Society)
     
 

CSA Trust

ImageApplications Invited for CSA Trust Jacques-Émile Dubois Grants for 2011

The Chemical Structure Association (CSA) Trust is an internationally recognized organization established to promote the critical importance of chemical information to advances in chemical research. In support of its charter, the Trust has created a unique Grant Program, renamed in honor of Professor Jacques-Émile Dubois who made significant contributions to the field of cheminformatics. The Trust is currently inviting the submission of grant applications for 2011.

Purpose of the Grants:

The Grant Program has been created to provide funding for the career development of young researchers who have demonstrated excellence in their education, research or development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and compounds. A Grant will be awarded annually up to a maximum of four thousand U.S. dollars ($4,000). Grants are awarded for specific purposes, and within one year each grantee is required to submit a brief written report detailing how the grant funds were allocated. Grantees are also requested to recognize the support of the Trust in any paper or presentation that is given as a result of that support.

Who is Eligible?

Applicant(s), age 35 or younger, who have demonstrated excellence in their chemical information related research and who are developing careers that have the potential to have a positive impact on the utility of chemical information relevant to chemical structures, reactions and compounds, are invited to submit applications. While the primary focus of the Grant Program is the career development of young researchers, additional bursaries may be made available at the discretion of the Trust. All requests must follow the application procedures noted below and will be weighed against the same criteria.

What Activities are Eligible?

Grants may be awarded to acquire the experience and education necessary to support research activities; e.g., for travel to collaborate with research groups, to attend a conference relevant to one’s area of research, to gain access to special computational facilities, or to acquire unique research techniques in support of one’s research.

Application Requirements:

Applications must include the following documentation:

  1. A letter that details the work upon which the Grant application is to be evaluated as well as details on research recently completed by the applicant;
  2. The amount of Grant funds being requested and the details regarding the purpose for which the Grant will be used (e.g. cost of equipment, travel expenses if the request is for financial support of meeting attendance, etc.). The relevance of the above-stated purpose to the Trust’s objectives and the clarity of this statement are essential in the evaluation of the application;
  3. A brief biographical sketch, including a statement of academic qualifications;
  4. Two reference letters in support of the application. Additional materials may be supplied at the discretion of the applicant only if relevant to the application and if such materials provide information not already included in items 1-4. Three copies of the complete application document must be supplied for distribution to the Grants Committee.

Deadline for Applications:

Applications must be received no later than March 14, 2011. Successful applicants will be notified no later than May 2, 2011.

Address for Submission of Applications:

Two copies of the application documentation should be forwarded to:

Bonnie Lawlor,
CSA Trust Grant Committee Chair
276 Upper Gulph Road
Radnor, PA 19087, USA

If you wish to enter your application by e-mail, please contact Bonnie Lawlor at blawlor@nfais.org prior to submission so that she can contact you if the e-mail does not arrive.

Bonnie Lawlor, Chair, CSA Trust Grant Committee

Book Reviews

Patents for Chemicals, Pharmaceuticals, and Biotechnology  

Locate in: Amazon- BookFinder4u

Grubb, Philip W., Thomsen, Peter R. Patents for Chemicals, Pharmaceuticals, and Biotechnology; Oxford University Press: New York, 2010, $190.00 (Hardcover).  592 pp. ISBN:  987-0-19-957523-7.
 
This exhaustive monograph covers the whole of the field of patents on an international scale.  History, patent law and procedure, patentability, patenting in practice (including drafting), and commercial exploitation are covered in depth in 25 chapters.  However, it is directed at patent agents and other practitioners in patent law including portfolio management.  Significantly, the chapter in previous editions on patents and information (chapter 20 in the 4th edition) has been eliminated.  For a treatment of the field of patent information, including a good primer on patents in general, the interested reader is referred to, among others, “Information Sources in Patents”1 by Stephen Adams.

Robert E. Buntrock
Buntrock Associates
16 Willow Drive
Orono, ME  04473
207-866-7930
buntrock16@myfairpoint.net
 
1.  Adams, S. R.  Information Sources in Patents, 2nd ed.; K. G. Saur, Munich, 2006.

 

Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics   

Locate in: Amazon- BookFinder4u

De Bellis, Nicola.  Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics; Scarecrow Press: Lanham, MD, 2009, $55.00  (Paperback).  394 pp.  ISBN:  978-0-8108-6713-0.

This rather massive monograph is the outgrowth of a research project on a related topic and is an English translation, encouraged by Eugene Garfield, of the Italian original.  The history and philosophy behind citation indexing and other bibliometric measures is documented in chapters one and three.  The empirical basis, the literary antecedents, and comparisons with concept indexing and other full text retrieval, are described in chapter two, including some discussion of Salton’s work.  The work of the giants in these portions of the information industry, Bernal, Merton, Price, Garfield, and Small, are documented in detail.  The mathematics of bibliometrics are described in chapter four including skewness, Lotka’s Law, Bradford’s Law, Zipf’s Law, and the work of Mandelbrot.  Chapter five is titled “Maps and Paradigms” and discusses involvement of bibliographic citation with the history and sociology of science using co-citation analysis and other methods.  Chapter six, titled “Impact Factor and the Evaluation of Scientists: Bibliographic Citation at the Service of Science Policy and Management”, probably has the most relevance to the most scientists.  The various metrics are discussed including the Hirsch (h) index.  Chapter seven, with the intriguing title, “On the Shoulders of Dwarfs: Citation as a Rhetorical Device …”, describes reasons for citation, professed and actual.  Chapter eight evolves the discussion into cybermetrics including the involvement of citation or linking in the performance of search engines.

Errors and omissions do occur.  Reference 5 for the introduction is missing.  Recall and relevance are only discussed briefly in the context of “improvement” of results from concept indexing (manual) and retrieval by means of Salton’s geometric machine indexing and other full text indexing methods.  No mention is made in chapter seven (or apparently elsewhere) of the relevance of citation retrieval since it should be commonly known among searchers that authors don’t always cite other references for the same reasons that the searcher is interested in.  Due to the multitude of topics and concepts that can appear in a single article, many of us searchers can cite instances where a citation was made for a non-relevant concept.  Curiously, the discussion of citation searching in patents is the last section of chapter six and has no discussion of the validity of the bibliometric value of citations in patents.  The work of Narin is described and referenced, but that of critics of the method, including Edlyn Simmons, Stu Kaback, and Nancy Lambert, is missing.  The existence of other uses of citation indexing and searching, e.g. in the CA file on versions of STN, are not mentioned.

In the Conclusions, the author provides an either/or summation of the evaluation controversy.  Either you believe that citations are “Mertonian” or you don’t.  If the former, researchers, organization, and journals can be evaluated.  If you don’t, none of the evaluations can be made and the Citation Index itself may not be of value.  This reviewer instead takes an intermediate attitude.  Bibliometric evaluations can be a valuable supplement in a larger, more personal evaluation scheme.  As for searching, use of citation indexes is a valuable supplement to other methods of searching (index, full text, etc.) and all methods should be used, none exclusively.  This book is of interest to those interested or researching in the fields of information science or history of science.  Chapter six should be made available to the management of academic and other organizations that use citation analysis for personnel evaluation.

Robert E. Buntrock
Buntrock Associates
16 Willow Drive
Orono, ME  04473
207-866-7930
buntrock16@myfairpoint.net

 

Contact Us

Executive Committee

Chair Dr. Gregory M. Banik, 2011 Bio-Rad Laboratories, Inc., Informatics Division
Two Penn Center Plaza, Suite 800, 1500 John F. Kennedy Blvd.
Philadelphia, PA  19102
267-322-6952 (voice)
267-322-6953 (fax)

 
Chair-Elect Dr. Rajarshi Guha, 2011 NIH Chemical Genomics Center,  
9800 Medical Center Drive,  
Rockville, MD  20852
814-404-5449 (voice)
812-856-3825 (fax)

 
Past-Chair Ms Carmen Nitsche, 2011 Symyx Technologies, Inc.,  254 Rockhill Drive,  
San Antonio, TX  78209
210-820-3459 (voice)
210-820-3459 (fax)
510-589-3555 (cell)
Secretary Ms Leah R. Solla, 2010-2011 Cornell University, Clark Library
283 Clark Hall,  
Ithaca, NY  14853-2501
607-255-1361 (voice)
607-255-5288 (fax)
607-229-0287 (cell)
Treasurer Position open    
Councilor Ms Bonnie Lawlor, 2010-2012 National Federation of Advanced Information Services (NFAIS),  
276 Upper Gulph Road,  
Radnor, PA  19087-2400
215-893-1561 (voice)
215-893-1564 (fax)

 
Councilor Miss Andrea Twiss-Brooks, 2009-2011 University of Chicago,  
4824 S. Dorchester Avenue, Apt. 2,  
Chicago, IL  60615-2034
773-702-8777 (voice)
773-702-3317 (fax)

 
Alternate Councilor Dr Guenter Grethe, 2010-2012 352 Channing Way,  
Alameda, CA  94502-7409
510-865-5152 (voice)
510-865-5152 (fax)
510-333-7526 (cell)
Alternate Councilor Mr Charles F. Huber, 2009-2011 University of California, Santa Barbara, Davidson Library,
Santa Barbara, CA  93106-9010
805-893-2762 (voice)
805-893-8620 (fax)

 
Program Chair Rachelle Bienstock, 2011-2012    
Membership Chair Ms Jan Carver, 2009-2011 University of Kentucky, Chemistry Physics Library
150 Chem Phys Bldg,  
Lexington, KY  40506-0001
859-257-4074 (voice)
859-323-4988 (fax)

 

Committee Chairs

Audit Jody Kempf
2009-2011

612-624-9399 (voice)
612-625-5583 (fax)

University of Minnesota, Science and Engineering Library
108 Walter Library, 117 Pleasant St. SE
Minneapolis, MN 55455
Awards Dr. Phil J. McHale
2009-2011

650-235-6169 (voice)
650-362-2104 (fax)

Cambridgesoft Corporation,
375 Hedge Road,
Menlo Park, CA 94025-1713
Careers Ms. Patricia Meindl
2009-2011

416-978-3587 (voice)
416-946-8059 (fax)

University of Toronto, A. D. Allen Chemistry Library
80 St George Street, Rm 480,
Toronto, ON M5S 3H6
Constitution, Bylaws, and Procedures Ms. Susanne Redalje
2007-

206-543-2070 (voice)

University of Washington, Chemistry Library
BOX 351700,
Seattle, WA 98195
Membership Ms. Jan Carver
2009-2011

859-257-4074 (voice)
859-323-4988 (fax)

University of Kentucky, Chemistry Physics Library
150 Chem Phys Bldg,
Lexington, KY 40506-0001
Program Rachelle Bienstock
2011-2012

 

 
Publications Dr. William G. Town
2009-2011

+44 20 8699 9764 (voice)

Kilmorie Consulting,
24A Elsinore Rd.,
London, SE23 2SL

 

Liaison

Divisional Representatives and Liaisons

SLA DCHE Ms. Susan K. Cardinal
2006-

585-275-9007 (voice)
585-273-4656 (fax)

University of Rochester, Carlson Library
Box 270236,
Rochester, NY 14627
ACS Multidisciplinary Program Planning Group Dr. Guenter Grethe
2007-

510-865-5152 (voice)
510-865-5152 (fax)

352 Channing Way,
Alameda, CA 94502-7409
Biotechnology Secretariat Dr. Guenter Grethe
2002-

510-865-5152 (voice)
510-865-5152 (fax)

352 Channing Way,
Alameda, CA 94502-7409
ASIS&T STI Ms. Erja Kajosalo
2006-

617-253-9795 (voice)
617-253-6365 (fax)

Massachusetts Institute of Technology, MIT Libraries 14S-134
77 Massachusetts Ave.,
Cambridge, MA 02139-4307
ACS Committee on Nomenclature, Terminology, and Symbols Dr. Peter F. Rusch
2006-

650-961-8120 (voice)
650-961-8120 (fax)

Rusch Consulting Group,
162 Holland Court,
Mountain View, CA 94040-3864
ACRL STS Mitchell C. Brown
2009-

949-824-9732 (voice)
949-824-3114 (fax)

University of California at Irvine,
Irvine, CA 92697-8200

 

Others

Other Functionaries

Archivist /
Historian

Ms. Bonnie Lawlor
2006 -

National Federation of Advanced Information Services (NFAIS),
276 Upper Gulph Road,
Radnor, PA 19087-2400

215-893-1561 (voice)
215-893-1564 (fax)
Webmaster Ms. Danielle Dennie
2011 - 2013
Concordia University, Vanier Library Building
7141 Sherbrooke St. W.,
Montréal (QC), H4B 1R6
514.848.2424 x 5237 (voice)

 

Download the PDF

Download this issue as a PDF: