CINF Technical Program: Rajarshi Guha

CINF Technical Program Reflections: An Interview With Rajarshi Guha

ImageDr. Rajarshi Guha is currently a research scientist at the NIH Chemical Genomics Center in Rockville, MD and is an Adjunct Professor of Informatics at Indiana University. He works on cheminformatics and bioinformatics topics related to high throughput screening for small molecules and RNAi. He also has active interests in novel data exchange and data analysis paradigms ranging from the use of Google services to large-scale parallel processing for chemical data.

Dr. Guha has been a member of the ACS and CINF since 2003 with an outstanding service as the CINF Division Program Chair during 2009-2010.

Svetlana Korolev: Rajarshi, the Fall 2010 ACS National Meeting was the last meeting for you as the CINF Program Chair, could you share highlights of the CINF technical program in Boston?

Rajarshi Guha: The Fall 2010 meeting had a pretty extensive program covering topics ranging from big data to the applications of RDF and the semantic web in chemistry. At this meeting we also co-hosted the JCIM 50th anniversary symposium with COMP. Sunday started off with the first session of the RDF symposium, highlighting tools and methods to handle this type of data along with a session on the assessment of collections and information resources. In the afternoon, we had the Best Presentation Award symposium – this time it was the Data Intensive Drug Discovery symposium run by Dr. John Van Drie. This was a great symposium, with a variety of very stimulating talks, all addressing how the data deluge is affecting the drug discovery workflow. Monday saw the JCIM symposium, with talks from luminaries of cheminformatics describing their research over the years. In parallel, we had the RDF symposium continue with various applications of RDF in chemistry. The afternoon session saw a symposium on consumer health information and the impact of social networking services. Monday also saw the first version of CINFlash – the lightning talks session. As an experiment it ran pretty well and we got a lot of excellent feedback on improving it in the future. On Tuesday, we had the Herman Skolnik Award symposium, organized by Tony Hopfinger & Emilio Esposito, who put together a great set of talks. Wednesday saw two excellent symposia – one on structure activity landscapes and one on chemical structure representations. My only complaint was that I couldn’t attend both simultaneously. Thursday ended with a good selection of General Papers, covering modeling, prediction, and characterization and integration of chemical information.

We also had a number of novel technological features at this meeting. It turned out that a speaker with three talks in the program was unable to make it to the USA – rather than having a program with three holes, we decided to connect to the speaker via Skype. I switched through his slides and we were able to get crystal clear audio and video of the remote speaker (thanks to great wireless at the convention center). These remote talks went very well, and while I wouldn’t want future speakers to get ideas, Skype is a very nice backup for emergencies.

This was also the first meeting I got involved with Twitter. We had three or four people at the CINF symposia tweeting from the sessions that they were attending. While I’ve only recently joined the Twitter bandwagon, I was pleasantly surprised to see that it was quite useful to keep track of parallel sessions. As part of the fun, I also put together a real-time aggregator to get a summary of all the messages emanating from the ACS meeting (http://rguha.net/atv/atv.html) - as you can see CINF was pretty prolific.

SK: Would you assess the last meeting as the most successful program during your tenure?

RG: The last meeting was quite successful – we had a good set of symposia including a new experimental symposium and I’m happy that I was able to end on a high note. But I think I’d consider the Spring 2010 meeting as my most successful one. While the Program Chair can’t really control how many papers get submitted, I was very pleased to see 140 papers submitted for that meeting (the highest since 2004, I think). Obviously, the venue being San Francisco helped! But beyond that we had an excellent selection of symposia including a great one on materials informatics – that one was a gratifying example of recognizing a subfield via submissions to General Papers and translating that to a fully fledged symposium, which from the varied feedback I’ve had, will be run in various incarnations in the future. The visualization symposium was one that I was very pleased to see run – the topic being close to my heart. I think one of the most memorable features was that these two symposia were run by newcomers to the Division and they did a fantastic job. From what I understand, they will continue to contribute to CINF programming in various ways in the future. The Spring 2010 meeting was also pretty hectic since we had 21 sessions running – resulting in triple tracking. Another nice aspect was the large size of General Papers – 24 papers. Over the last two years I’ve seen a general upward trend in the size of General Papers, which is encouraging as it indicates an increasing interest in the Division and also serves as a source of future programming topics.

SK: Do you think that you have gained maturity in handling of the Division technical program over the past two years?

RG: I think one of the main things that occurred over these two years is that I have developed a higher level view of CINF programming in terms of topics that are relevant to the Division. This has been helpful, especially with the thematic programming initiatives from the ACS, as it lets us match our programming to proposed themes (as far as possible). In addition, while topics such as “Chemical Structure Representation” may be regarded as old hat, having an overview of CINF programming over the years allows us to rerun these topics, but addressing the latest issues. Getting feedback from members about whether certain topics have been addressed or whether a symposium didn’t work too well has been useful in terms of future programming. Obviously, this type of feedback takes some time to work itself into the program, so it’s manifested itself towards the end of my tenure. Finally, having a high level view of programming is very useful in exploring new topics and areas that may not have been traditionally considered in the Division, but are become more and more relevant in our field. Of course, having two years of experience certainly helps when organizers need to make last minute changes or PACS is not being co-operative – no panic attacks!

SK: Have your expectations of the Program Committee Chair position proved to be true to the experience? Where there any aspects unforeseen? Would you agree that the Program Committee Chair is the most prestigious and challenging position in the CINF Division? What have you learned from this experience?

RG: I’m very thankful that Leah Solla provided a brain-dump when I started! She was very thorough in bringing me up to speed. And given that she was always very responsive to my questions, I must say that I didn’t hit too many unexpected things. Probably the most unexpected thing is the degree of socializing that I have done to solicit program topics and organizers. Not being a very social person, this was initially a little tricky – but over time it became easier. I think the effort paid off as I was able to bring in a number of people into contributing to CINF programming.

Probably someone other than the ex-Program Chair should comment on the prestige of the position – I’m biased! The role is certainly one of the central roles in the Division. At the same time, without the help of the Fundraising Chair and the Program Committee, I would not have been able to put together high quality programs. But I will admit that it is one of the more challenging and certainly one of the most visible positions. In the end, members attend national meetings, in large part, for the technical programming. While networking is a vital part of any meeting, I think having relevant and interesting symposia provides “hubs” around which people congregate. Given the diversity of the membership, it is certainly challenging to develop a balanced program that addresses a broad variety of interests. Also, being responsible for scheduling of sessions does present challenges in keeping people happy – nobody likes to be scheduled on Thursday!

I’ve learned a number of things from this experience – time management probably being a major one. As developing the program is done on nights and weekends, I’ve had to be quite efficient to make sure everything stays on track. I’ve also gained a much broader view of the field of chemical information. Prior to taking on this role, I was primarily focused on cheminformatics and computational aspects of the field. Over the last two years I’ve gained a deeper appreciation for the information science side of things – coupled with my involvement in Open Source, Data & Access, I think I have gotten a much better idea of the interrelated issues that are currently of interest in the field.

SK: Please talk about initiatives that you have implemented during your tenure, e.g., a call for new speakers, CINFlash. What were the driving forces for these innovations? How have they worked out?

RG: When taking over as Program Chair, one of my concerns was how to expand a range of programming topics and get more people involved with CINF. While we have a pretty large collection of topics that we can run from time to time, as well as a Program Committee with a diverse range of expertise, I thought that crowd-sourcing topics would be useful. With this in mind, I put out a “Call for Symposia” on various mailing lists, to solicit symposium topics (and organizers). That first call went out in April, 2009 and we got a decent response – 4 proposals. Due to the fact that we prepare programs in advance, we could not fit in all the proposals we got. However, we did manage to incorporate two proposed symposia into the subsequent programs. More importantly, a number of people were encouraged to contribute to CINF programming and though they could not organize a symposium in 2009, they have shown interest in working on later meetings, and I believe that they will be organizing in 2011 and 2012.

My other effort was CINFlash, a lightning talk symposium that we ran for the first time in Fall, 2010. The idea of short (6 to 8 minutes) talks had been floated in the past by Dave Martinsen, and though I had heard that it might be difficult to run it during an ACS meeting, we decided to try. I had seen videos of Ignite talks – 5 minutes, 20 slides on auto - and they seemed like a lot of fun. Another motivation for this was that submissions for ACS meetings must be decided 6 months before the meeting; so either the material is a year old or else the author expects (hopes!) to get the results described in the abstract by the time the meeting comes around. One of the key features of CINFlash was that we would not go via PACS. Instead we accepted short (less than 100 words) abstracts from mid - June until two weeks before the meeting, on pretty much any topic in chemical information and cheminformatics. Given that we were considering 6 to 8 minute talks, we weren’t looking for scientifically heavy talks. Rather, we wanted people to have some fun. Rob McFarland, Roger Schenck and I reviewed abstracts received via GMail. In the end we had 5 people willing to go along with our experiment, which I think worked out relatively well. I must admit that some of the speakers were impressively creative in their use of 8 minutes! The audience was pretty unanimous that we run the session in the future. The session had a lively audience discussion at the end where we exchanged ideas on how this type of symposium could be improved and I think we got a lot of excellent suggestions. One of the main issues with the symposium was that it wasn’t publicized very well. So expect to see revamped version of CINFlash next fall in Denver.

SK: Rajarshi, as Program Chair have you been getting any data about CINF programming from ACS? Could you share with us what sorts of data those are and if they provide any curious facts?

RG: Laura Mehlon was kind enough to provide me with a dump of the abstract data from the 220th National meeting (Fall, 2000) till the 238th National meeting (Fall, 2009). The data includes presentation dates, abstract titles, author titles and affiliations and so on. This is a great resource to examine how the CINF program has changed from meeting to meeting. I won’t go into too much detail, but will highlight a few interesting aspects of the data.

To begin with, Figure 1 shows how the number of symposia and the median number of papers per symposium has varied over the years. In general there is a negative correlation between the number of symposia scheduled in a meeting and the median number of papers in any given symposium (R = -0.69, p = 0.0009). This is an expected result, but suggests that the number of contributions is static overall.

Image

Figure 1. Summary of the number of CINF symposia per meeting and the median number of papers per symposium

If we consider the number of papers per meeting (Figure 2), we see the spikes in the graph corresponding to West Coast locations such as San Diego (229th meeting, Spring 2005) and San Francisco (232nd meeting, Fall 2006).

Image

Figure 2. Total number of CINF papers per meeting.

The low value for the 238th national meeting (Fall 2009) is due to incomplete data.  I know that the submissions have been increasing since that meeting (but at this time I do not have the data to plot).  As I mentioned before we saw the highest number of CINF submissions (140 papers) since 2004 for the Spring 2010 San Francisco meeting.

Next we consider some aspects of the authors contributing papers to the CINF program. For instance, if we consider the fraction of authors from industry, government and academia (based on email domains) in a given meeting (Figure 3), we see a cyclical trend for industry and government, but a slight upward trend for academics. Interestingly, the contributors from industry overshadowed academia and government prior to the 225th meeting (Spring 2003).  Note that this does not consider the fact that many papers may involve industry, government and academia, but does provide some information on who’s contributing to CINF programming.

Image

Figure 3. Fraction of authors from industry, academia and government per meeting

Thursday programming (which is usually General Papers) has always been a thorny issue for CINF due to the smaller size of the Division and correspondingly smaller pool of papers, we tend to not have too many papers on Thursdays. This, coupled with the generally low attendance on that day, doesn’t make for very interesting sessions. Over the last 3 meetings however, we’ve seen a steady increase in the number of submissions to General Papers and a lot of interesting content. Unfortunately, for this analysis I don’t have access to 2010 data. Thus till the end of 2009, we see a general decline in the number of papers that get allotted to the last day.

Image

Figure 4.  Percentage of papers per meeting that are presented on the last day corresponding to CINF General Papers mostly

It’s also interesting to look at who have been the most prolific contributors to the CINF program.

Image

This is not a completely rigorous analysis (readers of this article are surely familiar with the problem of name disambiguation!). The table above lists the top 10 authors based on how many times their name appeared in the CINF program abstracts 2000 - 2009.

Finally, I’ll end with a brief visualization of the variation in topics over the years. To do this, I’ve generated Wordles (http://www.wordle.net/) for two meetings (220th and 228th) based on the abstract titles for that meeting.  In the first case, the most prominent words, such as “chemical” & “information”, are pretty generic.  From one point of view this is not surprising since CINF programming is quite diverse.  But at the same time, some meetings have one or two large symposia. The 238th meeting is an example of this, as the meeting had a large symposium on federated search, which shows up in the Wordle visualization.

Image

Figure 5. 220th National Meeting

Image

Figure 6. 228th National Meeting

SK: Has the ACS thematic programming initiative impacted greatly on CINF programming? Please share your thoughts about this initiative.

RG: The thematic programming hasn’t really affected the CINF programming workflow too much. One of the main reasons is that we aren’t bound to construct a program in line with the theme. This is fortunate since for some themes, it can require significant creativity to link CINF topics into the theme! But at the same time, there have been a number of themes that have been very CINF-friendly. Examples include “Chemistry for Life” and “Chemistry for Health and Disease.” Given the many links between CINF topics and members with the pharmaceutical industry, these two themes are quite easy to satisfy. But it’s important to note that CINF programming is not explicitly constructed to match a theme. Our first priority is to create a program that is of interest to our membership and the community. The theme of a meeting certainly informs that process, and if we can connect multiple symposia to the theme all the better. Given the multidisciplinary nature of CINF, this is probably an easier job than for other divisions as we can connect to themes from various directions (information sources, legal issues, informatics approaches, modeling and so on).

SK: Would you like to comment on the new ACS abstract submission system (PACS)?

RG: Prior to 2009, the OASYS system was used to submit and organize abstracts. While a bit clunky, it was a known system and preparing the program was relatively straight forward. For various reasons ACS shifted to the new abstract submission system called PACS. Given that it’s a new system, there is a learning curve for people who are used to OASYS. But I will admit that it has been a painful experience – it was (at least until the Boston meeting) an unpolished system, that was released to users (program chairs, symposium organizers, etc.) in an untested form. However, to their credit, ACS held open meetings to listen to complaints and issues regarding the system. Major bugs have been fixed in time for the Anaheim and subsequent meetings, so I guess we’ll have to see how it turns out. I will note that there is a PACS Advisory Board, of which I am a member and have been providing input with respect to usability issues, training of users, and so on. I do think that as ACS and the vendor fix the bugs, it could be a very powerful system for management, reporting and in general keeping track of programming - related data. The ability to directly access programming data from the ACS database would be especially powerful for determining long term impact of programming decisions and policies.

SK: Who is going to be your successor as the CINF Program Chair? Could you give us a sneak preview of the CINF technical program planned for the 2011 National meetings?

ImageRG: Dr. Rachelle Bienstock from the National Institute of Environmental Health Sciences will be the next CINF Program Chair. She has been an active member of the Program Committee for the past three meetings and has also organized symposia in CINF. In other words, she knows about the magic that we do to put together a program! She’s provided great input on various aspects of our program and has a number of interesting program topics lined up for future meetings.

As for upcoming programming, I won’t go into too much detail, as Rachelle will be providing highlights of upcoming programming separately. Anaheim (Spring 2011) will see symposia on open data, combinatorial chemistry, and reaction modeling, as well as symposia relating to natural resources. Denver (Fall 2011) will include symposia ranging from the state of openness in science to high content screen analysis. As you can see, we’re maintaining the CINF tradition of topical diversity.

SK: Thank you, Rajarshi, for sharing with us your experience and insights of the CINF Program Committee Chair. Please accept my sincere congratulations on your moving up to the Division Chair-Elect position in 2011!