Chris Pentago

The New Computing Pioneers in Chemistry/Biology

Blog Post created by Chris Pentago on Jul 29, 2013

While in the past it was quite common to characterize large pharmaceutical companies as late adopters of IT, nowadays this may no longer be the case.


After spending the last 5 years in order to catch up to various types of industries in the deployment of enterprise software systems that deal with linking laboratories and researchers company wide, processing and data storage are moved on to the online medium by the big drug companies so that it can be managed for them by big corporations, like Microsoft, Google, Amazon on computers in secret locations.



Among the drug makers that decided to pilot into an emerging area of information technology service called cloud computing, number Genentech, Johnson & Johnson, Eli Lilly & Co and also Pfizer. In this equation, big companies that are computer oriented, offer time on their dispersed and massive infrastructures on a pay as you go basis. In this greatly uncharted territory, these drug corporations are the first ones to gauge the time and cost saving advantages and also the potential security and management disadvantages.


During the past year, the concept of cloud computing which is based on technologies that already support search services and E-mail has burst onto the information technology scene. And because of that, success stories didn't take much to develop, being registered across a wide range of government organizations and industries.


Involved in this is also the White House that actually used the cloud services offered by Google in order to handle the inquiries that were sent to the President during his town hall meeting that took place on March the 26th. From the 92.934 individuals involved in this event, the White House managed to peak seven hundred E-mail hits per second, submitting a total number of 104.073 questions and lastly, casting a whooping 3.695.984 votes in 48 hours prior to the meeting.


There are many advantages that drug companies will be able to enjoy by considering cloud computing and some of them include a lower cost and faster processing of data and also the ability of storing large amounts of data. Even better, people will be able to employ almost any kind of internet based computing app. For instance, recently a paper that argued on the viability of using the data processing, scalable proteomics and low cost cloud services of Amazon, was submitted by the researchers at the "Bioengineering and Biotechnology Center at Wisconsin ( Medical College.

On top of that, according to Dave Powers, the viability of using cloud computing in pharmaceutical research and development was already demonstrated by Lilly.



Later on he stated that they were able to launch a sixty four machine cluster computer that works on bioinformatics sequence info that managed to yield the results the company was looking for and completed the task thrown at it in just twenty minutes.

This was done using the Elastic Compute Cloud service from Amazon. And the price for this was just 6.40$. To give people a picture of what it would take to go from nothing to installing a 64 machine cluster, this process would take around 12 weeks.


Senior systems analyst for discovery IT, Andrew Kaczorek, said that even though Lilly can brag that it has a large base of computers installed, the information technology infrastructure of the company is already being used at 100% of its capacity.

According to Powers and him, regardless of the fact that cost savings are quite hard to calculate, they are certainly significant, as well as the time savings.


According to head of informatics at BBC, Giles Day, Amazon's cloud services were also considered by Pfizer's Biotherapeutics & Bioinnovation Center (BBC) a few months ago to develop, but also to refine models in anti-body docking runs.

Giles said that they used the cloud services in order to become more efficient and reduce the time it would normally take for them complete certain processes from 2 to 3 days to 2 to 3 hours. And with a price of just 300$, this is just an insignificant cost to even bother mentioning.

Applied Biosystems

Among the next generation of AND sequencing systems is also the Applied Biosystems Solid system which is going to generate dozens of TB of data in labs.

Dave says that Pfizer has already begun employing cloud computing in different research operations, yet he also pointed out there are a few disadvantages for them. One of them includes the need of users to come up with their own programming for coordinating with the providers of cloud services. The corporation has teamed up with a consulting firm, called BioTeam so that it can connect its work to the cloud. To manage the transfer of data off of and onto the cloud and access Amazon's network, there are 2 suppliers that Lilly is currently using, including RightScale and CycleComputing.


In what regards Power though, it pointed out the fact that for them security still remains a concern, limiting all or large parts of its activities to the manipulation of public data that doesn't involve patents or intellectual property. According to a statement released by Powers, the company said that policing the access of individual researchers to the cloud is a much bigger concern.


Wes Rishel, who is a vice president with Gartner Group (a healthcare provider information technology division), says that there are still many needs which need to be worked out. She stated that cloud computing is currently too high on the hype cycle and it might not be as shiny and great as it's pushed up to be. She did agree that companies can save a lot of money and time by using cloud technology, but the lack of standards for processing and entering data makes it a lot more complicated than what it might initially leave the impression it would be.

Industry Watchers agreed to the fact that this technology can be characterized by the uses the machines of the service companies are put to. Executive director of Science Commons, John Wilbanks, stated that all “cloud” refers to the fact that it's the internet. He eventually mentioned that this is just a fancy name for online processing and storage.


There is a lot of hesitation from the drug companies as well, because they are still thinking about the type of data that they would like to consider for cloud processing and storage. One aspect that will greatly influence the likelihood that these companies will ever going to say “yes” to cloud technology is where the data resides in the discovery process.


Wilbanks later on stated that the applicability of cloud computing in the pharmaceutical sector is going got be limited quite a lot by the large amounts of sensitive data that's currently kept behind very strong firewalls. However, the pharmaceutical sector is inclined to using cloud computing, because the rapid creation of life sciences data keeps growing at a fast rate, especially in the area of genomics research.

Rosetta Inpharmatics' executive scientific director for genetics, Eric Schadt, agrees to the fact that the next generation technologies regarding DNA sequencing and gene expression profiling will generate large scale data and do it at a very fast pace, which will be overwhelming to many.

While in the past it was believed that high density SNP and microarrays were generating difficult to hold high dimensional data, the next generation technologies will be 1 to 2 orders of magnitude above that. Schadt said that the data should offer answers to questions surrounding the way complex disease systems are manifesting themselves in the human body.


However, when it comes to Merck, it seems that it's just not ready to take the plunge. The company has acquired a computer 12 years ago that contains around 10K CPUs, but also a complex architecture that's internet based, which allows researchers who work on thousands of projects at Merck, to easily gain access to stored data.


However, it seems that this is going to change soon. This is because in the following months, the corporation will have to hand the Rosetta computer cluster and large parts of the data it contains, to Sage Bionetwork, which is a nonprofit bioinformatics database.


As more and more drug companies understand the way cloud computing will affect their operations, to accommodate booming demand, service suppliers are amassing a distributed utility infrastructure. According to vice president of Amazon Web Services, Adam Selipsky, Amazon's cloud storage offering has registered a massive growth and the company has managed to increase its server capacity from eighteen billion files to fifty two billion files in the last year.


It seems that the life sciences community exhibited a lot of interest in the service, a community in which companies that are dealing with financial constraints are bumping into increasing computing burdens. Selipsky noted that one of the reasons to why drug researchers were interested in the cloud is the speed of processing, especially due to the Elastic MapReduce service that was recently introduced by Amazon. This is a cloud computing utility that deals with accelerating the processing power by enabling computers to work with petabytes of data. Just for the record, 1 PB equals 1024 TB or around one million GB.


Director of Microsoft's life sciences business in the United States, Mike Naimoli, says that on top of the fact that cloud computing allows for massive time and cost savings, it may also allow for data sharing between contract research organizations, drug makers and other partners. Just last year the company managed to introduce its cloud computing dedicated platform, called Azure Services.


Google App Engine's senior product manager, Rishi Chandra said that the acute need for data computing and storage will eventually be taken advantage of by the drug companies. He mentioned that it makes more sense to operate distributed networks where spikes in demand can be handled with ease. But even though the security of the physical computer infrastructure of Google will be handled by the corporation itself, data access control, data encryption and other security measures on the cloud will be the responsibility of users.


However, some of this work will be done by third party software suppliers, one of them being CycleComputing. The latter began pooling its efforts 4 years ago, into developing open source software for high performance computing. According to its chief executive, Jason Stowe
, along the years it managed launching a business security management and application service for cloud computing. Even more, CycleComputing seems to have a partnership with computational chemistry software company, Schrodinger.


Stowe stated that the company is currently focused on developing apps for next generation genome sequencing and as a result, the condensation and processing of raw data from researchers' laboratories will become much easier.


Although vendors and users alike agree to the fact that cloud computing is still in its early days, they do view these types of information technology services as a feasible option for a wide range of work that stretches beyond processing and storage of non-proprietary data.

Spokeswoman for the FDA, Karen Riley, says that cloud technology is just new territory and she states that if medical data is stored in the cloud, the main concern would be safeguarding that information so that it cannot be tampered with. She continues saying that it would be impractical to audit companies like Google, because the responsibility for the security of data will most likely stay with the trial sponsors. The Food and Drug Administration seems to already use cloud technology for E-mail communication, but the servers are external.


Powers said that for them, it sounds quite nicely to know that they are on the cutting edge. Not only the potential for more advanced uses in the future, but also the proven accelerated research and cost savings that come with cloud computing make it a really great choice to consider.

The need for larger storage systems and also processing power drives a lot of companies on the brink of their resources and many of them don't want to figure out new solutions for these problems on their own. In this regard, cloud computing comes as a great infrastructure that offers dozens of vital advantages which the drug companies will eventually be unable to resist.


While aforementioned companies are global, and serve to causes and market of entire world I'm particularly interested in what companies like Ninefold and their dedicated/cloud/VPS server business can do for country-specific markets (specifically Australia) and science scene over there.

Hope you didn't get overwhelmed by a long post.