Today we're proud to announce another Open Science Data Cloud milestone. Over 700 resource allocations have been
granted to researchers to use the OSDC and the list of publications
that utilized the OSDC continues to grow.
As demand for resources to store, share, and analyze terabyte and petabyte scale datasets continues to grow, so too does
the OSDC ecosystem. Individuals granted OSDC resource allocations can house and share their own scientific data, access
datasets in our Public Data Commons directly mounted to their virtual
machines, build and share customized virtual machines with tools for data analysis, and then perform the analysis to
answer their research questions. The OSDC is a one-stop shop for making scientific research faster and easier.
The OSDC continues to support a number of large projects from OCC working groups and OCC members. Remaining available resources
are allocated to other applicants based on merit.
Learn more about the OSDC resource allocation process or apply for your own resource allocation.
The University of Chicago announced today their collaboration with the National Cancer Institute to establish the
Genomic Data Commons.
The Genomic Data Commons project will help researchers around the country assess genetic information from more than
10,000 cancer patients, which could be used to develop more effective treatments, said Robert Grossman, a professor
of medicine at University of Chicago who is directing the project.
The establishment of the NCI Genomic Data Commons (GDC) will expand access for scientists around the country, speeding
up research and, in turn, leading to faster discoveries for patients. The GDC will provide an interactive system for
researchers, making the data easier to use; it also will provide resources to facilitate the identification of subtypes
of cancer as well as potential therapeutic targets.
"The Genomic Data Commons has the potential to transform the study of cancer at all scales," said Robert Grossman,
PhD., director of the GDC project and professor in the Department of Medicine at the University of Chicago.
"It supplies the data so that any researcher can test their ideas, from comprehensive 'big-data' studies to
genetic comparisons of individual tumors to identify the best potential therapies for a single patient."
We’re gearing up for another successful Super Computing conference. This year's conference will be in New Orleans and the OCC and Center for Data Intensive Science will have a research booth in the exhibition hall. Throughout the conference, we'll be giving a variety of presentations on our many projects. Please see the full schedule below or here.
If you're in New Orleans and attending the conference please stop by the booth, say hi, and learn more about how we've been making research with substantial computing and storage needs easier and more accessible for scientists across the globe. We'll be giving away free gourmet coffee and OSDC embossed Belgian chocolates.
SC14 PRESENTATION SCHEDULE
Please join us at any of the following presentations to learn more about CDIS and OCC activities. Unless otherwise noted, presentations will take place at Exhibition Booth #1639.
We’re proud to announce that CliQr is now a member of the Open Cloud Consortium!
As a member of the OCC, CliQr will contribute their expertise in APIs to the Biomedical
Commons Cloud (BCC) working group. As the BCC ecosystem matures, researchers
analyzing genomic data, EMRs, medical images, and other PHI data will enjoy
CliQr's solutions to manage and govern their pipelines and workflow across resources.
Learn more about how your organization can become a member of the OCC here.
Today we're proud to announce a major milestone in the Open Science Data Cloud's history. For the first time, we're
retiring an OSDC user resource after many years of service.
OSDC Adler served the general science and bioinformatics communities for many years and was instrumental to a number of
researchers exploring the modENCODE and ENCODE datasets. It was the initial home for
Project Matsu, a collaboration between NASA and the Open Cloud Consortium to develop open
source technology for cloud-based processing of satellite imagery to support the earth sciences.
The Adler resource was paid for with generous support from the Moore Foundation and continuously
maintained with support from our other generous sponsors. OSDC Adler
users had access to 312 cores and approximately 1PB of raw storage. Adler's software stack included
Eucalyptus / Openstack. Active researchers on OSDC Adler were provided allocations on the OSDC Sullivan public resource to continue their research.
With lifespans for intensively used computing hardware generally estimated to be between three and four years, OSDC Adler served faithfully for nearly 5 years.
The Adler hardware will continue to be used internally by the OSDC team for testing and development.
You can learn more about available OSDC resources here.
We've recently finished a short video to help describe the services provided by
the Open Science Data Cloud and the need that drives our interest in providing this service.
If you're new to the OSDC ecosystem or just want to learn more about what the OSDC offers
offers, watch the video here.
Interested researchers can apply for an OSDC resource allocation here.
Big data is important to transforming research and the OCC is giving away a limited number of Discovery Awards to encourage scientists
to experiment with developing novel technology for analyzing big data. We also think it’s important to encourage use of big data in the
business community and are giving away a limited number of Innovation Awards.
Both awards will give users free computing resources on the Open Science Data Cloud.
Our Discovery Awards (for scientific research) are for 50,000 OSDC core hours and are available to selected
scientists and researchers. OCC Innovation Awards (for businesses) are for 30,000 OSDC core hours. We especially encourage small businesses to apply.
To learn more or to apply for a Discovery or Innovation Award, first apply for an OSDC resource allocation, then
send an email noting your application and a short paragraph describing what you’d like to do with the core hours awarded to firstname.lastname@example.org.
We’re proud to announce that The Ontario Institute for Cancer Research (OICR) is now a member of the
Open Cloud Consortium!
OICR will be involved with several OCC Working Groups, including the Open Science Data Cloud Working
Group and the Biomedical Commons Cloud Working Group, to build systems for cancer genomics analysis
and biomedical data sharing.
“Cancer genomics data sets are now too large to download over the Internet, and the compute resources needed to mine them for knowledge are out of reach for many researchers. Our collaboration will enable researchers from around the world to get the data, perform sophisticated analyses over it, and to extract knowledge that can be used to improve cancer diagnosis and care,” said Dr. Lincoln Stein, Director of the Informatics and Bio-computing Program at OICR.
“The Biomedical Commons Cloud (BCC) provides a medical research center a quick and easy way to get access to a secure and compliant cloud that contains a critical mass of biomedical data,” said Dr. Robert Grossman, Director of the OCC. “We are very excited that OICR will be one of the founding partners of this effort.”
The full joint press release is available here.
Learn more about how your organization can become a member of the OCC here.
The OSDC has a very active community of BETA users and demand for OSDC services is growing. To better
distribute available resources among interested researchers, the OSDC moved on August 1st to a new
resource allocation paradigm.
In the new paradigm, OSDC resource allocations generally run for 3 months at a time and begin on January 1, April 1, July 1,
October 1. All incoming applications for resources will be reviewed near one of these terms and are due
on the 15th of the month prior (e.g., December 15th for the allocation period starting January 1st). During
the survey process, a resource allocation extension can be requested if your research is not yet complete.
Established partner projects and Labs and OCC members that have contributed hardware will be given first priority.
To apply for a resource allocation during the period beginning on October 1st please use the OSDC Resource Allocation Application.
Special protected resources like the Bionimbus-PDC have their own separate application process.
Recipients of OSDC resource allocations are expected to:
- Make appropriate use of OSDC resources and use good social behavior (ie - terminating VMs when not in use).
- Cite the OSDC in any papers and publications
- Regularly respond to quarterly OSDC allocation surveys
- Submit tickets to the OSDC support ticketing system
when encountering technical issues not covered by the OSDC support documentation
Open Science Data Cloud researchers from all over the world gathered June 16-20
in the Netherlands at the University of Amsterdam (UvA) Science Park for the
annual OSDC Partnerships for International Research and Education (PIRE)
Workshop. At the workshop, this year's selected OSDC PIRE fellows kicked off
their fellowships by meeting their international summer research hosts
and being trained in the basics of data science and cloud computing from experts
in the field.
Over the course of the week, the fellows learned about open data repositories
such as the OSDC Public Data Commons,
the ENVRI project, the Global Biodiversity Information Facility,
data.tt out of Trinidad and Tobago, and Japan's Landsat-8
Real-time Release site. They worked through tutorials
on tools for data intensive research such as the Open Science Data Cloud
and projects like SAGA (Simple API for Grid Applications).
The fellows also learned best practices for data visualization and research
Armed with these new skills, the fellows formed teams to compete in a data
science hack-a-thon challenge with great results. Teams worked on projects
aimed at facilitating cross-disciplinary data analysis, using OSDC public datasets
for educating the public on extreme weather conditions, developing mobile apps
using public geospatial datasets, and making clouds like OSDC easier for scientists
The first place team, Cody Buntain (University of Maryland) and Nelson Auner (University of Chicago),
created a program they call "Mayfly," a toolkit that enables
reproducible research by allowing researchers to easily publish and share their
analysis, data visualizations, and results to Dropbox for others to view.
The team installed their toolkit on an OSDC public virtual machine snapshot
for any OSDC user to adopt and also made the source code and documentation
available on github for other users.
All teams delivered impressive results after only a few short days of work during
the workshop. Imagine what else could be accomplished!
The OSDC and Bionimbus were featured in a June 2014 article in Scientific American
called "Bioinformatics: Big Data Versus the Big C."
Analysing the genomes of 8,200 tumours is just a start. Researchers are “trying to figure out
how we can bring together and analyse, over the next few years, a million genomes”, says Robert
Grossman, who directs the Initiative in Data Intensive Science at the University of Chicago in
Illinois. This is an immense undertaking; the combined cancer genome and normal genome from a
single patient constitutes about 1 terabyte (1012 bytes) of data, so a million genomes would
generate an exabyte (1018 bytes). Storing and analysing this much data could cost US$100
million a year, Grossman says."
University of Chicago Pathologist and OSDC user Megan McNerney's discoveries (M. E. McNerney et al.
Blood 121, 975–983; 2012) are featured as a bioinformatics project that has shown the benefits of mining data.