Open Science Data Cloud researchers from all over the world gathered June 16-20
in the Netherlands at the University of Amsterdam (UvA) Science Park for the
annual OSDC Partnerships for International Research and Education (PIRE)
Workshop. At the workshop, this year's selected OSDC PIRE fellows kicked off
their fellowships by meeting their international summer research hosts
and being trained in the basics of data science and cloud computing from experts
in the field.
Over the course of the week, the fellows learned about open data repositories
such as the OSDC Public Data Commons,
the ENVRI project, the Global Biodiversity Information Facility,
data.tt out of Trinidad and Tobago, and Japan's Landsat-8
Real-time Release site. They worked through tutorials
on tools for data intensive research such as the Open Science Data Cloud
and projects like SAGA (Simple API for Grid Applications).
The fellows also learned best practices for data visualization and research
Armed with these new skills, the fellows formed teams to compete in a data
science hack-a-thon challenge with great results. Teams worked on projects
aimed at facilitating cross-disciplinary data analysis, using OSDC public datasets
for educating the public on extreme weather conditions, developing mobile apps
using public geospatial datasets, and making clouds like OSDC easier for scientists
The first place team, Cody Buntain (University of Maryland) and Nelson Auner (University of Chicago),
created a program they call "Mayfly," a toolkit that enables
reproducible research by allowing researchers to easily publish and share their
analysis, data visualizations, and results to Dropbox for others to view.
The team installed their toolkit on an OSDC public virtual machine snapshot
for any OSDC user to adopt and also made the source code and documentation
available on github for other users.
All teams delivered impressive results after only a few short days of work during
the workshop. Imagine what else could be accomplished!
The OSDC and Bionimbus were featured in a June 2014 article in Scientific American
called "Bioinformatics: Big Data Versus the Big C."
Analysing the genomes of 8,200 tumours is just a start. Researchers are “trying to figure out
how we can bring together and analyse, over the next few years, a million genomes”, says Robert
Grossman, who directs the Initiative in Data Intensive Science at the University of Chicago in
Illinois. This is an immense undertaking; the combined cancer genome and normal genome from a
single patient constitutes about 1 terabyte (1012 bytes) of data, so a million genomes would
generate an exabyte (1018 bytes). Storing and analysing this much data could cost US$100
million a year, Grossman says."
University of Chicago Pathologist and OSDC user Megan McNerney's discoveries (M. E. McNerney et al.
Blood 121, 975–983; 2012) are featured as a bioinformatics project that has shown the benefits of mining data.
Members of the OCC and OSDC team were present during the recent The Cancer Genome Atlas (TCGA)
symposium at which OCC Founder and Director Robert Grossman gave a keynote address that considered the
future of genomics and bioinformatics research.
Dr. Grossman framed the future of bioinformatics research and sharing large genomic datasets as an
extension of Garrett Hardin's 1968 publication, The Tragedy of the Commons.
The Bionimbus PDC and the OSDC's Public Data Commons
are excellent examples of Dr. Grossman and the OCC's efforts to provide shared, public resources in an open-source environment to the
both the genomics community and researchers across all disciplines to facilitate discovery.
You can watch the full speech here.
This week OSDC lead Maria Patterson will participate in the 2014 HyspIRI Symposium
in Maryland as part of the OCC’s collaboration with NASA, Project Matsu. Dr.
Patterson will give a talk on the Matsu Wheel for analytics that nightly processes
large volumes of satellite data. Stuart Fry, Dan Mandl, Pat Cappelcare and Vuong
Ly of Project Matsu will also be presenting and organizing.
The symposium will focus on enabling the evolution of land imaging by using new
approaches and products. Participants will discuss ways the HyspIRI mission and
other new technologies can help address sustainable imaging land requirements.
The HyspIRI mission includes two instruments mounted on a satellite. There is an
imaging spectrometer measuring from the visible to short wave infrared and a
multispectral imager measuring the mid and thermal infrared (TIR). You can
learn more about the HyspIRI mission here: http://hyspiri.jpl.nasa.gov/
One of the OCC’s key members, University of Chicago, is hiring for 4 positions in their Center for Data Intensive Science. These positions will work closely with our OSDC and OCC team.
If you’re interested or know someone qualified who might be, applications are being accepted for the following positions:
- Director of Security x1
- Bioinformaticians x4
- Linux System Administrators x4
- Software Engineers x4
To learn more:
Members of the OCC team are in Texas this week at the Open Big Cloud Symposium. The Symposium
aims to bring together the brightest minds in industry, academia, and research to discuss the
future of cloud computing and Big Data.
The conference will explore bringing the Cloud to the Enterprise, models and benefits, Cloud
Operation Model (DevOps), Open Technologies and best practices including software and hardware
disaggregation, Cloud and BigData for Scientific and Engineering workloads.”
To learn more visit: http://www.opencompute.org/community/events/ocp-on-the-road/open-bigcloud-symposium-and-ocp-workshop-2014
Maria Patterson, a research scientist at the Center for Data Intensive Science at the University
of Chicago and a lead for the Open Science Data Cloud will be giving a talk on 4.17 on the working
with large scientific datasets.
This talk will be an overview of the OSDC, one of the world’s largest general purpose science clouds
managed by the Open Cloud Consortium (OCC), and information on how to collaborate with the OSDC on
research projects involving data intensive computing. This talk will also discuss the NSF-funded
Partnership for International Research and Education (PIRE) fellowship opportunities for summer 2014.
Find out more about OSDC here https://www.opensciencedatacloud.org/
and the NSF PIRE fellowship here http://pire.opensciencedatacloud.org/.
Interested in getting some hands on experience on working with big data? Apply now for an OSDC
PIRE fellowship, and study abroad with one of our international partners to enhance your skills.
The deadline for application is 4/30/14.
You can learn more about the PIRE Program by watching these informative videos:
If you’re looking to learn more about what the OSDC is, how to use it for research, and how to apply for
a resource allocation you can learn more by watching this video demo.
Some of the topics covered include:
- What is the OCC?
- Why use the OSDC?
- How do I apply for a Resource Allocation on the OSDC?
- Learning about Pubkeys, VMs, Images, and Snapshots
- Example: Analysis of Data from the OSDC Public Data Commons
- Sample of OSDC partner projects
The Open Cloud Consortium is pleased to announce membership in the UK Federation. This allows researchers
from participating UK Organizations to gain authenticated access to the Open Science Data Cloud. The UK
federation uses the standards based Shibboleth software to facilitate the sharing of web resources that
are subject to access control.
If your institution is an Identity Provider (IdP) associated with the UK Federation, but is not listed
in the OSDC apply page, please contact us at firstname.lastname@example.org and we’ll work with you
to get your organization to release the appropriate attributes.