Google Cloud and FDA MyStudies: Harnessing real-world data for medical research

Google Cloud is committed to helping customers conduct life-saving research that results in new medications, devices and therapeutics by unlocking the knowledge hidden in real-world data. That’s why we’re supporting the goals of the U.S. Food & Drug Administration, by making the FDA’s open-source MyStudies platform available on Google Cloud Platform. By building on the platform developed by the FDA, we hope to stimulate an open ecosystem that will improve the ability of organizations to perform research that leads to better patient outcomes. This collaboration continues our long history of open-source work, and our commitment to producing easy-to-use tools that serve the healthcare and life sciences community.

Because of the FDA’s focus on real-world evidence, drug and device organizations are increasingly looking to incorporate patient-generated data into regulatory submissions for new products and treatment indications. But until recently, there haven’t been mobile technologies or methodologies to help collect, store and submit this kind of data in a regulatory compliant manner. In order to address this gap, the FDA developed MyStudies, an open-source technology platform that supports drug, biologic and device organizations as they collect and report real-world data for regulatory submissions.

Google Cloud is now working to expand the FDA’s MyStudies platform with built-in security and configurable privacy controls, and the ability for research organizations to automatically detect and protect personally identifying information. When an organization deploys FDA MyStudies on Google Cloud, a unique and logically isolated instance of the platform is created that only that organization and its delegates are authorized to access. These technologies will allow a research organization to select which of its researchers and clinicians are able to access what data, and to help optimize the use of that data as directed by participants. By leveraging Google Cloud as the underlying infrastructure for their FDA MyStudies deployments, organizations will have more safeguards in the ownership and management of data in their studies.

Further, Google Cloud is providing sponsorship to bring Stanford University’s MyHeart Counts cardiovascular research study onto the FDA MyStudies platform, enabling this groundbreaking virtual clinical study to begin enrolling users of both Android and iOS devices. Since it launched as one of the initial iOS research applications, MyHeart Counts has enrolled more than 60,000 participants and driven significant understanding of the feasibility of conducting large-scale, smartphone-based clinical trials. 

Enabling patient-reported data with MyStudies

The FDA relies on clinical trials and studies submitted by study sponsors to determine whether to approve, license or clear a drug, biologic or device for marketing in the United States. Historically, this information has been obtained almost exclusively through traditional clinical trials conducted under tightly controlled conditions. However, the increased digitalization of patient healthcare data may help to improve health with high-quality real-world evidence and more efficient clinical trials.

The FDA has recognized this opportunity. For example, the agency’s Patient Engagement Advisory Committee is now helping assure the experiences of patients are included as part of the FDA’s deliberations on complex issues involving the regulation of medical devices. And, in 2017, the FDA Center for Devices and Radiological Health released a guidance document addressing real-world evidence generation for medical devices. The FDA has also released several draft Patient-Focused Drug Development guidance documents addressing how stakeholders can collect and submit patient experience data to support regulatory decision-making. Finally, in 2018, the FDA also released a Real-World Evidence Framework which details the agency’s efforts to evaluate real-world evidence for drugs and biologics as mandated by the 21st Century Cures Act.

Originally launched as a publicly available resource in November of 2018, FDA’s MyStudies platform includes important features supporting patient accessibility and privacy. The patient-facing mobile application was built for Android using the open-source ResearchStack framework, and for iOS using Apple’s ResearchKit framework. By using these frameworks, developers can expand the capabilities of open-source mobile applications or create their own proprietary and branded applications. MyStudies mobile applications are configurable for different therapeutic areas and health outcomes through a web-based interface that reduces the need for custom software development. The overall platform has been designed to support auditing requirements for compliance with 21 CFR Part 11, allowing the platform to be used for trials under Investigational New Drug (IND) oversight.

Study sponsors have already leveraged the FDA’s existing MyStudies platform to build branded and customized mobile applications to administer questionnaires that assess patient-reported outcomes, patient reports of prescription and over-the-counter medication use, trial medication diaries and other patient experience data. Supporting MyStudies on Google Cloud will make it even easier for new study sponsors to benefit from the MyStudies platform.

New platform, new opportunities

Now, Google Cloud is equipping the FDA’s MyStudies platform with an additional set of capabilities that reduce complexity and overhead, allowing pharma and medtech organizations to get up and running fast. For study designers who do not want to configure a compliant environment from scratch, a ‘click-to-deploy’ option will be available in the Google Cloud Marketplace later this year. When deploying FDA MyStudies on Google Cloud using this option, a private MyStudies instance is built from the open-source repository. That instance is then configured following best practices to operate with selected Google Cloud services. This allows research groups to establish their own, preconfigured instance of the FDA’s MyStudies platform in minutes.

“Consistent with our obligations under the 21st Century Cures Act, FDA engages in public-private demonstration projects to advance the regulatory science around real-world evidence. The Patient Centered Outcomes Research Trust Fund investment that launched FDA MyStudies is a step toward this goal,” said David Martin, MD, associate director for Real-World Evidence Analytics, Office of Medical Policy, FDA Center for Drug Evaluation and Research. “FDA MyStudies is publicly available, but it requires professional expertise and time to progress from open-source resources to deployment of a new re-branded platform. As a company may do, Google Cloud is taking these resources and creating a click-to-deploy option linked to additional health data management and analytics.”

Besides streamlined deployment of the open-source software, drug and device companies running FDA MyStudies on Google Cloud can benefit from integration with other Google Cloud offerings, such as managed services that support HIPAA compliance like the Healthcare API and our serverless data warehouse, BigQuery. More information about compliance on Google Cloud and an up-to-date list of products covered by our BAA can be found here.

In addition to HIPAA compliance, Google Cloud can support customer compliance with CFR 21 Part 11 regulations when using Google Cloud in a prescribed manner to handle related data and workloads. While Google has a cloud technology stack that is ready for many CFR 21 Part 11 compliant workloads, the ultimate compliance determination depends on configuration choices made by the customer.

MyHeart Counts + FDA MyStudies on Google Cloud

Stanford University made mobile health history when it launched MyHeart Counts in 2015 as part of the inaugural cohort of iOS research applications. As an open enrollment study, any eligible individual who downloads the MyHeart Counts app may consent to participate in cardiovascular research. Once enrolled, participants are asked survey questions related to their health and physical activity. Participants may allow MyHeart Counts to collect physical activity data from their phone and other wearable devices. If participants are physically able, they will be asked to perform a 6-minute walk test, then enter information about risk factors and blood tests, which is used to determine a cardiovascular risk score.

The current version of MyHeart Counts is only available on iOS devices. By using FDA MyStudies on Google Cloud, the Stanford researchers behind MyHeart Counts will conduct a multi-arm, randomized controlled trial that runs on both Android and iOS devices—the first of its kind. Additional improvements to the FDA MyStudies platform will allow researchers like those conducting MyHeart Counts to configure and deploy studies in days rather than months, without needing to develop any software.

The study is being overseen by Professor Euan Ashley, MBChB, DPhil, professor of medicine, of genetics and of biomedical data science at Stanford. “In this digital era where everyone uses a smartphone, hosting a trial on an app lets us tap into a huge population. We are grateful for Google’s support because it enables us to expand our reach to include Android participants in addition to iOS, and incorporate an open-enrollment randomized controlled trial into a mobile application for the first time,” Prof. Ashley said.

“MyHeart Counts and digital apps like it allow experts to connect directly to patients in a way that’s more immediate and more extensive, through direct, sensor-based measurement collection. Google Cloud’s support of these efforts not only helps researchers organize and deploy important research programs faster and more reliably, but ultimately will help patients and doctors notice health issues early, so they can address them sooner,” said Prof. Ashley.

What’s next?

In the spirit of our commitment to healthcare and open-source, Google Cloud will continue investing in MyStudies to bring general improvements to the platform, expand the number of supported assessments and enable integration with downstream analytics and visualization tools.

Advancing the medical imaging field with cloud-based solutions at RSNA

The healthcare industry is increasingly embracing the cloud, and to help, we’ve developedhealthcare andlife sciences solutions that make it easier for organizations to transition to cloud technologies. Today, at the annual meeting of the Radiological Society of North America (RSNA), we’re excited to share the ways we’re enabling our customers and partners, through managed DICOM services, analytics, and AI, to make advances toward their clinical and operational goals in medical imaging.

At RSNA, we’ll be showcasing a number of end-to-end solutions and partner offerings. Specifically, we’ll be demonstrating solutions that enable de-identification of data in DICOM images and HIPAA-supported deployments so that our customers and partners can focus on their core business—not on managing and implementing infrastructure. 

Sharing the work of our customers and partners

More than a dozen customers and partners will be joining us this week to give live demos, host lightning talks, and share their innovations at RSNA. Some of the topics include:

  • Disaster recovery and vendor neutral archiving solutions running on Google Cloud.
  • Google Cloud as an enabler for next-generation PACS solutions.
  • A real-world evidence platform on Google Cloud.
  • A zero-footprint teleradiology solution. 
  • Machine learning to optimize workflow solutions and reduce annual costs.

You can find a full agenda below. Stop by booth #11318 in the North Hall Level 2 in the AI showcase to see these solutions in action.

Advancing research and AI in radiology

The importance and impact of AI in radiology has been rapidly expanding over the past few years—as can be seen with the growing size of the AI Showcase at RSNA. As an “AI first” company, we are committed to growing the ecosystem of AI developers, fostering new talent, and advancing research. Through Kaggle, and together with RSNA, we have hosted a number of medical imaging AI-based competitions to help encourage AI-based innovation in areas of medical need.

Last year, we hosted an AI competition, where over 1400 teams participated in building algorithms to detect a visual signal for pneumonia, one of the top 15 leading causes of death in the United StatesEarlier this year, we launched another healthcare AI competition in collaboration with RSNA. For this challenge, Kaggle participants built algorithms to detect acute intracranial hemorrhage and its subtypes. This year’s competition drew 1,345 teams, 1,787 individuals across those teams, and over 22,000 submissions. By supporting these competitions, we hope to inspire more AI researchers to build algorithms and models that positively impact the healthcare community.

Visit us at RSNA

If you’re planning to attend RSNA, we’d love to connect! Stop by booth #11318 in the North Hall Level 2 in the AI Showcase to say hello and learn more about how we’re working with customers, partners and patients to engineer a healthier world together. 

You’re invited to join our corporate symposium “Journey to the Cloud.” A number of our customers and partners will be on hand to share their experiences using Google Cloud to drive innovation in the PACS industry, enable real world evidence, and accelerate of new imaging solutions. The session is scheduled for Dec 3 at 9am CT (room S102AB, South Building, Level 1).

For a full list of Google Cloud activities, partners, demos, and presentations at RSNA, please review the Google Cloud guide to RSNA 2019.

We look forward to seeing you in Chicago!

Prescriptions for healthcare data management systems on GCP

Like many other industries, healthcare has seen rapid adoption of cloud-based resources in order to do things like store, process, and analyze vast amounts of data. However, given the healthcare industry’s complexity, many healthcare organizations have found it particularly challenging to create cloud-based solutions.

Beyond the technical challenges of implementing highly scalable and highly available systems, a critical consideration for any healthcare solution is also how to work with protected health information (PHI). Regulations around the world (for example, HIPAA in the United States) dictate how patient information must be handled and stored. In addition, while it’s important for healthcare organizations to be able to share data across the industry, they use a large variety of data formats and schemas. This can make it complex to combine data types.

To tackle this challenge, Google has built a number of healthcare-specific products that use common healthcare data formats, such as FHIR, HL7v2, and DICOM. In addition, to make it easier to understand and design healthcare data-management systems on Google Cloud Platform (GCP), we’ve published a number of solutions documents that address the issues that are important to the industry. These documents give you background in some of the issues that you might face when implementing healthcare solutions, along with prescriptive guidance on how to move your healthcare systems and data to GCP.

Building out a healthcare solution

Building out a GCP-based system for your healthcare data involves a lot of components and moving parts. Understanding all the pieces and assembling them yourself takes time and study. It’s especially important that your system has the controls that are required in order to help align with data-privacy concerns.

To get you started, the GCP Cloud Healthcare team has created the Google Cloud Healthcare Data Protection Toolkit—a set of scripts and procedures that walk you through the process and that do a lot of the work. The toolkit is available as an open source project under an Apache Licence, Version 2 on the GCP GitHub repository. 

To put the toolkit into context and show how it fits into a full healthcare system, we’ve published a pair of accompanying solutions. The first is Architecture: HIPAA-aligned Cloud Healthcare by Adrish Sannyasi, a Google Cloud solutions consultant. This document provides background on the unique concerns for creating healthcare solutions and shows you the architecture that the toolkit and accompanying solutions help you build:

Architecture HIPAA aligned Cloud Healthcare.png

As the document notes, this is not a final, production-ready system; it’s a reference architecture that’s designed to illustrate the components you need and how they fit together. The expectation is that you’d use this as a base for creating an architecture that incorporates your own requirements and usage. 

The document also delves into different facets of a full healthcare solution. It describes security and permissions, connectivity with your on-premises system, and logging and monitoring—all from the perspective of what’s necessary for a system that aligns with healthcare concerns.

From there, you can turn to the related solution Setting up a HIPAA-aligned project. This document provides detailed instructions for using the toolkit to build out an instance of the reference architecture. The tutorial walks you through every step, from creating a new project all the way through checking BigQuery logs to check for suspicious activity. When you’re done, you’ll not only have exercised the toolkit, but you’ll have a system that you can extend to meet your own needs.

Ingesting medical records

In the last few years, the Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a way to store and share medical records. FHIR defines both a way to represent data (JSON, XML, RDF) and a protocol for sharing records (REST, HTML). 

We recently published a solution that explains in detail how you can use the Cloud Healthcare API to work with FHIR in GCP. In Importing FHIR clinical data into the cloud using the Cloud Healthcare API, we lay out the benefits of a Cloud Healthcare API FHIR store. For example, the store can become a source for other GCP-based apps or for analysis in BigQuery and for machine learning. The API can also help with de-identifying data if you want to use it for apps that require anonymous data.

The solution then gets into the details of how to load (ingest) data into GCP, covering the following scenarios:

  • Near real-time ingestion, which loads one record at a time.

  • Bundled ingestion, in which you pass a set of records to be ingested. These can either be a simple batch of individual records, or a set of records that you ingest using transactions in order to have an all-or-nothing import of related records.

  • Batch ingestion, in which the import process reads a series of prepared files from Cloud Storage. 

The solution explores additional options, such as automating the ingestion process, using Cloud Functions to pre- or post-process records, using Cloud Pub/Sub to create subscriptions that watch events on buckets or other data stores, and using Cloud Dataflow to work with streaming data. 

The solution explicitly notes where you should be careful with security and permissions. For example, it lays out the Cloud IAM roles that the Cloud Healthcare API uses for creating and managing the data, and what permissions those roles need. 

De-identifying medical data

Finally, healthcare information isn’t used only for patient care. For example, medical images that help medical professionals diagnose patients are also valuable to researchers as data. But clearly, privacy concerns mean that researchers should not share or publish their research in a way that shows patient information. Therefore, any personally identifying information (PII) and protected health information (PHI) should be removed from the data before publication. 

In GCP, researchers can perform this process, known as de-identification, by using the Cloud Healthcare API. The following image shows an x-ray after it’s been de-identified:

x-ray de-identification.png

We have two solutions that describe the de-identification process. De-identification of medical images through the Cloud Healthcare API is an overview that covers how the process works, including what GCP services you can use to ingest images, and how to store them before and after you de-identify them. The solution also discusses how to use DICOM tag keywords to specify what data to de-identify.

A second solution, Using the Cloud Healthcare API to de-identify medical images, is a step-by-step tutorial that shows you how to de-identify data using the Cloud Healthcare API on your own DICOM dataset. The tutorial discusses two use cases. In the first, you leave only the minimum amount of data in the images. The second use case involves removing and modifying metadata and redacting any text in the images. This de-identification process maintains medically or scientifically relevant information. 

The tutorial not only shows you how to invoke the API to perform these tasks, but also explains details like how to set appropriate permissions to allow the API access to the data. 

Check out our Google Cloud healthcare solutions page to learn more.

Change Healthcare: Building an API marketplace for the healthcare industry

Today we hear from Gautam M. Shah, Vice President, API and Marketplaces at Change Healthcare, one of the largest independent healthcare technology companies in the United States. Change Healthcare provides data and analytics-driven solutions and services that address the three greatest needs in healthcare today: Reducing costs, achieving better outcomes, and creating a more interconnected healthcare system. 

Healthcare is a rapidly evolving industry. There is an urgent need to bridge gaps and connect multiple data sources, transactions, data owners and data users to improve all parts of the healthcare system. At Change Healthcare, we are rethinking and transforming how we approach our products and how we use APIs to achieve this goal. Taking a user-centered, outside-in approach, we identify, develop, and productize “quanta of value” within our portfolio (“quanta,” the plural of “quantum,” refers to small but crucial pockets of value). 

We connect and integrate those quanta into our own and our partners’ products to create a broader set of more impactful solutions. This approach to creating productized APIs enables us to bridge workflows and remove data silos. We bundle productized APIs to power solutions which open new possibilities for powering exceptional patient experiences, enhancing patient outcomes, and optimizing payer and provider workflows and efficiencies. 

To support this goal, we needed a way to support a large population of API producers, engage several segments of API consumers, and rethink how we bring API products to market at scale. We aren’t just delivering code; we’re creating and managing a broad product portfolio throughout its lifecycle. We take our APIs from planning, design, and operation through evolution and to retirement. 

Operating these products requires meeting the needs of many API producers, allowing for marketing and product enablement, supporting different distribution channels and pricing, and enabling rapid product and solution creation. We also have to do all of this while prioritizing security and requiring a minimum of added platform development or customization. In short, we need an enterprise marketplace enablement platform. We chose the Apigee API Management Platform because it allows us to do all this.

Why Apigee?
Change Healthcare is building a marketplace to advance API usage across the healthcare ecosystem. This marketplace, the API & Services Connection, is a destination where our internal users, customers, partners, and the healthcare ecosystem can readily discover, interact with, and consume our broad portfolio of clinical, financial, operational, and patient experience products and solutions in a secure, simple, and scalable manner.

Using Google Cloud’s enterprise-class Apigee API Management Platform to power our marketplace allows us to support our entire organization with a standard set of tools, patterns, and processes. Using these common, and in some cases, pre-established, sets of security, performance, and operation standards frees our API producers from worrying about the mechanics of how to deploy their products, and allows them to focus on creating the best possible solutions. It also provides us with robust proxy development and management capabilities, allowing us to access and distribute existing APIs and assets, thereby eliminating the need for complex migrations.

We empower our diverse mix of API producers by leveraging the full range of Apigee capabilities to automate engagement, integrate with different development methods, support visibility of products and pricing models, and measure usage, engagement, and adoption. By taking a “self-service first” approach, we allow our API producers to operate in line with their business processes and needs of the enterprise, while at the same time giving them the tools and metrics they need to create and optimize their products. 

We also use the Apigee bundling capabilities to allow our producers to easily create and productize API bundles, which are then used to develop solutions that incorporate leading-edge technologies to solve more complex problems. 

Our customer-facing marketplace makes the most of how Apigee supports distribution of APIs to multiple marketplaces, including a fully customizable developer portal. This capability gives us the ability to build private API user communities, create experiences for multiple customer segments, and distribute our APIs across multiple storefronts. 

Apigee lets us do all this while maintaining a common enterprise platform from which to control availability, monetization, and monitoring. In this way we can distribute our API assets internally and also allow our API producers to target how they want to manage their API products externally. Producers also benefit from rich engagement and usage data to better segment and target product availability, and pricing. Apigee also supports creating a more immersive and interactive experience for API consumers, enabling us to provide technical and marketing documentation, a sandbox, and connections to our product teams and other users.

Fulfilling a bold vision
At Change Healthcare, we believe APIs are the present and the future. Today, our APIs power our products and enable us to serve the needs of the entire healthcare ecosystem. Looking forward, our APIs will power growth by enabling internal users to take advantage of valuable capabilities we’ve created, as well as make those capabilities easily available to external users. Armed with these productized APIs, our developers, customers, partners—ultimately all parts of the ecosystem—will be able to deliver new and innovative products that combine interoperable data, differentiated experiences, optimized workflows, and new technologies such as AI and blockchain.

We’re just getting started with APIs! We’ve launched the first version of the API & Services Connection developer portal, and now have a standard method of engagement with our API producers and a place to drive internal visibility and external discovery. Our partnership with Apigee works well for us because we can demonstrate that we share the same goals internally and externally, and ultimately use the same set of tools to drive transformation. As our vision becomes a reality, we look forward to engaging not only more of our internal teams, but our partners and customers as well. Together we will use APIs to break down silos in healthcare, and ultimately create a more interoperable healthcare system for patients, providers, and payers. 

Learn more about API management on Google Cloud.

How Google and Mayo Clinic will transform the future of healthcare

Every year, more than 1 million patients from 140 countries visit Mayo Clinic. Ranked the number one hospital in the nation by U.S. News and World Report, the renowned hospital and research center exists to solve the world’s most challenging medical problems, one patient at a time. 

As the global expert in solving rare and complex disease, Mayo Clinic has a long history of excellence in healthcare and medical innovation. This rich legacy has long been supported by Mayo Clinic’s focus on innovation, research, and cutting-edge science. As healthcare increasingly embraces digital technology, the collection, management and analysis of complex healthcare data has become a critical factor in providing advanced care to patients worldwide. 

With these factors in mind, Mayo Clinic has chosen to partner with Google to positively transform patient and clinician experiences, improve diagnostics and patient outcomes, as well as enable it to conduct unparalleled clinical research.

This strategic partnership will combine Google’s cloud and AI capabilities and Mayo’s world-leading clinical expertise to improve the health of people—and entire communities—through the transformative impact of understanding insights at scale. Ultimately, we will work together to solve humanity’s most serious and complex medical challenges. 

Google Cloud will be at the cornerstone of Mayo Clinic’s digital transformation. We’ll enable Mayo Clinic to lay out a roadmap of cloud and AI-enabled solutions and will help Mayo Clinic develop a bold, new digital strategy to advance the diagnosis and treatment of disease. 

“When selecting a technology partner, Mayo Clinic was looking for an organization with the engineering talent, focus and cloud technology to collaborate with us on a shared vision to deliver digital healthcare innovation at a global scale,” said Christopher Ross, Chief Information Officer, Mayo Clinic. “With Google Cloud’s secure and compliant digital platform, we will be able to leverage innovative cloud technology, industry leading AI and healthcare specific solutions, so we can focus on revolutionizing healthcare delivery and taking care of our patients.”

In addition to building its data platform on Google Cloud, Mayo’s world-class physician leadership is partnering with Google to create machine-learning models for serious and complex diseases. Eventually, Mayo Clinic hopes to share these models and other joint solutions with caregivers across the globe to improve healthcare delivery. Mayo also looks forward to exploring additional points of collaboration with Google Health in the future.

As part of our partnership, Google will be opening a new office near Mayo Clinic’s headquarters in Rochester, Minn. Working alongside Mayo Clinic’s world-leading medical experts and researchers, we look forward to bringing Google Cloud’s data analytics and AI engineering capabilities to the forefront of patient care.

We are excited that Google Cloud will be core to Mayo Clinic’s digital transformation, and we look forward to combining our world-class cloud technology and engineers with Mayo Clinic’s industry-leading clinicians, researchers and innovators. Working together, we can transform healthcare and improve lives.

How Moorfields is using AutoML to enable clinicians to develop machine learning solutions

The democratization of AI and machine learning holds the promise for outcomes with enormous human benefit, and nowhere is this more apparent than in health and life sciences. One such example is Moorfields Eye Hospital NHS Foundation Trust, the leading provider of eye health services in the UK and a world-class centre of excellence for ophthalmic research and education.

In 2016, Moorfields announced a five-year partnership with DeepMind Health to explore whether artificial intelligence (AI) technology could help clinicians improve patient care. Last year, as a result of this partnership, Moorfields announced a major milestone for the treatment of eye disease. Its AI system could quickly interpret eye scans from routine clinical practice for over 50 sight-threatening eye diseases—as accurately as world-leading expert doctors.

Today, Moorfields has announced another new advancement, which has been published in The Lancet Digital Health. Using Google Cloud AutoML Vision, clinicians without prior experience in coding or deep learning were able to develop models to accurately detect common diseases from medical images.

As Pearse Keane, Consultant Ophthalmologist at Moorfields Eye Hospital who led this project said:

“At present, the development of AI systems requires highly specialised technical expertise. If this technology can be used more widely—in particular by healthcare professionals without computer programming experience—it will really speed up the development of these systems with the potential for significant patient benefits.’

Although the ability to create classification models without deep understanding of AI is attractive, comparative performance against expertly-designed models is still limited to more simple classification tasks. Pearse adds: “The process needs refining and regulation, but our results show promise for the future expansion of AI in medical diagnosis.”

Google Cloud AutoML is a set of products that allows users without ML expertise to develop and train high-quality machine learning models. By applying Google’s cutting-edge research in transfer learning and neural architecture search technology,  users can leverage the results of existing state-of-the-art ML models to build new ones with brand new data. Because the most complex part of the model–feature extraction–is pre-trained, classification in a new dataset is fast and accurate. The team at Moorfields was able to quickly train and evaluate five different models using Cloud Auto ML.

The Moorfields team started by identifying five public open-source datasets that their researchers could use to test and train models. These included de-identified medical images from the fields of ophthalmology, radiology, and dermatology such as eye scans, chest x-rays and photos of skin lesions. After learning  how to use Cloud Auto ML Vision by reviewing ten hours of online documentation, two researchers assembled and reviewed data sets simultaneously. They then worked together to build the models. After the images were uploaded to Google Cloud, AutoML Vision was used to train each model for up to 24 hours. 

The resulting models were then compared to published results from deep learning studies. All of the models the researchers created except one performed as well as state-of-the-art deep learning algorithms. The research demonstrates the potential for clinicians without AI expertise to explore and develop technologies to transform patient care. Beyond allowing clinicians to build and test diagnostic models, AutoML can be used to train physicians in the basics of deep learning. While the focus of this research was not centered around interpretability, it is understood to be of critical importance for medical applications.

AI continues to pave the way for advancements that improve lives on a global scale—from business to healthcare to education. Cloud AutoML has already been used by researchers to assess and track environmental change, by scientists to help monitor endangered species, and by The New York Times to digitize and preserve 100 years of history in its photo archive. We’re excited to see how businesses and organizations across the world apply AI to solve the problems that matter most.

McKesson chooses Google Cloud to help it chart a course to the future

From centralizing data management to using artificial intelligence (AI) to make healthcare predictions, advances in technology are transforming all medical disciplines. And as healthcare organizations strive to keep up with increasing patient expectations, many are looking to the cloud to find new ways to deliver quality, affordable services to patients, members and customers.

Today, we are thrilled to announce that McKesson has selected Google Cloud as its preferred cloud provider. A Fortune 6 company, McKesson is a global leader in healthcare supply chain management solutions, retail pharmacy, community oncology and specialty care, and healthcare information technology. Its aim is to deliver more value to its customers and the healthcare industry—quickly and efficiently—through common platforms and resources.

McKesson will take advantage of Google Cloud in numerous ways. The company will use Google Cloud Platform’s managed services, as well as healthcare-specific services such as theCloud Healthcare API, to help enhance its platforms and applications. It will use analytics on Google Cloud to make data-driven decisions for product manufacturing, specialty drug distribution, and pharmacy retail operations. Also, McKesson will migrate and modernize the mission critical SAP environment it uses to run its business to Google Cloud. Through the power of the cloud, McKesson hopes to create and modernize next generation solutions to deliver better healthcare—one patient at a time.

“This partnership will support our continued digital transformation,” said Andrew Zitney, senior vice-president, CTO of McKesson Technology. “It will not only accelerate and expand our strategic objectives, it will also help fuel next generation innovation by driving new technologies, advancing new business models and delivering insights.”

As we evolve to a more digitally-based healthcare environment, cloud computing will change how healthcare providers deliver quality, affordable services to their patients, members and customers. We believe our collaboration with McKesson will bring significant value to the healthcare ecosystem by building on Google Cloud’s secure, flexible and connected infrastructure to create and deploy better healthcare solutions.

Announcing the Cloud Healthcare API beta: Improving data access and shareability across organizations

At Google Cloud, we are focused on providing healthcare and life sciences organizations with innovative technology needed to improve our healthcare system. Through our customers and partners, we are working to improve healthcare for patients, providers, payers, and the many organizations involved in the discovery, development, and delivery of healthcare products and services.

Today, we’re pleased to announce that our Cloud Healthcare API is now in beta. From the beginning, our primary goal with Cloud Healthcare API has been to advance data interoperability by breaking down the data silos that exist within care systems. The API enables healthcare organizations to ingest and manage key data—and better understand that data through the application of analytics and machine learning in real time, at scale.   

Cloud Healthcare API offers a managed solution for storing and accessing healthcare data in Google Cloud Platform (GCP), providing a critical bridge between existing care systems and applications hosted on Google Cloud. Using the API, customers can unlock significant new capabilities for data analysis, machine learning, and application development. These capabilities, in turn, enable the next generation of healthcare solutions.

While our product and engineering teams are focused on building products to solve challenges across the healthcare and life sciences industries, our core mission embraces close collaboration with our partners and customers. This is why we are also excited to share how some of our newest customers and partners are leveraging Google Cloud to transform the healthcare industry.   

Next week at Google Cloud Next, we’ll hear from many healthcare customers, including:

  • American Cancer Society will outline how it is using Cloud ML Engine on GCP to accurately and quickly identify novel patterns in digital pathology images.

  • Hunterdon Health will discuss how cloud-native endpoints like Chrome Enterprise can be deployed throughout your healthcare network to increase information access, reduce operational costs, and deliver a better patient experience.

  • Stratus Medicine will review a serverless architecture for generating real-time clinical predictions using Cloud Healthcare API to feed FHIR and DICOM data into Cloud Machine Learning Engine.

  • CareCloud will discuss mapping X12 EDI transactions to FHIR as part of a broader approach to building a comprehensive Clinical Data Warehouse.

  • Kaiser Permanente will talk about how it leverages Google’s CI-CD process, API best practices, and Apigee API management to power its API-first strategy.

  • LifeImage will demo how they are enabling point of care epidemiology and secure image sharing networks on GCP.

  • iDigital will present their architecture for a zero-footprint teleradiology solution on top of Cloud Healthcare HL7v2 and DICOM API.

We look forward to continuing to bring innovative products to the healthcare and life sciences space, and partnering with organizations to improve our healthcare system. Visit our website to learn more about Google Cloud’s solutions in healthcare andlife sciences.

Analyzing 3024 rice genomes characterized by DeepVariant

Rice is an ideal candidate for study in genomics, not only because it’s one of the world’s most important food crops, but also because centuries of agricultural cross-breeding have created unique, geographically-induced differences. With the potential for global population growth and climate change to impact crop yields, the study of this genome has important social considerations.

This post explores how to identify and analyze different rice genome mutations with a tool calledDeepVariant. To do this, we performed a re-analysis of the Rice 3K dataset and have made the data publicly available as part of the Google Cloud Public Dataset Program pre-publication and under the terms of the Toronto Statement.

We aim to show how AI can improve food security by accelerating genetic enhancement to increase rice crop yield. According to the Food and Agriculture Organization of the United Nations, crop improvements will reduce the negative impact of climate change and loss of arable land on rice yields, as well as support an estimated 25% increase in rice demand by 2030.

Why catalog genetic variation for rice on Google Cloud?

In March 2018, Google AI showed that deep convolutional neural networks can identify genetic variation in aligned DNA sequence data. This approach, called DeepVariant, outperforms existing methods on human data, and we showed that the approach to call variants on a human can be used to call variants on other animal species. This blog post demonstrates that DeepVariant is also effective at calling variants on a plant, thus demonstrating the effectiveness of deep neural network transfer learning in genomics.

In April 2018, three research institutions—the Chinese Academy of Agricultural Sciences (CAAS), the Beijing Genomics Institute (BGI) Shenzhen, and the International Rice Research Institute (IRRI)published the results of a collaboration to sequence and characterize the genomic variation of the Rice 3K dataset, which consists of genomes from 3,024 varieties of rice from 89 countries. Variant calls used in this publication were identified against a Nipponbare reference genome using best practices and are available from the SNP-Seek database (Mansueto et al, 2017).

We recharacterized the genomic variation of the Rice 3K dataset with DeepVariant. Preliminary results indicate a larger number of variants discovered at a similar or lower error rate than those detected by conventional best practice, i.e. GATK.

In total the Rice3K DeepVariant datasetcontains ~12 billion variants at ~74 million genomic locations (SNPs and Indels). These are available in a 1.5 terabyte (TB) table that uses the BigQuery Variants Schema.

Even at this size, you can still run interactive analyses, thanks to the scalable design of BigQuery. The queries we present below run on the order of a few seconds to a few minutes. Speed matters, because genomic data are often being interlinked with data generated by other precision agriculture technologies.

Illustrative queries and analyses

Below, we present some example queries and visualizations of how to query and analyze the Rice 3K dataset. Our analyses focus on two topics:

  • The distribution of genome variant positions, across 3024 rice varieties.
  • The distribution of allele frequencies across the rice genome.

For a step-by-step tutorial on how to work with variant data in BigQuery using the Rice 3K data or another variant dataset of your choosing, consider trying out the Analyzing variants with BigQuery codelab.

Analysis 1: Genetic variants are not uniformly distributed

Genomic locations with very high or very low levels of variation can indicate regions of the genome that are under unusually high or low selective pressure.

In the case of these rice varieties, high selective pressure (which corresponds to low genetic variation) indicates regions of the genome under high artificial selective pressure (i.e. domestication). Moreover, these regions contain genes responsible for traits that regulate important cultivational or nutritional properties of the plant.

We can measure the magnitude of the regional pressure by calculating at each position the Z statistic of each individual variety vs. all varieties. Here’s the query we used to produce the heatmap below, which shows the distribution of genetic variation across all 1Mbase-sized regions across all 12 chromosomes as columns (labeled by the top colored row), vs. all 3024 rice varieties as rows. Red indicates very low variant density relative to other samples within a particular genomic region, while pale yellow indicates very high variant density within a particular genomic region. The dendrogram below shows the similarity among samples (branch length) and groups similar rice varieties together:

rice_genomes_plot.png

A high resolution PDF of this plot is available, as well as the R script used to generate it.

Some interesting details of the dataset are highlighted (in yellow) in the heatmap above:

  1. Closer inspection of chromosome 5 (cyan columns, 1Mbase blocks 9-12) shows that the distinct distribution of Z scores across samples likely occurs due to two factors:

    1. this region includes many centromeric satellites resulting in a high false-positive rate of variants detected, and

    2. a genomic introgression present in some of the rice varieties multiplies this effect (yellow rows).

  2. Nearly all of the 3024 rice varieties included in the Rice 3K dataset are from rice species Oryza sativa. However, 5 Oryza glaberrima varieties were also included. These have a high level of detected genetic variation because they are from a different species, and are revealed as a bright yellow band at the top of the heatmap.

  3. The majority of samples can be partitioned into one group with high variant density and another group with low variant density. This partition fits with previously used methods for classification by admixture. For example, the bottom rows that are mostly red correspond to rice varieties in the japonica and circum-basmati (aromatic) groups that are similar to the Nipponbare reference genome we used.

Analysis 2: Some specific regions are under selective pressure

According to the Hardy-Weinberg Principle, the expected proportion of genotype frequencies within a randomly mating population, in the absence of selective evolutionary pressure, can be calculated from the component allele frequencies. For a bi-allelic position having alleles P and Q and corresponding population frequencies p and q, the expected genotype proportions for PP, PQ, and QQ can be calculated with the formula p2 + 2pq + q2 = 1. However we need to modify this formula by adding an inbreeding coefficient F to reflect the population structure (see: Wahlund effect) and the self-pollination tendency of rice: PP=p2+Fpq ; PQ=2(1-F)pq ; QQ=q2+Fpq where F=0.95.

The significance of genomic positions deviating from the expected genotype distribution follows χ2 , allowing a p-value to be derived and thus identification of positions that are either under significant selective pressure or neutral. In short, this analysis, highlights the fact that rice is highly inbred.

Below you can find a plot of 10-kilobase genome regions from the Oryza sativa genome, colored according to the proportion of variant positions that are significantly (p<0.05) out of (inbreeding modified) Hardy-Weinberg equilibrium, with white regions corresponding to those under low selective pressure and red regions corresponding to those under high selective pressure:

Oryza sativa genome plot.png

The data shown above were retrieved using this query and plotted using this R script. The query used to make this figure was adapted to the BigQuery Variants Schema from one of a number of quality control metrics found in the Google Genomics Cookbook.

Note that selective pressure on the genome is not uniformly distributed, indicated by the clumps of red visible in the plot. Interestingly, there is little correspondence between the prevalence of variants within a region (previous figure) and the proportion of variants within that same region that are under significant selective pressure. The bin size (10 kilobases) used in this visualization is on the order of the average Oryza sativa gene size (3 kilobases) and, given the low correlation between high selective pressure and variant density, it may be useful to guide a gene hunting expedition aimed at identifying genomic loci associated with phenotypes of interest (i.e. those that affect caloric areal yield, nutritive value, and drought- and pest-resistance).

Data availability and conclusion

Genome sequencer reads in FastQ format from Sequence Read Archive Project PRJEB6180, were aligned to the Oryza sativa Os-Nipponbare-Reference-IRGSP-1.0 reference genome using the Burrow-Wheeler Aligner (BWA), producing a set of aligned read files in BAM format.

Subsequently, the BAM files were processed with the Cloud DeepVariant Pipeline, a Cloud TPU-enabled, managed service that executes the DeepVariant open-source software. The pipeline produced a list of variants detected in the aligned reads, and these variants were written out to storage as a set of variant call files in VCF format.

Finally, all VCF files were processed with the Variant Transforms Cloud Dataflow Pipeline, which wrote records to a BigQuery Public Dataset table in the BigQuery Variants Schema format.

For additional guidance on how to use DeepVariant and BigQuery to analyze your own data on Google Cloud, please check out the following resources:

Acknowledgments

We’d like to thank our collaborators and their organizations—both within and outside Google—for making this post possible:

  • Allen Day, Google Cloud

  • Ryan Poplin, Google AI

  • Ken McNally, IRRI

  • Dmytro Chebotarov, IRRI

  • Ramil Mauleon, IRRI

Recursion Pharmaceuticals accelerates drug discovery with Google Cloud

Despite advances in scientific research and medical technology, the process of drug discovery has become increasingly slower and more expensive over the last decade. While the pharmaceutical industry has spent more money on research and development each year, this has not resulted in an increase in the number of FDA-approved new medicines. Recursion, headquartered in Salt Lake City, is looking to address this declining productivity by combining rich biological datasets with the latest in machine learning to reinvent the drug discovery and development process.

Today, Recursion has selected Google Cloud as their primary public cloud provider as they build a drug discovery platform that combines chemistry, automated biology, and cloud computing to reveal new therapeutic candidates, potentially cutting the time to discover and develop a new medicine by a factor of 10.

In order to fulfill their mission, Recursion developed a data pipeline that incorporates image processing, inference engines and deep learning modules, supporting bursts of computational power that weigh in at trillions of calculations per second. In just under two years, Recursion has created hundreds of disease models, generated a shortlist of drug candidates across several diseases, and advanced drug candidates into the human testing phase for two diseases.

Starting with wet biology—plates of glass-bottom wells containing thousands of healthy and diseased human cells—biologists run experiments on the cells, applying stains that help characterize and quantify the features of the cellular samples: their roundness, the thickness of their membrane, the shape of their mitochondria, and other characteristics. Automated microscopes capture this data by snapping high-resolution photos of the cells at several different light wavelengths. The data pipeline, which sits on top ofGoogle Kubernetes Engine (GKE) and Confluent Kafka, all running on GCP, extracts and analyzes cellular features from the images. Then, data are processed by deep neural networks to find patterns, including those humans might not recognize. The neural nets are trained to compare healthy and diseased cell signatures with those of cells before and after a variety of drug treatments. This process yields promising new potential therapeutics.

To train its deep learning models, Recursion uses on-premises GPUs, then they use GCP CPUs to perform inference on new images in the pipeline using these models. Recursion is currently evaluating cloud-based alternatives including using Cloud TPU technology to accelerate and automate image processing. Since Recursion is already using TensorFlow to train its neural networks in its proprietary biological domains, Cloud TPUs are a natural fit. Additionally, Recursion is exploring using GKE On-Prem, the foundation of Cloud Services Platform, to manage all of their Kubernetes clusters from a single, easy-to-use console.

We’re thrilled to collaborate with Recursion in their quest to more rapidly and inexpensively discover new medicines for dozens of diseases, both rare and common. Learn more about how Recursion is using Google Cloud solutions to better execute its mission of “decoding biology to radically improve lives” here. You can also learn more about solutions for life sciences organizations and our Google Cloud for Startups Program.