
Accelerating Digital Health Data Transformation for a Leading Healthcare Technology and Pharma Company
Challenge
A leading healthcare technology and pharmaceutical company faced the daunting task of extracting, transforming, and loading over 400 TB of raw time-series physiological sensor data for data science projects in clinical studies. The existing manual, multi-step process involved three different tools for data ingestion, cleaning, and feature extraction, which proved to be time-consuming and inefficient. These challenges led to several issues:
- Inefficiency and high cost: The manual processes and the need to coordinate between multiple tools caused significant delays in data processing and high cost of resources.
- Reliance on sampled data: Throughput limitations necessitated sampling the data, potentially impacting the accuracy and comprehensiveness of the analysis.
- Complex integration: Adding new devices to the system required multi-month engineering efforts, further complicating and delaying data integration.
Solution
Nexla offered out-of-box connectors for seamless data integration from digital sources to any storage or API. Automated processes and a cloud-distributed engine ensured efficient data handling with robust encryption for compliance.
- Out-of-box connectors: Seamlessly integrated data from various digital sources (e.g., ActiGraph, Emerald) to any data storage, data frame, or APIs.
- Automation: Automated data ingestion, transformation, and monitoring.
- Efficient processing: Cloud-distributed data processing engine that transforms multiple data files with multiple algorithms efficiently.
- Compliance: Robust encryption mechanisms and strict access controls to ensure regulatory compliance.
Results
- Converted over 400 TB of complex time-series physiological sensor data into usable formats in just 10 days.
- Performed feature engineering on 120 million triaxial acceleration data records in under 3 hours.
- Improved data processing performance by over 50x, reducing record processing time from 6 minutes to 8 seconds.
- Enabled integration of new devices within a few weeks instead of several months.
- Maintained data integrity with automated monitoring and encryption, saving significant time and resources for data scientists and engineers.