Healthcare Analytics “Q&A” Series
Every day, I receive emails from readers of HealthcareAnalytics.info asking questions about healthcare analytics topics, trends, and issues. These questions arise out of posts they’ve read on this website or encountered elsewhere. Although I endeavour to answer each question individually, I’ve created this “Question & Answer” series on HealthcareAnalytics.info to provide answers to some of the most frequently asked questions I’ve received.
If you have a specific question you’d like to see posted to the Q&A series, just email it in to me at trevor@HealthcareAnalytics.info and I’ll do my best to include it in the series.
A: Volume, Variety, and Velocity – the three dimensions of Big Data as defined by Gartner.
“Big Data” is an important topic in healthcare analytics, and there are many definitions and opinions about what “Big Data” actually is. A good definition of “big data” that I have found useful and informative comes from Jimeng Sun J. & Chandan Reddy: “A collection of large and complex data sets which are difficult to process using common database management tools or traditional data processing applications.” This definition implies that healthcare organizations (HCOs) reach the “big data barrier” once their existing data management approaches and tools no longer work efficiently or effectively.
More specifically, Gartner defines Big Data as “high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Descriptions of these terms are below:
- Volume – The amount of data being generated and stored. Typical “big data” datasets can range from terabytes (1012 bytes) to petabytes (1015 bytes) and exabytes (1018 bytes). Traditional database technologies (such as Relational Database Management Systems, or RDBMS) and query tools (such as SQL) are unable to scale efficiently to such volumes, necessitating new approaches to data storage, management, and analysis.
- Variety – The number of different data sources has grown, ranging from more traditional EMR data to website clickstream data and data from social media sites (i.e., Twitter).
- Velocity – refers to the speed at which data is generated (through its numerous sources), accumulated (in associated storage systems), and must be processed.
- Veracity – Not part of the original Gartner “big data” definition, but is an indication of data quality (i.e., accuracy and completeness), trust (credibility of the source), uncertainty, and suitability (of data for target audience).
Despite there being a lot of market “hype” around the subject, Big Data is indeed relevant to healthcare data and information management. This is because according to HIMSS, healthcare is entering into a phase of ‘post EMR’ deployment where HCOs are “keen on gaining insights and instituting organizational change from the vast amounts of data being collected from their EMR systems”. This means that larger volumes of structured and unstructured data can now be managed and analyzed through “faster, more efficient and cheaper computing (processors, storage, and advanced software) and through pervasive computing (telecomputing, mobile devices and sensors)”.
Keep reading HealthcareAnalytics.info for continued updates, information, and research on big data analytics in healthcare. To receive future updates, sign up for regular email updates using the sign-up form on the right-side of the website, subscribe to our RSS feed, and follow me @tstrome on Twitter.