Avoiding big data pitfalls

Pablo Lamata*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
39 Downloads (Pure)


Clinical decisions are based on a combination of inductive inference built on experience (ie, statistical models) and on deductions provided by our understanding of the workings of the cardiovascular system (ie, mechanistic models). In a similar way, computers can be used to discover new hidden patterns in the (big) data and to make predictions based on our knowledge of physiology or physics. Surprisingly, unlike humans through history, computers seldom combine inductive and deductive processes. An explosion of expectations surrounds the computer's inductive method, fueled by the "big data" and popular trends. This article reviews the risks and potential pitfalls of this computer approach, where the lack of generality, selection or confounding biases, overfitting, or spurious correlations are among the commonplace flaws. Recommendations to reduce these risks include an examination of data through the lens of causality, the careful choice and description of statistical techniques, and an open research culture with transparency. Finally, the synergy between mechanistic and statistical models (ie, the digital twin) is discussed as a promising pathway toward precision cardiology that mimics the human experience.

Original languageEnglish
Pages (from-to)33-35
Number of pages3
JournalHeart and Metabolism
Issue number82
Publication statusPublished - 2020


  • Artificial intelligence
  • Big data
  • Digital twin


Dive into the research topics of 'Avoiding big data pitfalls'. Together they form a unique fingerprint.

Cite this