More Data for Anemia than for Cancer? Why AI Is Being Held Back Where It’s Needed Most

A sustainable healthcare system is not just about environmentally friendly hospital design or reducing plastic waste. Sustainability starts with diagnosis: The earlier and more precisely a disease is detected, the more efficiently resources can be used, unnecessary examinations can be avoided, and therapies can be optimized. This has economic, ecological, and social benefits.

In medical practice, we often face a surprising paradox: For common and relatively simple diagnoses like anemia, there is an abundance of structured data available. In contrast, when it comes to complex or rare diseases – precisely where artificial intelligence (AI) could offer the greatest support – the data needed to train such systems is often missing or insufficient.

Diagnosing anemia typically requires just a few lab values, such as hemoglobin. These parameters are part of routine care, recorded in large volumes and often in standardized formats across hospitals. This creates ideal conditions for training AI systems that can detect patterns and make accurate predictions based on this rich data landscape.

Now compare this to complex diseases, such as certain types of cancer or rare disorders. These conditions are not only less frequent, but their diagnosis often involves a much higher degree of complexity. They require a combination of lab values, imaging data, molecular profiles, clinical findings, and contextual information – data that is often scattered, unstructured, or stored in incompatible formats. As a result, AI systems are lacking the very foundation they need to function well in the scenarios where they could be most valuable.

Why AI Could Be Especially Helpful for Complex Diagnoses

There’s a common misconception that AI is most useful for simple tasks. In reality, its greatest potential lies in areas where humans reach cognitive limits – when dealing with vast amounts of data, uncertain diagnoses, or highly individualized clinical cases.

AI can be a powerful ally in differential diagnosis, where many possible causes must be considered simultaneously. For example, a patient presenting with chronic fatigue, weight loss, and intermittent fever might have a benign viral infection – or a rare autoimmune disease. An AI model trained on thousands of similar cases could help highlight plausible diagnostic options that may otherwise go overlooked.

Cancer diagnostics is another domain where AI can make a meaningful difference. Modern oncology often relies on genetic tumor profiles, multi-modal diagnostics, and ever-evolving clinical research. AI can support the interpretation of radiological and pathological images, predict therapy responses, and help match patients to personalized treatment options. But these systems need a solid data foundation – and that is where the challenges begin.

Data Gaps Are Slowing Innovation

The effectiveness of any AI application hinges on the availability of good-quality data. Machine learning models – especially deep learning approaches – require large, clean datasets to generate robust and generalizable predictions. Without enough data, models can become overfitted, unreliable, or simply unusable.

For rare diseases, the issue is particularly pressing. If only a few hundred documented cases exist for a given condition, it becomes nearly impossible to train an accurate, standalone model. Even for more common but highly individualized cases (e.g., rare cancer subtypes), single institutions typically do not have access to enough relevant data.

On top of that, much of the existing medical data is unstructured – found in free-text clinical notes, image files, or local documentation systems. These formats aren’t readily machine-readable and require complex processing to be usable for AI. So, although the data may technically exist, it remains inaccessible for AI training and deployment.

Three Key Approaches to Unlock AI’s Potential

Despite these challenges, there are promising ways to improve the usability and availability of data for AI development. At medicalvalues, we focus on three key strategies:

1. Data Harmonization and Interoperability

Data is often not lacking – it’s just poorly connected. Differences in documentation practices, incompatible systems, and inconsistent formats create barriers that prevent data from being used effectively. Interoperability standards like HL7 FHIR (Fast Healthcare Interoperability Resources) can address this by enabling structured, consistent, and machine-readable data exchange between systems.

Standardized data allows healthcare providers and researchers to pool information across sites, enabling AI to learn from larger and more diverse datasets. This is especially important for rare or complex conditions, where no single organization can generate enough data alone.

2. Leveraging Existing Knowledge and Pretrained Models

AI doesn’t always have to start from scratch. There are many pretrained models that already include medical knowledge drawn from broad datasets or scientific literature. These models can be fine-tuned with relatively little additional data, significantly reducing the burden of data collection for specific applications.

In addition, medical knowledge from guidelines, ontologies, or structured databases can be integrated directly into the model design. This hybrid approach – combining data-driven learning with expert knowledge – leads to more robust and clinically relevant models, especially in data-scarce scenarios.

3. Building Collaborative Data Networks

The best way to overcome data scarcity is to join forces. Collaborative networks such as the Medical Informatics Initiative (MII) in Germany or MEDICUS offer secure, structured, and scalable platforms for sharing anonymized healthcare data across institutions.

By contributing to and drawing from these networks, hospitals and research centers can collectively create large datasets even for rare and complex cases. Such collaboration not only fuels AI development, but also improves diagnostic standards, encourages feedback loops, and accelerates innovation in healthcare.

Conclusion: Bridging the Gap Between AI’s Promise and Real-World Practice

AI holds tremendous potential in medicine – especially in areas where human decision-making is most challenged. It is paradoxical that these very areas are currently underserved due to insufficient data. But this gap is not insurmountable.

By making healthcare data more interoperable, leveraging pretrained models, and fostering collaborative data sharing, we can begin to realize the full power of AI even in complex or rare disease contexts.

At medicalvalues, we are committed to building the infrastructure, models, and partnerships that enable this vision. Because ultimately, better data means better diagnoses – and better outcomes for every patient, no matter how complex their case may be.

Related Blog Posts

Sustainability in Healthcare: Why Early Diagnosis is Sustainable

A sustainable healthcare system is not just about environmentally friendly hospital design or reducing plastic waste. Sustainability starts with diagnosis: The earlier and more precisely a disease is detected, the more efficiently resources can be used, unnecessary examinations can be avoided, and therapies can be optimized. This has economic, ecological, and social benefits.

Read More »

On the road to efficiency: How laboratories are becoming more sustainable

Diagnostics and the related activity in laboratories are an essential part of the health care process and contribute significantly to better health. But what impact does this activity have on the health of our planet? Similar to any industry, the process of diagnostic research, development and patient care results in a enormous amount of waste and CO2 emissions – these are unavoidable in many cases. But what measures can laboratories take to operate more sustainably?

Read More »