The value of (Accurate and Quality) Data in AI for Pathology. Article by Chen Sagiv Ph.D.

by

Ten years ago, when the AI and deep learning revolution just started, there were two main paradigms: data should be BIG and when your model does not work good enough, you should put your efforts into trying to improve it.

Well, we have come a long way, and by 2023 concepts have changed.

Big data was replaced with Good Data and the model centred approach was replaced with data centric approach.

The field of pathology is going through a significant revolution. The introduction of high resolution scanners a few years ago are the enabler of digitization of pathology slides in a quality that is equivalent to that of a microscope. Digital slides can be viewed on screens, sent to second opinion as email attachments, and can be submitted to algorithmic analysis.

There are three main needs for AI solutions in pathology: the first is diagnostics, the second is for research, e.g. drug development and clinical trials, and the third is the generation of huge annotated datasets as a facilitator of personalised healthcare, e.g. creating meaningful features to assist in drug outcome prediction.

Developing high quality algorithms requires high quality data. The procedure to improve AI solutions relies on curating your data and making sure that your data is clean and accurate. The need for big data arises from two main needs: first, the need to train huge models. However, for most image analysis tasks there are already pretty good models that were developed. So, for many tasks you can take a known model and tune it to solve your specific problems. While doing so, you may encounter the second need for big data: noise. Noisy data spoils your AI solutions, and therefore, you need large amounts of data to compensate for the noise. If you have a procedure to clean your data and create a high quality, noise minimal data set, then you will not need large amounts of data. Good data will be sufficient.

The DeePathology STUDIO is the only data centric AI platform for pathology. It has an infrastructure of four generic modules for cell detection, regions segmentation, objects classification, and tiles mode. The users upload their slides and tune the generic modules by providing annotations. The method to create and improve the AI algorithm is by providing an accurate and a high quality dataset.

Data centric AI is gaining momentum. This trend is of great importance for medical applications, where data is scarce and there is a need to obtain AI solutions to critical problems on a day-to-day basis.

Author Chen Sagiv, Ph.D. is the Co-founder & co CEO DeePathoogy Ltd. 

About DeePathology 

DeePathology is an Israeli based company that was founded in 2018 by Jacob Gildenblat, Nizan Sagiv and Chen Sagiv. The company has developed the STUDIO, a Data Centric AI platform for the creation of solutions to image analysis problems in pathology. The STUDIO allows pathologists and research to create their own AI solutions without coding.

https://deepathology.ai/