Abstract :
[en] Over the last ten years, the culture of medical data sharing has been widely expanded to constitute databases enough representative. This worthy initiative encouraged researchers from various fields of expertise to consider medical issues, and notably, to provide clinicians with support in making a diagnosis. Accessible online, the open medical datasets may include, for each subject, clinical data (e.g. age, gender), images (e.g. magnetic resonance images), signals (e.g. electrocardiography, electroencephalography) and/or other biological data (e.g. genotypes).
In particular, for children mental disorders, there is a lack of pathophysiological bases universally recognized. Hence, the diagnosis is based on subjective observations collected from the environment of the children (parents, teachers, etc.). Moreover, kids' disorders present common features, making these troubles difficult to dissociate. Incidentally, on a given trouble diagnosis, the agreement among practitioners, measured by the Kappa statistics, may still be improved. The development of predictive models based on physiological data could help the clinical neuroscience to make assessments more objective. To that end, several studies considered the application of Data Mining techniques on medical data. Yet beyond a simple data analysis, it is critically important to integrate the questioning and requirements specific to the medical domain in the process of Data Mining.
The foremost requirement of practitioners is to be able to justify and validate the diagnosis based on the recommendation provided by Data Mining. Besides, the predictive model should be able, on a per-patient basis, to provide a readable relationship between the input data and the diagnosis through a decision chain. For such purposes, the recommendation must be based on an interpretable (and preferably, interpreted) model. Thus, it remains essential to set the properties of interpretability in the sense of medical Data Mining. Furthermore, in case of a neurological diagnosis, should predictive models make errors, it must be ensured that these wrong predictions have the lowest impact on patients. Indeed, a model having a low ability to detect healthy patients is far from being cautious, in the sense that they are exposed to the risk of being prescribed an unnecessary medication. The opposite situation is less restrictive: if a model has a low ability to detect pathological patients, the detection of the trouble may be just delayed in time. Finally, medical data include several sources of heterogeneities, e.g. the harmonization of protocols is missing; cultural and social factors influence the etiology and epidemiology of troubles. In view of these specifics of medical Data Mining, we propose a critical consideration on learning schemes, algorithms and features that should be used to provide the most appropriate aid-in-diagnosis models for neurological disorders.