GECCO Data Set

GECCO at a glance

To cope with the current pandemic and the associated treatment of patients, the Federal Ministry of Education and Research (BMBF) is funding a national network of university medicine in the fight against COVID-19.The network collects and bundles the data of the treated COVID-19 patients systematically. The researchers collect, track, and analyze the treatment of COVID-19 patients in a standardized way.

The high threat level has led to intensive scientific activity on COVID-19, including numerous regional, national, and international epidemiological surveys and register studies. Without a coordinated action and the assurance of semantic and syntactic interoperability, the multitude of efforts could lead to a segmentation of information. The result could delay or even prevent urgently needed scientific knowledge.

To achieve optimal professionalism and high acceptance of the COVID-19 consensus data set, proposals and concepts for the data set were collected by members of an expert panel. This expert panel is composed of specialists from university hospitals, professional associations and other relevant organizations. The result of this joined venture was the creation of the “German Corona Consensus Data Set” – GECCO. 

The consensus data set gives the science around COVID-19 a common language and working basis.

The GECCO core is a pioneer dataset created to provide researchers with a unified approach to acquisition and processing of consistent clinical data for their further utilization in all participating projects universally. By now, the GECCO core dataset is an imminently important aspect in four adjacent projects: NAPKON, CODEX, COMPASS and Europe-wide ORCHESTRA

The GECCO core dataset is accessible to the entire Netzwerk Universitätsmedizin (NUM) and to the general public on ART-DECOR and Simplifier since May 2020. 

Content parameter

  1. Anamnesis and risk factors: Collection of data of pre-existing conditions e.g. cardiovascular diseases, cancer diseases, HIV or diabetes
  2. Imaging: Collection of data of imaging procedures e.g. CT, Radiography, Ultrasonography
  3. Demographics: Collection of data such as gender, date of birth, weight, height, ethnic group
  4. Epidemiological factors: Did the patient knowingly had contact with a person with probable or proven COVID-19 disease within 14 days before the onset of his/her symptoms?
  5. Complications: Collection of data referring to e.g. pulmonary embolism, thromboembolic complications, myocardial infarction, infection of the bloodstream
  6. Onset of illness/admission: Stage of illness when COVID-19 diagnosed
  7. Laboratory values: Collection of data of various laboratory values
  8. Medication: Questions referring to pharmaceuticals used for COVID-19 therapy and additional diseases
  9. Outcome at discharge: Respiratory outcome and type of discharge
  10. Study enrolment/inclusion criteria: Confirmed COVID-19 diagnosis as main reason for admission
  11. Symptoms: Collection of data regarding loss of taste, abdominal pain, diarrhea, vomiting, cough, nausea, fever and dyspnea
  12. Therapy: Which therapy did the patient receive (e.g. intensive care, ventilation)?
  13. Vital signs: Collection of data e.g. body temperature, heart rate, blood pressure and oxygen saturation

The basis of the data set is the PA-COVID-19 study, the ISARIC protocol of the WHO as well as the data concepts of the LEOSS-Register.

Pa-COVID-19 is a central phenotyping-platform and longitudinal data and sample collection for confirmed SARS-CoV-2 infected patients treated at Charité in Berlin. This platform was established for a harmonized, scalable data collection, pathophysiological analysis and deep phenotyping of COVID-19.

The purpose of ISARIC is to prevent disease and death from outbreaks of infectious diseases such as COVID-19. ISARIC brings together clinical research networks worldwide to provide the fastest possible research response to an outbreak of an infectious disease.

The ESCMID Emerging Infections Task Force (EITaF), the German Centre for Infection Research (DZIF), and the German Society for Infectiology have initiated the LEOSS network to establish a clinical patient registry for patients infected with SARS-CoV-2. The aim of LEOSS is to provide a comprehensive and cross-sectoral survey of the clinical epidemiology of patients with COVID-19 with the aim of better understanding the disease, identifying prognostic factors, and evaluating common interventions in order to improve patient care.

Approximately 80 data set concepts with 280 possible answers were selected according to defined criteria in a transparent process. The semantic annotation in SNOMED CT, LOINC, UCUM, ICD, and ATC as well as the beginning work on FHIR modeling can be accessed online at any time. 

Response options, reference ranges, units of measurement and semantic annotations were listed in the detailed description in ART-DECOR.


The development will be carried out in close cooperation with the Fach- und Organspezifischen Arbeitsgruppen (FOSA) and Fachbeirat (advisory board), which enables constant monitoring and quality insurance of the process.

All parameter and possible answers are pictured in our detailed GECCO report which is linked in publications.

The development of GECCOplus in cooperation with the NAPKON cohorts HAP (Hochauflösende Plattform), SÜP (Sektorenübergreifende Plattform) and POP (Populationsbasierte Plattform) will achieve a cohort overarching data set. This data set is meant to create a joined interface out of all three cohort specific data sets.

Alongside the development of GECCOplus for the three NAPKON cohorts the GECCO team develops, in close and constant exchange with the FOSAs and other included specialists, customized modules concerning individual medical departments. These modules include department specific parameter which help to collect more in-depth data.

In 2021, the focus lies on the development of three modules:

  • Pediatrics
  • Cardiology
  • Third Module is in phase of coordination with experts from the NUM

Current ideas and suggestions for future modules are:

  • Infectiology
  • Intensive care
  • Neurology
  • Emergency care
  • Pneumology

We design GECCO as a responsive dataset that will allow harmonized data exchange with real-world data platforms, health agencies, and international partners.

Since engineered with international standards and terminology, GECCO ensures the semantic and syntactic interoperability of different systems.