Capturing cancer stage & recurrence for population-based cancer registries

A research and development project designed to capture information on clinical stage, metastases at diagnosis and cancer recurrence employing natural language processing solutions.


Accurate information on cancer is critical to improving cancer outcomes. In Australia, jurisdictional population-based cancer registries receive notifications of cancer from hospitals and pathology laboratories. The information collected by these registries is limited - particularly about how advanced the cancer is, if it has spread, or whether, after treatment, it recurs. These factors are important for better understanding of cancer screening, treatment and outcomes.

Diagnostic imaging reports, specifically CT, MRI and PET have been identified as a rich source of information on cancer stage and recurrence.

Natural language processing (NLP) offers a potential and innovative solution to effectively and efficiently extract information from existing information systems, including imaging information systems, for population-based cancer registries.


To trial and evaluate the use of a natural language processing solution, to capture, code, process and store information on clinical stage, metastases at diagnosis and cancer recurrence, for population-based registries.


This project includes investigating the collection of 3 elements to augment the existing population-based cancer registry data:

  • Clinical stage at diagnosis for lung and pancreatic cases
  • Metastases at diagnosis for all cancers
  • Recurrence for all cancers

Cancer Council Victoria is working with Health Language Laboratories to develop and implement a NLP solution (TumourTExtract) to text mine existing diagnostic imaging reports.

TumourTExtract will be implemented in up to four pilot diagnostic imaging services to transfer information from CT, MRI and PET reports to the population-based cancer registry.

New South Wales and Victorian population-based registries will link the imaging information with existing records to determine the stage and metastases at diagnosis, and recurrence.


Evaluation is expected to assess the feasibility of data acquisition using TumourTExtract and the extent to which the acquired data meets quality standards in completeness and accuracy.

Project impact

Accurate information on cancer is critical to cancer control and to support improved cancer outcomes. The collection of high quality information on cancer stage and recurrence will augment existing population-based data on cancer incidence, mortality and survival to provide better evidence for research, policy development and action in cancer prevention, screening and treatment.

For example, the population-based collection of cancer stage could inform:

  • Survival by stage at diagnosis including trends over time
  • Time from diagnosis to recurrence
  • Patterns of disease including incidence and mortality by stage at diagnosis


  • Lake Imaging Ballarat - first pilot site
  • Project funded by Cancer Australia

Contact details

Georgina Marr
Project Manager
Victorian Cancer Registry
Cancer Council Victoria
(03) 9514 6232