Australia's Biggest Morning Tea

Every dollar raised makes an incredible difference

Register Now

Overview of the Melbourne Collaborative Cohort Study (Health 2020)

Following careful planning, instrument development, pilot studies, and extensive review by international experts, The Melbourne Collaborative Cohort Study was set up in the early 1990s to investigate prospectively the role of diet and other lifestyle factors in causing common chronic diseases – especially prostate cancer, breast cancer and bowel cancer – and to investigate possible interactions between these exposures and common genetic variants.

Cohort recruitment was funded by VicHealth ($1,430,000) and Cancer Council Victoria ($3,570,000). Continuing data management and follow-up have been funded by Cancer Council Victoria ($300,000 p.a. since 1995).

The Melbourne Collaborative Cohort Study was designed to be the largest prospective cohort study conducted in Australia. As we included 30% Southern European migrants it is unusual in its wide range of lifestyle exposures. It is also unusual in the collection of blood samples and physical measurements from all subjects.

Of special significance is the face-to-face follow-up, which enabled the collection of further blood samples, repeated measures of exposures, and assessment of a number of non-fatal, non-cancer endpoints. Other major international cohort studies can't and don't plan to achieve this. The repeated measures of key lifestyle exposures and selected molecules in plasma reduce measurement error and further increase statistical power and temporal relevance.


Between 1990 and 1994, 41,500 people, (24,500 women and 17,000 men) aged 40-69 were recruited. About 30% of the cohort are Southern European migrants to Australia who were deliberately over-sampled to extend the range of lifestyle exposures and to increase genetic variation.


Extensive information was collected at baseline in face-to-face interviews that included questionnaires (diet, physical activity etc.) and physical measurements, including lean and fat mass by bioelectric impedance, and blood pressure. A food frequency questionnaire was developed specifically to measure dietary intake in the cohort, with the food list based on weighed food records in a group of around 800 men and women reflecting the main country of birth groups within The Melbourne Collaborative Cohort Study. Blood samples were drawn and whole blood and plasma stored for analysis of DNA and other molecules of interest (e.g. sex hormones and growth factors, carotenoids and fatty acids involved in disease pathways).


Cases of cancer and mortality are identified by regular matching of The Melbourne Collaborative Cohort Study to cancer registries and death indices. Non-cancer, non-fatal health events were captured by following up the cohort with a mailed questionnaire and by telephone at 3 to 4 years after baseline. For incident cases of type 2 diabetes, 76% had their diagnosis confirmed by their doctor.

Statistical analysis

For cancer and CVD endpoints, analyses of exposures measured in the whole cohort (for example diet and physical measures) use Cox's proportional hazards models with age as the time axis to estimate the hazard ratios between various lifestyle factors and outcomes. For diabetes, logistic regression models have been used, as length of follow-up was relatively constant, and dates of diagnosis were not well-documented for all cases.


Since the study commenced, a large number of analyses have been completed and nearly 800 papers published on diet, body size and various outcomes. 

Analysis of baseline biospecimens

In order to preserve plasma samples and reduce costs, assays of plasma and genotyping have been (or will be) conducted in a limited sub-sample of participants as described below.

In 1999, with collaborators we wrote 4 NHMRC project grants relating to prostate cancer, breast cancer, diabetes and cardiovascular disease. All were funded to begin in 2000. They included analysis of stored plasma and measurement of genetic polymorphisms and the cancer grants included funds for the retrieval and molecular analysis of archival tumour tissue. The plasma assays included biomarkers of dietary intake (carotenoids, vitamin E and phospholipid fatty acid profiles), steroid sex hormones, homocysteine, insulin, triglycerides, HDL cholesterol, insulin-like growth factor 1 (IGF-1) and insulin-like growth factor binding protein 3 (IGFBP3).

The study of each outcome was originally intended to be a nested case-control study, but because of the similarity of 'exposure' measures across outcomes, we explored the possibility of conducting a case-cohort study. By 2000, we had sufficient participants with colorectal cancer to include it in our program, which further tipped the balance in favour of a case-cohort study. (We wrote a successful project grant on colorectal cancer in 2001 in which we discussed the case-cohort study, but it was rolled into our program grant.) We performed extensive simulations, based on our own data, and decided that a case-cohort study was indeed more efficient. To date, no measurement of genetic polymorphisms has occurred in relation to breast, bowel or prostate cancer.

The main potential weakness of a case-cohort study relative to a nested case-control study relates to the possibility of differential measurement error. In a case-control study, differential error is unlikely if the samples of each matched set are handled together and treated identically. However, more work is required in a case-cohort study to balance the measurements for cases and non-cases over time. We achieved this by randomly ordering samples for assay. Because we were performing assays while still identifying cases, we selected subjects and assayed their samples in small batches. We achieved good balance throughout.

During the baseline sample collection, pooled plasma from several volunteers was stored in 1 mL aliquots alongside the participants' samples. We included eight pooled plasma aliquots (blinded) with each batch of participants' samples as an independent means of checking within and between batch variations in assay results.

Pilot study of reliability of plasma measurements: Before beginning the main study, we conducted reliability studies of the assays we wished to perform. This had 2 purposes: to identify assays that were too unreliable for us to use since we did not want to waste sample and money on poor measurements, and to enable us to correct for measurement error in our statistical analyses. About 200 participants had a second blood sample taken around 12 months after their baseline attendance. Each sample was thawed and divided into two parts.

All 4 aliquots from each person were refrozen and sent blind to the laboratories, which performed the assays in separate runs at least two weeks apart. Most of the reliability coefficients were sufficiently high for us to commence measurement in the main study but the initial analyses of insulin and E2 had poor reliability. We changed kits for measurement of E2 and a second reliability study gave results adequate for us to proceed. For insulin, we performed a second study at another laboratory, which achieved much better results, and we proceeded to measure insulin at this lab in the main study. Initial measurements of some carotenoids also had low reliability. We changed methods and conducted a second study, which measured laboratory error only (we no longer had any subjects with multiple samples). These results showed much less variability than the first set, so we proceeded with this method in the main study.


The sub-cohort is a random sample of the entire cohort, stratified by sex. The sampling was originally set so that the sub-cohort contained 3.6 times as many people as the largest sex-specific group of cases (prostate cancer in men and breast cancer in women). However, at the beginning of the study, we included in-situ breast cancers as cases, which were later excluded. Thus, the ratio in women is higher than in men. Our simulation studies had shown that these ratios were necessary for the case-cohort analysis of breast and prostate cancers to have the same power as nested case-control studies with two controls per case. For the less common outcomes, the case-cohort study has substantially greater power than nested case-control studies with two controls per case.

The number of participants in each group is shown in the table below:

  Sub-cohort Breast cancer Colorectal cancer Prostate cancer Type 2 diabetes Cardiovascular disease Total*
Female 2492 574 251 - 193 150 3522
Male 2167 - 269 604 209 342 3387
Total 4659 574 520 604 402 492 6909

* total is less than the sum of individual categories because of overlap. For example, 53 women with breast cancer were selected for the sub-cohort as were 74 men with prostate cancer.

Progress: Measurements of all plasma analytes except carotenoids, vitamin E and retinol are complete. We have had problems with the carotenoid assays. Originally, the assays were performed in Adelaide. The CVs for our pooled plasma samples were high and indicated problems with the measurement of the most labile carotenoids. We then arranged for the remaining samples to be assayed at Monash. Unfortunately, we lost about 1000 of these because inappropriate tubes were used.

We have not commenced analysis of genetic polymorphisms in relation to breast, prostate and bowel cancer. Our reasons include: the study of genetic polymorphisms in relation to risk of cancer has not been particularly enlightening with respect to understanding causal pathways; many of the genes in which we are interested have multiple polymorphisms and there is currently a great deal of work being done to identify haplotypes in these genes; the cost of doing large scale genotyping has been high but is reducing.