In addition, it is important to anticipate what supplementary analyses might be needed in the IPD meta‐analysis project to explore the main results. For example, for a question about the effects of chemotherapy on long‐term cancer survival, it may be helpful to collect data which would allow the investigation of the effects of treatment on different (competing) causes of death, such as those due to cancer, treatment‐related side effects or co‐morbid conditions.
In many cases, it will only be necessary to collect outcomes and participant characteristics as defined in the individual trials. However, additional variables might be required to provide greater granularity (e.g. sub‐scales in quality of life instruments), or to allow outcomes or other variables to be defined in a consistent way for each trial. For example, in an IPD meta‐analysis of anti‐platelet therapy for pre‐eclampsia in pregnancy, data on systolic and diastolic blood pressure plus presence of proteinurea were collected. This was to allow the central research team to analyse pre‐eclampsia according to both a pre‐defined meta‐analysis definition, as well as the individual trial definitions (of which there were many variations). 97Furthermore, if the IPD are to be maintained in perpetuity, to address new questions that might arise, additional data may be requested to effectively ‘future‐proof’ the database. For example, if there is a plan to use the IPD collected to produce conditional treatment effects ( Chapter 5), to identify predictors of treatment effect ( Chapter 7), or to identify prognostic factors ( Chapter 16), then it would be sensible to request more detailed baseline data than might be necessary if just the overall (unadjusted) effects of treatments were of interest. Having said that, it is important to avoid collecting extraneous data, as these will still need to be checked and managed, and if not used, this represents an unnecessary burden for the trial teams who have spent time preparing data. Of course, it may be easier for trial teams to provide a complete trial data file, and let the IPD meta‐analysis research team extract what they need.
Box 4.3Example of typical data obtained for trials to be included in an IPD meta‐analysis project
At a minimum, the IPD requested for each trial would typically include variables that:
‘Identify’ participants, e.g.De‐identified participant ID ( Section 4.4.1), centre ID
Describe the participant population, facilitate data checking and allow analyses by participant characteristics, e.g.Age, sex, demographic variables, disease or condition characteristics and key prognostic factors
Describe the intervention, e.g.Date of randomisationIntervention allocationIf appropriate, the interventions participants received and the dates of administration
Record all outcomes of interest and relevant to the objectives, e.g.Survival, toxicity, pre‐eclampsia, healing, hospital stay, last follow‐up date
Describe whether participants were excluded from the primary trial analysis and reasons, e.g.Ineligible, protocol violation, missing outcome data, withdrawal, ‘early’ outcome
Source: Jayne Tierney and Lesley Stewart.
4.2.7 Developing a Data Dictionary for the IPD
In addition to preparing a list of variables that will be required for the analyses, it is important to consider carefully how best to define, collect and store these in an appropriate and unambiguous manner. The development of a detailed data dictionary for an IPD meta‐analysis project effectively establishes the structure of the meta‐analysis database, facilitates processing of IPD from each trial and ensures that the analyses can proceed as planned, with the greatest degree of flexibility. It also helps guide the trial teams in the preparation of IPD prior to transfer, and gives them the responsibility for modifying variables, lessening the likelihood of misinterpretation or coding errors. However, trial teams may not have the time to adhere to the data dictionary, and they should not be compelled to do so, particularly if their resources for preparing the IPD are limited. In such instances, it is advisable that the central research team accepts trial IPD in any (reasonable) workable form, and take responsibility for reformatting and re‐coding it themselves, according to the data dictionary.
Table 4.1provides an excerpt from a data dictionary used in an IPD meta‐analysis examining the effects of chemoradiation for cervical cancer. 93Age at randomisation was collected straightforwardly as a continuous variable, with a missing data code of 999. Tumour stage was collected as a categorical variable with a single code for each stage and sub‐stage, and with a missing data code of 9. This afforded the greatest flexibility for subsequent analysis, as the sub‐stages could be used as supplied, or collapsed into broader‐stage categories as needed. While trial eligibility criteria indicate which participants a trial intends to recruit, it is worth suggesting a wider range of possibilities in the data dictionary, because recruitment of some ineligible participants might be inevitable. This could arise, for example, if eligibility is predicated on a positive diagnostic test, and false positives are identified at subsequent review, or as a result of a later diagnostic procedure. In the aforementioned cervical cancer IPD meta‐analysis, women with stage IVB stage were not eligible for any of the included trials. However, they were sometimes randomised erroneously, because initial clinical staging did not identify them as such, but subsequent surgical staging did, and so the data dictionary allowed for that possibility. If particular participant characteristics are collected on different scales, then it may be possible to convert to a common scale. In the cervical cancer IPD meta‐analysis, the included trials recorded performance status on different scales, so in the data dictionary it was made clear that all were permitted, and these were later converted into a common meta‐analysis scale.
Table 4.1Excerpt from a data dictionary developed for an IPD meta‐analysis of chemoradiation for cervical cancer. 93
Source: Claire Vale and Jayne Tierney.
Variable |
Variable name |
Definition |
Age at randomisation |
Age |
Numeric Age in years 999 = unknown |
Tumour stage |
TumStage |
Numeric Tumour stage categories 1 = Stage Ia 2 = Stage Ib 3 = Stage IIa 4 = Stage IIb 5 = Stage IIIa 6 = Stage IIIb 7 = Stage IVa 8 = Stage IVb 9 = unknown |
Performance status |
PerfStat |
Numeric Provide the data as defined in the trial and supply full details of the system used |
Survival status |
SurvStat |
Numeric 0 = Alive 1 = Dead |
Date of death or last follow‐up |
DOLF |
Date in dd/mm/yy format unknown day = ‐‐/mm/yy unknown month = ‐‐/‐‐/yy unknown date = ‐‐/‐‐/‐‐ |
The data dictionary should use accepted coding conventions wherever possible, not only to facilitate the provision of data by trial teams, but also to avoid errors. For example, for binary and time‐to‐event outcomes, 0 is most commonly used to indicate no event, and 1 to indicate an event has happened. For time‐to‐event outcomes such as survival in cancer, or time free of seizures in epilepsy, it is important to collect the three component variables that make up the outcome for each participant ( Table 4.1). These would comprise: a variable that indicates whether an event has happened (e.g. a death or a seizure); another that provides the date the event happened (e.g. date of death or date of seizure) and finally one that describes the date that the participant was last assessed for the outcome of interest (e.g. the date last seen in clinic). If an event has not occurred, the latter allows the participant to be included in the analysis, and censored at that time‐point. Together with the date of randomisation, these variables allow the time to event for each participant to be calculated, and provides the greatest flexibility for data checking ( Section 4.5), risk of bias assessment ( Section 4.6) and analysis ( Part 2). Alternatively, the date of event and date of last follow‐up (censoring time) can be collected as a composite. As a bare minimum, the collection of an indicator variable for the occurrence of an event (yes/no) and the time to event (or censoring) will suffice. In fact, the latter may be all that trial teams are able to provide, for example, if they originate from a country or institute bound by stringent data protection regulations, or if the data are downloaded from a repository that prohibits the supply of exact dates in order to help to preserve participant confidentiality.
Читать дальше