Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
In statistics, survival analysis is to deal with the analysis for one or more nonnegative event times. Observations of nonnegative event times are usually involved in many different research areas and therefore the topic survival analysis has many different names, for example reliability theory, duration modelling or event history analysis. The data collected in such studies are called survival data, where observations mean the time to the event of interest, for example, in medical research the time from a patient’s infection of a particular disease to death or in engineering, the lifetime of a particular item. In summary, time to a particular event is of main interest. In particular, the research interest includes the study of the probability distributions of the event time, how the rate of event occurrence depends on certain risk factors and so on.
In practice, when measuring time to events, we usually have a starting calendar time point and an ending calendar point. The time difference of the two calendar time points is the time to event. The starting calendar time point is usually treated as time 0 for all subjects, although they may actually have different starting calendar time point. In some studies, it has a natural time 0. For example, in drug experiment studies time 0 is just the start of the experiments and usually all subjects start at the same time. In epidemiology studies, time 0 could be the subject infection time of an infectious disease. Due to the long period of survival data collection, biased data or data with missing information are usually observed. For example, event times could be right censored when the study may be stopped before the event actually occurs for a subject. Then we only know that the subject will experience the event after the stopping time (the censoring time), i.e. the actual event time value should be greater than the censoring value. Some studies may involve left censoring, for example an measurement of a subject (say the tumor size of a patient) cannot be recorded if it is smaller than a certain threshold. Under left censoring, we only know that the actual observation value should be less than the censoring value. If left censoring and right censoring can occur simultaneously, we will have interval censoring or middle censoring. Various types of censoring have been well studied in the last several decades. Apart from censoring, another important type of missing data in survival analysis is called truncation. The main different between truncation and censoring is that truncated data have the selection bias in the sample, but censored data do not have. For censored sample, the observations are randomly selected without any selection bias, but there could be missing information for the observed values. On the other hand, the observations in a truncated data set are selected with certain bias and it is problematic if you use the biased sample to represent the whole population.
The most commonly occurred truncation is left-truncation, which usually involve an entry time point in the study. For example, the time point 0 could be the calendar time that a patient catching a particular disease and research interest is in the time event time T, when the patient is cured. Patients, however, may enter the study (then being recorded) after a while, say time L. This could be because the patient is cured (without any treatment or with some self-treatment), before the date that they planned to go to hospital. In other words, a patient may only go to see a doctor if his/her symptoms last for say L days. Then the patient’s information will be recorded if T L (the patient goes to see a doctor). Therefore, the observed event time is randomly larger than the event times in the whole population. The observed data is selected with bias. In summary, left truncation is usually caused by an entry time which makes smaller event times unlikely to be selected.
Right-truncation may occur very often if the study has an end of recruitment time. For example in epidemiology studies, the research interest is the event time T from infection to the development of a particular disease. If there is an end of recruitment calendar time, then a subject will be recorded if it develops the disease by the end of recruitment time. Suppose two subjects infected at the same time, only the one which developed the disease before the end of recruitment time can be recorded. The subject with a longer disease event time cannot be recorded. Therefore the observed data could be biased, i.e. subjects with smaller disease times are more likely to be observed. In fact, such a bias could be very serious in the epidemiology study, since more subjects could be infected towards the end of the study, but among those only the ones with shorter disease time (developing disease before the end of recruitment) are more likely to be recorded; the other ones with longer disease times are less likely to be recruited.
If left-truncation and right-truncation occur simultaneously, we will have interval truncated data. A more complicated scenario is that both censoring and truncation occur. This is because the censoring variable and truncation variables are usually correlated, which makes the analysis difficult. For example the end of recruitment could cause a potential right-truncation and also cause a potential right-censoring, or the censoring time (could be the last follow-up time) may be a certain period of time after the truncation time (the end of recruitment). Another type of biased data is called length-biased data, which actually can be viewed as a special case of truncated data. Some discussions about this are provided in later chapters.
|Download Ebook||Read Now||File Type||Upload Date|
|November 20, 2016|
Do you like this book? Please share with your friends, let's read it !! :)How to Read and Open File Type for PC ?