Survival analysis and germination data: an overlooked connection

Survival analysis and germination data: an overlooked connection. r-bloggers July, 2019.

The background Seed germination data describe the time until an event of interest occurs. In this sense, they are very similar to survival data, apart from the fact that we deal with a different (and less sad) event: germination instead of death. But, seed germination data are also similar to failure-time data, phenological data, time-to-remission data… the first point is: germination data are time-to-event data. You may wonder: what’s the matter with time-to-event data? Do they have anything special? With few exceptions, all time-to-event data are affected by a certain form of uncertainty, which takes the name of ‘censoring’. It relates to the fact that the exact time of event may not be precisely know. I think it is good to give an example. Let’s take a germination assay, where we put, say, 100 seeds in a Petri dish and make daily inspections. At each inspection, we count the number of germinated seeds. In the end, what have we learnt about the germination time of each seed? It is easy to note that we do not have a precise value, we only have an uncertainty interval. Let’s make three examples. If we found a germinated seed at the first inspection time, we only know that germination took place before the inspection (left-censoring). If we find a germinated seed at the second inspection time, we only know that germination took place somewhere between the first and the second inspection (interval-censoring). If we find an ungerminated seed at the end of the experiment, we only know that its germination time, if any, is higher than the duration of the experiment (right-censoring). Censoring implies a lot of uncertainty, which is additional to other more common sources of uncertainty, such as the individual seed-to-seed variability or random errors in the manipulation process. Is censoring a problem? Yes, it is, although it is usually overlooked in seed science research. I made this point in a recent review (Onofri et al., 2019) and I would like to come back to this issue here. The second point is that the analyses of data from germination assays should always account for censoring. Data analyses for germination assays A swift search of literature shows that seed scientists are often interested in describing the time-course of germinations, for different plant species, in different environmental conditions. In simple terms, if we take a population of seeds and give it enough water, the individual seeds will start germinating. Their germination times will be different, due to natural seed-to-seed variability and, therefore, the proportion of germinated seeds will progressively and monotonically increase over time. However, this proportion will almost never reach 1, because, there will often be a fraction of seeds that will not germinate in the given conditions, because it is either dormant or nonviable. In order to describe this progress to germination, a log-logistic function is often used: \textbackslash[ G(t) = \textbackslashfrac\d\\ 1 + exp \textbackslashleft\textbackslash\ - b \textbackslashright[ \textbackslashlog(t) - \textbackslashlog(e) \textbackslashleft] \textbackslashright\textbackslash\\ \textbackslash] where \textbackslash(G\textbackslash) is the fraction of germinated seeds at time \textbackslash(t\textbackslash), \textbackslash(d\textbackslash) is the germinable fraction, \textbackslash(e\textbackslash) is the median germination time for the germinable fraction and \textbackslash(b\textbackslash) is the slope around the inflection point. The above model is sygmoidally shaped and it is symmetric on a log-time scale. The three parameters are biologically relevant, as they describe the three main features of seed germination, i.e. capability (\textbackslash(d\textbackslash)), speed (\textbackslash(e\textbackslash)) and uniformity (\textbackslash(b\textbackslash)). My third point in this post is that The process of data analysis for germination data is often based on fitting a log-logistic (or similar) model to the observed counts. Motivating example: a simulated dataset Considering the above, we can simulate the results of a germination assay. Let’s take a 100-seed-sample from a population where we have 85% of germinable seeds (\textbackslash(d = 0.85\textbackslash)), with a median germination time \textbackslash(e = 4.5\textbackslash) days and \textbackslash(b = 1.6\textbackslash). Obviously, this sample will not necessarily reflect the characteristics of the population. We can do this sampling in R, by using a three-steps approach. Step 1: the ungerminated fraction First, let’s simulate the number of germinated seeds, assuming a binomial distribution with a proportion of successes equal to 0.85. We use the random number generator ‘rbinom()’: #Monte Carlo simulation - Step 1 d

@misc{r-bloggers2019Survival,
	title = {Survival analysis and germination data: an overlooked connection},
	shorttitle = {Survival analysis and germination data},
	url = {https://www.r-bloggers.com/survival-analysis-and-germination-data-an-overlooked-connection/},
	abstract = {The background Seed germination data describe the time until an event of interest occurs. In this sense, they are very similar to survival data, apart from the fact that we deal with a different (and less sad) event: germination instead of death. But, seed germination data are also similar to failure-time data, phenological data, time-to-remission data… the first point is: germination data are time-to-event data. You may wonder: what’s the matter with time-to-event data? Do they have anything special? With few exceptions, all time-to-event data are affected by a certain form of uncertainty, which takes the name of ‘censoring’. It relates to the fact that the exact time of event may not be precisely know. I think it is good to give an example. Let’s take a germination assay, where we put, say, 100 seeds in a Petri dish and make daily inspections. At each inspection, we count the number of germinated seeds. In the end, what have we learnt about the germination time of each seed? It is easy to note that we do not have a precise value, we only have an uncertainty interval. Let’s make three examples.  If we found a germinated seed at the first inspection time, we only know that germination took place before the inspection (left-censoring). If we find a germinated seed at the second inspection time, we only know that germination took place somewhere between the first and the second inspection (interval-censoring). If we find an ungerminated seed at the end of the experiment, we only know that its germination time, if any, is higher than the duration of the experiment (right-censoring).  Censoring implies a lot of uncertainty, which is additional to other more common sources of uncertainty, such as the individual seed-to-seed variability or random errors in the manipulation process. Is censoring a problem? Yes, it is, although it is usually overlooked in seed science research. I made this point in a recent review (Onofri et al., 2019) and I would like to come back to this issue here. The second point is that the analyses of data from germination assays should always account for censoring.    Data analyses for germination assays A swift search of literature shows that seed scientists are often interested in describing the time-course of germinations, for different plant species, in different environmental conditions. In simple terms, if we take a population of seeds and give it enough water, the individual seeds will start germinating. Their germination times will be different, due to natural seed-to-seed variability and, therefore, the proportion of germinated seeds will progressively and monotonically increase over time. However, this proportion will almost never reach 1, because, there will often be a fraction of seeds that will not germinate in the given conditions, because it is either dormant or nonviable. In order to describe this progress to germination, a log-logistic function is often used: {\textbackslash}[ G(t) = {\textbackslash}frac\{d\}\{ 1 + exp {\textbackslash}left{\textbackslash}\{ - b {\textbackslash}right[ {\textbackslash}log(t) - {\textbackslash}log(e) {\textbackslash}left] {\textbackslash}right{\textbackslash}\}\} {\textbackslash}] where {\textbackslash}(G{\textbackslash}) is the fraction of germinated seeds at time {\textbackslash}(t{\textbackslash}), {\textbackslash}(d{\textbackslash}) is the germinable fraction, {\textbackslash}(e{\textbackslash}) is the median germination time for the germinable fraction and {\textbackslash}(b{\textbackslash}) is the slope around the inflection point. The above model is sygmoidally shaped and it is symmetric on a log-time scale. The three parameters are biologically relevant, as they describe the three main features of seed germination, i.e. capability ({\textbackslash}(d{\textbackslash})), speed ({\textbackslash}(e{\textbackslash})) and uniformity ({\textbackslash}(b{\textbackslash})). My third point in this post is that The process of data analysis for germination data is often based on fitting a log-logistic (or similar) model to the observed counts.    Motivating example: a simulated dataset Considering the above, we can simulate the results of a germination assay. Let’s take a 100-seed-sample from a population where we have 85\% of germinable seeds ({\textbackslash}(d = 0.85{\textbackslash})), with a median germination time {\textbackslash}(e = 4.5{\textbackslash}) days and {\textbackslash}(b = 1.6{\textbackslash}). Obviously, this sample will not necessarily reflect the characteristics of the population. We can do this sampling in R, by using a three-steps approach.  Step 1: the ungerminated fraction First, let’s simulate the number of germinated seeds, assuming a binomial distribution with a proportion of successes equal to 0.85. We use the random number generator ‘rbinom()’: \#Monte Carlo simulation - Step 1 d},
	language = {en-US},
	urldate = {2019-07-08},
	journal = {R-bloggers},
	author = {r-bloggers},
	month = jul,
	year = {2019},
	keywords = {leer}
}

Downloads: 0

{"_id":"dfvkHWbYkrRty5bJ6","bibbaseid":"rbloggers-survivalanalysisandgerminationdataanoverlookedconnection-2019","authorIDs":[],"author_short":["r-bloggers"],"bibdata":{"bibtype":"misc","type":"misc","title":"Survival analysis and germination data: an overlooked connection","shorttitle":"Survival analysis and germination data","url":"https://www.r-bloggers.com/survival-analysis-and-germination-data-an-overlooked-connection/","abstract":"The background Seed germination data describe the time until an event of interest occurs. In this sense, they are very similar to survival data, apart from the fact that we deal with a different (and less sad) event: germination instead of death. But, seed germination data are also similar to failure-time data, phenological data, time-to-remission data… the first point is: germination data are time-to-event data. You may wonder: what’s the matter with time-to-event data? Do they have anything special? With few exceptions, all time-to-event data are affected by a certain form of uncertainty, which takes the name of ‘censoring’. It relates to the fact that the exact time of event may not be precisely know. I think it is good to give an example. Let’s take a germination assay, where we put, say, 100 seeds in a Petri dish and make daily inspections. At each inspection, we count the number of germinated seeds. In the end, what have we learnt about the germination time of each seed? It is easy to note that we do not have a precise value, we only have an uncertainty interval. Let’s make three examples. If we found a germinated seed at the first inspection time, we only know that germination took place before the inspection (left-censoring). If we find a germinated seed at the second inspection time, we only know that germination took place somewhere between the first and the second inspection (interval-censoring). If we find an ungerminated seed at the end of the experiment, we only know that its germination time, if any, is higher than the duration of the experiment (right-censoring). Censoring implies a lot of uncertainty, which is additional to other more common sources of uncertainty, such as the individual seed-to-seed variability or random errors in the manipulation process. Is censoring a problem? Yes, it is, although it is usually overlooked in seed science research. I made this point in a recent review (Onofri et al., 2019) and I would like to come back to this issue here. The second point is that the analyses of data from germination assays should always account for censoring. Data analyses for germination assays A swift search of literature shows that seed scientists are often interested in describing the time-course of germinations, for different plant species, in different environmental conditions. In simple terms, if we take a population of seeds and give it enough water, the individual seeds will start germinating. Their germination times will be different, due to natural seed-to-seed variability and, therefore, the proportion of germinated seeds will progressively and monotonically increase over time. However, this proportion will almost never reach 1, because, there will often be a fraction of seeds that will not germinate in the given conditions, because it is either dormant or nonviable. In order to describe this progress to germination, a log-logistic function is often used: \\textbackslash[ G(t) = \\textbackslashfrac\\d\\\\ 1 + exp \\textbackslashleft\\textbackslash\\ - b \\textbackslashright[ \\textbackslashlog(t) - \\textbackslashlog(e) \\textbackslashleft] \\textbackslashright\\textbackslash\\\\ \\textbackslash] where \\textbackslash(G\\textbackslash) is the fraction of germinated seeds at time \\textbackslash(t\\textbackslash), \\textbackslash(d\\textbackslash) is the germinable fraction, \\textbackslash(e\\textbackslash) is the median germination time for the germinable fraction and \\textbackslash(b\\textbackslash) is the slope around the inflection point. The above model is sygmoidally shaped and it is symmetric on a log-time scale. The three parameters are biologically relevant, as they describe the three main features of seed germination, i.e. capability (\\textbackslash(d\\textbackslash)), speed (\\textbackslash(e\\textbackslash)) and uniformity (\\textbackslash(b\\textbackslash)). My third point in this post is that The process of data analysis for germination data is often based on fitting a log-logistic (or similar) model to the observed counts. Motivating example: a simulated dataset Considering the above, we can simulate the results of a germination assay. Let’s take a 100-seed-sample from a population where we have 85% of germinable seeds (\\textbackslash(d = 0.85\\textbackslash)), with a median germination time \\textbackslash(e = 4.5\\textbackslash) days and \\textbackslash(b = 1.6\\textbackslash). Obviously, this sample will not necessarily reflect the characteristics of the population. We can do this sampling in R, by using a three-steps approach. Step 1: the ungerminated fraction First, let’s simulate the number of germinated seeds, assuming a binomial distribution with a proportion of successes equal to 0.85. We use the random number generator ‘rbinom()’: #Monte Carlo simulation - Step 1 d","language":"en-US","urldate":"2019-07-08","journal":"R-bloggers","author":[{"firstnames":[],"propositions":[],"lastnames":["r-bloggers"],"suffixes":[]}],"month":"July","year":"2019","keywords":"leer","bibtex":"@misc{r-bloggers2019Survival,\n\ttitle = {Survival analysis and germination data: an overlooked connection},\n\tshorttitle = {Survival analysis and germination data},\n\turl = {https://www.r-bloggers.com/survival-analysis-and-germination-data-an-overlooked-connection/},\n\tabstract = {The background Seed germination data describe the time until an event of interest occurs. In this sense, they are very similar to survival data, apart from the fact that we deal with a different (and less sad) event: germination instead of death. But, seed germination data are also similar to failure-time data, phenological data, time-to-remission data… the first point is: germination data are time-to-event data. You may wonder: what’s the matter with time-to-event data? Do they have anything special? With few exceptions, all time-to-event data are affected by a certain form of uncertainty, which takes the name of ‘censoring’. It relates to the fact that the exact time of event may not be precisely know. I think it is good to give an example. Let’s take a germination assay, where we put, say, 100 seeds in a Petri dish and make daily inspections. At each inspection, we count the number of germinated seeds. In the end, what have we learnt about the germination time of each seed? It is easy to note that we do not have a precise value, we only have an uncertainty interval. Let’s make three examples. If we found a germinated seed at the first inspection time, we only know that germination took place before the inspection (left-censoring). If we find a germinated seed at the second inspection time, we only know that germination took place somewhere between the first and the second inspection (interval-censoring). If we find an ungerminated seed at the end of the experiment, we only know that its germination time, if any, is higher than the duration of the experiment (right-censoring). Censoring implies a lot of uncertainty, which is additional to other more common sources of uncertainty, such as the individual seed-to-seed variability or random errors in the manipulation process. Is censoring a problem? Yes, it is, although it is usually overlooked in seed science research. I made this point in a recent review (Onofri et al., 2019) and I would like to come back to this issue here. The second point is that the analyses of data from germination assays should always account for censoring. Data analyses for germination assays A swift search of literature shows that seed scientists are often interested in describing the time-course of germinations, for different plant species, in different environmental conditions. In simple terms, if we take a population of seeds and give it enough water, the individual seeds will start germinating. Their germination times will be different, due to natural seed-to-seed variability and, therefore, the proportion of germinated seeds will progressively and monotonically increase over time. However, this proportion will almost never reach 1, because, there will often be a fraction of seeds that will not germinate in the given conditions, because it is either dormant or nonviable. In order to describe this progress to germination, a log-logistic function is often used: {\\textbackslash}[ G(t) = {\\textbackslash}frac\\{d\\}\\{ 1 + exp {\\textbackslash}left{\\textbackslash}\\{ - b {\\textbackslash}right[ {\\textbackslash}log(t) - {\\textbackslash}log(e) {\\textbackslash}left] {\\textbackslash}right{\\textbackslash}\\}\\} {\\textbackslash}] where {\\textbackslash}(G{\\textbackslash}) is the fraction of germinated seeds at time {\\textbackslash}(t{\\textbackslash}), {\\textbackslash}(d{\\textbackslash}) is the germinable fraction, {\\textbackslash}(e{\\textbackslash}) is the median germination time for the germinable fraction and {\\textbackslash}(b{\\textbackslash}) is the slope around the inflection point. The above model is sygmoidally shaped and it is symmetric on a log-time scale. The three parameters are biologically relevant, as they describe the three main features of seed germination, i.e. capability ({\\textbackslash}(d{\\textbackslash})), speed ({\\textbackslash}(e{\\textbackslash})) and uniformity ({\\textbackslash}(b{\\textbackslash})). My third point in this post is that The process of data analysis for germination data is often based on fitting a log-logistic (or similar) model to the observed counts. Motivating example: a simulated dataset Considering the above, we can simulate the results of a germination assay. Let’s take a 100-seed-sample from a population where we have 85\\% of germinable seeds ({\\textbackslash}(d = 0.85{\\textbackslash})), with a median germination time {\\textbackslash}(e = 4.5{\\textbackslash}) days and {\\textbackslash}(b = 1.6{\\textbackslash}). Obviously, this sample will not necessarily reflect the characteristics of the population. We can do this sampling in R, by using a three-steps approach. Step 1: the ungerminated fraction First, let’s simulate the number of germinated seeds, assuming a binomial distribution with a proportion of successes equal to 0.85. We use the random number generator ‘rbinom()’: \\#Monte Carlo simulation - Step 1 d},\n\tlanguage = {en-US},\n\turldate = {2019-07-08},\n\tjournal = {R-bloggers},\n\tauthor = {r-bloggers},\n\tmonth = jul,\n\tyear = {2019},\n\tkeywords = {leer}\n}\n\n","author_short":["r-bloggers"],"key":"r-bloggers2019Survival","id":"r-bloggers2019Survival","bibbaseid":"rbloggers-survivalanalysisandgerminationdataanoverlookedconnection-2019","role":"author","urls":{"Paper":"https://www.r-bloggers.com/survival-analysis-and-germination-data-an-overlooked-connection/"},"keyword":["leer"],"downloads":0},"bibtype":"misc","biburl":"https://bibbase.org/zotero/flavjack","creationDate":"2020-07-06T11:23:40.183Z","downloads":0,"keywords":["leer"],"search_terms":["survival","analysis","germination","data","overlooked","connection","r-bloggers"],"title":"Survival analysis and germination data: an overlooked connection","year":2019,"dataSources":["74y4ALo5xpgKN9nF8"]}