University of Arizona
College of Engineering
Client: Sayyed Mohsen Vazirizade (University of Arizona – Department of Civil Engineering)
Consultants: Elmira, Haozhe, Samir, Amy (author)
12 November 2019, 3-4:30pm
A. Overview of Client’s Problem
Sayyed is focusing on estimating distribution of significant wave heights only using the histogram of those wave heights. Based on the prior knowledge, he fitted the Weibull distribution with 3 parameters, and the parameters were estimated based on the least squares (LS), the maximum likelihood method (MSE), and method of moments (MM). In order to determine which of the estimated distributions is the best, he used the p-values from the goodness of fit test, RMSE, and the likelihood ratios. He wants to know if there are better methods to 1) estimate the parameters of the Weibull distribution and 2) test the performance of different density estimations.
1) Data details
The only available data is the histogram of the total number of significant wave heights, which are top 1/3 of the maximum wave heights, in year. The data is grouped into bins, and from the histogram, the distribution is skewed to the right, and the heavy tail is important feature in the distribution (but is not of our interest now).
2) Background information
There is no standard practice available in the literature to work only with the information from the histogram of the data. He also obtained the LS, MSE, and MM estimators based on the CDF of the histogram, not the PDF. The LS estimators were obtained by minimizing the distance between the fitted Weibull distribution and the end point of each bin in the histogram.
3) Outcome of interest
- Alternative method to estimate parameters
- Performance metrics to measure which of the estimated densities with different sets of parameters is the best
IV. Next steps
1) Our suggestions
- Simulation study: In order to validate his approaches (the LS, MLE, MM with the CDF of a histogram) and to generalize his methods, we suggested generating samples from the Weibull distribution with different sets of parameters. Using the generated samples, he can create histograms and apply his methods to see if it works for general cases, other than this specific dataset.
- Reliability of the p-values from the goodness of fit test: Since there are only 9 bins in the histogram, which is like a test with 9 observations, we told him not to heavily rely on the p-values from the goodness of fit test. For a small sample, a Bayesian approach might be useful, which assume a prior distribution of parameters based on the background knowledge. However, according to Sayyed, the prior was assumed to be uniform for one of the parameter, which will not add any information. So, if that is the case, it might not be useful.
- We suggested looking at simpler situations with only two parameters, excluding the third parameter, gamma, which accounts for the shift of the Weibull distribution. If he wants to use all three parameters, he would want to fix gamma, and conditional on that given gamma, he can estimate alpha and beta, the other two parameters.
2) Things to be discussed in class
- What other methods are available for the parameter estimation?
- Any other methods other than the goodness of fit test? How do you measure which density estimation method is good or not? We need to come up with a method to determine which set of parameters is the best.