University of Arizona
The data is collected from real time air monitors that measure particulate matter (PM) that are placed at homes for 24 hours and are logged at 1 minute intervals. Homes in this study are sampled at two time points for 24 hours, once during the summer and again during the winter. At each household visit, questionnaires are completed by participants that ask about activities they engaged in during the 24 hour sampling period that may contribute to elevated indoor concentrations of PM. Additionally, household characteristics are collected from technician walkthroughs.
The goal is to identify household characteristics and behavioral factors that are associated with indoor PM concentrations across two seasons (winter and summer). We may use linear mixed effects models, using 24 hour indoor average concentrations of PM as our outcome and household fuel type as our main explanatory variable.
We have a multitude of variables we are collecting. Some of the monitors however do not run for the entire 24 hours, sometimes they run for 6, 10, 14 hours, so the decision for including or excluding these homes needs to be decided. Additionally, we have some missing questionnaire data and homes that withdrew from the study (only sampled at 1 of 2 time points). We are interested in exploring methods for imputing missing data from monitoring data and from questionnaire data as well.
Is linear mixed effects model the appropriate method for analyzing this data? What about CART or PCA?
What variables selection method can we use for building a multivariable model? We currently have 38 independent variables.
What do we do with homes that have missing or incomplete monitoring and questionnaire data? Best imputation method? Exclude cases with incomplete/missing data (severe reduction in sample size)?
If we are interested in looking at seasonal difference in household and behavioral factors related to indoor PM, can we split the data into 2 separate datasets, one for summer and one for winter, then identify household and behavioral factors that are associated with indoor PM during winter and summer?
Do we run the analysis with homes that only have complete data and completed both follow-up sampling?
The data is tribal data, and I need to ask the PI and tribe if I have permission to send you the data and I will follow up with you later this week if I am able to provide it.
Client: Steven Hadeed
Consultants: Lisa, Drew(author), Andy, Dave
II. When: September 16, 3:30-4:30
Clients Problem: The client is studying household PM 2.5 levels and its relation with heating fuel type in Northern Arizona. The client has missing data for the PM measurement and is interested in methods to deal with missing values for PM measurements for outside concentrations. The main topic of the meeting dealt with whether imputation was appropriate for this study and further discussion is required regarding model recommendation and variables to be included in said model.
B) Data Collection:
The data is collected two times in Northern Arizona at homes in both plateau locations (ie isolated) and also in more densely packed villages. PM measurements are recorded two times per year (once in summer, and once in winter) through air monitors placed at two locations, one inside the house and the other outside the house. These monitors record the PM data at one minute intervals. Data is also collected by a technician walk through of the home. Lastly the residents complete a survey that provides information on activities they are doing inside the house. We are currently unsure if the homes are all sampled at the same time or whether they are sampled at different time points in summer and winter.
The cause of the missing data is thought to be failure of PM air monitors at low temperatures, which results in the missing values occuring not at random. A second meeting has been set to discuss if the client sampled the houses at the same time or at different times, and how the client can deal with the missing data. The meeting also dealt with the issue of comparing seasonality and its effects on PM measurements. The inclusion of a seasonal variable (1 winter, 0 summer) was discussed to assess the effects of time of year. However, heating fuel is not used during the summer time and we have left the discussion over this variable until the next meeting.
Next Steps: The consulting team will meet again with Steve with imputed data for the missing PM values and will then compare models with imputed data set and non imputed data set and then discuss the variable selection