Erin Jonaitis
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Posts
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
I'm analyzing Medicare data -- my first real experience with a large dataset, where the number of observations of interest to me is in the millions. We have repeated measures/clusters to worry about, each ranging from 2 to 10 observations, give or take.
I'm struggling with performance issues in pretty much every approach I take to this dataset. One outcome of interest is a proportion. zoib is painfully slow, even when I take a (stratified) random sample of 2% of rows -- in an hour it's only 4% done fitting my null model. Boundary values (0,1) are common in the data, ruling out "transform and just do lmer."
What general tools are available for modeling bigger datasets in R? Because of data privacy agreements I'm required to do all of the computing on-prem, so unfortunately I don't know that I can take advantage of high throughput computing on other servers, if it were even workable in this case.
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Another github migration question, this time for the scholcomm people and librarians:
I've got a repository that is associated with a publication. I'd like to do my due diligence to keep this out of large AI models as much as I can while also keeping it available to any researchers who care to poke more deeply into what we did. (Yes, I recognize that these are in some sense incompatible goods.) Has anyone developed a standard workflow for removing a repository from github while leaving a forwarding address?
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.
Applied statistician at UW-Madison, interested in longitudinal data analysis, reproducible workflows, and Alzheimer's disease. Views mine; all my errors are independent.