Dylan Pieper
@dylanpieper.bsky.social
π€ 226
π₯ 1604
π 95
Data scientist @ Pitt β’ Dog dad π β’ Pilot πͺ β’
#rstats
β’
https://dylanpieper.github.io
reposted by
Dylan Pieper
about 2 months ago
Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.
1
44
10
reposted by
Dylan Pieper
Libby Heeren
2 months ago
Remember this
#rstats
post? I wasn't the only one talking about it & the tidyverse team was listening π
#databs
New
#dplyr
functions? They're looking for feedback!! π€ replace_when, recode_values, replace_values π Read this:
github.com/tidyverse/ti...
π£οΈ Comment on PR:
github.com/tidyverse/ti...
add a skeleton here at some point
4
63
18
reposted by
Dylan Pieper
Nate O
2 months ago
I think pedocon theory is right. Itβs empirically adequate, parsimonious, fits within a broader theoretical framework, and has immense explanatory breadth and depth
www.liberalcurrents.com/we-need-to-t...
loading . . .
We Need to Talk About Pedocon Theory
The connection between Donald Trump and Jeffrey Epstein is no accident, but reveals a deep logic at the heart of reactionary politics.
https://www.liberalcurrents.com/we-need-to-talk-about-pedocon-theory/
2
73
28
reposted by
Dylan Pieper
Hadley Wickham
3 months ago
I am such a sucker for frivolous uses of AI. Here's an anthem for the tidyverse:
suno.com/s/iVMVs4IoyA...
loading . . .
https://suno.com/s/iVMVs4IoyAXEoZMo
3
9
3
reposted by
Dylan Pieper
Crystal Lewis
3 months ago
Very cool to see authors of this article mentioning the importance of sharing project-, data-, AND variable-level documentation alongside data in a repository, and linking to the templates I've provided on OSF as an example! π
doi.org/10.1515/ling...
2
29
8
reposted by
Dylan Pieper
Crystal Lewis
3 months ago
As a data manager, good documentation not only helps me do my job better, but also helps me annoy you less! π Good documentation about inclusion criteria, READMEs about oddities in the data, consort diagrams and tracking to explain missing data, and so on, are all ways to ensure I bug you less! πππ
0
22
3
reposted by
Dylan Pieper
Hadley Wickham
4 months ago
New to me is the term "premature closure", where you too quickly latch on to the first solution you see. Always a danger in coding, but particularly so today when LLMs can give you a plausible fix so so quickly.
www.shayon.dev/post/2025/16...
loading . . .
Pitfalls of premature closure with LLM assisted coding
When LLM models generates clean, professional-looking code, it's tempting to stop exploring alternatives. But therein lies the risks that comes with premature closure. So what is premature closure?
https://www.shayon.dev/post/2025/164/pitfalls-of-premature-closure-with-llm-assisted-coding/
7
99
19
reposted by
Dylan Pieper
Charlie Gao
4 months ago
Bleeding edge update for the
#tidyverse
purrr package with even more seamless
#rstats
parallel maps. Introducing our shiniest new adverb: `in_parallel()`. Just wrap your function to take advantage of blazing fast parallel processing via mirai. pak::pak("tidyverse/purrr")
purrr.tidyverse.org/dev/
loading . . .
Functional Programming Tools
A complete and consistent functional programming toolkit for R.
https://purrr.tidyverse.org/dev/
6
103
33
reposted by
Dylan Pieper
Vincent Arel-Bundock
4 months ago
One cool thing you can/should do is sample from priors only, and plot the distribution of the actual quantity of interest (ex: risk ratio). I find this very useful. This is actually super easy with brms.
arelbundock.com/posts/margin...
loading . . .
Prior Predictive Checks with marginaleffects and brms β Vincent Arel-Bundock
https://arelbundock.com/posts/marginaleffects_priors/index.html
1
21
2
reposted by
Dylan Pieper
JD Long
4 months ago
This blog post about engineering not doing ETL is nine years oldβ¦ itβs worth reviewing
multithreaded.stitchfix.com/blog/2016/03...
loading . . .
Engineers Shouldnβt Write ETL: A Guide to Building a High Functioning Data Science Department | Stitch Fix Technology β Multithreaded
βWhat is the relationship like between your team and the data scientists?β This is, without a doubt, the question Iβm most frequently asked when conducting i...
https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
3
20
5
reposted by
Dylan Pieper
Alex Kraieski
4 months ago
Here's a functional programming trick for
#rstats
that I wish I started using sooner: if you need a
#ggplot2
scale to be reusable across multiple plots and dynamically configurable without relying on global state, consider using a function factory (a function that returns a function) to build it
6
36
6
reposted by
Dylan Pieper
Charlie Gao
5 months ago
mirai - minimalist async framework for
#RStats
- released as an 'r-lib' package. Blog post: Advancing Async Computing in R.
shikokuchuo.net/posts/26-mir...
mirai provides event-driven async for
#RShiny
and parallel processing for purrr
#tidyverse
. Really excited to be working on this at Posit!
loading . . .
shikokuchuo{net}: mirai 2.3.0
Advancing Async Computing in R
https://shikokuchuo.net/posts/26-mirai-230/
0
64
19
reposted by
Dylan Pieper
Carl T. Bergstrom
5 months ago
tl;dr β this EO co-opts the language of open science to implement a system of political control wherein presidential appointees are given broad latitude to designate any number of reasonable scientific activities and inferences as scientific misconduct, and to penalize those involved accordingly.
loading . . .
Restoring Gold Standard Science
By the authority vested in me as President by the Constitution and the laws of the United States of America, including section 7301 of title 5, United
https://www.whitehouse.gov/presidential-actions/2025/05/restoring-gold-standard-science/
102
2485
1177
reposted by
Dylan Pieper
Kevin Zollman
5 months ago
There's so much polarization around LLMs. They are way overhyped, I agree. But I also use them semi-regularly now. Here's a thread of genuine use cases where I find them helpful. Please add your own!
7
92
35
reposted by
Dylan Pieper
Jasmine Daly
5 months ago
π¦ Iβm excited to share a new
#rstats
package Iβve been working on: {shinyfa} built to help folks working on large or unfamiliar
#rshiny
apps β¨ The package scans your app folders and extracts out details on render*(), reactive() and input$ to a dataframe! π
www.dalyanalytics.com/blog/shinyfa...
loading . . .
Introducing {shinyfa}: Analyze Large Shiny App Codebases Faster with This R Package | Daly Analytics
Discover {shinyfa}, a new R package designed to improve developer experience by analyzing and summarizing the structure of large Shiny applications. Perfect for consultants, teams, and contributors wo...
https://www.dalyanalytics.com/blog/shinyfa-announcement
2
13
3
reposted by
Dylan Pieper
Michael Howe
5 months ago
Playing around with satellite imagery of
#madison
to make some office art.
#Rstats
0
5
1
reposted by
Dylan Pieper
Hadley Wickham
5 months ago
β¨Use llms from
#rstats
with ellmer β¨Version 0.2.0 is on CRAN now. No blog post yet because I'm about to go on vacation, but in the meantime you can check out the release notes:
github.com/tidyverse/el...
.
loading . . .
https://github.com/tidyverse/ellmer/blob/main/NEWS.md#ellmer-020
3
69
14
reposted by
Dylan Pieper
Crystal Lewis
5 months ago
The kind of Friday morning content I needed to see. β€οΈ
add a skeleton here at some point
0
14
1
reposted by
Dylan Pieper
Posit
5 months ago
Registration for the posit::conf(2025) virtual experience is now open! Join us virtually, Sept 16β18, and access live-streamed keynotes and 100+ talks, on-demand recordings, Q&A sessions, and our virtual networking platform. Learn more in the blog post:
posit.co/blog/posit-c...
#RStats
#Python
1
21
19
reposted by
Dylan Pieper
easystats
5 months ago
In case you missed it, we recently updated some of our packages, including many new features (again) in the
#rstats
#easystats
{modelbased} package:
easystats.github.io/modelbased/n...
The last weeks we were working a lot on improving support and performance for Bayesian models and especially
loading . . .
Changelog
https://easystats.github.io/modelbased/news/index.html
1
14
5
reposted by
Dylan Pieper
Crystal Lewis
5 months ago
I'm still thinking about my favorite quote from the Posit Data Science Hangout today. It perfectly sums up what I hope I provide to the researchers I work with: a trusted partner, who is there to support them in their work. Earn a reputation for being a good person to work with - Cara Thompson
0
25
5
reposted by
Dylan Pieper
Terry Christiani π
5 months ago
Great news! R/Medicine 2025 is providing a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Join us for the premier R conference for health and medicine. π Register today:
rconsortium.github.io/RMedicine_we...
#rstats
#opensource
#RMed25
loading . . .
register β R/Medicine 2025
https://rconsortium.github.io/RMedicine_website/Register.html
0
10
8
reposted by
Dylan Pieper
David Ho
5 months ago
I think a lot about what Carl Sagan said in one of his final interviews.
251
18636
6629
Iβm happy to share that Iβll be giving a talk at R/Medicine 2025! π I work with a BIG REDcap database for substance use treatment (200+ locations) which makes extraction difficult. I developed {redquack}, an
#rstats
π¦ that transfers REDCap data to DuckDB, and will talk about how to use it. π¦
5 months ago
1
10
3
π π΅οΈββοΈ
#rstats
community! Do you sometimes feel like you're just pretending to be a data scientist? I'm researching imposter syndrome for my upcoming talk at posit::conf(2025) π I'd love to hear YOUR experiences in a short 5-10 minute anonymous survey:
forms.gle/YkJtwZWquyKM...
Please share! π
loading . . .
Imposter Syndrome in Data Science
This survey is intended to gather community feedback from data scientists and students or recent graduates interested in data science as a career. Your responses will be anonymous and may be used for...
https://forms.gle/YkJtwZWquyKM84P67
5 months ago
3
3
7
reposted by
Dylan Pieper
Andrew Heiss
5 months ago
Since it's in Atlanta, I'll be here at my first posit::conf! I'll be speaking here with
@gosterhout.bsky.social
about election night reporting with
#rstats
and
#QuartoPub
(showing off {targets} and other neat tricks like this:
www.andrewheiss.com/blog/2024/11...
)
add a skeleton here at some point
1
46
4
reposted by
Dylan Pieper
Gabe Osterhout
11 months ago
States spend too much on clunky election night reporting. We just replaced ours using
#rstats
. dbplyr backend + reactable & leaflet viz +
#quartopub
site. Real magic happened with programmatic code chunks & targets pipeline done by
@andrew.heiss.phd
.
#dataviz
results.voteidaho.gov
4
46
11
reposted by
Dylan Pieper
Julia M. Rohrer
5 months ago
Another q for the stats people! People worry about collinearity (cf blog post below). Consider a scenario in which the collinear predictors are just controls to account for confounding. Including both of them doesn't impair the precision with which the effect of interest is estimated, does it?
loading . . .
Jan Vanhove :: Blog - Collinearity isnβt a disease that needs curing
https://janhove.github.io/posts/2019-09-11-collinearity/
14
90
27
reposted by
Dylan Pieper
Hadley Wickham
6 months ago
Happy reinstalling-all-your-R-packages day to all those who celebrate
#rstats
12
165
26
reposted by
Dylan Pieper
Vincent Arel-Bundock
6 months ago
Hive mind, please help me out! I need a more informative and explicit subtitle for my upcoming
#RStats
book "Model to Meaning" The premise is that analysts should often transform coefficient estimates into more meaningful / interpretable quantities like predictions, risk differences, slopes, etc.
23
86
19
reposted by
Dylan Pieper
Adam H. Smiley
6 months ago
My amazing independent study student wrote me a thank you card and drew this laptop with
#rstats
code on it π₯Ή
0
23
5
reposted by
Dylan Pieper
Crystal Lewis
6 months ago
When planning for data collection, especially in longitudinal studies, first consider how that data will be used. Ask yourself: - How will we combine data for analysis? - What unique IDs will allow us to do this? - How will we name/code items to combine data? - Will our data need restructuring?
1
31
6
reposted by
Dylan Pieper
Norm Matloff (δ½ ζεθ«ζΈ ζ₯ε?)
6 months ago
Recently I posted a draft of my essay on banishing p-values/NHST. I've now refined and expanded it, partly based on feedback I received here. Please take a look and comment,
matloff.github.io/No-P-Values
loading . . .
Redirect to NPV.html
https://matloff.github.io/No-P-Values
4
22
8
reposted by
Dylan Pieper
Hadley Wickham
6 months ago
If you're using LLMs to write R code,
@simonpcouch.com
's blog posts are the best way to keep up with which model is best
add a skeleton here at some point
0
54
13
reposted by
Dylan Pieper
Stand Up for Science!
6 months ago
Childhood lead exposure can cause serious developmental delays, hearing loss, and behavioral problems. The CDC has long had the technology and the people power to efficiently and effectively mitigate lead exposure. RFK Jr. cut the program. What's more un-American than that?
#StandUpForScience
add a skeleton here at some point
7
288
96
reposted by
Dylan Pieper
Giles
6 months ago
use() is a pretty cool addition!
add a skeleton here at some point
1
2
2
Itβs a good day. I got my first suit and ran my first stan (brms) model in the same day. Except the full model is still running lol.
6 months ago
0
1
0
reposted by
Dylan Pieper
Thomas Lin Pedersen
6 months ago
For the last couple of months I've been working on something and I'm excited to finally share an early preview: Say hello to plumber2 π plumber2 is a full rewrite of the plumber package for creating powerful webapis in
#rstats
. It takes everything we have learned from plumber and adds even more
loading . . .
What the Package Does (One Line, Title Case)
What the package does (one paragraph).
https://posit-dev.github.io/plumber2/
5
141
41
Visualizing git commits for ellmer using hellmer for batching with structured data refinement. π¦
#rstats
gist.github.com/dylanpieper/...
loading . . .
6 months ago
0
4
1
reposted by
Dylan Pieper
Rafe Meager (they/them)
6 months ago
today we will all read imbens 2021 on statistical significance and p values, which is a strong contender for having the best opening paragraph of any stats paper
pubs.aeaweb.org/doi/pdf/10.1...
27
715
122
reposted by
Dylan Pieper
DerMann51
6 months ago
From Minnesota State Capitol today -
#HandsOff
@handle.invalid
@indivisiblemnleg.bsky.social
1
154
27
reposted by
Dylan Pieper
Weremuskrat John
6 months ago
Madison, Wisconsin, this afternoon: wow!
#Handsoff
#Protest
(Photo via
@captimes.com
)
8
412
90
reposted by
Dylan Pieper
Ben Wikler
6 months ago
Madison, Wisconsin
26
1934
484
reposted by
Dylan Pieper
Karl (sad trombone noise enthusiast)
6 months ago
Shot a bit on black and white today for the first time in five years and it felt good.
2
58
4
reposted by
Dylan Pieper
Ilya Kashnitsky
6 months ago
ADHD β Attention Deficit Hey Dude
0
4
1
reposted by
Dylan Pieper
Chris Brownlie
6 months ago
this week in
#rstats
π₯ - evaluating LLMs in R π§ͺ
@simonpcouch.com
- Observable JS for R users π
@nrennie.bsky.social
- a new Docker pkg for R π³
@coatless.bsky.social
- using duckdb & duckplyr π¦
@rorylawless.com
- R pkg risk & QA π§
@jumpingrivers.com
and more! π₯³
www.linkedin.com/pulse/week-r...
loading . . .
This week in R (2025-04-04)
How good is Gemini at R?; a new way to integrate Docker to your R work; Observable JS for R devs; getting the most out of DuckDB; R package QA, and more!
https://www.linkedin.com/pulse/week-r-2025-04-04-chris-brownlie-xbgre/
0
22
8
reposted by
Dylan Pieper
Katie Martin
6 months ago
Your tariff rate is your star sign divided by the number of boys you've kissed, multiplied by one
398
16392
2915
reposted by
Dylan Pieper
George Pearkes
6 months ago
Just did the math and yes the tariff rate is *exactly* (abs(trade_deficit))/us_imports for each country. Where this number is <10%, it's 10%. Literally every single country matches precisely.
25
2024
481
reposted by
Dylan Pieper
Jeff Stevens
6 months ago
π’ Huge Neb-RUG announcement! π’ Yihui Xie will introduce his new litedown package in the next Nebraska R User Group talk: Creating Beautiful PDFs via HTML/CSS/JS 1-2pm CDT Wednesday, April 9th Register at
www.meetup.com/neb-rug/even...
Learn more about Yihui in the thread! π (1/2)
#RStats
#statsky
loading . . .
Creating Beautiful PDFs via HTML/CSS/JS (Yihui Xie), Wed, Apr 9, 2025, 1:00 PM | Meetup
**Talk summary** *The usual way of creating a PDF document is through a typesetting tool such as LaTeX or Word. LaTeX is well-known for its steep learning curve and high-qu
https://www.meetup.com/neb-rug/events/306923229
1
16
8
reposted by
Dylan Pieper
Nicola Rennie
6 months ago
We're looking at Pokemon data for
#TidyTuesday
this week! π Stream plot showing speed and colour using {ggstream} π· Image added with {ggimage} π Aiming for a minimalist, fun, and arty chart Code:
github.com/nrennie/tidy...
#RStats
#DataViz
#ggplot2
0
54
13
Load more
feeds!
log in