What is the impact of AI on productivity?
Reconciling the micro and the macro evidence
This post is intended as a living resource. I will update it periodically as new evidence accumulates. The current version reflects research available through March 2026. While this post will be regularly updated with studies and new productivity data as it comes in, I will only update the front end of the post if the narrative begins to shift.
Update (March 2026): In the initial version of the post, I had said that the micro productivity data was not yet showing up in the aggregate statistics, but that I expected that to change in the near-term. I believe that the newest batch of aggregate data—which shows a big upwards revision—is showing signs of AI productivity gains. Of course this is not the final word, but the trend is worth noting. Also noteworthy: Jason Furman now agrees with Erik Brynjolfsson that the aggregate productivity numbers are reflecting an AI productivity boost that had thus far been mostly documented in micro studies.
In this post I review and summarize the current literature on the productivity impact of AI. There has been a lot of debate on the news and social media on whether AI is actually having the promised impact on productivity, if yes, where it is showing up, and who is becoming more productive. This debate has been particularly urgent with the rolling out of AI agents such as Claude Code, which seem to be able to automate a really huge array of tasks (though it’s too early for these tools to show up in the numbers):
This post looks at the evidence we have so far, where “so far” will be updated continuously as new studies/data come in. For reasons that will become apparent, I divide the evidence into the “micro” and “macro”. The micro evidence captures evidence from specific controlled studies or natural experiments where the scientist can quantify the impact of AI adoption on a certain task, or for a certain job. The macro evidence comes from analyses of observational data on aggregate economic indicators, such as the productivity in a certain sector or across sectors, heterogeneity of who uses AI in the labor market, as well as on the employment impact of AI conditional on output. While evidence from the micro should theoretically show up in the macro data—productivity gains on tasks should scale up to aggregate output gains—there are reasons why this may not be the case. For example, in many cases, the participants in micro studies (“participants” is used broadly here) are instructed to use the tool and given training on how to apply it effectively on the given task. Workers in real-world settings may face frictions in adoption and may not know how to use the tools effectively, which would preclude the benefits from AI from showing up in the aggregate data. I discuss other reasons for disconnects between the micro and the macro, e.g., Baumol’s cost disease, in the last section of the post.
Here is the summary of the evidence thus far: we now have a growing body of micro studies showing real productivity gains from generative AI. However, the productivity impact of AI has yet to clearly show up in the aggregate data. This disconnect should not be surprising at this stage given the history of technology adoption. In the case of the previous big tech shock (information technology), Robert Solow famously observed in 1987 that “you can see the computer age everywhere but in the productivity statistics.” It is likely that the same dynamics are showing up with AI, at least for now.
At the micro level, the evidence is mixed but leaning heavily towards positive productivity benefits. Studies find productivity gains ranging from modest increases on some tasks to substantial returns (50%+) to AI. There is also variation in who benefits from AI. While the literature has thus far documented a prevalence for an equalizing effect—with less experienced/skilled workers benefiting most—there are notable exceptions.
At the macro level, these gains have not yet convincingly shown up in aggregate productivity statistics. While some studies show a slow down in hiring for AI-exposed jobs—which suggests that individual workers are either becoming more productive or tasks are being automated—the extent and timing of these dynamics are currently being debated. Other studies have found no changes in hours worked or wages earned based on AI use. Notably, observational studies of who uses AI in the aggregate also paints a different picture than the micro studies. As pointed out by Kevin Bryan, the Anthropic Economic Index finds that AI usage is concentrated among middle-to-upper wage white-collar workers. AI also tends to be used for tasks that require more education. A recent BCG survey found that managers use AI at nearly twice the rate of front-line workers. If AI makes workers more productive, these results suggest that it may widen disparities.
My working hypothesis is that the disconnect reflects a difference between what micro studies measure and what macro statistics capture. Micro studies typically focus on narrow, well-defined tasks. Study participants are prompted to use AI tools and are often given access to training. There are several factors and frictions that would prevent the productivity gains observed in micro studies from showing up in the aggregate data.
AI adoption is often endogenous, in the sense that the worker selects into using AI and often has to learn how to use it effectively and may not be using it for productive tasks. For example, the BCG survey showed that only 36% of workers feel that they were properly trained to use AI. Workers may not be unlocking the full productivity potential of the technology if, for example, they are not using the best LLM model for the job or applying it for unproductive tasks. These types of frictions will likely be overcome in the near term as organizations adopt training programs and AI tools are standardized at the enterprise level.
The endogeneity of AI adoption likely explains the distinction between who tends to benefit in the micro studies (mostly less-skilled) versus the aggregate data (higher-skilled, upper wage). The decision to adopt AI likely involves a different set of factors than the relative productivity gains conditional on adoption, and these factors may be correlated with being high-skilled/having more experience on the task. This means that while micro studies tend to show an equalizing effect of AI, we may see inequities exacerbated in the aggregate data. As with the first point, training programs and top-down directives for adoption will be key to close the gap between the micro and the macro.
Jobs are a collection of tasks. Even if some tasks are sped up, those that are not act as bottlenecks for productivity growth at the job level. A software developer who can write code twice as fast still has to wait for code reviews, attend meetings, coordinate with teammates, and navigate organizational processes. Tasks that are not assisted by AI may even take longer if workflows are not optimized to integrate with LLM output. Bottleneck tasks will slow down the emergence of AI gains in the aggregate data, but organizational re-structuring, training, and improvement in tools will reveal the productivity impact sooner than later. For example, AI agents such as Claude Code are beginning to automate and speed up tasks that were done manually only weeks ago. See Anthropic’s Boris Cherny and OpenAI’s (probably) roon on how much code they’re writing these days:
We may be on the descending portion of a productivity J-curve. As Brynjolfsson, Rock, and Syverson illustrate, when firms adopt transformative general-purpose technologies, measured productivity often initially falls because resources are diverted to investment, reorganization, and learning that do not show up as measured output. The intangible capital being accumulated is not captured in standard statistics. Their estimates suggest that adjusting for intangibles related to software and computer hardware yields TFP levels 15.9 percent higher than official measures. Kristina McElheran and colleagues show that this is likely the case with AI as well—resources invested in integrating AI tools may mask aggregate productivity gains.
There are of course factors that I’m probably leaving out. But from the ones I’ve considered, my guess is that we will be seeing AI showing up in the aggregate productivity numbers quite soon. Especially as AI begins to accelerate research and scientific discovery.
Ok, now to the evidence.1 Below are the paper summaries in chronological order, split into micro and macro sections. Note that dates may not reflect when the study was actually run, but rather when the paper was released/published. A table summary follows the paper summaries.
The Micro Studies
Guillermo Cruces, Diego Fernández Meijide, Sebastian Galiani, Ramiro H. Gálvez, and María Lombardi (2026): Randomized 1,174 adults aged 25–45 from Argentina into completing an incentivized business problem-solving task with or without access to a GPT-4.1-based assistant. Without AI, higher-education participants outperformed lower-education participants by 0.548 standard deviations. With AI access, this gap fell to 0.139 standard deviations—closing about 75 percent of the baseline productivity difference. AI raised task scores for both groups, but gains were larger for lower-education individuals (1.242 SD versus 0.834 SD relative to the low-education control group). In a follow-up exercise without AI, the education gap narrowed only modestly, indicating that underlying skill differences persist when the assistant is removed. The authors interpret AI as a task-level equalizer that substitutes for cognitive inputs like abstract reasoning and written communication that are more binding for less-educated workers, without eliminating the role of human capital.
Imke Reimers and Joel Waldfogel (2026): Examined how LLMs affected book publishing between 2022 and 2025. New book releases tripled over this period, but average quality declined. The pattern varied by tier: the top 1,000 monthly releases per category showed higher quality than the pre-LLM period, while the top 100 showed no improvement. The quality gains concentrate in categories experiencing faster title growth. Pre-LLM authors continued producing higher-quality work, while new entrants flooding the market post-LLM drove most of the quality decline. Despite the average quality drop, the authors estimate a potential steady-state consumer surplus gain of 25–50 percent from market expansion.
Jessie Gommers, Veronica Hernström, Viktoria Josefsson, Hanna Sartor, David Schmidt, Annie Hjelmgren, Aldana Rosso, Ingvar Andersson, Solveig Hofvind, Oskar Hagberg, and Kristina Lång (2026): The MASAI trial randomized 105,934 Swedish women to AI-supported mammography screening or standard double reading. In the AI group, screen readings fell from 109,692 to 61,248—a 44 percent workload reduction—while screen-detected cancers rose from 262 to 338, a 29 percent increase in detection. Sensitivity climbed from 73.8 percent to 80.5 percent with no change in specificity (98.5 percent in both arms). Interval cancers—those missed between screenings—dropped from 93 to 82 (12 percent fewer), with sharper reductions among invasive tumors (89 to 75, down 16 percent) and aggressive non-luminal A subtypes (59 to 43, down 27 percent). Larger tumors (T2+) also fell from 48 to 38, a 21 percent decline. The trial is the first completed RCT on AI in mammography, though findings come from a single country with experienced radiologists.
Zora Zhiruo Wang, Sanidhya Vijayvargiya, Aspen Chen, Hanmo Zhang, Venu Arvind Arangarajan, Jett Chen, Valerie Chen, Diyi Yang, Daniel Fried, and Graham Neubig (2025): Mapped 43 AI agent benchmarks spanning 72,342 tasks to 1,016 real-world US occupations and found substantial mismatches between where agent development focuses and where human labor concentrates. Computer and mathematical occupations account for just 7.6 percent of US employment yet dominate benchmarking, while management (88 percent digital work, 1.4 percent of benchmark coverage), legal (70 percent digital, 0.3 percent coverage), and architecture and engineering (71 percent digital, 0.7 percent coverage) are severely underrepresented. Two granular skills—”Getting Information” and “Working with Computers”—cover less than 5 percent of total employment yet drive most agent development. Existing benchmarks cover 56.5 percent of work domains and 85.4 percent of work skills. The authors propose three principles for improved benchmark design: coverage, meaning benchmarks should represent the full breadth of economically valuable work rather than clustering around programming; realism, meaning tasks should reflect actual workplace conditions including multi-step workflows and ambiguous instructions rather than isolated toy problems; and granular evaluation, meaning performance should be measured at the level of specific occupational skills rather than broad domain scores, to identify where AI agents are genuinely useful versus where gaps remain.
Judy Hanwen Shen and Alex Tamkin (2025): Ran a randomized experiment with 52 software engineers learning a new asynchronous programming library (Trio) with and without AI assistance. On a subsequent quiz assessing debugging, code reading, code writing, and conceptual concepts, the AI group scored 50 percent versus 67 percent for the no-AI group—a 17 percentage point gap (Cohen’s d = 0.74, p = 0.01). The largest gap appeared on the debugging questions. Despite the learning penalty, the AI group finished only about 2 minutes faster, a difference that did not reach statistical significance. The researchers identified six distinct interaction patterns: low scorers (under 40 percent) tended toward full delegation and iterative debugging with AI, while high scorers (65 percent and above) used AI for hybrid code-explanation queries or conceptual inquiry, preserving learning outcomes. The finding suggests AI can accelerate task completion while undermining the skill acquisition that makes future tasks possible.
Joel Becker, Nate Rush, Beth Barnes, and David Rein (2025): Study from METR. They had 16 experienced open-source developers complete 246 tasks (averaging two hours each) in mature repositories they knew well, randomizing whether AI tools were allowed. AI made developers 19 percent slower. Developers expected AI to speed them up by 24 percent and even after the study believed it had helped by 20 percent, demonstrating a substantial perception-reality gap. Important data point but notably small sample.
Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond (2025, Quarterly Journal of Economics): Studied the staggered rollout of a generative AI conversational assistant for customer-support agents at a Fortune 500 software firm, covering 5,172 agents. They find a 14-15 percent increase in issues resolved per hour on average, with much larger gains (30-35 percent) for less experienced agents. Highly skilled agents see minimal gains and in some cases slight quality declines. Workflow is text-heavy and measurable and the tool is naturally suited to retrieving “tacit” patterns from past conversations.
Kevin Zheyuan Cui, Mert Demirer, Sonia Jaffe, Leon Musolff, Sida Peng, and Tobias Salz (2025): Analyzed three randomized controlled trials at Microsoft, Accenture, and an anonymous Fortune 100 company encompassing nearly 5,000 developers. Their preferred weighted IV estimate shows a 26.08 percent increase in completed pull requests among developers using the AI tool, with a 13.55 percent increase in commits and 38.38 percent increase in builds. The build success rate fell by 5.53 percentage points, suggesting that raw output can rise while quality signals get noisier (the “guess-and-check” behavior noted in the writeup). Effects were larger for less experienced and more junior workers.
Fabrizio Dell’Acqua, Charles Ayoubi, Hila Lifshitz‑Assaf, Raffaella Sadun, Ethan Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim R. Lakhani (2025): The “Cybernetic Teammate” study examines generative AI reshaping teamwork and expertise in product‑development style work at Procter & Gamble, covering 776 employees. AI raises solution quality for individuals and teams, and can reduce the gap between technical and commercial specialists (more “balanced” ideas).
Lu Fang, Zhe Yuan, Kaifu Zheng, Dante Donati, and Miklos Sarvary (2025): Across seven separate field experiments (Fall 2023 to Summer 2024), they found AI chatbots for customer service increased sales by 16 percent, while AI-generated product descriptions increased sales by 2.05 percent.
Harang Ju and Sinan Aral (2025): Participants worked either with another human or with an AI on editing advertisements (2,310 participants creating 11,138 ads), then researchers tested the ads by measuring actual click-through rates. Human-AI teams show roughly 73 percent higher productivity per worker for ad copy, though human-human teams remained superior for images.
Doron Yeverechyahu, Raveesh Mayya, and Gal Oestreicher-Singer (2024): Examined the rollout of GitHub Copilot in 2021 when it supported some programming languages but not others. They found a 37-55 percent increase in commits, mainly through contributions building on others’ work like debugging or small edits.
Leonardo Gambacorta, Han Qiu, Shuo Shan, and Daniel Rees (2024): Studied CodeFuse, an AI coding agent at Ant Group (the financial arm of Alibaba). The outcome variable was lines of code. They show that 20 percent of the 55 percent total increase was directly attributable to AI-generated lines, suggesting genuine productivity effects beyond simple code generation. Junior employees saw larger gains; senior employees showed no significant effect.
Paradis et al. (2024): Google’s internal evaluation of AI code completion, smart paste, and natural language to code. They found a 21 percent reduction in time spent per task. Interestingly, more experienced developers saw bigger effects, possibly because AI requires substantial verification and judgment rather than simply accepting generated code.
Nicholas G. Otis, Rowan Clarke, Solène Delecourt, David Holtz, and Rembrand Koning (2023): A five‑month field experiment with 640 Kenyan small business entrepreneurs. Treated participants received access to a GPT‑4‑powered AI business assistant via WhatsApp; controls received a standard business guide. The paper reports no statistically significant average effect on revenues or profits. But effects are highly heterogeneous: high‑performing businesses at baseline appear to improve (roughly 15 percent), while low performers do worse (roughly 8-10 percent worse), implying a meaningful widening of the performance gap.
Fabrizio Dell’Acqua, Edward McFowland III, Ethan Mollick, Hila Lifshitz‑Assaf, Katherine C. Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R. Lakhani (2023): The “jagged frontier” study examined 758 BCG consultants working with GPT-4. For tasks within AI’s capabilities, consultants completed 12.2 percent more tasks, finished 25.1 percent faster, and produced results rated 40 percent higher quality. But for tasks outside the AI frontier, consultants performed 19 percentage points worse than those without AI access—apparently over-relying on a system that was confidently wrong.
Shakked Noy and Whitney Zhang (2023, Science): They recruited 444 college-educated professionals and assigned them occupation-specific writing tasks like press releases, randomizing access to ChatGPT. Treated workers completed tasks 0.8 standard deviations faster and produced output rated 0.4 standard deviations higher in quality by blinded human evaluators. Lower-performing workers saw the largest gains, suggesting AI compresses rather than widens the productivity distribution here.
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer (2023): The original GitHub Copilot study found developers completed a coding task (implementing an HTTP server in JavaScript) 55.8 percent faster with AI assistance.
Lång, Kristina, Viktoria Josefsson, Anna-Maria Larsson, Stefan Larsson, Charlotte Högberg, Hanna Sartor, Solveig Hofvind, Ingvar Andersson, Aldana Rosso (2023): Conducted a randomized controlled trial of AI-supported mammography screening versus standard double reading in 80,033 women across four Swedish screening sites. AI-supported screening achieved a 44.3% reduction in screen-reading workload while maintaining similar cancer detection rates (6.1 vs 5.1 per 1000 screened).
The Macro Studies
Jason Furman (2026): Analyzing new BLS data with upward revisions, Furman notes that nonfarm business sector labor productivity now sits 2.2 percent above the Congressional Budget Office’s pre-pandemic (January 2020) forecast. Annual productivity growth rates stand at 2.8 percent over one year, 2.5 percent over two years, and 2.2 percent over six years. Using peak-to-peak measurement to strip out cyclical distortions, the current cycle (2019Q4–2025Q4) shows 2.2 percent average annual productivity growth—the second-best since 1973, trailing only the dotcom era, though average for the full postwar period. Furman, previously skeptical that AI was showing up in aggregate data, now agrees with Brynjolfsson’s assessment, citing both the revised aggregate data and the accumulating business-level studies as evidence that AI may be contributing to the productivity gains.
Inaki Aldasoro, Leonardo Gambacorta, Rozália Pál, Debora Revoltella, Christoph Weiss, and Marcin Wolski (2026): Using data from over 12,000 European firms surveyed between 2019 and 2024, the authors estimate that AI adoption increases labour productivity by 4 percent on average across the EU. They find no evidence of reduced employment in the short run—workers in AI-adopting firms received higher wages both in aggregate and per employee. Complementary investments matter: each additional percentage point spent on workforce training added 5.9 percentage points to productivity gains, while software and data infrastructure added 2.4 percentage points. Adoption varies sharply by firm size (45 percent for large firms versus 24 percent for small) and by region (36 percent in financially developed countries like Sweden and the Netherlands versus 28 percent in Romania and Bulgaria). The authors use an instrumental variable strategy matching EU firms to comparable US firms to isolate causal effects.
Ivan Yotzov, Jose Maria Barrero, Nicholas Bloom, Philip Bunn, Steven J. Davis, Kevin M. Foster, Aaron Jalca, Brent H. Meyer, Paul Mizen, Michael A. Navarrete, Pawel Smietanka, Gregory Thwaites, and Ben Zhe Wang (2026): Present the first representative international survey of firm-level AI use, covering nearly 6,000 executives across the US, UK, Germany, and Australia. Around 70 percent of firms actively use AI, and over 66 percent of top executives use it regularly, averaging 1.5 hours weekly—though 25 percent of executives report zero AI use. Despite widespread adoption, over 80 percent of firms report no impact on employment or productivity over the past three years. Looking ahead three years, executives predict a 1.4 percent productivity boost and 0.8 percent output increase alongside a 0.7 percent employment reduction—while employees forecast a 0.5 percent employment increase. The gap between adoption rates and realized effects, combined with the expectation divergence between executives and employees, echoes the broader disconnect between micro-level AI potential and macro-level outcomes.
Erik Brynjolfsson (2026): Writing in the Financial Times, Brynjolfsson argues that the AI productivity take-off is now visible in US economic data. New BLS benchmark revisions show total payroll growth was revised downward by approximately 403,000 jobs, while real GDP remained robust at 3.7 percent in Q4—a decoupling of output from labor input that is the hallmark of productivity growth. His updated analysis suggests US productivity grew roughly 2.7 percent in 2025, nearly double the 1.4 percent annual average of the past decade. He frames this through the “J-curve” hypothesis: general-purpose technologies suppress measured productivity during an initial investment phase before entering a harvest phase. He notes that most businesses still use AI for narrow tasks like translation or summarisation, while a small cohort of power users compress weeks of work into hours by automating end-to-end workstreams with AI agents.
Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen (2026 note): In a note accompanying their “Canaries in the Coal Mine” paper, the authors address two questions raised about their findings. First, they rule out interest rates as an alternative explanation for declining entry-level hiring in AI-exposed fields—these occupations actually show negative correlation with interest rate sensitivity, and high interest rate jobs (like construction) have low AI exposure. Second, they clarify timing: using firm-time fixed effects, the relationship between AI exposure and employment decline becomes statistically significant starting in 2024, not late 2022 or 2023.
Morgan R. Frank, Alireza Javadian Sabet, Lisa Simon, Sarah H. Bana, and Renzhe Yu (2026): Using multiple U.S. datasets—monthly unemployment insurance records to measure occupation‑location unemployment risk, millions of LinkedIn profiles to track entry into AI‑exposed jobs, and millions of university syllabi to measure “AI‑exposed curricula”—they find that unemployment risk in AI‑exposed occupations rose beginning in early 2022, before ChatGPT’s release. LinkedIn data show graduate cohorts from 2021 onward entered AI‑exposed jobs at lower rates than earlier cohorts, with gaps opening before late 2022. At the same time, after ChatGPT, graduates taking more AI‑exposed curricula have higher first‑job pay and shorter job searches. Some “AI‑exposed” labor‑market deterioration may reflect pre‑existing trends. It doesn’t refute AI effects; it argues that careful identification (and pre‑trend checks) are essential, and it suggests LLM‑relevant education is already valuable.
My Take: The Frank et al. paper qualifies the interpretation of Brynjolfsson et al. My personal view is that 2022 is simply too early for AI to have a meaningful impact on jobs, the models were not good enough yet. At the same time, Brynjolfsson et al. will likely end up being correct, just with a starting point closer to 2024.
Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen (2025): The “Canaries in the Coal Mine” paper uses high‑frequency U.S. payroll microdata from ADP (monthly, individual‑level) through September 2025, linked to occupational measures of AI exposure. Early‑career workers (ages 22–25) in the most AI‑exposed occupations show large relative employment declines (on the order of 15–16 percent), even controlling for firm‑level shocks. The adjustment shows up more in employment than wages, and is concentrated in occupations where AI use appears more automating than augmenting. Employment for experienced workers is comparatively stable. Declining entry‑level headcount could be consistent with productivity improvements (similar output with fewer junior hours), especially if AI substitutes for routinized “apprentice” tasks. But it could also reflect other forces (post‑pandemic demand shifts, interest rates, hiring freezes). The data’s frequency and scale make it one of the clearest “early warning” signals in the macro literature.
Seyed Mahdi Hosseini Maasoum and Guy Lichtinger (2025): Using U.S. résumé data covering 62 million workers across 285,000 firms (2015–2025), they identify firm GenAI adoption using text analysis that detects “GenAI integrator” job postings. Following adoption, junior employment declines sharply in adopting firms relative to non‑adopters, while senior employment remains largely unchanged. The junior decline is concentrated in occupations most exposed to GenAI and appears driven by slower hiring rather than separations or promotions. This pattern is consistent with GenAI acting as seniority‑biased technological change: it substitutes for junior labor (or reduces the need for it) while leaving senior roles relatively intact. This paper complements the “canaries” results and highlights a concrete channel through which AI can affect aggregate outcomes: by changing hiring and career ladders.
Anders Humlum and Emilie Vestergaard (2025): They linked large-scale representative surveys of AI chatbot adoption in 11 exposed occupations in Denmark to administrative labor market records, examining outcomes through December 2024 (two years post-ChatGPT). Their difference-in-differences estimates show essentially zero effects on earnings and recorded hours at both worker and workplace levels, with confidence intervals ruling out effects larger than 2 percent. These null results hold for intensive users, early adopters, workplaces with substantial AI investments, workers reporting large productivity gains, flexible-pay occupations, and early-career jobs. They do find AI adoption is linked to occupational switching (roughly 4 percent of FTE employment) and task restructuring, but these changes have not translated into measurable earnings or hours changes. Their key insight: workers may be taking productivity gains as on-the-job leisure rather than producing more output.
Yale Budget Lab (continuous): The end of the write-up summarizes things well: “While anxiety over the effects of AI on today’s labor market is widespread, our data suggests it remains largely speculative. The picture of AI’s impact on the labor market that emerges from our data is one that largely reflects stability, not major disruption at an economy-wide level. While generative AI looks likely to join the ranks of transformative, general purpose technologies, it is too soon to tell how disruptive the technology will be to jobs. The lack of widespread impacts at this early stage is not unlike the pace of change with previous periods of technological disruption. Preregistering areas where we would expect to see the impact and continuing to monitor monthly impacts will help us distinguish rumor from fact.”
Penn Wharton Budget Model (2025): Projects AI’s contribution to total factor productivity growth at approximately 0.01 percentage points in 2025, an essentially negligible contribution. The small productivity blip in official macro series is consistent with early diffusion and the need for complementary investments. Their projections suggest meaningful macro effects are not here yet.
St. Louis Fed (2025): Back-of-envelope calculations suggest generative AI may have increased labor productivity by up to 1.1-1.3 percent since ChatGPT’s release. However, this estimate comes from self-reported time savings (”how long would this have taken without ChatGPT?”), which may substantially overstate actual productivity gains given the perception-reality gaps documented in the METR study. Identification challenges make clean attribution difficult.
Eisfeldt, Andrea L. , Gregor Schubert, Bledi Taska, and Miao Ben Zhang (2025): Constructed the first firm-level measure of workforce exposure to Generative AI and found that an “Artificial-Minus-Human” (AMH) portfolio earned 5% in the two weeks following ChatGPT’s release, with 678 occupations having on average 23% of their tasks exposed to Generative AI. This is based on expectations of financial markets (stock prices) rather than measured productivity.
Alexander Bick, Adam Blandin, and David J. Deming (2024): A large representative survey on AI uptake. Nearly 40 percent of working-age Americans use generative AI; 23 percent used it for work in the previous week. Self-reported time savings average about 5-6 percent of work hours among users, implying aggregate savings of roughly 1.4 percent across all workers (about 6.72 extra minutes per day). The authors note workers may simply be taking this as leisure rather than producing more output. Even if the time savings are real, it’s not automatically a macro productivity boom.
Babina, Fedyk, He, and Hodson (2023): They measure AI investment through the share of AI-skilled workers in firms’ workforces, constructed from 535 million resumes and 180 million job postings. Firms that invest more in AI grow substantially faster—a one-standard-deviation increase in AI workers corresponds to roughly 20% higher sales, 18% higher employment, and 22% higher market valuations over eight years. But none of this growth comes from productivity improvements. AI investments show zero relationship with sales per worker, TFP, or process patents. Instead, the entire effect runs through product innovation—more trademarks, more product patents, expanded product portfolios. AI-investing firms aren’t getting more efficient; they’re creating more stuff. The authors also find that AI’s benefits accrue disproportionately to larger firms (consistent with data being a key complementary input), and that AI-investing industries see increased concentration. The implication is that AI functions less like a cost-cutting automation technology and more like a prediction technology that reduces the costs of product development and experimentation—helping firms learn faster about what to make, not how to make it cheaper. Caveat: the study period stops in 2018, so does not include genAI.
Appendix
If you’re interested on how I automated the updating process for this post, see this post on X that walks through all of the steps (TL;DR: I used Claude Code).
References
• Aldasoro, Inaki, Leonardo Gambacorta, Rozália Pál, Debora Revoltella, Christoph Weiss, and Marcin Wolski. “How AI Is Affecting Productivity and Jobs in Europe.” VoxEU/CEPR (February 17, 2026).
• Becker, Joel, Nate Rush, Beth Barnes, and David Rein. “Measuring the Impact of Early‑2025 AI on Experienced Open‑Source Developer Productivity.” arXiv:2507.09089 (July 2025).
• Bick, Alexander, Adam Blandin, and David J. Deming. “The Rapid Adoption of Generative AI.” NBER Working Paper 32966 (September 2024; revised February 2025).
• Brynjolfsson, Erik. “The AI Productivity Take-Off Is Finally Visible.” Financial Times (February 15, 2026).
• Brynjolfsson, Erik, Bharat Chandar, and Ruyu Chen. “Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence.” Working paper (November 13, 2025).
• Brynjolfsson, Erik, Bharat Chandar, and Ruyu Chen. “Canaries, Interest Rates, and Timing: More on Recent Drivers of Employment Changes for Young Workers.” Stanford Digital Economy Lab (February 2026).
• Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. “Generative AI at Work.”Quarterly Journal of Economics140, no. 2 (2025): 889–942.
• Brynjolfsson, Erik, Daniel Rock, and Chad Syverson. “The Productivity J‑Curve: How Intangibles Complement General Purpose Technologies.”American Economic Journal: Macroeconomics13, no. 1 (2021): 333–372.
• Cruces, Guillermo, Diego Fernández Meijide, Sebastian Galiani, Ramiro H. Gálvez, and María Lombardi. “Does Generative AI Narrow Education-Based Productivity Gaps? Evidence from a Randomized Experiment.” NBER Working Paper 34851 (February 2026).
• Cui, Kevin Zheyuan, Mert Demirer, Sonia Jaffe, Leon Musolff, Sida Peng, and Tobias Salz. “The Effects of Generative AI on High‑Skilled Work: Evidence from Three Field Experiments with Software Developers.” Working paper (2025).
• David, Paul A. “The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox.”American Economic Review80, no. 2 (1990): 355–361.
• Dell’Acqua, Fabrizio, Charles Ayoubi, Hila Lifshitz‑Assaf, Raffaella Sadun, Ethan Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim R. Lakhani. “The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise.” Harvard Business School Working Paper 25‑043 / NBER Working Paper 33641 (March 2025).
• Dell’Acqua, Fabrizio, Edward McFowland III, Ethan Mollick, Hila Lifshitz‑Assaf, Katherine C. Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R. Lakhani. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality.” Harvard Business School Working Paper 24‑013 (September 2023).
• Eisfeldt, Andrea L., Gregor Schubert, Bledi Taska, and Miao Ben Zhang. “Generative AI and Firm Values.” Working paper (2025).
• Fang, Lu, Zhe Yuan, Kaifu Zheng, Dante Donati, and Miklos Sarvary. “Field Experiments on the Effects of Generative AI in E-commerce.” arXiv:2510.12049 (2025).
• Frank, Morgan R., Alireza Javadian Sabet, Lisa Simon, Sarah H. Bana, and Renzhe Yu. “AI‑exposed Jobs Deteriorated before ChatGPT.” arXiv:2601.02554 (January 2026).
• Furman, Jason. “Are We Finally Seeing AI in the Productivity Data?” X/Twitter thread. https://x.com/jasonfurman/status/2029559853706842426?s=20 (March 5, 2026).
• Gambacorta, Leonardo, Han Qiu, Shuo Shan, and Daniel Rees. “AI and Productivity: Evidence from CodeFuse.” BIS Working Paper 1208 (2024).
• Gommers, Jessie, Veronica Hernström, Viktoria Josefsson, Hanna Sartor, David Schmidt, Annie Hjelmgren, Aldana Rosso, Ingvar Andersson, Solveig Hofvind, Oskar Hagberg, and Kristina Lång. “Interval Cancer, Sensitivity, and Specificity Comparing AI-Supported Mammography Screening with Standard Double Reading without AI in the MASAI Study: A Randomised, Controlled, Non-Inferiority, Single-Blinded,Population-Based, Screening-Accuracy Trial.” The Lancet 407, no. 10527 (January 2026): 505–514.
• Hosseini Maasoum, Seyed Mahdi, and Guy Lichtinger. “Generative AI as Seniority‑Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data.” SSRN Working Paper (September 2025; revised November 2025).
• Humlum, Anders, and Emilie Vestergaard. “Large Language Models, Small Labor Market Effects.” NBER Working Paper 33777 (April 2025; revised October 2025).Ju, Harang, and Sinan Aral. “Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance.” arXiv:2503.18238 (2025).
• Lång, Kristina, Viktoria Josefsson, Anna-Maria Larsson, Stefan Larsson, Charlotte Högberg, Hanna Sartor, Solveig Hofvind, Ingvar Andersson, and Aldana Rosso. “Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study.” Lancet Oncology 24 (2023): 936–944.
• Noy, Shakked, and Whitney Zhang. “Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence.”Science381, no. 6654 (2023): 187–192.
• Otis, Nicholas G., Rowan Clarke, Solène Delecourt, David Holtz, and Rembrand Koning. “The Uneven Impact of Generative AI on Entrepreneurial Performance.” Harvard Business School Working Paper 24‑042 (December 2023).
• Paradis, Elise, Kate Grey, Quinn Madison, Daye Nam, Andrew Macvean, Vahid Meimand, Nan Zhang, Ben Ferrari-Church, Satish Chandra“How much does AI impact development speed? An enterprise-based randomized controlled trial.” arXiv:2410.12944 (2024).
• Peng, Sida, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot.” arXiv:2302.06590 (2023).
• Reimers, Imke, and Joel Waldfogel. “AI and the Quantity and Quality of Creative Products: Have LLMs Boosted Creation of Valuable Books?” NBER Working Paper 34777 (January 2026).
• Shen, Judy Hanwen, and Alex Tamkin. “How AI Impacts Skill Formation.” arXiv:2601.20245 (January 2025).
• Wang, Zora Zhiruo, Sanidhya Vijayvargiya, Aspen Chen, Hanmo Zhang, Venu Arvind Arangarajan, Jett Chen, Valerie Chen, Diyi Yang, Daniel Fried, and Graham Neubig. “How Well Does Agent Development Reflect Real-World Work?” arXiv:2603.01203 (2025).
• Yeverechyahu, Doron, Raveesh Mayya, and Gal Oestreicher‑Singer. “The Impact of Large Language Models on Open‑source Innovation: Evidence from GitHub Copilot.” arXiv:2409.08379 (2024).
• Yotzov, Ivan, Jose Maria Barrero, Nicholas Bloom, Philip Bunn, Steven J. Davis, Kevin M. Foster, Aaron Jalca, Brent H. Meyer, Paul Mizen, Michael A. Navarrete, Pawel Smietanka, Gregory Thwaites, and Ben Zhe Wang. “Firm Data on AI.” NBER Working Paper 34836 (2026).
Because this post will be continuously updated as new research comes in, paper summaries will be composed with the help of LLM models and manually checked for accuracy by me.








What this really highlights is that productivity doesn’t scale at the task level, it scales at the institutional level.
We’re seeing genuine task acceleration, but jobs are embedded in workflows, permissions, review cycles, liability, and coordination structures that haven’t moved. Until those layers change, micro gains get absorbed as slack, quality variance, or reallocation rather than output.
The Solow paradox wasn’t about computers being weak. It was about institutions being slow.
Thanks, Alex. I really enjoyed reading this. With another group of coauthors, we tried to bridge this macro-micro divide by looking at open source software but when a country banned it (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5332003).
Maybe this is worthy of a longer conversation, but I wonder if simple measures of adoption are not very informative. The best way I've heard it said is by Arvind Narayanan and Sayesh Kapoor:
Imagine an alternate universe in which people don’t have words for different forms of transportation, only the collective noun “vehicle.” They use that word to refer to cars, buses, bikes, spacecraft, and all other ways of getting from place A to place B. Conversations in this world are confusing. There are furious debates about whether or not vehicles are “environmentally friendly,” but (even though no one realizes it) one side of the debate is talking about bikes and the other side about trucks. There is a breakthrough in rocketry, but when the media focuses on how vehicles have gotten faster, people call their car dealer (oops, vehicle dealer) to ask when faster models will be available. Meanwhile, fraudsters have capitalized on the fact that consumers don’t know what to believe when it comes to vehicle technology, so scams are rampant in the vehicles sector.
Now replace the word “vehicles” with “artificial intelligence,” and we have a pretty good description of the world we live in.
If this is the case, then are we measuring adoption in a way that is helpful? Even in Shen and Tamkin (2025), the way the tool is used is so different across participants, is it even the same tool? I am asking these questions because I don't know the answers. And I am simultaneously trying to make progress in my other work on algorithmic hiring, where I think the challenges are similar. A resume screening tool is different from a voice interviewer is different from having candidates play games that are trained on performance of your current employees. But we are calling these 'algorithmic hiring' and trying to draw broad conclusions over inherently different technologies.