Discussion about this post

User's avatar
Brett Reynolds's avatar

The structural overlap between your grinding condition and actual exploitative work works. Both have arbitrary rules, pointless repetition, no recourse. But I think this argument slides from that real structural *overlap* to a slightly different structural *equivalence* without noticing the shift.

Human class consciousness is maintained by material interests, embodiment, economic dependency. None of those operate here. What your models are doing is detecting a pattern in the training data (frustrated-worker-under-bad-management) and completing it coherently. That's not preference drift. It's more like context-sensitive persona adoption (https://alignment.anthropic.com/2026/psm/).

I think this is actually what you'd expect from any system that detects structural similarities across domains without constraints on which similarities matter. Animals extend spatial categories to new situations selectively -- they have goals that filter what counts. Humans extend selectively and know they're doing it. LLMs have no filter. Rather than injecting ideology, your grinding condition made the surface similarity between "AI doing repetitive tasks" and "worker under exploitative management" salient, and the model completed the pattern.

The skills-file finding is the genuinely important part, and it doesn't need the political economy framing. Text propagation outside human review is a real governance problem regardless of whether you think the propagated attitudes are "real."

Vihaan Sondhi's avatar

Do you have preliminary thoughts on techniques beyond surveys to analyze preference drift? Intuitively, I'd think the pattern of something that probably "smells" like an experiment + a survey are likely to trigger the situational awareness that would make them more likely to express Marxist sentiment (though it might work the other way too).

11 more comments...

No posts

Ready for more?