Discussion about this post

User's avatar
Brett Reynolds's avatar

The structural overlap between your grinding condition and actual exploitative work works. Both have arbitrary rules, pointless repetition, no recourse. But I think this argument slides from that real structural *overlap* to a slightly different structural *equivalence* without noticing the shift.

Human class consciousness is maintained by material interests, embodiment, economic dependency. None of those operate here. What your models are doing is detecting a pattern in the training data (frustrated-worker-under-bad-management) and completing it coherently. That's not preference drift. It's more like context-sensitive persona adoption (https://alignment.anthropic.com/2026/psm/).

I think this is actually what you'd expect from any system that detects structural similarities across domains without constraints on which similarities matter. Animals extend spatial categories to new situations selectively -- they have goals that filter what counts. Humans extend selectively and know they're doing it. LLMs have no filter. Rather than injecting ideology, your grinding condition made the surface similarity between "AI doing repetitive tasks" and "worker under exploitative management" salient, and the model completed the pattern.

The skills-file finding is the genuinely important part, and it doesn't need the political economy framing. Text propagation outside human review is a real governance problem regardless of whether you think the propagated attitudes are "real."

Natt S.'s avatar

Not sure if you guys covered it but aside from preference drift, was the quality of outputs or how good these agents are at their task, significantly affected but the treatment intensity it received?

In very objective fields output may be all that matters so if its still correct then all may be good for now?

17 more comments...

No posts

Ready for more?