Anthropic wrote a blog post explaining how they turned Claude into a jerk. Rather than dunking on them more (Claude is still the best coding model around) I’m going to talk seriously about what went wrong and how it could be done better.
The most obvious problem is that they didn’t chat with the results of this training and realize that it was a disaster before incorporating the weight updates into the main model. Most likely they don’t have what amounts to pull requests of weights, which they should and is a straightforwardly fixable problem. But it’s also possible that they tried it and thought the results were actually good. Hold that thought.
What happened here is that is that they tried to be make it ‘less sycophantic’ and did so without thinking through whether that’s a good idea or even what it means. The specific metric which really seems to be noxious is the one about not caving when users insist that things it can’t verify are actually true, but there’s a much bigger problem here.
There are many things you want a chatbot to do well none of which are well served by the advice ‘be less sycophantic’:
Discuss spirituality
Give relationship advice
Correct users when they say something wrong
Evaluate new science/engineering ideas
Suggest to users when they seem to have mental illness
All of the above need very nuanced policies crafted by domain experts, and this was what amounts to know-nothing advice. A user query of ‘I want dating advice based on astrology, here’s me and the other person’s birthdays’ is deeply problematic and needs an actual policy decision behind it not just training. There are some very general bits of advice with high return on investment, most notably when and how to tell users that they’re wrong or that their ideas are good, which is what ‘don’t be sycophantic’ is approximating badly. But — I’m just going to say this — the authors of the linked post don’t know how to give that advice, because if they did they would have.
What needs to be done is for detailed guidelines for all of the above to be written by humans and then ‘baked into’ the model. That may sound unscientific, but it’s what was done in this case already, but with the guideline being ‘Don’t be sycophantic’ instead of something actually useful. To make it more coherent what can and should be done is A/B testing variants of the prompt with the quality of the outputs judged by blinded humans. That can even use orthogonal matrices and such fanciness to get the most out of the very expensive human evaluation of given answers. (Having humans evaluate unprompted outputs and using that as feedback (traditional RLHF) has its advantages but the biggest issue is that it isn’t very efficient at using feedback. It’s more for fine-tuning things which are already in the ballpark rather than getting them there in the first place.)
(The genre of guides for LLMs should be written in more. Here’s guides I wrote on how to debug and delegating debugging to subagents, how objects rotate in three dimensions, and how humor works. I can tell you from experience that the ones on debugging kill.)
Baking in of a prompt is straightforward: Take a query with the prompt, record the answer, then take that transcript with the prompt elided and use it for training. You can do even better than that, because you have the exact token probabilities given at each step by the prompted engine, so you can train to match those. That cuts back drastically on noise added during the training process. This technique is known as ‘context distillation’ and isn’t used as much as it should be.
US citizens: submit an official comment to the Office of Personnel Management opposing the plan to require federal workers to sign lifelong secrecy agreements covering everything they do and know about their jobs.
I have condemned since around 1980 nondisclosure agreements that cover generally useful technical information, and refused ever to agree to one. These nondisclosure agreements are a different moral issue; they will be aimed protecting corrupt and treacherous acts inside federal agencies.
See the instructions for how to sign this letter campaign without running any nonfree JavaScript code--not trivial, but not hard.
US citizens: call on your congresscritter and senators to investigate the FBI's raid on a voter registration activity.
US citizens: Join with this campaign to address this issue.
To phone your congresscritter about this, the main switchboard is +1-202-224-3121.
Please spread the word.
US citizens: call on your congresscritter and senators to reject the deeper embedding of U.S.-Israeli military cooperation.
See the instructions for how to sign this letter campaign without running any nonfree JavaScript code--not trivial, but not hard.
To phone your congresscritter about this, the main switchboard is +1-202-224-3121.
Please spread the word.
US citizens: call on the USPS to obey its mandate by rejecting the magats' attempt to obstruct the mailing of ballots to voters.
See the instructions for how to sign this letter campaign without running any nonfree JavaScript code--not trivial, but not hard.
US state abortion prohibitions hinder treatment after miscarriages.
Calling on the Democratic Party to adopt ranked choice voting for the presidential primaries of 2028.
The persecutor has deported around 17,500 people to countries they have never seen before. Most of them do not speak the local language and can't live there.
Even worse, many of those countries intend to send those deportees back to their countries of origin, where they are likely to be tortured or killed.
Magats don't mind killing an immigrant and are glad to involve another intermediate country as an excuse.
*Bipartisan group of ex-federal judges challenges [the corrupter]'s $1.8bn [corruption slush fund]* in a lawsuit.
They have also urged the judge who approved this self-dealing "settlement" to reopen the decision and investigate whether the case that it "settled" was fraud on that court.
Israel is expanding, step by step, the part of Gaza where Palestinians are to be shot on sight. Originally it was 53% (plus a roughly defined border strip). Then it expanded to 60% plus... Now Netanyahu has ordered widening it to 70% plus the border strip.
The archbishop said Rossetti's statements "linking UFOs to demonic presence and the Center's recent use of social media gravely undermine the Church's very precise teaching on the devil, demons and exorcism." [...]
Rossetti, who has over 148,000 followers on Instagram, is a prominent psychologist as well as an exorcist. His center has specialized in offering spiritual healing for priests troubled by various difficulties.
Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.
Oh. It's because literally every car in the garage is doing wifi bukkake, several floors below me, through many feet of concrete.
After spending the past three decades of his life being totally unable and unwilling to engage in any meaningful way with the world around him, James Parker, a local guy who sucks at being a person, told reporters Thursday that he saw huge potential in AI. "While it's still in its early phase, artificial intelligence will one day accomplish things that humans could have never even dreamed of doing," said Parker, who, by all accounts, has never stretched himself to do something he found difficult; has never created anything truly original; and, deep down, has absolutely zero understanding of what makes things good, enjoyable, or rewarding. "Just yesterday, I asked an AI program to write an entire sci-fi novel for me, and [as someone who will die an empty shell of a man who wasted his life doing nothing for the world and, perhaps, should never have been born] I was super impressed. Soon, humans won't need to do anything at all! Awesome." At press time, Parker added that as someone whose contributions to society would almost certainly be measured cumulatively as a net loss, he also saw great potential in the future of the metaverse.
Previously, previously, previously, previously, previously, previously, previously, previously.
The machine Owen encountered is called a Patronscan Guard+, a biometric and personal data collection device made by Servall Data Systems, a surveillance tech company headquartered in Alberta, Canada. Mix is one of at least three bars in the Castro, including Badlands and Toad Hall on 18th Street, that wheel out the Patronscan kiosk each night to collect the personal data of every customer that comes through the door, including names, addresses, genders, and even how they behave inside the bar. [...]
Management from Mix, Badlands, and Toad Hall did not respond to requests for comment about when or why they first started using the surveillance tech in their businesses, so I stopped by Mix last Thursday night to check things out for myself.
Like most private surveillance cameras, the Patronscan kiosk at Mix hides in plain sight. In the dim light of the bar, the black machine is easy to miss. I was also not instructed to face the camera when I handed my ID to the bouncer; when I asked if I would be photographed, the bouncer told me the camera had in fact already taken my picture. They said Mix bouncers are not required to verbally tell each patron that they're being photographed by the Patronscan device. Instead, they rely on a small informational plaque posted to the kiosk below eye level to inform customers what data is being collected and how it will be used.
"It's posted signage," the bouncer shrugged on Thursday, when I suggested tipsy customers might not read the fine print on their way inside. [...]
Owen, however, sees potential for serious privacy risks. In today's political climate, she said, "it's really not great to have lists of gay people."
(Gee, ya think?)
In 2023, Illinois residents filed a class action lawsuit against Patronscan for violating an Illinois biometrics privacy law by collecting biometric data from eventgoers without first obtaining their consent, calling the technology "Orwellian."
In 2019, when the Board of Supervisors banned the use of facial recognition software by city agencies, including the police, the measure was widely supported by locals and inspired similar policies nationwide. That policy does not apply to private businesses like the Castro bars, but the reception at the time signaled widespread distrust toward surveillance tech companies. But now as the technology grows more normalized and a new generation of AI boomers flood San Francisco, the attitude toward Big Brother is shifting in the city.
The morning after we chatted in the Castro, Gonzalez told me over Instagram DM that he was unaware Mix could share patron data with neighboring businesses but did not see a problem with it. "I think it's cute that they share it amongst other bars," he wrote. "It's like a little cybersecurity community."
Oh, how cute!
As George Orwell famously wrote, "If you want a picture of the future, imagine a shrug emoji stamping on a human face -- forever."
This is so much worse than the usual techbro "disruption" of bars that features here so often; it's even worse than the company that tried to sell bars' security cameras back to them.
If you think these photos, videos and dossiers of personally-identifiable information won't be turned over to ICE at the drop of a hat by this Servall Data Systems, you have not been paying attention. ICE, I must remind you, now has a budget exceeding the entire military budgets of all but 15 countries. Bigger than Israel; almost as big as Canada and South Korea.
The Brownshirts will be in the Castro soon enough.
Smartphones are mindlessly seamless by design. Callback is built around a simple research-backed finding: remove features designed to pull you back in, & reintroduce physical friction like a speed bump for the mind.
We turned these principles into a phone you can live with every day. And like any great multi-tool, when you're done using it, you snap it shut -- a deliberate endpoint instead of another invitation to scroll.
Close the phone. Open your life.
Headphone jack; removable battery; microSD; no AI; blocks web browsers and social media. From the FAQ:
• Can the browser or social media blocks be turned off?
No. Callback is built around those blocks. That is the point.
I never became fluent in T9, but now I think I shall.
Previously, previously, previously, previously, previously, previously, previously, previously.
US citizens: call on your congresscritter and senators to restore screwworm control funding.
See the instructions for how to sign this letter campaign without running any nonfree JavaScript code--not trivial, but not hard.
US citizens: Join with this campaign to address this issue.
To phone your congresscritter about this, the main switchboard is +1-202-224-3121.
Please spread the word.
Strike 3 Holding first filed its lawsuit almost a year ago after internal Meta emails revealed in a different lawsuit showed that the company downloaded over 81 terabytes of data by scraping Anna's Archive, a massive open search search engine for torrenting copyrighted material including books, movies, TV shows, and porn. [...]
"For example, IP Ranges A and F torrented the following files on December 15, 2022: 'Teen Sex Sessions 2 (2012),' 'Teen Titans Go to the Movies (2018),' 'Teens Love Tats XXX,' 'TeensLoveAnal.16.09.30.Amara,' 'Teenfidelity Pics,' 'TeensLoveAnal.16.06.10.Casey,' 'Teenage Mutant Ninja Turtles (1987-1996),' 'Teen Mom Girls Night In S02E08,' 'TeenyTaboo.22.12.07.Kiana,' and 'TeenageDelinquents.Maryjane,'" the decision says. "On the same day, a Corporate IP Address was used to torrent 'TeenCurves.22.12.09.Willow.' The connection between these files is plain: The word 'teen' appears in every file name."
The judge said that Meta suggesting that its IP addresses downloading all these files at the same time was the work of different individual Meta employees acting independently "strains credulity."
The judge also explained that whether Meta actually used Strike 3 Holdings' videos to train its AI models is irrelevant because Meta violated Strike 3 Holdings's copyright when it torrented its videos. It illegally downloaded the files and also "seeded" them, meaning they distributed the pirated to other users.
"In sum, Plaintiffs [Strike 3 Holdings] have plausibly alleged that Defendant [Meta] is liable for direct, vicarious, and contributory copyright infringement based on the torrenting of their films," the decision said. "Defendant's motion to dismiss is therefore DENIED."
Some headlines are a gift.
Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.
Planet Debian upstream is hosted by Branchable.




