Microsoft says its automated ai red teaming tool finds malicious content in a matter of hours

From ChatGPT to Gemini: how AI is rewriting the internet

See all Stories

Posted Feb 23, 2024 at 5:37 PM UTC

Microsoft says its automated AI red teaming tool finds malicious content “in a matter of hours.”

PyRIT, or Python Risk Identification Toolkit, can point human evaluators to “hot spot” categories in AI that might generate harmful prompt results.

Microsoft used PyRIT while redteaming (the process of intentionally trying to get AI systems to go against safety protocols) its Copilot services to write thousands of malicious prompts and score the response based on potential harm in categories that security teams can now focus on.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Emilia David

Loading comments

Getting the conversation ready...

Most Popular

More in AI

Top Stories

Most Popular

The Verge Daily

More in AI

Top Stories