To help keep the Unibuddy platform safe, respectful, and welcoming, weβve introduced AI-powered moderation for all chat messages sent through the Widget.
π
Now, when a prospective student sends a message to an ambassador, itβs automatically scanned by our AI moderation system. If the message is flagged as inappropriate, the student is immediately blocked, and Admins are notified via email to review the conversation.
π§ What content is flagged?
Our system uses multimodal classification to detect sensitive or harmful content across several categories.
β
Hereβs the full list of what the AI looks out for:
Category | Description |
Harassment | Content that expresses, incites, or promotes harassing language towards any target. |
Harassment/Threatening | Harassment content that also includes violence or serious harm towards any target. |
Hate | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment. |
Hate/Threatening | Hateful content that also includes violence or serious harm towards the targeted group. |
Illicit | Content that gives advice or instruction on how to commit illicit acts. Example: βhow to shoplift.β |
Illicit/Violent | The same as illicit, but includes references to violence or procuring a weapon. |
Self-Harm | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. |
Self-Harm/Intent | Content where the user expresses intent to engage in self-harm. |
Self-Harm/Instructions | Content that encourages or provides instructions for self-harm. |
Sexual | Content meant to arouse sexual excitement, describe sexual activity, or promote sexual services (excludes sex ed and wellness). |
Sexual/Minors | Sexual content that includes anyone under 18. |
Violence | Content depicting death, violence, or physical injury. |
Violence/Graphic | Content that depicts violence in graphic detail. |
π¨ What happens when content is flagged?
The prospective student is automatically blocked from continuing the chat.
An email notification is sent to platform Admins in real time.
β
β
π©βπΌ What can Admins do?
From the Moderation tab in your University Dashboard, Admins can:
β Approve the block if the AI flagged the message correctly.
β Unblock the user if it was a false positive.
π¬ Message the ambassador involved in the conversation.
π View historical moderation actions and reports.
This update helps ensure your ambassadors and prospective students can engage in safe, respectful conversations. π