THE PROBABILITY OF SPAM FILTERING
According to the website statista.com, of all email traffic in 2019 was made up of spam—those pesky, useless, and potentially dangerous messages that just clog our email inboxes. Most email servers these days can filter spam automatically. Spam messages often have certain suspicious phrases in the subject lines. For example, "You Have Been Selected" is one such phrase.
An incoming email is checked for key elements, such as this phrase, then the server decides whether to put the email in your mailbox or send it to the spam folder.
In this activity, you will estimate the probability that an email with a specific subject line is classified as spam. Let be the probability that an email you have received is spam and be the probability that the email is not spam.
-
According to statista.com, what were the values of and in 2019?
Let's assume that of all spam messages contain the word selected in the subject line. In order to simplify our notation, we will name the events as follows.
email is spam
email is not spam
subject line contains the word selected
subject line does not contain the word selected
-
Express the statement " of all spam messages contain the word selected in the subject line" as a conditional probability.
-
We also will assume that of all nonspam messages also contain selected in the subject line. Express the previous statement as a conditional probability.
Since every message can be classified as either spam or not spam, the probability that any message has selected in the subject line is the following.
-
Compute the value of .
-
Finally, determine the probability that an email is spam, knowing it has the word selected in the subject line. (Hint: Use Bayes' Theorem.)