MeeefTCD (Massive Enhanced Extracted Email Features Tailored for Cosine Distance) data set

Although the MeeefTCD data set is prepared by Farshad Barahimi using some data processing code he has written to convert some of the emails in a folder structure to data features per email, the original emails used to create the data set does not belong to him but were publicly available in a public data set. The folder structure of the emails used, does not belong to him either but were publicly available in a public data set.

The structure of the MeeefTCD.csv in the meeeftcd.zip is as follows:
2400 lines where each line contains 48558 numbers separated by commas. The first number of each line is a label number (between 0 and 7) and the rest of the numbers on the line specify 48557 data features for the email which was used for that line.

For the original emails data sets used to create the data set visit https://www.cs.cmu.edu/~enron/ and https://www.cs.cmu.edu/~enron/enron_mail_20150507.tar.gz ( Farshad Barahimi is not affiliated with https://www.cs.cmu.edu/~enron/ or https://www.cs.cmu.edu/~enron/enron_mail_20150507.tar.gz ).

Label numbers used in MeeefTCD data set: