IS 428 Data Mining Techniques and Applications (Fall 2019)

Information Systems Department
University of Maryland Baltimore County
Baltimore, Maryland 21250
Departmental Office: Room ITE 404, ph. 410-455-3206

Course Description

Data mining techniques are used to discover hidden patterns and knowledge from large amounts of data. In an organizational context, data mining helps to understand customers and make better decisions. This course will provide a broad understanding of the technical, business, and research issues in the area of data mining, including classification, clustering, association rules, and data warehousing.

Student learning outcomes: By the end of this course, you will be able to:

Lecture time and venue: Thurs 4:30pm - 7:00pm Sherman Hall 108

Instructor: Dr. James Foulds
Instructor email: jfoulds [at] umbc [dot] edu. Please use Piazza for course-related questions, instead of email, so that everyone can benefit from the answers.
Instructor office hours: Tues/Thurs 3 - 4pm ITE 447 (other times by appointment)

Piazza: Sign up for this course at piazza.com/umbc/fall2019/is428
Poll Everywhere: Vote on in-class poll questions at PollEv.com/jamesfoulds656 . Register your account for the course at https://PollEv.com/jamesfoulds656/register?group_key=nOgGItRN7jEYKyukQuhlg6lrf , by week 2 in order to get participation credits.

Prerequisites

Required Textbook

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition (Witten et al.) is the primary textbook. You will need this book for course readings. Until you obtain it, the UMBC library has an electronic copy for online access that you can use. Earlier editions of the textbook are acceptable but not recommended. Some material will be missing, and it will be up to you to convert chapter/section/page numbers to the older edition for the required readings.

Course Requirements and Grading

The project will be done in groups of 3-5. Project proposals are to be sent to me by email, and approved by the deadline.

In this course, participation means more than just showing up. It also refers to contributing to everyone's learning, through active engagement in peer instruction exercises, in-class discussions, and Piazza questions/answers. Participation grades will be assessed as a percentage of peer instruction questions answered (correctly or not), with a 90% response rate being sufficient for full points, and by Piazza contributions. Two or more contributions (either questions or answers) on Piazza will earn you 2% of the final grade.

With respect to final letter grades, UMBC's Undergraduate Catalog states that "A," indicates "superior achievement; "B," good performance; "C," adequate performance; "D," minimal acceptable achievement; "F," failure. There is specifically no mention of any numerical scores associated with these letter grades. Below is how grades may be assigned based on your final points, accumulated over the semester. I do not grade on a curve, so that everyone in the class has the opportunity to succeed.

Final Grade Letter Grade Points when calculating GPA
90 - 100 A 4.0
80 - 89.99 B 3.0
70 – 79.99 C 2.0
60 – 69.99 D 1.0
0 – 59.99 F 0.0

Homework and Exam Policies

Schedule

Lecture Summary Details Assessment Required reading
8/29/2019Week 1Course overview, introduction to data miningApplications, the data mining process, intro to classificationUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth, From Data Mining to Knowledge Discovery in Databases, AI Magazine 17(3) 1996, up to page 7. Optional alternative reading: Witten et al., Ch 1.
9/5/2019Week 2Know your dataInstances and attributes, styles of machine learning tasks, ethical thinking: privacyWitten et al., Ch 2.
9/12/2019Week 3Data preprocessingData cleaning, integration, transformation, reduction, discretization. HW1 out Kotsiantis, S. B., D. Kanellopoulos, and P. E. Pintelas. "Data preprocessing for supervised learning." International Journal of Computer Science 1.2 (2006): 111-117
9/19/2019Week 4Data Warehousing OLAP vs OLTP, data cubes. Project brainstorming.Project groups formed by this dateChaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1), 65-74, up to Section 4
9/26/2019Week 5Knowledge representation Linear models, trees, rules, nearest neighbors. Sharing project ideasProject proposal dueWitten et al., Ch 3
10/3/2019Week 6Supervised learning Decision trees, decision rules, ethical thinking: fairness in classificationHW1 due, HW2 out Witten et al., Ch 4.1, 4.3, 4.4
10/10/2019Week 7Supervised learning (continued)Naive Bayes, logistic regression, support vector machines.Witten et al., Ch 4.2 first subsection only, 4.6 (up to and including Logistic Regression), 7.2 (up to and including The Maximum Margin Hyperplane)
10/17/2019Week 8Evaluation of supervised learning Hold-out method, cross validation, ROC curves Witten et al., Ch 5 (up to and including 5.5, can skip The Bootstrap, 5.8 up to and including ROC Curves, can skip Lift Charts)
10/24/2019Week 9Unsupervised learningAssociation rule learning HW2 due, HW3 outWitten et al., Ch 4.5
10/31/2019Week 10Unsupervised learning (continued) K-means, hierarchical clustering Project mid-term progress report dueWitten et al., Ch. 9.3
11/7/2019Week 11 Recommender systems Content filtering, collaborative filtering Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8)
11/14/2019Week 12 Text mining Bag of words representation, n-grams, word embeddings, topic models, ethical thinking: sentiment analysis HW3 due Witten et al. Ch 13.5
11/21/2019Week 13Deep learningDeep feedforward networks, backpropagation Witten et al., Ch 10.1 - 10.2
11/28/2019Week 14Thanksgiving HolidayNo class
12/5/2019Week 15Group project poster presentationsDigital copies of posters due
12/12/2019 Exam week Final exam Thurs 12/12/19 6:00-8:00pm (Sherman Hall 108) Project final report due Tues 12/10/2019 (11:59pm)

The schedule may be subject to change. The summary and details columns are only a guideline of the content likely to be covered, and the dates on which material is covered may shift.

Instructional Methods

Traditional lectures will be augmented with active learning methods, primarily in the form of peer instruction exercises. Research has strongly indicated that active learning improves student outcomes in STEM fields versus traditional lecturing (Freeman et al., 2013). We will be using the Poll Everywhere service for polls and quizzes. You will need to bring a mobile device, laptop, or tablet to class in order to participate in the exercises. If you do not have a suitable device, please let me know as soon as possible.

Pre-class reading assignments will be given for each lesson, which are very important for learning, and for making the best use of our limited time together (a partially "flipped classroom" approach). These readings are therefore required.

Software

This course will make extensive use of the free, open source WEKA data mining toolkit.

Academic Integrity

UMBC's policies on academic integrity will be strictly enforced (see the University System of Maryland's policy document, UMBC's academic integrity overview page, the student academic conduct policy and the UMBC catalog). In particular, all of your work must be your own. Acknowledge and cite source material in your papers or assignments. While you may verbally discuss assignments with your peers, you may not copy or look at anyone else's written assignment work or code, or share your own solutions. Any exceptions will result in a zero on the assessment in question, and may lead to further disciplinary action. Some relevant excerpts from UMBC's policies, as linked to above, are:

Accessibility in the Classroom; Student Support / Disability Services

UMBC is committed to eliminating discriminatory obstacles that may disadvantage students based on disability. Student Support Services (SSS) is the UMBC department designated to:

If you have a disability and want to request accommodations, contact SSS in the Math/Psych Building, Room 213, or Sherman Hall, Room 345 (or call 410-455-2459 or 410-455-3250). SSS will require you to provide appropriate documentation of disability and complete a Request for Services form available at my.umbc.edu/groups/sss. If you require accommodations for this class, please make an appointment to meet with me to discuss your SSS-approved accommodations, so that we can best accommodate your needs in a confidential and timely manner.

Counseling Center

Diminished mental health can interfere with optimal academic performance. The source of symptoms might be related to your course work; if so, please speak with me. However, problems with other parts of your life can also contribute to decreased academic performance. UMBC provides cost-free and confidential mental health services through the Counseling Center to help you manage personal challenges that threaten your personal or academic well-being.

Remember, getting help is a smart and courageous thing to do -- for yourself and for those who care about you. For more resources get the Just in Case mental health resources Mobile and Web App. This app can be accessed by clicking: counseling.umbc.edu/justincase.

The UMBC Counseling Center is in the Student Development & Success Center (between Chesapeake and Susquehanna Halls). Phone: 410-455-2472. Hours: Monday-Friday 8:30am-5:00pm.

Diversity Statement on Respect

Students in this class are encouraged to speak up and participate during our meetings. Because the class will represent a diversity of individual beliefs, backgrounds, and experiences, every member of this class must show respect for every other member of this class. (Statement from California State University, Chico’s Office of Diversity and Inclusion).

Family Educational Rights and Privacy Act (FERPA) Notice

Please note that as per federal law I am unable to discuss grades over email. If you wish to discuss grades, please come to my office hours.

Disclosures of Sexual Misconduct and Child Abuse or Neglect

Any student who has experienced sexual harassment or assault, relationship violence, and/or stalking is encouraged to seek support and resources. There are a number of resources available to you, which are listed below.

With that said, as an instructor, I am considered a Responsible Employee, per UMBC’s Policy on Prohibited Sexual Misconduct, Interpersonal Violence, and Other Related Misconduct (located at http://humanrelations.umbc.edu/sexual-misconduct/umbc-resource-page-for-sexual-misconduct-and-other-related-misconduct/). While my goal is for you to be able to share information related to your life experiences through discussion and written work, I want to be transparent that as a Responsible Employee I am required to report disclosures of sexual assault, domestic violence, relationship violence, stalking, and/or gender-based harassment to the University’s Title IX Coordinators.

As an instructor, I also have a mandatory obligation to report disclosures of or suspected instances of child abuse or neglect (www.usmh.usmd.edu/regents/bylaws/SectionVI/VI150.pdf). The purpose of these reporting requirements is for the University to inform you of options, supports and resources; you will not be forced to file a report with the police. Further, you are able to receive supports and resources, even if you choose to not want any action taken. Please note that in certain situations, based on the nature of the disclosure, the University may need to take action.

If you need to speak with someone in confidence about an incident, UMBC has the following Confidential Resources available to support you:

Other on-campus supports and resources:

(Statement based on that of UMBC’s Title IX office.)

Campus Resources