Introduction to Data Mining CSE5/7331 & EMIS5/7332

Course Information

Lecture Time:
TTh 2:00PM-3:30PM, Junkins 0204
Office Hours:
F 1:00PM-3:00PM, Caruth 451
Text:
Introduction To Data Mining by Tan, Steinbeck, and Kumar
First Edition, ISBN-10: 0321321367
TA:
None
Instructor Webpage:
eclarson.com
          
This class introduces the processes of managing, exploring, visualizing, and acting on large amounts of data. This course provides an introduction to data-mining techniques (classification, regression, association and cluster analysis) used in analytics. All material covered will be reinforced through hands-on experience using state-of-the art tools to design and execute data mining processes. Class examples will come from Python and R. Pre-requisite courses for this class include basic statistics and probability, and introductory algorithm analysis (or desire to learn quickly). Experience with databases is helpful but not required.

Assignments will use python programming language but can also be completed in R. Lecture may also contain guest speakers (in person or video) giving short demonstrations and/or presentations.


Learning Outcomes

This course is constructed to help students design and use machine learning and data mining techniques. Students will hone their abilities to analyze data, visualize and explain data, and predict outcomes using various learning algorithms. Various techniques for mining rules and building production ready recommendation systems will be discussed. Tools for working with massively parallel systems will be discussed, as well as techniques for working with massive data sets. Finally, students will learn to communicate ideas about these technical areas effectively.

Topics covered include:
  • Data analysis in python (scikit-learn, pandas, dato, and ipython)
  • Visualization using matplotlib, seaborn, and mpld3
  • Feature dimension reduction and manipulation
  • Linear and non-parametric regression
  • Classification techniques
  • Gradient-based optimization techniques
  • Clustering techniques
  • Recommender systems
  • Association rule mining

Assignments

Lab Assignments. Periodically lab assignments will be submitted electronically. Lab assignments must be completed individually. Late labs will not be accepted. Lab assignments should be turned in as HTML webpages with all images either in a zipped directory with the master HTML file or embedded in the HTML. Using iPython is an extremely efficient means of completing the assignment and keeping an HTML archive. Most assignments are turned in during a week where formal lecture does not take place. Use this extra time to complete extended or time consuming analyses of the data. There is a high expectation for these assignments. Comment code and explain reasoning in the HTML document.

In Class Assignments. Periodically, there will be video lectures to watch before class time. After video lectures, we will use class time to complete an assignment. The specifications for the assignment will be given at the start of class and the assignment will then be turned in at the end of class (i.e., a flipped classroom). Students can work individually or in teams and turn in the in-class assignments at the end of the class. Come prepared to work! If working as a team, all team members will need to be present to receive a grade. An absence during an in-class assignment cannot be made up after the in-class assignment. In class assignments should be turned in using HTML with all images and source code embedded (exactly like lab assignments).

Grading

Students will be evaluated based upon their  lab assignments and in class assignmnets, as follows:
Biweekly lab assignments: *
75% of grade (3 labs @ 20% each, 1 lab @ 15%)
In Class Assignments: 
25% of grade (5 at 5% each)

Absences

Class attendance is required. Students with three absences or less, who actively participate in class, will not receive any deductions for his/her absence. Starting with the fourth absence, 2% points from the final grade will be deducted for each absence (over the initial three absences).  Please note: Rarely are these measures needed!

Cheating

Cheating of any kind such as plagiarism or direct copying is strictly prohibited and against the SMU honor code. However, collaboration is strongly encouraged. Most lab assignments can be done as a group and turned in as a group. 

Disability Accommodations

Students needing academic accommodations for a disability must first be registered with Disability Accommodations & Success Strategies (DASS) to verify the disability and to establish eligibility for accommodations. Students may call 214-768-1470 or visit http://www.smu.edu/alec/dass.asp to begin the process. Once registered, students should then schedule an appointment with the professor to make appropriate arrangements.

Religious Observance

Religiously observant students wishing to be absent on holidays that require missing class should notify their professors in writing at the beginning of the semester, and should discuss with them, in advance, acceptable ways of making up any work missed because of the absence. (See University Policy No. 1.9.)

Excused Absences for University Extracurricular Activities

Students participating in an officially sanctioned, scheduled University extracurricular activity should be given the opportunity to make up class assignments or other graded assignments missed as a result of their participation. It is the responsibility of the student to make arrangements with the instructor prior to any missed scheduled examination or other missed assignment for making up the work. (See the University Undergraduate Catalog for details.)

Please note that this syllabus is subject to change. Any changes to the syllabus will be announced via Blackboard and displayed on the course website.