CSCI 1070 Introduction to Computer Science: Taming Big Data
This section of the course (Fall 2017) meets Monday, Wednesday, and Friday from 2:10PM-3:00PM in Ritter Hall 115.
How does Netflix recommend movies that you'll like? How does your email system recognize which messages are spam? How does Facebook decide what stories to post on your news feed? How does Google translate automatically between dozens of languages? The answer in each case is the same: the computer "learns" how to solve these problems using large datasets of known solutions (for example, a large collection of emails labeled as spam or not-spam, or historical movie ratings of every Netflix user).
In this course we'll learn the basic techniques and algorithms in Machine Learning and see how they can be applied to these problems and others like them. We'll learn how to assemble real-world datasets using various web APIs and will learn how to apply machine learning algorithms using the Python programming language.
- What is Machine Learning?
- Python Crash Course
- Databases and Web APIs
- Decision Tree Classifiers
- Bayesian Classifiers
- k-Means Clustering
- k Nearest Neighbors
- Recommendation Engines
- Neural Networks
- Introduction to Natural Language Processing
- Exploring Social Graphs
- Ethics in Machine Learning
Student Learning Outcomes
After successfully completing this course, students will be able to:
- recognize the use of bits in the low-level representation of digital data, with a particular focus on text data and the Unicode character encoding
- use control structures to process large datasets in a high-level programming language
- apply basic concepts of Software Engineering in the implementation of a computer program that satisfies a set of requirements, including thorough testing and iteration until requirements are met
- select and apply appropriate machine learning algorithms for classification or clustering to real world datasets
- formulate and solve a real-world problem problem in data science, including an appropriate evaluation, and present results in written form
The textbook for the course is Data Science from Scratch by Joel Grus, O'Reilly Media Inc., 2015. The ISBN is 978-1-491-90142-7. You can get it directly from O'Reilly, from Amazon, or from the SLU bookstore. The code examples from the book are available from this forkable github repository.
For those of you who choose to use the lab computers, please read the department and university policies on appropriate use of computer systems.
I will give approximately eight in-class quizzes (roughly one every two weeks) throughout the semester; dates TBA, depending on our progress through the course material. The quizzes are usually true/false, multiple choice, and some short answer, and only take about 10-15 minutes at the beginning of class. I'll drop your lowest quiz score, but I will not allow you to make up quizzes that you miss because of absence or if you arrive late for class. Together the quizzes make up 50% of your final grade.
You will also be asked to do a semester software project related to some topic we cover in the course, accounting for 25% of your final grade. I'll give you some ideas as we approach the middle of the semester. Since we'll cover a lot of different things, this is a good opportunity for you to explore some particular topic in greater depth.
Finally, we'll have a machine learning "bakeoff" toward the end of the semester, where in which you will do your best to solve a fixed machine learning problem of my choosing. I'll provide you with labeled training data for you to learn from, and then we will evaluate your algorithm against a hidden test set. The bakeoff will count for 25% of your final grade.
There is no final exam for the course.
Letter grades will be based on each student's overall percentage of awarded points according to the following formula.
- Student percentage above 90% will result in a grade of A or better.
- Student percentage above 87% will result in a grade of A- or better.
- Student percentage above 83% will result in a grade of B+ or better.
- Student percentage above 80% will result in a grade of B or better.
- Student percentage above 77% will result in a grade of B- or better.
- Student percentage above 73% will result in a grade of C+ or better.
- Student percentage above 70% will result in a grade of C or better.
- Student percentage above 67% will result in a grade of C- or better.
- Student percentage above 60% will result in a grade of D or better.
- Student percentage below 60% will result in a grade of F.
Academic Integrity Statement
Academic integrity is honest, truthful and responsible conduct in all academic endeavors. The mission of Saint Louis University is "the pursuit of truth for the greater glory of God and for the service of humanity." Accordingly, all acts of falsehood demean and compromise the corporate endeavors of teaching, research, health care, and community service via which SLU embodies its mission. The University strives to prepare students for lives of personal and professional integrity, and therefore regards all breaches of academic integrity as matters of serious concern. The governing University-level Academic Integrity Policy was adopted in Spring 2015, and can be accessed on the Provost's Office website. Additionally, each SLU College, School, and Center has adopted its own academic integrity policies, available on their respective websites. All SLU students are expected to know and abide by these policies, which detail definitions of violations, processes for reporting violations, sanctions, and appeals. Please direct questions about any facet of academic integrity to your faculty, the chair of the department of your academic program, or the Dean/Director of the College, School or Center in which your program is housed.
Title IX Statement
Saint Louis University and its faculty are committed to supporting our students and seeking an environment that is free of bias, discrimination, and harassment. If you have encountered any form of sexual misconduct (e.g. sexual assault, sexual harassment, stalking, domestic or dating violence), we encourage you to report this to the University. If you speak with a faculty member about an incident of misconduct, that faculty member must notify SLU’s Title IX coordinator and share the basic fact of your experience. The Title IX coordinator will then be available to assist you in understanding all of your options and in connecting you with all possible resources on and off campus. If you wish to speak with a confidential source, you may contact the counselors at the University Counseling Center at 314-977-TALK. To view SLU’s sexual misconduct policy and for resources, please visit this web address.
Student Success Center
In recognition that people learn in a variety of ways and that learning is influenced by multiple factors (e.g., prior experience, study skills, learning disability), resources to support student success are available on campus. The Student Success Center, a one-stop shop, which assists students with academic and career related services, is located in the Busch Student Center (Suite, 331) and the School of Nursing (Suite, 114). Students who think they might benefit from these resources can find out more about (1) Course-level support (e.g., faculty member, departmental resources, etc.) by asking your course instructor, and (2) University-level support (e.g., tutoring services, university writing services, disability services, academic coaching, career services, and/or facets of curriculum planning) by visiting the Student Success Center or by going here.
Disability Services Academic Accommodations
Students with a documented disability who wish to request academic accommodations are encouraged to contact Disability Services to discuss accommodation requests and eligibility requirements. Please contact Disability Services, located within the Student Success Center, at <Disability_services@slu.edu> or 314-977-3484 to schedule an appointment. Confidentiality will be observed in all inquiries. Once approved, information about academic accommodations will be shared with course instructors via email from Disability Services and viewed within Banner via the instructor’s course roster.