The objective of this course is to teach the analytical mindset & programming skills relevant to data science. You will learn how to explore and analyze data using data science libaries and the Python programming language. More specifically, you will learn how to work with Jupyter notebooks, NumPy, Pandas, Seaborn, and Scikit-learn.
Students will learn skills that cover the various phases of exploratory data analysis:
Upon successful completion of this course, you have acquired the fundamental skills for a data-centric career in the Silicon Valley.
The goal is to spend as much time in the classroom on establishing and honing your analytical mindset. Thus, as part of the class, you will complete the online Python tutorial on codecademy.com. This is an excellent, self-guided and self-paced tutorial on the Python programming language. You are required to complete the tutorial by January 26, 2018. This will allow us to spend the time in class on covering the basics of data science.
The hands-on elements in this course use the Jupyter environment. Jupyter is a web applications for literate programming. Literate programming integrates the explanation of the code with the actual code, which is particularly important in data science environments.
PLEASE NOTE: This course involves extensive programming and computer work, both in and out of class. You are required to bring a laptop to class each time and have Jupyter installed on your laptop. If you run into any issues, please let me or the TA know.
I am committed to your learning success. Please feel free to contact me with any questions regarding this course. If I am not able to help you myself, I will forward your request to someone who can.
Class meetings are Tuesdays and Thursdays, 7:35 PM to 8:50 PM in Lucas Hall 210.
Each class will focus on one specific conceptual issue and discusses solutions to this issue. During the class meeting, you have to work on data science problems. At the end of each meeting, I may ask you to present and explain your results.
“What it boils down to is one per cent inspiration and ninety-nine per cent perspiration.” (Thomas Edison)
Your mastery of the learning objectives will be examined through the online Python tutorial, a midterm exam, weekly homework, and a team project. There will be no final exam.
The following table links the learning objectives of this class with the assignments and shows the maximum number of points that you can achieve of each assignment in the final grade.
Learning Objective | Assignment | Max. Points |
---|---|---|
Understand the fundamentals of Python. | Online tutorial | 10 |
Prepare data sources for analysis. | Mid-term exam | 30 |
Develop solutions to data science problems. | Weekly homework | 30 |
Create a persuasive data analysis. | Team Project | 30 |
Total | 100 |
The final grade distribution is as follows.
Points | Letter Grade |
---|---|
100-94 | A |
>94-90 | A- |
>90-87 | B+ |
>87-84 | B |
>84-80 | B- |
>80-77 | C+ |
>77-74 | C |
>74-70 | C- |
>70-0 | F |
My grading criteria are as follows:
I reserve the right to change the grading to accommodate special circumstances and opportunities. Any changes, however, will be discussed and announced in class and on Camino.
You will submit a confirmation of completing the online tutorial no later than January 26, 2018 (11:59 PM). I will not accept late submission without prior notice or without a doctor’s note. I am aware that sometimes life goes crazy but please notify me in advance and we will work it out.
In the mid-term exam you will demonstrate your ability to prepare a dataset for analysis. Commonly, data scientists spend about 60-80 percent of their time with data preparation (cleaning, exploring, transforming, etc.).
The mid-term exam consists of a in-class component and a take-home component:
Both components will be administered through Camino. I will not accept late submission for the take-home component without prior notice or without a doctor’s note. I am aware that sometimes life goes crazy but please notify me in advance and we will work it out.
PLEASE NOTE: Both components of the mid-term exam are “open book”, which means that you may use the internet, class notes, books, etc. during the exam time.
PLEASE NOTE: The take-home component is individual work. You may discuss the take-home component but you must not share intermediate or final solutions with fellow students. Please review the academic integrity rules below.
I will evaluate the take-home submissions based on the following criteria:
Criteria | Metrics | Max. Points |
---|---|---|
Data description | Understandability (1), Completeness (1) | 2 |
Data preparation & use | Clarity (1), Explanations (1), Class Concepts (1) | 3 |
Finding | Persuasiveness (1), Evidence (2) | 3 |
Style | Professionality (1), Originality (1) | 2 |
Total | 10 |
Persuasiveness metrics assess the structure and the strength of your findings from a managerial perspective (Can you answer the “So what”-question? Do you provide actionable insights?). Evidence metrics assess the functionality, efficiency, and reproducibility of your findings from a data science perspective (Did you produce codes that other data scientists can build upon?).
Eight (8) weekly homework assignments help you to apply the class material to real-world data science problems. You will face increasingly difficult problems as we progress in the quarter. Homework assignments will be published each week on Tuesdays and are due on Fridays (11:59 PM) of the same week. No homework during the week of the mid-term exam and the final week. I will not accept late submission without prior notice or without a doctor’s note. I am aware that sometimes life goes crazy but please notify me in advance and we will work it out.
PLEASE NOTE: Homework is individual work. You may discuss homework assignments but you must not share intermediate or final solutions with fellow students. Please review the academic integrity rules below.
Homework | Due | Max. Points |
---|---|---|
1 | January, 16 2018 (7:35 PM) | 3 |
2 | January, 23 2018 (7:35 PM) | 3 |
3 1 | January, 30 2018 (7:35 PM) | 3 |
4 | February, 6 2018 (7:35 PM) | 3 |
5 | February, 13 2018 (7:35 PM) | 4 |
6 | February, 27 2018 (7:35 PM) | 4 |
7 | March, 6 2018 (7:35 PM) | 5 |
8 | March, 13 2018 (7:35 PM) | 5 |
Total | 30 |
You have to submit one Jupyter notebook as a github link (I will not accept any other forms of submission.). Make sure that the notebook renders correctly on github.
I will evaluate homework submissions based on the following criteria:
Criteria | Question | Max. Points |
---|---|---|
Completion | All objective completed? | 1 |
Explanation | Do you explain how you arrived at the solution? | 1-2 |
Efficiency | Can you show that your solution is efficient? | 0-1 |
Style | Professionality (1) | 1 |
Total | 3-5 |
The objective of this project is to demonstrate mastery of the class material. You will work in teams of two students on a dataset of your own choice. Your objective is to prepare the data and detect three interesting, non-trivial, and somewhat unexpected findings. It is your responsibility to explain and present why your findings are interesting, non-trivial, and somewhat unexpected.
You have to notify me of your choice of dataset by February 6, 2018 (11:59 PM).
You are highly encouraged to structure the project according to the following schedule:
Phase | Should be done by |
---|---|
Identify Dataset | Week 5 |
Data preparation | Week 6 |
Finding 1 | Week 7 |
Finding 2 | Week 8 |
Finding 3 | Week 9 |
Presentation | Week 10 |
Polishing | Finals Week |
The project is due on March, 20, 2018, 11:59 PM. You have to submit one Jupyter notebook as a github link (I will not accept any other forms of submission.). Make sure that the notebook renders correctly on github. I will not accept late submission without prior notice or without a doctor’s note. I am aware that sometimes life goes crazy but please notify me in advance and we will work it out.
The Jupyter notebook should includes the following components:
I will evaluate project submissions based on the following criteria:
Criteria | Metrics | Max. Points |
---|---|---|
Data description | Understandability (1), Completeness (1) | 2 |
Data preparation & use | Clarity (2), Explanations (2), Class Concepts (2), Efficiency (2) | 8 |
Finding 1 | Persuasiveness (3), Evidence (3) | 6 |
Finding 2 | Persuasiveness (3), Evidence (3) | 6 |
Finding 3 | Persuasiveness (3), Evidence (3) | 6 |
Style | Professionality (1), Originality (1) | 2 |
Total | 30 |
Persuasiveness metrics assess the structure and the strength of your findings from a managerial perspective (Can you answer the “So what”-question? Do you provide actionable insights?). Evidence metrics assess the functionality, efficiency, and reproducibility of your findings from a data science perspective (Did you produce codes that other data scientists can build upon?).
I firmly believe that mastery of data science requires constant practice. You will ace this course if you:
Week | Class Meeting | Topic |
---|---|---|
1 | January, 9 | Introduction |
1 | January, 11 | Jupyter in a whirlwind |
2 | January, 16 | Series |
2 | January, 18 | Indexing & selection |
3 | January, 23 | Dataframes |
3 | January, 25 | Grouping & aggregation |
4 | January, 30 | String operations |
4 | February, 1 | Apply |
5 | February, 6 | Review |
5 | February, 8 | Matplotlib & seaborn (Midterm take home) |
6 | February, 13 | Midterm (in class) |
6 | February, 15 | Take-home review |
7 | February, 20 | Data Science in a whirlwind |
7 | February, 22 | Classification |
8 | February, 27 | Clustering |
8 | March, 1 | Regression |
9 | March, 6 | Team project review |
9 | March, 8 | Putting it all together |
10 | March, 13 | Team presentations |
I reserve the right to change the schedule to accommodate special circumstances and opportunities. Any changes, however, will be discussed and announced in class and on Camino.
The Academic Integrity pledge is an expression of the University’s commitment to fostering an understanding of and commitment to a culture of integrity at Santa Clara University. The Academic Integrity pledge, which applies to all students, states:
“I am committed to being a person of integrity. I pledge, as a member of the Santa Clara University community, to abide by and uphold the standards of academic integrity contained in the Student Conduct Code.”
You are expected to uphold the principles of this pledge for all work in this class. For more information about Santa Clara University’s academic integrity pledge and resources about ensuring academic integrity in your work, see www.scu.edu/academic-integrity.
In particular, I expect that you give credit to any material (including but not limited to journal articles, web article, blog posts, images, data sets, libraries, APIs, and any media) that you have used for completing any assignment in this class. Being able to give credit by referencing sources consistently and correctly is evidence of mastery of a topic. It shows that you are able to construct original arguments that are backed with verifiable evidence. Failing to give credit is a sign of an inadequate learning progress. It shows that you have not understood the topic well enough to formulate your own arguments in relation to already existing ideas.
During your work in this class, you will use, modify, or extend digital content that you have found online. You will also use libraries, APIs, code snippets, and data sets that have been created by others. In every piece of work (presentations, assignments, etc.), you must acknowledge work, source code, data sets, and any other content that was not produced by you. Acknowledgements must be easily identifiable, inseparable from your content, and must not violate licenses.
Failure to provide appropriate acknowledgements will result in an F grade for that assignment. Repeated failure to provide appropriate acknowledgements will result in an F grade for the entire course.
During the first class, we will discuss this digital content policy. After this class, I will strictly enforce this policy. If you have doubts, contact me.
I will support you in your learning in this class and beyond to the best of my abilities. If I am not able to help you myself, I will identify someone who can. I will evaluate your contribution solely based on the standards set by this syllabus. Changes to the syllabus will be highlighted, discussed during class sessions, and will be published on Camino.
By enrolling in this class, you agree to the requirements stated in this syllabus. You will operate with integrity in your dealings with me and your fellow students. You will engage the learning materials with appropriate attention and dedication and maintain their engagement when challenged by difficult learning activities. You will contribute to the learning of others and you will perform to standards set by this syllabus.
Mutual respect is the foundation of this course. No one will be criticized for being wrong. Appropriate conduct includes honesty, self-respect, respect for others, and compliance with university policies and standards. Computers in the classroom should be used only for completing course-related work and for taking notes; cell phones must be turned off or muted.
Please let me know via email during the first two weeks of the course if you have any conflicts between a course element (class meeting, assignment) and another vital commitment (another course, work, university-related extracurricular activities, religious commitments). At my discretion, I will you provide with alternative means to complete the course element.
I am aware that many of you have multiple commitments. You should attend at least 80 percent of all scheduled class meetings. If you miss more than 20 percent of scheduled classes, you will receive reduction by one letter grade.
If you have a disability for which accommodations may be required in this class, please contact Disabilities Resources (Benson Hall 216, 408-554-4109) as soon as possible to discuss your needs and register for accommodations with the University. If you have medical needs related to pregnancy, you may also be eligible for accommodations. If you have already arranged accommodations through Disabilities Resources, please discuss them with me during my office hours as soon as possible.
While I am happy to assist you, I am unable to provide accommodations until I have received verification from Disabilities Resources. If you are in doubt of whether you are eligible for accommodations, I encourage you to contact Disabilities Resources (Benson Hall 216, 408-554-4109). The Disabilities Resources office would be grateful for advance notice of at least two weeks.
In alignment with Title IX of the Education Amendments of 1972, and with the California Education Code, Section 66281.7, Santa Clara University provides reasonable accommodations to students who are pregnant, have recently experienced childbirth, and/or have medically needs related to childbirth. Pregnant and parenting students can often arrange accommodations by working directly with their instructors, supervisors, or departments. Alternatively, a pregnant or parenting student experiencing related medical conditions may request accommodations through Disabilities Resources (Benson Hall 216, 408-554-4109).
Santa Clara University upholds a zero-tolerance policy for discrimination, harassment and sexual misconduct. If you (or someone you know) have experienced discrimination or harassment, including sexual assault, domestic/dating violence, or stalking, I encourage you to tell someone promptly. For more information, please consult the University’s Gender-Based Discrimination and Sexual Misconduct Policy at http://bit.ly/2ce1hBb or contact the University’s EEO and Title IX Coordinator, Belinda Guthrie, at 408-554-3043, bguthrie@scu.edu. Reports may be submitted online through https://www.scu.edu/osl/report/ or anonymously through Ethicspoint https://www.scu.edu/hr/quick-links/ethicspoint/
This syllabus was inspired by Aleszu Bajak’s syllabus. The class was developed by Michele Samorani at Santa Clara University.