- Course: BIOL 7800, LSU
- Time/Location: T/Th, 9:00 - 10:20 AM | PFT 1246; W, 9:30 - 10:20 AM | PFT 1131
- Instructor: Brant Faircloth
- Need help?
- Office Hours T/Th 10:30 - 12:00 PM | 282 Life Sciences
The analysis of large data sets in biological research is becoming common, particularly as new sequencing technologies and data collection strategies exponentially increase the amount of data that can be collected by an individual researcher. Programmatic approaches are often needed to format and analyze these large data sets, yet few biologists receive training in applying programming languages to these tasks. Programming for Biologists
is meant to introduce graduate or advanced undergraduate students to the practice of computer programming as it is applied to biological problems using a common programming language (Python) and programmatic techniques and algorithms.
This course is going to challenge and frustrate you. A lot. I promise. You are learning a new language really quickly - that's a hard thing to do. Along with the hard parts of learning a new language, in this case, comes having to learn a number of new tools that you have not (likely) been exposed to. That's also really hard. You're also going to have to actually think on top of all that. But, if you think, and work, and work with your classmates to understand what's going on, you will end up learning much, much more in a shorter period of time than you expected.
I'm here to help you learn to program a computer. It's up to you to learn how to make that work for you. I view my role as providing guidance and direction and your role as using that guidance and direction to get where you want/need to go. If you decide that you like this sort of thing, you will be teaching yourself this way for the rest of your life. Better to learn how to do that now.
Along those lines, I am not going to answer questions about this or that program/technique/assignment via email. All class communication should happen on Slack, and almost all of that communication should happen in a open channel where your classmates can help you answer your question. You should each be able to create additional #channels, if needed. You should also take some time to learn about the features Slack offers, like code-formatting, etc.
Part of the learning process is figuring out how to search for and find the information you need to fix a problem that you are having. I am unlikely to respond to requests on Slack that can be answered using a simple google search. I'm not doing this to be mean, I'm doing it because formulating a good search strategy to help you answer these types of problems is in your best interest.
A wise person once said that "99% of bioinformatics is learning how to google", and that idea is just as important when talking about computer programming. Learn how to answer a question for yourself, test out some new ideas if you're close but not quite there, and you'll be kicking-ass in no time.
THAT SAID, a wise person also said that using these types of information without attribution is plagiarism. So, DO NOT use these sources without attribution, and DO NOT use these sources as a crutch to help you succeed in this course. I will notice is all of your assignments are using code from elsehere. You will also learn much less, this way.
In a word: experimenting. The best way for you to learn what works is to try different things out. For example, if I tell you that you have a list containing [1,2,3,4]
and ask you how to drop the last number, you should look up several ways that you might go about doing this and try those in the REPL. There are lots of solutions, like:
# drop the last item from this list
l = [1,2,3,4]
# the smart
new_l = l[:3]
# or the redundant
new_l = [item for item in l[:3]]
# or the snarky
new_l = [1,2,3]
# or the "i read the documentation" and think this is smart
new_l = l.remove(4)
So, try them out and see what's what. You should be doing this for everything!!
Think Python: How to Think Like a Computer Scientist by Allen Downey
This is a freely-available textbook. We will follow parts of it for the class. It is also an invaluable reference text when you need to remind yourself of relatively simple Python details.
Python 3.6
This year, we're going to try something different from what we've done in previous years. To get around most of the problems with learning a language on different computing platforms (which can be a pain), you will be learning Python using repl.it, which is a way to practice your Python skills without worrying about the details of differnt operating systems.
I am not, yet, sure how much of this we will be doing... However, you may want to install a version of Python on your own machine. If you do, I suggest using the Conda Python Distribution. Specifically, the miniconda
distribution. If you are using Windows 10, you will probably also benefit greatly from installing the Linux Subsystem. On Windows and for bioinformatics/programming tasks, the Linux subsystem is the way to go.
- Here are installer packages for Miniconda
- And, here are some details regarding the Linux subsystem on Windows 10
There exists a weird schism in the world where a now (much) older version of a programming language (Python 2.7.x) is used by many developers versus the newer (and mostly improved) version of that same language (Python 3.6.x). The reasons for this are many and varied, but they largely dealt with the unavailability of many important packages in Python 3.6.x until "recently".
I would argue that the time is right for scientists to make the move to Python 3.6.x from Python 2.7.x. So, we're starting that movement.
- Because teaching a programming language where everyone's laptop runs a different OS is difficult.
- repl.it can install packages we are missing
- repl.it makes it easier to grade assignments that you complete
Fine with me - you'll still need to upload your code to repl.it. Speaking of, a good code editor is worth learning and something that we'll spend time on during on of the labs. If you are after something free, I suggest:
I am releasing the contents of this course (e.g. all my notes) under an open-source license (BSD).
In accordance with the LSU grading policy, grades will be assigned using an A-F scale and the +/- system. Grading is pretty simple:
Item | Points | # of assignments | Total Points | % of grade |
---|---|---|---|---|
Class assignments | 25 each | 22 | 550 | ~55% |
Class exams | 150 each | 3 | 450 | ~45% |
Total | 1000 points | 100% |
Points | Letter Grade Assigned |
---|---|
970-1000 | A+ |
930-969 | A |
900-929 | A- |
870-899 | B+ |
830-869 | B |
800-829 | B- |
770-799 | C+ |
730-769 | C |
700-729 | C- |
670-699 | D+ |
630-669 | D |
600-629 | D- |
< 600 | F |
If you are in the field during the first portion of class, I will work with you. Otherwise, if you don't turn in the assignments on time, you will lose all of the points for that assignment. Class is technically optional. But, it will greatly benefit you to show up in class for the discussion and exercises that will give you a head-start on your assignments. It will also help you prepare for the exams, which will be paper-based and for which you will not use a computer.
The course will be a mix of lecture, in-class "active" learning, individual assignments, and exams. That keeps it fun for all of us. You will be expected to contribute to discussions in class. If you do not, I will ensure your grade reflects that lack of participation. Also, see Commitment to Community and Academic Integrity regarding my expectations with respect to being civil to your classmates and doing your own work.
Some portions of our class will be lecture-based. These lectures will, for the most part, derive from the Textbook chapter or the URL provided in the Schedule . I, of course, will elaborate on some items and focus less on others - as I feel they are appropriate. It would be wise for you to read the assigned reading prior to coming to class. You may want to read the same chapter, again, after lecture. Repetition is one key to learning a new language efficiently.
There is a laboratory section of this class that is meant to provide time for me to assist you with problems that you may be encountering as you learn to program or for us to review materials we've covered during the course. The laboratory section is mandatory, although it may not always use the fully allotted period of time. During lab, be prepared to review code from your previous assignments, ask implementation questions, and discuss problems you are having. The laboratory is meant to directly help you with each part of your assignment - that's not the goal. The goal is to get you over minor obstacles that are keeping you from completing your assignments.
I added the laboratory section to this class on the advice of previous students who have enrolled.
We will have assignments associated with almost every class period, and assignments are to be submitted before the class period at which they are due. To receive credit for those assignments, you will need to turn them in on time. Late assignments will receive a score of zero.
Assignment and exam scores will post to moodle. The score that you receive on any given assignment will be based on a rubric that is associated with each assignment. Generally, this means that your function or program produces the expected output by following the expected progression of steps. For example, if I ask you to write a computer program to compute the value of the constant e
, but you simply output the value of math.e
without specifically computing e
, you will not receive credit for that portion of the assignment.
You will have three exams associated with this class. These will be in-class exams that focus on what you've learned during the previous few weeks. These will not be open-book. I have decided to hold exams for this course to ensure that everyone in the class is taking the time to study the material we cover. The types of questions on the exam will range from general ("Who was Ada Lovelace?") to specific ("What is the difference between an integer and a float? Why is a tuple better to hold data?"). These tests should be challenging.
Week | Date | Subject | Chapter | Assignment Due |
---|---|---|---|---|
1 | 21 Aug | Syllabus; Intro | ||
22 Aug | LAB (Intro to Repl and repl.it) | |||
23 Aug | Python Variables/Expressions | Chap 1 & 2 | ||
2 | 28 Aug | Functions Part I & PEP8 | Chap 3 & PEP 8 | 1 |
29 Aug | LAB (Function practice) | |||
30 Aug | Conditionals and Recursion | Chap 5 | 2 | |
3 | 4 Sep | Functions Part II | Chap 6 | 3 |
5 Sep | LAB | |||
6 Sep | Iteration | Chap 7 | 4 | |
4 | 11 Sep | Strings & Lists | Chap 8 & 10 | 5 |
12 Sep | NO LAB | |||
13 Sep | NO CLASS | |||
5 | 18 Sep | Dictionaries & Tuples | Chap 11 & 12 | 6 |
19 Sep | LAB | |||
20 Sep | Files | Chap 14 | 7 | |
6 | 25 Sep | Input/Output/Stdin/Stdout/Logging | 8 | |
26 Sep | LAB | |||
27 Sep | EXAM 1 (in class) | 9 | ||
7 | 2 Oct | Classes & objects | Chap 15 & Chap 16 | |
3 Oct | LAB | |||
4 Oct | NO CLASS (FALL BREAK) | |||
8 | 9 Oct | Classes & methods | Chap 17 & Chap 18 | 10 |
10 Oct | LAB | |||
11 Oct | The Kitchen Sink | Chap 19 | 11 | |
9 | 16 Oct | The Kitchen Sink (Part 2) | 12 | |
17 Oct | LAB | |||
18 Oct | The Kitchen Sink (Part 2) | 13 | ||
10 | 23 Oct | TDD and Documentation | 14 | |
24 Oct | Python Modules and BioPython | BioPython Cookbook | ||
25 Oct | EXAM 2 (in class) | 15 | ||
11 | 30 Oct | NO CLASS | ||
31 Oct | NO CLASS | |||
1 Nov | NO CLASS | |||
12 | 6 Nov | BioPython + NCBI | BioPython Cookbook | |
7 Nov | numpy + pandas | numpy user guide & pandas user guide1 | 16 | |
8 Nov | numpy + pandas | numpy user guide & pandas user guide1 | 17 | |
13 | 13 Nov | subprocess | 18 | |
14 Nov | LAB | |||
15 Nov | subprocess | subprocess | 19 | |
14 | 20 Nov | sqlite3 | sqlite3 | 20 |
21 Nov | NO CLASS (THANKSGIVING) | |||
22 Nov | NO CLASS (THANKSGIVING) | |||
15 | 27 Nov | sqlite3 | timeit & multiprocessing | 21 |
28 Nov | LAB Using the CLI | |||
29 Nov | speed, timing, and multiprocessing | 22 | ||
16 | 6 Dec | EXAM 3 (12:30 - 2:30 in PFT 1246) |
1 No, I do not expect you to read all 1800+ pages. Read Chapters 5, 6, 8, 9, 10. Experiment w/ the examples.
I take academic integrity seriously. You are expected to reference sources appropriately in your written work. You are absolutely expected to reference any third party computer code that you include in your assignments. You should also not copy the work of others. Simply copying someones work and changing variable names is still plagiarizing their work.
In previous years, I caught several students in this course for plagiarizing - ask around. All of them were found to have plagiarized, and all suffered several penalties including a note on their transcript that they plagiarized. You need to be very, very careful not to inappropriately use the work of others.
I will always assume the work you submit is your own, so you are responsible for its content.
You may ask your classmates about general ideas related to the course, and you are free to demonstrated to one another how this or that idea works. HOWEVER, you are expected to complete your assignments on your own, without help from anyone else. If you use other sources, please cite. If I determine that you are citing too many sources rather than doing your own work, your score for that assignment will indicate that you have not shown mastery of the material.
If I suspect that you have committed Academic Misconduct of any form (plagiarizing, cheating, etc.), I am required to report the incident to the Student Advocacy and Accountability office, and they will follow-up. Definitions of academic misconduct are provided here.
You should be familiar with the LSU Commitment to Community, which is outlined here. You should also be familiar with the LSU Code of Student Conduct, which is available here. You are expected to follow the Commitment to Community during your time in this class and when working on assignments outside of class. Students who do not respect the instructor(s) or other members of the class will be asked to leave the lecture immediately. This includes using the telephone, texting, or using the internet for non-class-related purposes during the lecture.