As we reach the end of the course there is the obvious question of what next? The ideal scenario is that you have some of your own real data that you can start practising with and develop your skills there, but it can be overwhelming with large datasets or downloading other public ones. Additionally, you may just want to develop your python skills more, as we have only began to see some of the power of python.
So here are some tips
The Projects that we worked on this week are just starting points, and there are many more ways to use that dataset and build bigger code. What other findings can you extract from them? Maybe go back to day one/two/three and re-write them using functions, classes, and comprehensions.
- Converting exercises we have done into command-line programs involves a lot of python practice to include variables (argparse), reading files, outputting data. Good practice!
- Find some published python code or that a colleague has written and try to understand how it works. Maybe add your own function to it. Alternatively, try to convert code from a different language (R/perl) into python!
- There are a million python tutorials and guides out there, although few related to biology. Now you have a good start you can probably go through some of these with ease. A free trial at Codecademy, Udemy, Coursera will give you lots more tasks and challenges and perhaps explain things in a slightly different way
- As your code grows, one thing to be aware of is the efficiency of your code. It can become easy to have loops of loops which cause CPU and ram issues or times where a dictionary or library (i.e. numpy) is more efficient. You can look at cProfiler, line_profiler, and time() as ways to identify where your code is struggling.
- The Python for Data Analysis book : https://wesmckinney.com/book
- Start with a small dataset and try to replicate some results from papers. You can download real published data and see the publications that go with them ENA Browser - European Nucleotide Archive
- There is a great whole project walk through n using python for Drug Discovery - 1 hour 45! Drug Discovery Using Machine Learning
- A great youtube channel with lots of bioinformatics chat in different programming languages: OMGenomics