Scientific programming

The notes here are meant to provide some basic information that is necessary to consider when programming for math and science. For some basic information on programming concepts check out the python page and resources there.

Data type considerations

Some primitive data types in many programming languages include things like int, float, double, string. For programming in math and science, which one is being used can make a large difference. For example, if you using python and you divide 1 by 2 like:

>>> a = 1
>>> b = 2
>>> a/b
0

you see that you get 0. Lets see what happens when you just add a decimal after the 1 or the 2

>>> a = 1
>>> b = 2.
>>> a/b
0.5
>>> a = 1.
>>> b = 2
>>> a/b
0.5

This is because Python is a dynamic typed language where the type of the variable is decided based on the usage and at runtime (as opposed to at compile time). This is contrasted with something like C or C++ or Java that are all static typed. You might have distinguished the two scenarios above in something like C like this


int a = 1;
int b = 2;
double c = 1;
double d = 2;

1
2
3
4
int a = 1;
int b = 2;
double c = 1;
double d = 2;

By specifically noting the type in static typed languages we don’t face the problem presented by the rounding in the first python example. Nevertheless, because we are aware of this problem in Python and other dynamically typed programming languages, we can simply make sure that we are explicit.

Underflow and overflow considerations

Something that is going to constantly be a problem for us when doing numerical calculations is underflow and overflow.

These are pretty simple concepts that have to do with how small and large a number the computer can store. It is easy to see why this might be without thinking of the specific way that computers store memory. For example, lets just say that you have three digits in which you can store a number:

000

Each of these numbers can have from 0 to 9. Well, if the limit is the three digits, what happens when we, say add

999
+
1

Well, if we have no additional space in which to put our carry (over), then we will have an overflow. It can go the other direction as well in terms of small numbers. Say we have one number in front of the decimal and three after the decimal

0.000

If we wanted to say multiply

0.001
*
0.1

The answer is going to be 0 or an underflow error depending on how this is handled. In fact, it isn’t 0 but we know it is 0.0001. Unfortunately, our system is not setup like there. There are a number of tricks for handling this problem. We will talk about these solutions as they come up (i.e., likelihood), but the problem is something to keep in mind.

Performance consideration

Typically, when people discuss programming languages and which is best for which task they mention that most applications don’t require a lot of speed because computers are much faster now. Well, it turns out that for many scientific applications this isn’t the case. There are still many cases where we have to worry a lot about performance both because we have large datasets and because we have hard problems. So why don’t you just learn C or C++ or assembly for that matter? Well, despite the fact that for us speed still matters, many researchers will find that for the vast majority of applications python will not only be just fine, but you will code much quicker. Then once you have figured out exactly how to solve the problem, if speed is the bottleneck, then you can move on to C or C++. Never the less, most programmers are going to have multiple languages in their toolbox.

Algorithm analysis

Throughout the exercises I will make some mention of the performance expectations of particular algorithms. In some cases, I will demonstrate experimentally the runtime increase with dataset size. When we encounter this for the first time, I will go over this in more detail. In short, though, algorithm analysis is the analysis of what resources will be required and how the resources scale for a particular algorithm. Even though individual computer systems differ by computer chips and memory resources, we can measure the rate at which these resources scale based on aspects of the algorithm. As you might imagine, this can be analyzed analytically, however, because this is an empirically based site, measurements will be based on empirical examples.

Priorities

For scientific software the primary concern would be the appropriate answer first, then worry about the speed and performance. That goes not only for the language that you choose to write your program in, but also in the style of code. It is better for the code to make sense than for it to be fast. So you should first write so that the code works and makes sense. Then you can determine where the code is slow and speed up just those sections. The readability of those sections may still be bad, but you can minimize the sections that are harder to read.

Leave a Reply