Python Intro for R users#

Introduction#

This is a Jupyter notebook rendered as a page of a Jupyter book. Please see this page for more information on the functionality available

Python and R have a few features (and some common syntax), but have separate histories and are set up to make different problems “easy” by default.

The R language was originally developed for statistical analysis, and its syntax makes these operations simple to write. Conversely, it has relatively complicated and obscure abilities for string processing. Python is a “general purpose” programming language, and the “out-of-the-box” interpreter is not as well suited to purely numerical or statistical operations, while handling strings, files and network protocols more gracefully. However, this deficiency can be largely fixed using a few popular and well known extensions (“packages” in Python terminology) to add back functionality which looks (deliberately, since its developers had used R) a lot like standard R syntax.

Some key differences#

Differences in usage pattern#

R is used almost exclusively by statisticians & data scientists to do statistics and data science, whereas Python is a general purpose programming language with a wide user base including a large number of scientists, engineers and mathematicians.

As such, Python is frequently used to create or use small programs (and a few large ones) whereas R is more often used in interactive or batch modes to process or visualize data.

Differences in syntax, standards and terminology#

Standard R practice tends to use the <- or -> token to assign values to variables. In Python, this is always done with the = token, acting towards the left.

Similarly where R uses c to create a vector, and $ to access named elements of a list, Python used [ and ] brackets to create a list and . to access members, properties and attributes of objects. Where R users something use . to split the words in names, Python users tend to use snake_case for functions and variables and camelCase for classes.

More generally, where R tends to support many different ways of saying the same thing in code, as used in various different packages, Python tends to strive for a single “best” or “Pythonic” way to write things.

Some similarities#

Much syntax#

The core mathematical operators are mostly the same in the two languages (with exceptions for exponentiation, ^ versus ** and the modulus operator %% versus %), as are the ways they are used.

Core algorithms#

Since algorithms are usually based in mathematics, and since both languages provide similar data structures, then an algorithm in one language can usually be translated into another fairly directly.

Both languages are interpretted#

Both R and Python work in an interpretter, meaning that you can code live, see the results of changes instantly and update things as you go.

This is different from compiled languages such as C or C++ in which coding a program and running feel like (and are) separate stages of work.

Pandas Data Frames#

The Python pandas package collects utilities to process labelled two-dimensional relational data as might be stored in a spreadsheet or table. This provides much of the functionality of the R data.frame object, with a very similar syntax.

For a deeper comparison see here.

Further Reading#

Pages from this book#