Unix

Getting started

We made a video to remind people about how to get comfortable with UNIX commands:

This is the easiest book for learning this stuff; it is short and gets right to the point:

https://go.oreilly.com/purdue-university/library/view/-/0596002610

you just log in and you can see it all; we suggest Chapters 1, 3, 4, 5, 7 (you can basically skip chapters 2 and 6 the first time through).

It is a very short read (maybe, say, 2 or 3 hours altogether?), just a thin book that gets right to the details.

Zoe Yang asked us about the difference in these 5 words: bash/Linux/terminal/shell/UNIX. Here you go:

UNIX(Unix) and Linux are operating systems, just like Mac OS X and Windows 10 are operating systems. There are many variants. Within Linux, the main different is the kernel (the main piece of code that makes things work) and sometimes the default configurations, like the GUI (i.e., the way stuff looks when you log in and see your desktop and interact with the windows and folders and files). UNIX dates back to the 1970's, and was from AT&T Bell Labs, and then people decided to make lots and lots of variants of this, and hence, the many flavors of Linux.

OK? The terminal is an application that runs in UNIX or Linux. It is the thing that you open and type things into it, and you see the output.

It is hard to tell the difference between the terminal and the shell. The shell is the way that you interact with UNIX/Linux directly (without pointing and clicking). You can tell the shell directly what you want to do with the files on the computer, for instance. You might think that the terminal and the shell are the same thing, but they are not quite. There are lots of different types of shells that can run in the terminal. To see which one you are using, you can type:

echo $SHELL

By default, it will say:

/bin/bash

There are other shells in your /bin directory. bash (Bourne Again SHell) is the default one. Many people consider this to be the "best" shell, or at least, the one that people know the most. Others are Bourne (sh), Korn (ksh), Z shell (zsh), C shell (csh), TENEX C shell (tcsh), and dozens more. Any of these shells would run in the terminal, just like bash does, and you might not even realize at the start which shell you are using, unless you type the command mentioned above:

echo $SHELL

They each have differences, but some of the differences are small. Again, bash is still the default on most Linux operating systems. A big recent change is that Mac OS Catalina just started using zsh instead of bash as the default shell but it is just because of a licensing issue, and Dr Ward thinks that Mac users who open the terminal and use the shell are very likely to switch from zsh back to bash. That's what Dr Ward did immediately when Apple made this change to zsh, i.e., he switched back to bash.

Wow, sorry for the long-winded answer.

Standard utilities

`man`

man stand for manual and is a command which presents all of the information you need in order to use a command. To use man simply execute man <command> where command is the command for which you want to read the manual.

You can scroll up by typing "k" or the up arrow. You can scroll down by typing "j" or the down arrow. To exit the man pages, type "q" (for quit).

How do I show the man pages for the `wc` utility?

Click here for solution

man wc

`cat`

cat stands for concatenate and print files. It is an extremely useful tool that prints the entire contents of a file by default. This is especially useful when we want to quickly check to see what is inside of a file. It can be used as a tool to output the contents of a file and immediately pipe the contents to another tool for some sort of analysis if the other tool doesn't natively support reading the contents from the file.

A similar, but alternative UNIX command that incrementally shows the contents of the file is called less. less starts at the top of the file and scrolls through the rest of the file as the user pages down.

`head`

head is a simple utility that displays the first n lines of a file, or input.

How do I show the first 5 lines of a file called `input.txt`?

Click here for solution

head -n5 input.txt

Alternatively:

cat input.txt | head -n5

`tail`

tail is a similar utility to head, that displays the last n lines of a file, or input.

How do I show the last 5 lines of a file called `input.txt`?

Click here for solution

tail -n5 input.txt

Alternatively:

cat input.txt | tail -n5

`ls`

ls is a utility that lists files and folders. By default, ls will list the files and folders in your current working directory. To list files in a certain directory, simply provide the directory to ls as the first argument.

How do I list the files in my `$HOME` directory?

Click here for solution

ls $HOME

# or

ls ~

How do I list the files in the directory `/home/$USER/projects`?

Click here for solution

ls /home/$USER/projects

How do I list all files and folders, including hidden files and folders in `/home/$USER/projects`?

Click here for solution

ls -a /home/$USER/projects

How do I list all files and folders in `/home/$USER/projects` in a list format, including information like permissions, filesize, etc?

Click here for solution

ls -l /home/$USER/projects

How do I list all files and folders, including hidden files and folders in `/home/$USER/projects` in a list format, including information like permissions, filesize, etc?

Click here for solution

ls -la /home/$USER/projects

# or

ls -al /home/$USER/projects

# or

ls -l -a /home/$USER/projects

`du`

du is a tool used to get file space usage.

Examples

How do I get the size of a file called `./metadata.csv` in bytes?

Click here for solution

du -b ./metadata.csv

How do I get the size of a file called `./metadata.csv` in kilobytes?

Click here for solution

du -k ./metadata.csv

## 1792 ./metadata.csv

Why is the result of `du -b ./metadata.csv` divided by 1024 not the result of `du -k ./metadata.csv`?

Click here for solution

du reports disk usage by default not necessarily actual size. File systems typically divide a disk into blocks. When a program tells the file system it wants say 3 bytes of space, if the block size is 1024 bytes, the file system may allocate 1024 bytes of space to store the 3 bytes of data. To see the apparent size, do this:

du -b ./metadata.csv
du -k --apparent-size ./metadata.csv

`cp`

cp is a utility used for copying files an folders from one location to another.

How do I copy `/home/$USER/some_file.txt` to `/home/$USER/projects/same_file.txt`?

Click here for solution

cp /home/$USER/some_file.txt /home/$USER/projects/same_file.txt

# If currently in /home/$USER
cd $HOME
cp some_file.txt projects/same_file.txt

# If currently in /home/$USER/projects
cd $HOME/projects
cp ../some_file.txt .

`mv`

mv very similar to cp, but rather than copy a file, mv moves the file. Moving a file removes it from its old location and places it in the new location.

How do I move `/home/$USER/some_file.txt` to `/home/$USER/projects/same_file.txt`?

Click here for solution

mv /home/$USER/some_file.txt /home/$USER/projects/same_file.txt

# If currently in /home/$USER
cd $HOME
mv some_file.txt projects/same_file.txt

# If currently in /home/$USER/projects
cd $HOME/projects
mv ../some_file.txt .

`touch`

touch is a command used to update the access and modification times of a file to the current time. More commonly, it is used to create an empty file that you can add contents to later on. To use this command, type touch followed by the file name (with the intended file path added when necessary).

`mkdir`

mkdir is the command to create a directory. It is simple to use, just type mkdir followed by a path to the new directory.

Examples

How do I create a new directory called `my_directory` in the current directory?

Click here for solution

mkdir my_directory

How do I create a new directory called `my_directory` in the parent directory?

Click here for solution

mkdir ../my_directory

How do I create a set of two new nested directories in the current directory?

Click here for solution

# You can either make the directories one at a time like this:
mkdir first_dir
cd first_dir
mkdir second_dir

# Or, you can use the -p option:
mkdir -p first_dir/second_dir

`rm`

rm is the command to remove files or directories. You can find the available options by checking its manual page.

Examples

How do I remove a folder called `my_folder` and all of its contents recursively. Assume `my_folder` is in `/home/user/projects`.

Click here for solution

rm -r /home/user/projects/my_folder

How do I remove all files in a folder ending in `.txt`? Assume we are looking at files in `/home/user/projects`.

Click here for solution

rm /home/user/projects/*.txt

`rmdir`

rmdir is a tool to remove empty directories. Simply type rmdir followed by the path to the empty directory you'd like to remove. Note that this command only removes empty directories. For this reason, rm is better suited to remove directories with content.

`pwd`

pwd stands for print working directory and it does just that -- it prints the current working directory to standard output.

`type`

type is a useful command to find the location of some command, or whether the command is an alias, function, or something else.

Where is the file that is executed when I type `ls`?

Click here for solution

type ls

## ls is /bin/ls

`uniq`

uniq reads the lines of a specified input file and compares each adjacent line and returns each unique line. Repeated lines in the input will not be detected if they are not adjacent. What this means is you must sort prior to using uniq if you want to ensure you have no duplicates.

`wc`

You can think of wc as standing for "word count". wc displays the number of lines, words, and bytes from the input file.

How do I count the number of lines of an input file called `input.txt`?

Click here for solution

wc -l input.txt

How do I count the number of characters of an input file called `input.txt`?

Click here for solution

wc -m input.txt

How do I count the number of words of an input file called `input.txt`?

Click here for solution

wc -w input.txt

`ssh`

`mosh`

`scp`

`cut`

cut is a tool to cut out parts of a line based on position/character/delimiter/etc and directing the output to stdout. It is particularly useful to get a certain column of data.

How do I get the first column of a csv file called 'office.csv`?

Click here for solution

cut -d, -f1 office.csv

How do I get the first and third column of a csv file called 'office.csv`?

Click here for solution

cut -d, -f1,3 office.csv

How do I get the first and third column of a file with columns separated by the "|" character?

Click here for solution

cut -d '|' -f1,3 office.csv

`sed`

`grep`

It is very simple to get started searching for patterns in files using grep.

How do I search for lines with the word "Exact" in the file located `/home/john/report.txt`?

Click here for solution

grep Exact /home/john/report.txt

# or

grep 'Exact' '/home/john/report.txt'

How do I search for lines with the word "Exact" or "exact" in the file located `/home/john/report.txt`?

Click here for solution

# The -i option means that the text we are searching for is 
# not case-sensitive. So the following lines will match
# lines that contain "Exact" or "exact" or "ExAcT".
grep -i Exact /home/john/report.txt

# or

grep -i 'Exact' '/home/john/report.txt'

How do I search for lines with a string containing multiple words, like "how do I"?

Click here for solution

# The -i option means that the text we are searching for is 
# not case-sensitive. So the following lines will match
# lines that contain "Exact" or "exact" or "ExAcT".

# By adding quotes, we are able to search for the entire
# string "how do i". Without the quotes this would only 
# search for "how".
grep -i 'how do i' /home/john/report.txt

How do I search for lines with the word "Exact" or "exact" in the files in the folder and all sub-folders located `/home/john/`?

Click here for solution

# The -R option means to search recursively in the folder
# /home/john. A recursive search means that it will search 
# all folders and sub-folders starting with /home/john.
grep -Ri Exact /home/john

How do I search for the lines that don't contain the words "Exact" or "exact" in the folder and all sub-folders located `/home/john/`?

Click here for solution

# The -v option means to search for an inverted match.
# In this case it means search for all lines of text
# where the word "exact" is not found.
grep -Rvi Exact /home/john

How do I search for lines where one or more of the words "first" or "second" appears in the current folder and all sub-folders?

Click here for solution

# The "|" character in grep is the logical OR operator.
# If we do not escape the "|" character with a preceding
# "\" grep searches for the literal string "first|second"
# instead of "first" OR "second".
grep -Ri 'first\|second' .

How do I search for lines that begin with the word "Exact" (case insensitive) in the folder and all sub-folders located in the current directory?

Click here for solution

The "^" is called an anchor and indicates the start of a line.

grep -Ri '^Exact' .

How do I search for lines that end with the word "Exact" (case insensitive) in the files in the current folder and all sub-folders?

Click here for solution

The "$" is called an anchor and indicates the end of a line.

grep -Ri 'Exact$' .

How do I search for lines that contain only the word "Exact" (case insensitive) in the files in the current folder and all sub-folders?

Click here for solution

grep -Ri '^Exact$' .

How do I search for strings or sub-strings where the first character could be anything, but the next two characters are "at"? For example: "cat", "bat", "hat", "rat", "pat", "mat", etc.

Click here for solution

The "." is a wildcard, meaning it matches any character (including spaces).

grep -Ri '.at' .

How do I search for zero or one of, zero or more of, one or more of, exactly n of a certain character using grep and regular expressions?

Click here for solution

"*" stands for 0+ of the previous character. "+" stands for 1+ of the previous character. "?" stands for 0 or 1 of the previous character. "{n}" stands for exactly n of the previous character.

# Matches any lines with text like "cat", "bat", "hat", "rat", "pat", "mat", etc.
# Does NOT match "at", but does match " at". The "." indicates a single character.
grep -Ri '.at' .

# Matches any lines with text like "cat", "bat", "hat", "rat", "pat", "mat", etc.
# Matches "at" as well as " at". The "." followed by the "?" means 
# 0 or 1 of any character.
grep -Ri '.?at' .

# Matches any lines with any amount of text followed by "at".
grep -Ri '.*at' .

# Only matches words that end in "at": "bat", "cat", "spat", "at". Does not match "spatula".
grep -Ri '.*at$' .

# Matches lines that contain consecutive "e"'s.
grep -Ri '.*e{2}.*' .

# Matches any line. 0+ of the previous character, which in this case is the wildcard "."
# that represents any character. So 0+ of any character.
grep -Ri '.*'

Resources

Regex Tester

https://regex101.com/ is an excellent tool that helps you quickly test and better understand writing regular expressions. It allows you to test four different "flavors" or regular expressions: PCRE (PHP), ECMAScript (JavaScript), Python, and Golang. regex101 also provides a library of useful, pre-made regular expressions.

Lookahead and Lookbehinds

This is an excellent resource to better understand positive and negative lookahead and lookbehind operations using grep.

ReExCheatsheet

An excellent quick reference for regular expressions. Examples using grep in R.

`ripgrep`

ripgrep is a "line-oriented search tool that recursively searches your current directory for a regex pattern." You can read about why you may want to use ripgrep here. Generally, ripgrep is frequently faster than grep. If you are working with code it has sane defaults (respects .gitignore). You can easily search for specific types of files.

How do I exclude a filetype when searching for `foo` in `my_directory`?

Click here for solution

# exclude javascript (.js) files
rg -Tjs foo my_directory

# exclude r (.r) files
rg -Tr foo my_directory

# exclude Python (.py) files
rg -Tpy foo my_directory

How do I search for a particular filetype when searching for `foo` in `my_directory`?

Click here for solution

# search javascript (.js) files
rg -tjs foo my_directory

# search r (.r) files
rg -tr foo my_directory

# search Python (.py) files
rg -tpy foo my_directory

How do I search for a specific word, where the word isn't part of another word?

Click here for solution

# this is roughly equivalent to putting \b before and after all search patterns in grep
rg -w foo my_directory

How do I replace every match `foo` in `my_directory` with the text given, `bar`, when printing results?

Click here for solution

rg foo my_directory -r bar

How do I trim whitespace from the beginning and ending of each printed line?

Click here for solution

rg foo my_directory --trim

How do I follow symbolic links when searching a directory, `my_directory`?

Click here for solution

rg -L foo my_directory

`find`

find is an aptly named tool that traverses directories and searches for files.

Examples

How do I find a file named `foo.txt` in the current working directory or subdirectories?

Click here for solution

find . -name foo.txt

How do I find a file named `foo.txt` or `Foo.txt` or `FoO.txt` (i.e. ignoring case) in the current working directory or subdirectories?

Click here for solution

find . -iname foo.txt

# or 

find . -i -name foo.txt

How do I find a directory named `foo` in the current working directory or subdirectories?

Click here for solution

find . -type d -name foo

How do I find all of the Python files in the current working directory or subdirectories?

Click here for solution

find . -name "*.py"

How do I find files over 1gb in size in the current working directory or subdirectories?

Click here for solution

find . -size +1G

How do I find files under 10mb in size in the current working directory or subdirectories?

Click here for solution

find . -size -10M

`less`

less is a utility that opens a page of text from a file and allows the user to scroll forward or backward in the file using "j" and "k" keys or down and up arrows. less does not read the entire file into memory at once, and is therefore faster when loading large files.

How do I display the contents of a file, `foo.txt`?

Click here for solution

less foo.txt

How do I scroll up and down in `less`?

Click here for solution

To scroll down use "j" or the down arrow. To scroll up use "k" or the up arrow.

How do I exit `less`?

Click here for solution

Press the "q" key on your keyboard.

`sort`

sort is a utility that sorts lines of text.

Examples

How do I sort a csv, `flights_sample.csv` alphabetically by the 18th column?

Click here for solution

# the r option sorts ascending
sort -t, -k18,18 flights_sample.csv

## 1990,10,18,7,729,730,847,849,PS,1451,NA,78,79,NA,-2,-1,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,19,1,749,730,922,849,PS,1451,NA,93,79,NA,33,19,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,21,3,728,730,848,849,PS,1451,NA,80,79,NA,-1,-2,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,22,4,728,730,852,849,PS,1451,NA,84,79,NA,3,-2,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,23,5,731,730,902,849,PS,1451,NA,91,79,NA,13,1,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,24,6,744,730,908,849,PS,1451,NA,84,79,NA,19,14,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
## 1987,10,14,3,741,730,912,849,PS,1451,NA,91,79,NA,23,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1990,10,15,4,729,730,903,849,PS,1451,NA,94,79,NA,14,-1,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1990,10,17,6,741,730,918,849,PS,1451,NA,97,79,NA,29,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA

How do I sort a csv, `flights_sample.csv` alphabetically by the 18th column, and then in descending order by the 4th column?

Click here for solution

sort -t, -k18,18 -k4,4r flights_sample.csv

## 1990,10,18,7,729,730,847,849,PS,1451,NA,78,79,NA,-2,-1,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,24,6,744,730,908,849,PS,1451,NA,84,79,NA,19,14,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,23,5,731,730,902,849,PS,1451,NA,91,79,NA,13,1,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,22,4,728,730,852,849,PS,1451,NA,84,79,NA,3,-2,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,21,3,728,730,848,849,PS,1451,NA,80,79,NA,-1,-2,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1991,10,19,1,749,730,922,849,PS,1451,NA,93,79,NA,33,19,SAN,ABC,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
## 1990,10,17,6,741,730,918,849,PS,1451,NA,97,79,NA,29,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1990,10,15,4,729,730,903,849,PS,1451,NA,94,79,NA,14,-1,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA
## 1987,10,14,3,741,730,912,849,PS,1451,NA,91,79,NA,23,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA

`git`

See here.

`awk`

awk is a powerful programming language that specializes in processing and manipulating text data.

In awk, a command looks something like this:

awk -F, 'BEGIN{ } { } END{ }'

The delimiter is specified with the -F option (in this case our delimiter is a comma). The BEGIN chunk is run only once at the start of execution. The middle chunk is run once per line of the file. The END chunk is run only once, at the end of execution.

The BEGIN and END portions are always optional.

The variables: $1, $2, $3, etc., refer to the 1st, 2nd, and 3rd fields in a line of data. For example, the following would print the 4th field of every row in a csv file:

awk -F, '{print $4}'

$0 represents the entire row.

awk is very powerful. We can achieve the same effect as using cut:

head 5000_products.csv | cut -d, -f3

# or

head 5000_products.csv | awk -F, '{print $3}'

Built in variables

awk has some special built in variables that can be very useful. See here.

Examples

How do I print only rows where the `DAYOFWEEK` is `5`?

Click here for solution

head metadata.csv | awk -F, '{if ($3 == 5) {print $0}}'

## 01/01/2015,,5,0,0,1,2015,CHRISTMAS PEAK,0,5,nyd,1,,,,0,0,CHRISTMAS PEAK,73.02,59.81,66.41,,0,,0,,0,,0,,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,17:42,1,1,0,0,18,19,17,0,0,0,0,0,0,0,1,13,17,15,0,0,0,0,1,0,14,16,14,0,1,0,0,0,0,11,15,12,8:00,25:00,17,7:00,25:00,8:00,26:00,18,8:00,25:00,17,8:00,21:00,13,8:00,21:00,8:00,25:00,17,8:00,21:00,13,8:00,22:00,14,8:00,22:00,8:00,24:00,16,8:00,22:00,14,8:00,19:00,11,8:00,19:00,8:00,22:00,14,8:00,20:00,12,1,1,0,0,NONE,53.375714286,70.3,50.2,0.12,616246,367265,296273,236654,53904354,34718635,26907827,20971646,1600,1000,2,12:00,15:30,Disney Festival of Fantasy Parade,1,22:15,,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,3,18:30,20:00,Fantasmic!,1,0,,,,,0,,,
## 01/08/2015,,5,7,1,1,2015,CHRISTMAS,8,0,,0,,marwk,,0,1,CHRISTMAS,59.44,38.7,49.07,,0,,0,,0,,0,,88%,94%,99%,78%,97%,83%,69%,94%,100%,100%,100%,76%,100%,100%,93%,100%,100%,100%,100%,100%,100%,63%,93%,17:47,1,0,0,0,13,12,12,0,0,0,0,0,0,0,1,12,12,14,0,0,0,0,0,0,10,10,10,0,1,0,0,0,0,8,9,9,9:00,21:00,12,8:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,19:00,10,9:00,19:00,9:00,19:00,10,9:00,19:00,10,9:00,17:00,8,9:00,17:00,9:00,17:00,8,9:00,18:00,9,1,1,0,0,NONE,48.372142857,70.3,49.4,0.08,615046,367265,296273,236654,53894754,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,2,19:00,21:00,Main Street Electrical Parade,1,20:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,1,19:00,,Fantasmic!,1,0,,,,,0,,,

How do I print the first, fourth, and fifth columns of rows where the `DAYOFWEEK` is `5`?

Click here for solution

head metadata.csv | awk -F, '{if ($3 == 5) {print $1, $4, $5}}'

## 01/01/2015 0 0
## 01/08/2015 7 1

How do I print only rows where `DAYOFWEEK` is `5` OR `YEAR` is `2015`?

Click here for solution

head metadata.csv | awk -F, '{if ($3 == 5 || $7 == 2015) {print $0}}'

## 01/01/2015,,5,0,0,1,2015,CHRISTMAS PEAK,0,5,nyd,1,,,,0,0,CHRISTMAS PEAK,73.02,59.81,66.41,,0,,0,,0,,0,,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,17:42,1,1,0,0,18,19,17,0,0,0,0,0,0,0,1,13,17,15,0,0,0,0,1,0,14,16,14,0,1,0,0,0,0,11,15,12,8:00,25:00,17,7:00,25:00,8:00,26:00,18,8:00,25:00,17,8:00,21:00,13,8:00,21:00,8:00,25:00,17,8:00,21:00,13,8:00,22:00,14,8:00,22:00,8:00,24:00,16,8:00,22:00,14,8:00,19:00,11,8:00,19:00,8:00,22:00,14,8:00,20:00,12,1,1,0,0,NONE,53.375714286,70.3,50.2,0.12,616246,367265,296273,236654,53904354,34718635,26907827,20971646,1600,1000,2,12:00,15:30,Disney Festival of Fantasy Parade,1,22:15,,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,3,18:30,20:00,Fantasmic!,1,0,,,,,0,,,
## 01/02/2015,,6,1,0,1,2015,CHRISTMAS,2,5,,0,,,,0,0,CHRISTMAS,78,60.72,69.36,,0,,0,,0,,0,,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,17:43,0,1,0,0,17,18,16,0,0,0,0,0,1,0,0,15,13,12,0,0,1,0,0,0,14,14,14,0,0,0,0,0,0,12,11,11,8:00,25:00,17,8:00,25:00,8:00,25:00,17,9:00,25:00,16,8:00,21:00,13,8:00,23:00,8:00,21:00,13,9:00,21:00,12,8:00,22:00,14,8:00,22:00,8:00,22:00,14,9:00,22:00,13,8:00,20:00,12,8:00,20:00,8:00,19:00,11,8:00,19:00,11,1,1,0,0,NONE,53.750714286,70.3,50,0.12,616246,367265,296273,236654,53904354,34718635,26907827,20971646,1600,1000,2,12:00,15:30,Disney Festival of Fantasy Parade,1,22:15,,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,3,18:30,20:00,Fantasmic!,1,0,,,,,0,,,
## 01/03/2015,,7,2,0,1,2015,CHRISTMAS,3,0,,0,,,,0,0,CHRISTMAS,83.12,67.31,75.22,,0,,0,,0,,0,,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,17:44,0,0,0,0,16,17,15,0,0,0,0,0,0,1,0,12,15,12,1,0,0,0,0,0,14,14,11,0,0,1,0,0,0,11,12,12,9:00,25:00,16,9:00,25:00,8:00,25:00,17,9:00,24:00,15,9:00,21:00,12,9:00,21:00,8:00,21:00,13,9:00,21:00,12,9:00,22:00,13,8:00,22:00,8:00,22:00,14,9:00,20:00,11,8:00,19:00,11,8:00,19:00,8:00,20:00,12,9:00,20:00,11,1,1,0,0,NONE,49.212857143,70.3,49.9,0.07,616246,367265,296273,236654,53904354,34718635,26907827,20971646,1600,1000,2,12:00,15:30,Disney Festival of Fantasy Parade,1,22:15,,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,2,18:30,20:00,Fantasmic!,1,0,,,,,0,,,
## 01/04/2015,,1,3,1,1,2015,CHRISTMAS,4,0,,0,,,,0,0,CHRISTMAS,83.93,67.97,75.95,,0,,0,,0,,0,,67%,74%,77%,74%,74%,70%,66%,94%,68%,57%,56%,70%,79%,43%,93%,100%,100%,100%,100%,100%,48%,63%,84%,17:44,0,0,0,0,15,16,14,0,0,0,0,0,0,0,0,12,12,12,0,1,0,0,0,1,11,14,13,1,0,0,0,0,0,12,11,8,9:00,24:00,15,9:00,24:00,9:00,25:00,16,9:00,23:00,14,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,20:00,11,9:00,20:00,9:00,22:00,13,9:00,20:00,11,9:00,20:00,11,8:00,20:00,8:00,19:00,11,9:00,17:00,8,1,1,0,0,NONE,48.270714286,70.3,49.8,0.12,616246,367265,296273,236654,53904354,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,2,20:00,22:00,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,2,19:00,20:30,Fantasmic!,1,0,,,,,0,,,
## 01/05/2015,,2,4,1,1,2015,CHRISTMAS,5,0,,0,,,,0,0,CHRISTMAS,72.3,56.89,64.6,,0,,0,,0,,0,,67%,74%,77%,74%,74%,70%,66%,94%,68%,57%,56%,70%,79%,43%,93%,100%,100%,100%,100%,100%,48%,63%,84%,17:45,0,0,0,0,14,15,12,0,0,0,0,1,0,0,0,12,12,13,0,0,0,1,0,0,13,11,10,0,1,0,0,0,0,8,12,8,9:00,23:00,14,9:00,23:00,9:00,24:00,15,9:00,21:00,12,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,20:00,11,9:00,22:00,9:00,20:00,11,9:00,19:00,10,9:00,17:00,8,9:00,17:00,9:00,20:00,11,9:00,17:00,8,1,1,0,0,NONE,48.971538462,70.3,49.6,0.12,616246,367265,306272,236654,53904354,34718635,27897728,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,2,20:00,22:00,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,2,19:00,20:30,Fantasmic!,1,0,,,,,0,,,
## 01/06/2015,,3,5,1,1,2015,CHRISTMAS,6,0,,0,,,,0,0,CHRISTMAS,77.67,54.88,66.28,,0,,0,,0,,0,,86%,92%,98%,77%,96%,82%,69%,94%,100%,98%,98%,76%,100%,96%,93%,100%,100%,83%,100%,100%,92%,63%,93%,17:46,0,0,0,0,12,14,12,0,0,1,0,0,0,0,0,13,12,12,0,0,0,0,1,0,10,13,10,0,0,1,0,0,0,8,8,9,9:00,21:00,12,9:00,21:00,9:00,23:00,14,9:00,21:00,12,9:00,21:00,12,8:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,19:00,10,9:00,19:00,9:00,20:00,11,9:00,19:00,10,9:00,17:00,8,9:00,17:00,9:00,17:00,8,9:00,17:00,8,1,1,0,0,NONE,50.093571429,70.2,49.5,0.12,615046,367265,296273,236654,53894754,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,0,,,,1,20:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,1,19:00,,Fantasmic!,1,0,,,,,0,,,
## 01/07/2015,,4,6,1,1,2015,CHRISTMAS,7,0,,0,,marwk,,0,1,CHRISTMAS,67.24,48.56,57.9,,0,,0,,0,,0,,88%,94%,99%,78%,97%,83%,69%,94%,100%,100%,100%,76%,100%,100%,93%,100%,100%,100%,100%,100%,100%,63%,93%,17:47,0,0,1,0,12,12,13,0,0,0,1,0,0,0,0,12,13,12,0,0,0,0,0,0,10,10,10,1,0,0,0,0,0,9,8,8,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,19:00,10,9:00,19:00,9:00,19:00,10,9:00,19:00,10,9:00,17:00,8,8:00,17:00,9:00,17:00,8,9:00,17:00,8,1,1,0,0,NONE,47.188571429,70.3,49.5,0.12,615046,367265,296273,236654,53894754,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,0,,,,1,20:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,1,19:00,,Fantasmic!,1,0,,,,,0,,,
## 01/08/2015,,5,7,1,1,2015,CHRISTMAS,8,0,,0,,marwk,,0,1,CHRISTMAS,59.44,38.7,49.07,,0,,0,,0,,0,,88%,94%,99%,78%,97%,83%,69%,94%,100%,100%,100%,76%,100%,100%,93%,100%,100%,100%,100%,100%,100%,63%,93%,17:47,1,0,0,0,13,12,12,0,0,0,0,0,0,0,1,12,12,14,0,0,0,0,0,0,10,10,10,0,1,0,0,0,0,8,9,9,9:00,21:00,12,8:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,19:00,10,9:00,19:00,9:00,19:00,10,9:00,19:00,10,9:00,17:00,8,9:00,17:00,9:00,17:00,8,9:00,18:00,9,1,1,0,0,NONE,48.372142857,70.3,49.4,0.08,615046,367265,296273,236654,53894754,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,2,19:00,21:00,Main Street Electrical Parade,1,20:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,1,19:00,,Fantasmic!,1,0,,,,,0,,,
## 01/09/2015,,6,8,1,1,2015,CHRISTMAS,9,0,,0,,marwk,,0,1,CHRISTMAS,54.89,45.37,50.13,,0,,0,,0,,0,,88%,94%,99%,78%,97%,83%,69%,94%,100%,100%,100%,76%,100%,100%,93%,100%,100%,100%,100%,100%,100%,63%,93%,17:48,0,1,0,0,12,13,14,0,1,0,0,0,1,0,0,14,12,12,0,0,1,0,0,0,10,10,12,0,0,0,0,0,0,9,8,11,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,23:00,14,9:00,21:00,12,9:00,23:00,9:00,21:00,12,9:00,21:00,12,9:00,19:00,10,9:00,19:00,9:00,19:00,10,9:00,20:00,11,9:00,18:00,9,9:00,18:00,9:00,17:00,8,9:00,20:00,11,1,1,0,0,NONE,51.094285714,70.3,49.3,0.11,615046,367265,296273,236654,53894754,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,1,19:00,,Main Street Electrical Parade,1,20:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,1,19:00,,Fantasmic!,1,0,,,,,0,,,

How do I print only rows where `DAYOFWEEK` is `5` AND `YEAR` is `2015`?

Click here for solution

head metadata.csv | awk -F, '{if ($3 == 5 && $7 == 2015) {print $0}}'

## 01/01/2015,,5,0,0,1,2015,CHRISTMAS PEAK,0,5,nyd,1,,,,0,0,CHRISTMAS PEAK,73.02,59.81,66.41,,0,,0,,0,,0,,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,0%,17:42,1,1,0,0,18,19,17,0,0,0,0,0,0,0,1,13,17,15,0,0,0,0,1,0,14,16,14,0,1,0,0,0,0,11,15,12,8:00,25:00,17,7:00,25:00,8:00,26:00,18,8:00,25:00,17,8:00,21:00,13,8:00,21:00,8:00,25:00,17,8:00,21:00,13,8:00,22:00,14,8:00,22:00,8:00,24:00,16,8:00,22:00,14,8:00,19:00,11,8:00,19:00,8:00,22:00,14,8:00,20:00,12,1,1,0,0,NONE,53.375714286,70.3,50.2,0.12,616246,367265,296273,236654,53904354,34718635,26907827,20971646,1600,1000,2,12:00,15:30,Disney Festival of Fantasy Parade,1,22:15,,Main Street Electrical Parade,1,21:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,3,18:30,20:00,Fantasmic!,1,0,,,,,0,,,
## 01/08/2015,,5,7,1,1,2015,CHRISTMAS,8,0,,0,,marwk,,0,1,CHRISTMAS,59.44,38.7,49.07,,0,,0,,0,,0,,88%,94%,99%,78%,97%,83%,69%,94%,100%,100%,100%,76%,100%,100%,93%,100%,100%,100%,100%,100%,100%,63%,93%,17:47,1,0,0,0,13,12,12,0,0,0,0,0,0,0,1,12,12,14,0,0,0,0,0,0,10,10,10,0,1,0,0,0,0,8,9,9,9:00,21:00,12,8:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,21:00,12,9:00,21:00,9:00,21:00,12,9:00,21:00,12,9:00,19:00,10,9:00,19:00,9:00,19:00,10,9:00,19:00,10,9:00,17:00,8,9:00,17:00,9:00,17:00,8,9:00,18:00,9,1,1,0,0,NONE,48.372142857,70.3,49.4,0.08,615046,367265,296273,236654,53894754,34718635,26907827,20971646,1600,1000,1,15:00,,Disney Festival of Fantasy Parade,2,19:00,21:00,Main Street Electrical Parade,1,20:00,,Wishes Nighttime Spectacular,1,21:00,,IllumiNations: Reflections of Earth,0,,,0,,,,1,19:00,,Fantasmic!,1,0,,,,,0,,,

How do I get the average of values in a column containing the max temperature, `WDWMAXTEMP`?

Click here for solution

# Here NR represents the number of rows
head metadata.csv | awk -F, '{sum = sum + $19}END{print "Average max temp: " sum/NR}'

# Or alternatively we could track the number of rows as we go
head metadata.csv | awk -F, '{sum = sum + $19; count++}END{print "Average max temp: " sum/count}'

## Average max temp: 64.961
## Average max temp: 64.961

How do I get counts of each unique value in a column, `SEASON`?

Click here for solution

When executing the middle chunk of code, awk will create a set of values called seasons, whose elements are named by unique values in the 8-th column SEASON. For the SEASON value in a line, awk will add 1 to the corresponding element (this is ++). Thus, we get the count for each unique value.
In the END chunk of code, we print out season by going through its elements. The season in for (season in seasons) refers to the name of the elements. To access the actual value, we use seasons[season].
This is just one example of arrays in awk. You can find more details here: https://www.gnu.org/software/gawk/manual/html_node/Arrays.html

cat metadata.csv | awk -F, '{seasons[$8]++}END{for (season in seasons) {print season, seasons[season]}}'

## SUMMER BREAK 236
## CHRISTMAS 245
## JERSEY WEEK 50
## SEPTEMBER LOW 140
## PRESIDENTS WEEK 55
## FALL 212
## HALLOWEEN 26
## MEMORIAL DAY 20
## CHRISTMAS PEAK 176
## SEASON 1
## COLUMBUS DAY 20
## SPRING 490
## THANKSGIVING 60
## EASTER 95
## MARTIN LUTHER KING JUNIOR DAY 45
## MARDI GRAS 15
## JULY 4TH 25
## WINTER 222

How do I get counts of each unique value in a column, `SEASON`, but only print the values for `FALL`, `WINTER`, `SUMMER`, and `SPRING`?

Click here for solution

cat metadata.csv | awk -F, '{seasons[$8]++}END{for (season in seasons) {if (season == "FALL" || season == "SUMMER" || season == "WINTER" || season == "SPRING") print season, seasons[season]}}'

## FALL 212
## SPRING 490
## WINTER 222

Or a better solution would be to use the ~ operator:

cat metadata.csv | awk -F, '{seasons[$8]++}END{for (season in seasons) {if (season ~ /WINTER|SPRING|SUMMER|FALL/) print season, seasons[season]}}'

## SUMMER BREAK 236
## FALL 212
## SPRING 490
## WINTER 222

If you want to exclude "SUMMER BREAK", use the $ regular expression anchor. This forces it to only accept strings where the entire string ends in "SUMMER" so "SUMMER BREAK" is excluded as it ends in " BREAK" not "SUMMER":

cat metadata.csv | awk -F, '{seasons[$8]++}END{for (season in seasons) {if (season ~ /WINTER|SPRING|SUMMER$|FALL/) print season, seasons[season]}}'

## FALL 212
## SPRING 490
## WINTER 222

~ & . & ..

~ represents the location which is in the environment variable $HOME. If you change $HOME, ~ also changes. As you are navigating directories, to jump to the most previously visited directory, you can run ~-. For example, if you navigate to /home/$USER/projects/project1/output, then to /home/$USER, and you'd like to jump directly back to /home/$USER/projects/project1/output, simply run ~-. ~- is simply a reference to the location stored in $OLDPWD.

. represents the current working directory. For example, if you are in your home directory /home/$USER, . means "in this directory", and ./some_file.txt would represent a file named some_file.txt which is in your home directory /home/$USER.

.. represents the parent directory. For example, /home is the parent directory of /home/$USER. If you are currently in /home/$USER/projects and you want to access some file in the home directory, you could do ../some_file.txt. ../some_file.txt is called a relative path as it is relative to your current location. If we accessed ../some_file.txt from the home directory, this would be different than accessing ../some_file.txt from a different directory. /home/$USER/some_file.txt is an absolute or full path of a file some_file.txt.

Examples

If I am in the directory `/home/kamstut/projects` directory, what is the relative path to `/home/mdw/`?

Click here for solution

../../mdw

If I am in the directory `/home/kamstut/projects/project1`, what is the absolute path to the file `../../scripts/runthis.sh`?

Click here for solution

/home/kamstut/scripts/runthis.sh

How can I navigate to my `$HOME` directory?

Click here for solution

cd
cd ~
cd $HOME
cd /home/$USER

Piping & Redirection

Redirection is the act of writing standard input (stdin) or standard output (stdout) or standard error (stderr) somewhere else. stdin, stdout, and stderr all have numeric representations of 0, 1, & 2 respectively.

Piping is a form of redirection, but rather than redirect output to stdin, stdout, or stderr, we redirect the output to further commands for more processing.

Redirection

Examples

For the following examples we use the example file redirection.txt. The contents of which are:

cat redirection.txt

## This is a simple file with some text.
## It has a couple of lines of text.
## Here is some more.

How do I redirect text from a command like `ls` to a file like `redirection.txt`, completely overwriting any text already within `redirection.txt`?

Click here for solution

# Save the stdout from the ls command to redirection.txt
ls > redirection.txt

# The new contents of redirection.txt
head redirection.txt

## 01-scholar.Rmd
## 02-data-formats.Rmd
## 03-unix.Rmd
## 04-sql.Rmd
## 05-r.Rmd
## 06-python.Rmd
## 07-tools.Rmd
## 08-faqs.Rmd
## 09-projects.Rmd
## 10-fall-2020-projects.Rmd

How do I redirect text from a command like `ls` to a file like `redirection.txt`, without overwriting any text, but rather appending the text to the end of the file?

Click here for solution

# Append the stdout from the ls command to the end of redirection.txt
ls >> redirection.txt

head redirection.txt

## This is a simple file with some text.
## It has a couple of lines of text.
## Here is some more.
## 01-scholar.Rmd
## 02-data-formats.Rmd
## 03-unix.Rmd
## 04-sql.Rmd
## 05-r.Rmd
## 06-python.Rmd
## 07-tools.Rmd

How can I redirect text from a file to be used as stdin for another program or command?

Click here for solution

# Let's count the number of words in redirection.txt
wc -w < redirection.txt

## 20

How can I use multiple redirects in a single line?

Click here for solution

# Here we count the number of words in redirection.txt and then 
# save that value to value.txt.
wc -w < redirection.txt > value.txt

head value.txt

## 20

Piping

Piping is the act of taking the output of one or more commands and making the output the input of another command. This is accomplished using the "|" character.

Examples

For the following examples we use the example file piping.txt. The contents of which are:

cat piping.txt

## apples, oranges, grapes
## pears, apples, peaches,
## celery, carrots, peanuts
## fruits, vegetables, ok

How can I use the output from a `grep` command to another command?

Click here for solution

grep -i "p\{2\}" piping.txt | wc -w

## 6

How can I chain multiple commands together?

Click here for solution

# Get the third column of piping.txt and 
# get all lines that end in "s" and sort 
# the words in reverse order, and append
# to a file called food.txt.
cut -d, -f3 piping.txt | grep -i ".*s$" | sort -r > food.txt

Resources

Intro to I/O Redirection

A quick introduction to stdin, stdout, stderr, redirection, and piping.

Cron

Cron is a unix application used to schedule commands or tasks to run at a specific time or at a specific time interval. For example, let's say you have a program called generate_report.py that reads some data from the system and generates a report to email to your superiors. Cron would be perfectly suited to do this at the end of each month, without you needing to do a single manual task. To do so, do the following:

Open the crontab. The crontab is the text document containing your cron jobs.

# -e stands for "edit"
crontab -e

This command will open a text editor for you to write your crontab. Then, on a single line, paste the following content:

0 0 1 * * /full/path/to/generate_report.py

Once you save the file, the cron job will take effect. This cron job would run at midnight, on the first day of every month. The rough format of a cron job is:

minute hour day (of month) month day (of week)

So, the first 0 represented minute 0. The second 0, hour 0. The first 1, the first day of the month. The first *, every month. The second *, any day of the week.

If you are uncomfortable using the text editor on Scholar (nano/vim/emacs), there is an alternative way to modify your crontab.

Create a text file in RStudio by clicking File > New File > Text File. To the first line, paste the following content:

0 0 1 * * /full/path/to/generate_report.py

Important note: You must include the newline following the line of text.

Save the file to your $HOME directory as my_cron.txt. Once you've saved your file, you should be able to see it in the bottom right hand corner of RStudio (you may need to click the refresh button to make it appear).

Once complete, open a terminal by clicking Code > Terminal > Open New Terminal at File Location. If this option isn't present, it is likely you already have a terminal tab open in RStudio. Navigate to the terminal. To update your crontab to the contents of your text file, my_cron.txt, type the following (into the terminal):

crontab $HOME/my_cron.txt

Important note: If you get an error that says "premature EOF", you forgot to add a newline (empty line) to the end of your my_cron.txt.

If the command runs without error, your crontab has been successfully installed! You can check by running the following command in the terminal:

crontab -l

Examples

Write a cron job that runs `generate_report.py` every minute.

Click here for solution

* * * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every 5 minutes.

Click here for solution

*/5 * * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every 10 minutes.

Click here for solution

*/10 * * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` on the 5th minute of every hour.

Click here for solution

5 * * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every hour.

Click here for solution

0 * * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every other hour.

Click here for solution

0 */2 * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every other minute of every other hour.

Click here for solution

*/2 */2 * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every day at 5 AM.

Click here for solution

0 5 * * * /full/path/to/generate_report.py

Write a cron job that runs `generate_report.py` every day at 2:22 PM.

Click here for solution

22 14 * * * /full/path/to/generate_report.py

How do I remove a cron job when I no longer want it to run?

Click here for solution

First, open the crontabs:

crontab -e

Then, delete the line containing the cron job you no longer wish to run. Save the file. Upon saving the file, the cron job you deleted will no longer run.

Resources

Crontab Guru

An incredibly helpful tool for writing cron jobs.

Emacs

Nano

Vim

Writing scripts

bash stands for "Bourne Again Shell". There are many types of shells, including but not limited to: ksh, zsh, csh, tcsh, fish. When you open a terminal emulator, it will typically run a shell. You can write a bash script, zsh script, csh script, etc. Typically, when you have an interpreter, you can write scripts for them. For example, even though R and Python are not shells, we can write scripts for those languages. As bash is the default shell for many linux operating systems today, we will keep referring to scripts as "bash scripts", but take note that in general the same applies for other shells too.

A bash script is more or less a series of bash commands used to perform a sequence of actions. It is similar to a .R script, but instead of R code, we have bash commands.

A bash script starts with the "shebang" or "bang" line or "hash-bang" -- #!/bin/bash. The shebang is used to indicate which interpreter to use to execute the script. For example, if you were using zsh instead, your shebang might read #!/bin/zsh.

Take the following bash script:

#!/bin/bash

echo "First argument: $1"
echo "Second argument: $2"

If you were to place that text inside of a file called my_script:

echo '#!/bin/bash

echo "First argument: $1"
echo "Second argument: $2"' > $HOME/my_script

And then run it:

cd $HOME
chmod +x ./my_script
./my_script okay cool

The second line of code is to set the permission so that your script is executable. You would get the following result:

First argument: okay
Second argument: cool

The operating system would use the interpreter located /bin/bash to execute the script. This would produce the same results:

cd $HOME
/bin/bash my_script okay cool

But instead we only have to run:

cd $HOME
./my_script okay cool

Note that if you were to change the shebang to say #!/usr/bin/python and try running the following:

cd $HOME
./my_script okay cool

You would get an error that reads:

File "./my_script", line 3
  echo "First argument: $1"
                          ^
SyntaxError: invalid syntax

The reason is that the operating system is using the Python interpreter located /usr/bin/python to run the bash code in our script, my_script. Since our code is not Python code, we get this error.

Arguments

A bash script can accept arguments. This is just like many programs we've used to date (grep, cut, awk, etc.). For example:

grep -i 'special'

Here, -i and 'special' are arguments to grep. -i is the first argument, and 'special' is the second. If you run the following script:

#!/bin/bash

echo "First argument: $1"
echo "Second argument: $2"

You can see that this is indeed the truth:

cd $HOME
./my_script -i 'special'

First argument: -i
Second argument: special

In a bash script the first argument is denoted by $1 the second by $2 the third by $3 etc. In fact, $0 denotes the command used to run the script:

#!/bin/bash

echo "Command: $0"
echo "First argument: $1"
echo "Second argument: $2"

cd $HOME
./my_script okay cool

Command: ./my_script
First argument: okay
Second argument: cool

Examples

Write a script called `indyflights.sh` that takes a file from this directoy as its input: `/class/datamine/data/flights/subset` and returns the number of flights that have `IND` as the origin or destination.

Click here for solution

#!/bin/bash

cat /class/datamine/data/flights/subset/$1 | cut -d, -f17,18 | grep IND | wc -l

Modify your script from this problem to accept an argument containing an airport code (for example `IND`). Your script should determine how many flights have origin or destination `IND` (or your given airport code) altogether (across all years in all of the flights files).

Click here for solution

#!/bin/bash
for i in {1987..2008}; do
  count=$(cat /class/datamine/data/flights/subset/$i.csv | cut -d, -f17,18 | grep $1 | wc -l)
  sum=$((sum + count))
done

echo "$sum"

Note: This option would work better if you need to use variable substitution in your range (from 1987 to 2008).

#!/bin/bash
for ((i=1987; i<=2008; i++)); do
  count=$(cat /class/datamine/data/flights/subset/$i.csv | cut -d, -f17,18 | grep $1 | wc -l)
  sum=$((sum + count))
done

echo "$sum"

Unix

Getting started

Standard utilities

man

How do I show the man pages for the wc utility?

cat

head

How do I show the first 5 lines of a file called input.txt?

tail

How do I show the last 5 lines of a file called input.txt?

ls

How do I list the files in my $HOME directory?

How do I list the files in the directory /home/$USER/projects?

How do I list all files and folders, including hidden files and folders in /home/$USER/projects?

How do I list all files and folders in /home/$USER/projects in a list format, including information like permissions, filesize, etc?

How do I list all files and folders, including hidden files and folders in /home/$USER/projects in a list format, including information like permissions, filesize, etc?

du

Examples

How do I get the size of a file called ./metadata.csv in bytes?

How do I get the size of a file called ./metadata.csv in kilobytes?

Why is the result of du -b ./metadata.csv divided by 1024 not the result of du -k ./metadata.csv?

cp

How do I copy /home/$USER/some_file.txt to /home/$USER/projects/same_file.txt?

mv

How do I move /home/$USER/some_file.txt to /home/$USER/projects/same_file.txt?

touch

mkdir

Examples

How do I create a new directory called my_directory in the current directory?

How do I create a new directory called my_directory in the parent directory?

How do I create a set of two new nested directories in the current directory?

rm

Examples

How do I remove a folder called my_folder and all of its contents recursively. Assume my_folder is in /home/user/projects.

How do I remove all files in a folder ending in .txt? Assume we are looking at files in /home/user/projects.

rmdir

pwd

type

Where is the file that is executed when I type ls?

uniq

wc

How do I count the number of lines of an input file called input.txt?

How do I count the number of characters of an input file called input.txt?

How do I count the number of words of an input file called input.txt?

ssh

mosh

scp

cut

How do I get the first column of a csv file called 'office.csv`?

How do I get the first and third column of a csv file called 'office.csv`?

How do I get the first and third column of a file with columns separated by the "|" character?

sed

grep

How do I search for lines with the word "Exact" in the file located /home/john/report.txt?

How do I search for lines with the word "Exact" or "exact" in the file located /home/john/report.txt?

How do I search for lines with a string containing multiple words, like "how do I"?

How do I search for lines with the word "Exact" or "exact" in the files in the folder and all sub-folders located /home/john/?

How do I search for the lines that don't contain the words "Exact" or "exact" in the folder and all sub-folders located /home/john/?

How do I search for lines where one or more of the words "first" or "second" appears in the current folder and all sub-folders?

How do I search for lines that begin with the word "Exact" (case insensitive) in the folder and all sub-folders located in the current directory?

How do I search for lines that end with the word "Exact" (case insensitive) in the files in the current folder and all sub-folders?

How do I search for lines that contain only the word "Exact" (case insensitive) in the files in the current folder and all sub-folders?

How do I search for strings or sub-strings where the first character could be anything, but the next two characters are "at"? For example: "cat", "bat", "hat", "rat", "pat", "mat", etc.

How do I search for zero or one of, zero or more of, one or more of, exactly n of a certain character using grep and regular expressions?

Resources

ripgrep

How do I exclude a filetype when searching for foo in my_directory?

How do I search for a particular filetype when searching for foo in my_directory?

How do I search for a specific word, where the word isn't part of another word?

How do I replace every match foo in my_directory with the text given, bar, when printing results?

How do I trim whitespace from the beginning and ending of each printed line?

How do I follow symbolic links when searching a directory, my_directory?

find

Examples

How do I find a file named foo.txt in the current working directory or subdirectories?

How do I find a file named foo.txt or Foo.txt or FoO.txt (i.e. ignoring case) in the current working directory or subdirectories?

How do I find a directory named foo in the current working directory or subdirectories?

How do I find all of the Python files in the current working directory or subdirectories?

How do I find files over 1gb in size in the current working directory or subdirectories?

How do I find files under 10mb in size in the current working directory or subdirectories?

`man`

How do I show the man pages for the `wc` utility?

`cat`

`head`

How do I show the first 5 lines of a file called `input.txt`?

`tail`

How do I show the last 5 lines of a file called `input.txt`?

`ls`

How do I list the files in my `$HOME` directory?

How do I list the files in the directory `/home/$USER/projects`?

How do I list all files and folders, including hidden files and folders in `/home/$USER/projects`?

How do I list all files and folders in `/home/$USER/projects` in a list format, including information like permissions, filesize, etc?

How do I list all files and folders, including hidden files and folders in `/home/$USER/projects` in a list format, including information like permissions, filesize, etc?

`du`

How do I get the size of a file called `./metadata.csv` in bytes?

How do I get the size of a file called `./metadata.csv` in kilobytes?

Why is the result of `du -b ./metadata.csv` divided by 1024 not the result of `du -k ./metadata.csv`?

`cp`

How do I copy `/home/$USER/some_file.txt` to `/home/$USER/projects/same_file.txt`?

`mv`

How do I move `/home/$USER/some_file.txt` to `/home/$USER/projects/same_file.txt`?

`touch`

`mkdir`

How do I create a new directory called `my_directory` in the current directory?

How do I create a new directory called `my_directory` in the parent directory?

`rm`

How do I remove a folder called `my_folder` and all of its contents recursively. Assume `my_folder` is in `/home/user/projects`.

How do I remove all files in a folder ending in `.txt`? Assume we are looking at files in `/home/user/projects`.

`rmdir`

`pwd`

`type`

Where is the file that is executed when I type `ls`?

`uniq`

`wc`

How do I count the number of lines of an input file called `input.txt`?

How do I count the number of characters of an input file called `input.txt`?

How do I count the number of words of an input file called `input.txt`?

`ssh`

`mosh`

`scp`

`cut`

`sed`

`grep`

How do I search for lines with the word "Exact" in the file located `/home/john/report.txt`?

How do I search for lines with the word "Exact" or "exact" in the file located `/home/john/report.txt`?

How do I search for lines with the word "Exact" or "exact" in the files in the folder and all sub-folders located `/home/john/`?

How do I search for the lines that don't contain the words "Exact" or "exact" in the folder and all sub-folders located `/home/john/`?

`ripgrep`

How do I exclude a filetype when searching for `foo` in `my_directory`?

How do I search for a particular filetype when searching for `foo` in `my_directory`?

How do I replace every match `foo` in `my_directory` with the text given, `bar`, when printing results?

How do I follow symbolic links when searching a directory, `my_directory`?

`find`

How do I find a file named `foo.txt` in the current working directory or subdirectories?

How do I find a file named `foo.txt` or `Foo.txt` or `FoO.txt` (i.e. ignoring case) in the current working directory or subdirectories?

How do I find a directory named `foo` in the current working directory or subdirectories?

`less`

How do I display the contents of a file, `foo.txt`?

How do I scroll up and down in `less`?

How do I exit `less`?

`sort`

How do I sort a csv, `flights_sample.csv` alphabetically by the 18th column?

How do I sort a csv, `flights_sample.csv` alphabetically by the 18th column, and then in descending order by the 4th column?

`git`

`awk`

How do I print only rows where the `DAYOFWEEK` is `5`?

How do I print the first, fourth, and fifth columns of rows where the `DAYOFWEEK` is `5`?

How do I print only rows where `DAYOFWEEK` is `5` OR `YEAR` is `2015`?

How do I print only rows where `DAYOFWEEK` is `5` AND `YEAR` is `2015`?

How do I get the average of values in a column containing the max temperature, `WDWMAXTEMP`?

How do I get counts of each unique value in a column, `SEASON`?

How do I get counts of each unique value in a column, `SEASON`, but only print the values for `FALL`, `WINTER`, `SUMMER`, and `SPRING`?

If I am in the directory `/home/kamstut/projects` directory, what is the relative path to `/home/mdw/`?

If I am in the directory `/home/kamstut/projects/project1`, what is the absolute path to the file `../../scripts/runthis.sh`?

How can I navigate to my `$HOME` directory?

How do I redirect text from a command like `ls` to a file like `redirection.txt`, completely overwriting any text already within `redirection.txt`?

How do I redirect text from a command like `ls` to a file like `redirection.txt`, without overwriting any text, but rather appending the text to the end of the file?

How can I use the output from a `grep` command to another command?

Write a cron job that runs `generate_report.py` every minute.

Write a cron job that runs `generate_report.py` every 5 minutes.