Chapter 11 UNIX Commands

Contributors: Stephanie Djajadi, Kunal Mishra, Anna Nguyen, Jade Benjamin-Chung, and Ben Arnold

We typically use Unix commands in Terminal (for Mac users) or Git Bash (for Windows users) to

  1. Run a series of scripts in parallel or in a specific order to reproduce our work
  2. To check on the progress of a batch of jobs
  3. To use git and push to github

11.1 Environment

On Mac OS, there is an application named Terminal that provides a bash shell interface to Unix. In Windows, one option is to install the git for Windows package: https://gitforwindows.org/.

The default coloring in a terminal window is pretty basic. If you want to make it more colorful in Mac OS, you can do that by saving a .bash_profile file in your home directory (note the “.” prefix on the file name). This is one example of how you can add color to your terminal by including custom coloring in your bash profile (copied from Ben’s profile):

# color terminal
export CLICOLOR=1
export LSCOLORS=GxFxCxDxBxegedabagaced
export PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '

The encoding is extremely cryptic, but there are decodings online (e.g., link).

Another bash shell that provides a large array of colors for Mac OS is iTerm2 (https://iterm2.com/). There are over 200 color schemes to choose from: https://github.com/mbadolato/iTerm2-Color-Schemes.

11.2 Basics

On the computer, there is a desktop with two folders, folder1 and folder2, and a file called file1. Inside folder1, we have a file called file2. Mac users can run these commands on their terminal; we recommend that Windows users use Git Bash, not Windows PowerShell.

Here is our example desktop.
Here is our example desktop.

11.3 Syntax for both Mac/Windows

When typing in directories or file names, quotes are necessary if the name includes spaces.

Command Description
cd desktop/folder1 Change directory to folder1
pwd Print working directory
ls List files in the directory
cp "file2" "newfile2" Copy file (remember to include file extensions when typing in file names like .pdf or .R)
mv “newfile2” “file3” Rename newfile2 to file3
cd .. Go to parent of the working directory (in this case, desktop)
mv “file1” folder2 Move file1 to folder2
mkdir folder3 Make a new folder in folder2
rm <filename> Remove files
rm -rf folder3 Remove directories (-r will attempt to remove the directory recursively, -rf will force removal of the directory)
clear Clear terminal screen of all previous commands
Here is an example of what your terminal might look like after executing the commands in the order listed above.
Here is an example of what your terminal might look like after executing the commands in the order listed above.

11.4 Running Bash Scripts

Windows Mac / Linux Description
chmod +750 <filename.sh> chmod +x <filename.sh> Change access permissions for a file (only needs to be done once)
./<filename.sh> ./<filename.sh> Run file (./ to run any executable file)
bash bash_script_name.sh & bash bash_script_name.sh & Run shell script in the background

11.5 Running Rscripts in Windows

Note: This code seems to work only with Windows Command Prompt, not with Git Bash.

When R is installed, it comes with a utility called Rscript. This allows you to run R commands from the command line. If Rscript is in your PATH, then typing Rscript into the command line, and pressing enter, will not error. Otherwise, to use Rscript, you will either need to add it to your PATH (as an environment variable), or append the full directory of the location of Rscript on your machine. To find the full directory, search for where R is installed your computer. For instance, it may be something like below (this will vary depending on what version of R you have installed):

C:\Program Files\R\R-3.6.0\bin

For appending the PATH variable, please view this link. I strongly recommend completing this option.

If you add the PATH as an environment variable, then you can run this line of code to test: Rscript -e “cat(‘this is a test’)", where the -e flag refers to the expression that will be executed.

If you do not add the PATH as an environment variable, then you can run this line of code to replicate the results from above: “C:\Program Files\R\R-3.6.0\bin” -e “cat(‘this is a test’)”

To run an R script from the command line, we can say: Rscript -e “source(‘C:/path/to/script/some_code.R’)”

11.5.1 Common Mistakes

  • Remember to include all of the quotation marks around file paths that have a spaces.
  • If you attempt to run an R script but run into Error: '\U' used without hex digits in character string starting "'C:\U", try replacing all \ with \\ or /.

11.6 Checking tasks and killing jobs

Windows Mac / Linux Description
tasklist ps -v List all processes on the command line
top -o [cpu/rsize] List all running processes, sorted by CPU or memory usage
taskkill /F /PID pid_number kill <PID_number> Kill a process by its process ID
taskkill /IM "process name" /F Kill a process by its name
start /b program.exe Runs jobs in the background (exclude /b if you want the program to run in a new console)
nohup Prevents jobs from stopping
disown Keeps jobs running in the background even if you close R
taskkill /? Help, lists out other commands

To kill a task in Windows, you can also go to Task Manager > More details > Select your desired app > Click on End Task.

11.7 Running big jobs

For big data workflows, the concept of “backgrounding” a bash script allows you to start a “job” (i.e. run the script) and leave it overnight to run. At the top level, a bash script (0-run-project.sh) that simply calls the directory-level bash scripts (i.e. 0-prep-data.sh, 0-run-analysis.sh, 0-run-figures.sh, etc.) is a powerful tool to rerun every script in your project. See the included example bash scripts for more details.

  • Running Bash Scripts in Background: Running a long bash script is not trivial. Normally you would run a bash script by opening a terminal and typing something like ./run-project.sh. But what if you leave your computer, log out of your server, or close the terminal? Normally, the bash script will exit and fail to complete. To run it in background, type ./run-project.sh &; disown. You can see the job running (and CPU utilization) with the command top or ps -v and check your memory with free -h.

Alternatively, to keep code running in the background even when an SSH connection is broken, you can use tmux. In terminal or gitbash follow the steps below. This site has useful tips on using tmux.

# create a new tmux session called session_name
tmux new -ssession_name

# run your job of interest
R CMD BATCH myjob.R & 
  
# check that it is running
ps -v

# to exit the tmux session (Mac)
ctrl + b 
d

# to reopen the tmux session to kill the job or 
# start another job
tmux attach -tsession_name 
  • Deleting Previously Computed Results: One helpful lesson we’ve learned is that your bash scripts should remove previous results (computed and saved by scripts run at a previous time) so that you never mix results from one run with a previous run. This can happen when an R script errors out before saving its result, and can be difficult to catch because your previously saved result exists (leading you to believe everything ran correctly).

  • Ensuring Things Ran Correctly: You should check the .Rout files generated by the R scripts run by your bash scripts for errors once things are run.