Chapter 5 Coding Style

Contributors: Kunal Mishra, Jade Benjamin-Chung, and Ben Arnold

5.1 Line breaks

  • For ggplot calls and dplyr pipelines, do not crowd single lines. Here are some nontrivial examples of “beautiful” pipelines, where beauty is defined by coherence:

    # Example 1
    school_names <- list(
      OUSD_school_names = absentee_all %>%
        filter(dist.n == 1) %>%
        pull(school) %>%
        unique() %>%
        sort(),
    
      WCCSD_school_names <- absentee_all %>%
        filter(dist.n == 0) %>%
        pull(school) %>%
        unique() %>%
        sort()
    )
    # Example 2
    absentee_all <- fread(file = raw_data_path) %>%
      mutate(program = case_when(schoolyr %in% pre_program_schoolyrs ~ 0,
                                 schoolyr %in% program_schoolyrs ~ 1)) %>%
      mutate(period = case_when(schoolyr %in% pre_program_schoolyrs ~ 0,
                                schoolyr %in% LAIV_schoolyrs ~ 1,
                                schoolyr %in% IIV_schoolyrs ~ 2)) %>%
      filter(schoolyr != "2017-18")

    And of a complex ggplot call:

    # Example 3
    ggplot(data=data,
           mapping=aes_string(x="year", y="rd", group=group)) +
    
      geom_point(mapping=aes_string(col=group, shape=group),
                 position=position_dodge(width=0.2),
                 size=2.5) +
    
      geom_errorbar(mapping=aes_string(ymin="lb", ymax="ub", col=group),
                    position=position_dodge(width=0.2),
                    width=0.2) +
    
      geom_point(position=position_dodge(width=0.2),
                 size=2.5) +
    
      geom_errorbar(mapping=aes(ymin=lb, ymax=ub),
                    position=position_dodge(width=0.2),
                    width=0.1) +
    
      scale_y_continuous(limits=limits,
                         breaks=breaks,
                         labels=breaks) +
    
      scale_color_manual(std_legend_title,values=cols,labels=legend_label) +
      scale_shape_manual(std_legend_title,values=shapes, labels=legend_label) +
      geom_hline(yintercept=0, linetype="dashed") +
      xlab("Program year") +
      ylab(yaxis_lab) +
      theme_complete_bw() +
      theme(strip.text.x = element_text(size = 14),
            axis.text.x = element_text(size = 12)) +
      ggtitle(title)

Imagine (or perhaps mournfully recall) the mess that can occur when you don’t strictly style a complicated ggplot call. Trying to fix bugs and ensure your code is working can be a nightmare. Now imagine trying to do it with the same code 6 months after you’ve written it. Invest the time now and reap the rewards as the code practically explains itself, line by line.

5.2 Automated Tools for Style and Project Workflow

5.2.1 Styling

  1. Code Autoformatting - RStudio includes a fantastic built-in utility (keyboard shortcut: CMD-Shift-A) for autoformatting highlighted chunks of code to fit many of the best practices listed here. It generally makes code more readable and fixes a lot of the small things you may not feel like fixing yourself. Try it out as a “first pass” on some code of yours that doesn’t follow many of these best practices!

  2. Assignment Aligner - A cool R package allows you to very powerfully format large chunks of assignment code to be much cleaner and much more readable. Follow the linked instructions and create a keyboard shortcut of your choosing (recommendation: CMD-Shift-Z). Here is an example of how assignment aligning can dramatically improve code readability:

# Before
OUSD_not_found_aliases <- list(
  "Brookfield Village Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Brookfield"),
  "Carl Munck Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Munck"),
  "Community United Elementary School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Community United"),
  "East Oakland PRIDE Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "East Oakland Pride"),
  "EnCompass Academy" = str_subset(string = OUSD_school_shapes$schnam, pattern = "EnCompass"),
  "Global Family School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Global"),
  "International Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "International Community"),
  "Madison Park Lower Campus" = "Madison Park Academy TK-5",
  "Manzanita Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Manzanita Community"),
  "Martin Luther King Jr Elementary" = str_subset(string = OUSD_school_shapes$schnam, pattern = "King"),
  "PLACE @ Prescott" = "Preparatory Literary Academy of Cultural Excellence",
  "RISE Community School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Rise Community")
)
# After
OUSD_not_found_aliases <- list(
  "Brookfield Village Elementary"      = str_subset(string = OUSD_school_shapes$schnam, pattern = "Brookfield"),
  "Carl Munck Elementary"              = str_subset(string = OUSD_school_shapes$schnam, pattern = "Munck"),
  "Community United Elementary School" = str_subset(string = OUSD_school_shapes$schnam, pattern = "Community United"),
  "East Oakland PRIDE Elementary"      = str_subset(string = OUSD_school_shapes$schnam, pattern = "East Oakland Pride"),
  "EnCompass Academy"                  = str_subset(string = OUSD_school_shapes$schnam, pattern = "EnCompass"),
  "Global Family School"               = str_subset(string = OUSD_school_shapes$schnam, pattern = "Global"),
  "International Community School"     = str_subset(string = OUSD_school_shapes$schnam, pattern = "International Community"),
  "Madison Park Lower Campus"          = "Madison Park Academy TK-5",
  "Manzanita Community School"         = str_subset(string = OUSD_school_shapes$schnam, pattern = "Manzanita Community"),
  "Martin Luther King Jr Elementary"   = str_subset(string = OUSD_school_shapes$schnam, pattern = "King"),
  "PLACE @ Prescott"                   = "Preparatory Literary Academy of Cultural Excellence",
  "RISE Community School"              = str_subset(string = OUSD_school_shapes$schnam, pattern = "Rise Community")
)
  1. StyleR - Another cool R package from the Tidyverse that can be powerful and used as a first pass on entire projects that need refactoring. The most useful function of the package is the style_dir function, which will style all files within a given directory. See the function’s documentation and the vignette linked above for more details.
    • Note: The default Tidyverse styler is subtly different from some of the things we’ve advocated for in this document. Most notably we differ with regards to the number of spaces before/after “tokens” (i.e. Assignment Aligner add spaces before = signs to align them properly). For this reason, we’d recommend the following: style_dir(path = ..., scope = "line_breaks", strict = FALSE). You can also customize StyleR even more if you’re really hardcore.
    • Note: As is mentioned in the package vignette linked above, StyleR modifies things in-place, meaning it overwrites your existing code and replaces it with the updated, properly styled code. This makes it a good fit on projects with version control, but if you don’t have backups or a good way to revert back to the intial code, I wouldn’t recommend going this route.