Creating Functions to Reduce Code Lines | Macarena Quiroga

Creating Functions to Reduce Code Lines

Discover how to use functions to simplify and optimize your code, saving time and reducing complexity.

Image made by me with the {aRtsy} package.

As I continue to improve my programming skills, I face new challenges. My approach to learning anything is always the same: focus on the big picture first, then refine the details.

One of those “details” that I never delved into was creating functions. I mean, I know it can be done; I did some exercises in tutorials, but they were always very basic functions like “calculate the average of a number” or convert Celsius to Fahrenheit, etc. I always told myself, “Let’s leave this topic for later,” and I think that later has arrived.

I realized this because every time I read or heard the idea that creating our own functions prevents us from reproducing the same code many times, I remembered this post where I repeated the same code 23 times to create each chart (terrible idea, I know). So today is the day: join me in understanding how to create my own function.


The Logic of Functions

The first thing we need to understand is that a function is an operation that will be executed on one or more elements, with a set of specifications that act as parameters. All of this is incorporated into the function as arguments.

In this case, the code that is repeated 23 times is as follows:

# 2000
a2000 <- county.map.drought %>% 
  filter(date == "2000-01-04" & !STATE %in% c("02", "15")) %>% 
  ggplot(aes(long, lat, group=group)) + 
  geom_polygon(aes(fill = value))+
  scale_fill_viridis_c(option = "plasma",
                       limits = c(0, 500))+
  labs(title = "2000",
       x = "", y = "")+
  theme_void()+
  theme(legend.position = "none",
        plot.title = element_text(size = 12, hjust = 0.5,
                                  family="Times"))

Essentially, it’s code that takes a dataframe that is always the same (county.map.drought), filters it by a specific date, and then plots it with a corresponding title. Therefore, the function only needs to take the date value to be plotted as an argument; from that value, we can segment it to use the year as the title of the chart.

drought_graph <- function(date){
  title <- str_sub(date, end = 4) 
  county.map.drought %>% 
    filter(date == date & !STATE %in% c("02", "15")) %>% 
    ggplot(aes(long, lat, group=group)) + 
    geom_polygon(aes(fill = value))+
    scale_fill_viridis_c(option = "plasma",
                       limits = c(0, 500))+
    labs(title = title,
       x = "", y = "")+
    theme_void()+
    theme(legend.position = "none",
        plot.title = element_text(size = 12, hjust = 0.5))
}

To further simplify, we can create a vector with specific dates, one for each year (in this case, the first measurement in January of each year), and then, using a for loop, iterate over those dates to create the corresponding chart for each year. We store them in a list and finally arrange those charts into a single one (this step was already present in the previous post, exactly the same way).

dates <- c("2000-01-04", "2001-01-02", "2002-01-01", "2003-01-07", "2004-01-06", "2005-01-04", "2006-01-03", "2007-01-02", "2008-01-01", "2009-01-06", "2010-01-05", "2011-01-04", "2012-01-03", "2013-01-01", "2014-01-07", "2015-01-06", "2016-01-05", "2017-01-03", "2018-01-02", "2019-01-01", "2020-01-07", "2021-01-05", "2022-01-04")

graphics <- list()

for (date in dates){
  graph <- drought_graph(date)
  graphics[[date]] <- ggplotGrob(graph)
}

grid.arrange(grobs = graphics[c("2000-01-04", "2001-01-02", "2002-01-01", "2003-01-07", "2004-01-06", "2005-01-04", "2006-01-03", "2007-01-02", "2008-01-01", "2009-01-06", "2010-01-05", "2011-01-04", "2012-01-03", "2013-01-01", "2014-01-07","2015- 01-06", "2016-01-05", "2017-01-03", "2018-01-02", "2019-01-01","2020-01-07", "2021-01-05", "2022-01-04")],
            top = grid::textGrob("Índice de Gravedad y Alcance de la sequía (DSCI) \nmedido la primera semana de enero"),
            nrow = 5, ncol = 6)

There are many things that can be further improved in this code. For example, the function could take only the year and automatically segment the date string. However, surprisingly, that brought me many problems: not only because the first measurement was not always taken on the same day in January (which could be resolved with row_number()), but also because the different types of groupings that could be done to solve that affected the values required to create the spatial graph in ways that I don’t fully understand.

Despite that, I am quite satisfied with the result. One thing I didn’t consider when starting this project was that the objects, containing information from 22 years for all states of the US, were very, very heavy, and every change I had to make involved reloading several-megabyte files (not to mention the final object with the 23 maps). Next time, I’ll try to work with smaller tables.

As always, remember you can suscribe to my blog to stay updated, and if you have any questions, don’t hesitate to contact me. And if you like what I do, you can buy me a cafecito from Argentina or a kofi.

Macarena Quiroga
Macarena Quiroga
Linguist/PhD student

I research language acquisition. I’m looking to deepen my knowledge of statistis and data science with R/Rstudio. If you like what I do, you can buy me a coffee from Argentina, or a kofi from other countries. Suscribe to my blog here.

Related