lubridate
lubridate
packageWe have learned about different data type classes in previous
lessons. Some common data classes we have examined before include
character, factor, and numeric. But R also recognizes a data class
called “Dates”. Having your date data in the “Dates” data class is very
useful, as you can then do things like calculate time between two
events, transform the dates into different formats, and plot temporal
data easily. In this lesson, we are going to introduce how base R deals
with dates (POSIXct
or POSITlt
), but we are
going to spend the majority of our lesson on the package
lubridate
. lubridate
is a great package that
makes it much easier to work with dates and times in R.
Importantly, there are 3 basic time classes in R:
Dates
(just dates, i.e.,
2012-02-10)POSIXct
(“ct” ==
calendar time, best class for dates with times)POSIXlt
(“lt” ==
local time, enables easy extraction of specific components of a time,
however, remember that POXIXlt objects are lists)Unfortunately converting dates & times in R into formats that are computer readable can be frustrating, mainly because there is very little consistency. In particular, if you are importing things from Excel, keep in mind dates can get especially weird1, depending on the operating system you are working on, the format of your data, etc.
1 For example Excel stores dates as a number representing days since 1900-Jan-0, plus a fractional portion of a 24 hour day (serial-time), but in OSX (Mac), it is 1904-Jan-0.
The Date
class in R can easily be converted or operated
on numerically, depending on the interest. Let’s make a string of dates
to use for our example:
sample_dates_1 <- c("2018-02-01", "2018-03-21", "2018-10-05", "2019-01-01", "2019-02-18")
#notice we have dates across two years here
What is the class that R classifies this data as?
R classifies our sample_dates_1
data as character data.
Let’s transform it into Dates. Notice that our
sample_dates_1
is in a nice format: YYYY-MM-DD. This is the
format necessary for the function as.Date
.
sample_dates_1 <- as.Date(sample_dates_1)
What happens with different orders…say MM-DD-YYYY?
# Some sample dates:
sample_dates_2 <- c("02-01-2018", "03-21-2018", "10-05-2018", "01-01-2019", "02-18-2019")
sample_dates_3 <-as.Date(sample_dates_2) # well that doesn't work
The reason this doesn’t work is because the computer expects one thing, but is getting something else. Remember, write code you can read and your computer can understand. So we need to give some more information here so R will interpret our data correctly.
# Some sample dates:
sample_dates_2 <- c("02-01-2018", "03-21-2018", "10-05-2018", "01-01-2019", "02-18-2019")
sample_dates_3<- as.Date(sample_dates_2, format = "%m-%d-%Y" ) # date code preceded by "%"
To see a list of the date-time format codes in R, check out this page and
table, or you can use: ?(strptime)
The nice thing is this method works well with pretty much any format, you just need to provide the associated codes and structure:
as.Date("2016/01/01", format="%Y/%m/%d")
=2016-01-01
as.Date("05A21A2011", format="%mA%dA%Y")
=2011-05-21
Format this date with the as.Date
function:
Jul 04, 2019
as.Date("Jul 04, 2019", format = "%b%d,%Y")
## [1] "2019-07-04"
When working with times, the best class to use in base R is
POSIXct
.
tm1 <- as.POSIXct("2016-07-24 23:55:26")
tm1
## [1] "2016-07-24 23:55:26 PDT"
tm2 <- as.POSIXct("25072016 08:32:07", format = "%d%m%Y %H:%M:%S")
tm2
## [1] "2016-07-25 08:32:07 PDT"
#Notice how POSIXct automatically uses the timezone your computer is set to. What if we collected this data in a different timezone?
# specify the time zone:
tm3 <- as.POSIXct("2010-12-01 11:42:03", tz = "GMT")
tm3
## [1] "2010-12-01 11:42:03 GMT"
lubridate
PackageThe lubridate
package will handle 90% of the date &
datetime issues you need to deal with. It is fast, much easier to work
with, and we recommend using it wherever possible. Do keep in mind
sometimes you need to fall back on the base R functions (i.e.,
as.Date()
), which is why having a basic understanding of
theses functions and their use is important.
To use lubridate
we will first need to install and load
the package.
#install.packages("lubridate")
library(lubridate)
lubridate
has lots of handy functions for converting
between date and time formats, and even timezones.
Let’s take a look at our sample_dates_1
data again.
sample_dates_1 <- c("2018-02-01", "2018-03-21", "2018-10-05", "2019-01-01", "2019-02-18")
Once again, R reads this in a character data.
Lubridate uses functions that looks like ymd
or
mdy
to transform data into the class “Date”. Our
sample_dates_1
data is formatted like Year, Month, Day, so
we would use the lubridate
function ymd
(y =
year, m = month, d = day).
sample_dates_lub <- ymd(sample_dates_1)
What about that messier sample_dates_2
data? Remember R
wants dates to be in the format YYYY-MM-DD.
sample_dates_2 <- c("2-01-2018", "3-21-2018", "10-05-18", "01-01-2019", "02-18-2019")
#notice that some numbers for years and months are missing
sample_dates_lub2 <- mdy(sample_dates_2) #lubridate can handle it!
All sorts of date formats can easily be transformed using
lubridate
:
lubridate::ymd("2016/01/01")
=2016-01-01lubridate::ymd("2011-03-19")
=2011-03-19lubridate::mdy("Feb 19, 2011")
=2011-02-19lubridate::dmy("22051997")
=1997-05-22lubridate
for Time and Timezoneslubridate
has very similar functions to handle data with
Times and Timezones. To the ymd
function, add
_hms
or _hm
(h= hours, m= minute, s= seconds)
and a tz
argument. lubridate
will default to
the POSIXct format.
lubridate::ymd_hm("2016-01-01 12:00", tz="America/Los_Angeles")
= 2016-01-01 12:00:00lubridate::ymd_hm("2016/04/05 14:47", tz="America/Los_Angeles")
= 2016-04-05 14:47:00lubridate::ymd_hms("2016/04/05 4:47:21 PM", tz="America/Los_Angeles")
= 2016-04-05 16:47:21For lubridate to work, you need the column datatype to be
character or factor. The
readr
package (from the tidyverse
) is smart
and will try to guess for you. Problem is, it might convert your data
for you without the settings (in this case the proper timezone). So here
are few ways to work around this.
library(lubridate)
library(dplyr)
library(readr)
# read in some data and skip header lines
nfy1 <- read_csv("data/2015_NFY_solinst.csv", skip = 12)
head(nfy1) #R tried to guess for you that the first column was a date and the second a time
## # A tibble: 6 × 5
## Date Time ms Level Temperature
## <date> <time> <dbl> <dbl> <dbl>
## 1 2015-05-22 14:00 0 -8.68 0
## 2 2015-05-22 14:15 0 -8.29 0
## 3 2015-05-22 14:30 0 -8.29 0
## 4 2015-05-22 14:45 0 -8.29 0
## 5 2015-05-22 15:00 0 -8.30 0
## 6 2015-05-22 15:15 0 -8.29 0
# import raw dataset & specify column types
nfy2 <- read_csv("data/2015_NFY_solinst.csv", col_types = "ccidd", skip=12)
glimpse(nfy1) # notice the data types in the Date.Time and datetime cols
## Rows: 7,764
## Columns: 5
## $ Date <date> 2015-05-22, 2015-05-22, 2015-05-22, 2015-05-22, 2015-05-2…
## $ Time <time> 14:00:00, 14:15:00, 14:30:00, 14:45:00, 15:00:00, 15:15:0…
## $ ms <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Level <dbl> -8.6834, -8.2928, -8.2914, -8.2901, -8.2955, -8.2935, -8.2…
## $ Temperature <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
glimpse(nfy2)
## Rows: 7,764
## Columns: 5
## $ Date <chr> "2015/05/22", "2015/05/22", "2015/05/22", "2015/05/22", "2…
## $ Time <chr> "14:00:00", "14:15:00", "14:30:00", "14:45:00", "15:00:00"…
## $ ms <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Level <dbl> -8.6834, -8.2928, -8.2914, -8.2901, -8.2955, -8.2935, -8.2…
## $ Temperature <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
Next we want to create a single datetime column. How do we get our
Date and Time columns into one column so we can format it as a datetime?
The answer is the paste
function.
paste()
allows pasting text or vectors (& columns)
by a given separator that you specify with the sep =
argumentpaste0()
is the same thing, but defaults to using no
separator (i.e. no space).# make a datetime col:
nfy2$datetime <- paste(nfy2$Date, " ", nfy2$Time, sep = "")
glimpse(nfy2) #notice the datetime column is classifed as character
## Rows: 7,764
## Columns: 6
## $ Date <chr> "2015/05/22", "2015/05/22", "2015/05/22", "2015/05/22", "2…
## $ Time <chr> "14:00:00", "14:15:00", "14:30:00", "14:45:00", "15:00:00"…
## $ ms <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Level <dbl> -8.6834, -8.2928, -8.2914, -8.2901, -8.2955, -8.2935, -8.2…
## $ Temperature <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ datetime <chr> "2015/05/22 14:00:00", "2015/05/22 14:15:00", "2015/05/22 …
# convert Date Time to POSIXct in local timezone using lubridate
nfy2$datetime_test <- as_datetime(nfy2$datetime,
tz="America/Los_Angeles")
# OR convert using the ymd_functions
nfy2$datetime_test2 <- ymd_hms(nfy2$datetime, tz="America/Los_Angeles")
# OR wrap in as.character()
nfy1$datetime <- ymd_hms(as.character(paste0(nfy1$Date, " ", nfy1$Time)), tz="America/Los_Angeles")
tz(nfy1$datetime)
## [1] "America/Los_Angeles"
Last, lubridate
lets you extract components of date,
time and datetime data types with intuitive functions.
# Functions called day(), month(), year(), hour(), minute(), second(), etc... will extract those elements of a datetime column.
months <- month(nfy2$datetime)
# Use the table function to get a quick summary of categorical variables
table(months)
## months
## 5 6 7 8
## 904 2880 2976 1004
# Add label and abbr agruments to convert numeric representations to have names
months <- month(nfy2$datetime, label = TRUE, abbr=TRUE)
table(months)
## months
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 0 0 0 0 904 2880 2976 1004 0 0 0 0
This lesson was contributed by Ryan Peek and Martha Zillig.