Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ compile_commands.json
/doc/
/Meta/
.claude/settings.local.json
.claude/plans/*
76 changes: 58 additions & 18 deletions R/read_fwf.R
Original file line number Diff line number Diff line change
@@ -1,29 +1,67 @@
#' Read a fixed width file into a tibble
#' Read a fixed-width file into a tibble
#'
#' A fixed width file can be a very compact representation of numeric data.
#' It's also very fast to parse, because every field is in the same place in
#' every line. Unfortunately, it's painful to parse because you need to
#' describe the length of every field. Readr aims to make it as easy as possible
#' by providing a number of different ways to describe the field structure.
#' - [fwf_empty()] - Guesses based on the positions of empty columns.
#' - [fwf_widths()] - Supply the widths of the columns.
#' - [fwf_positions()] - Supply paired vectors of start and end positions.
#' - [fwf_cols()] - Supply named arguments of paired start and end positions or column widths.
#' @description
#' Fixed-width files store tabular data with each field occupying a specific
#' range of character positions in every line. Once the fields are identified,
#' converting them to the appropriate R types works just like for delimited
#' files. The unique challenge with fixed-width files is describing where each
#' field begins and ends. \pkg{readr} tries to ease this pain by offering a
#' few different ways to specify the field structure:
#' - `fwf_empty()` - Guesses based on the positions of empty columns. This is
#' the default. (Note that `fwf_empty()` returns 0-based positions, for
#' internal use.)
#' - `fwf_widths()` - Supply the widths of the columns.
#' - `fwf_positions()` - Supply paired vectors of start and end positions. These
#' are interpreted as 1-based positions, so are off-by-one compared to the
#' output of `fwf_empty()`.
#' - `fwf_cols()` - Supply named arguments of paired start and end positions or
#' column widths.
#'
#' Note: `fwf_empty()` cannot work with a connection or with any of the input
#' types that involve a connection internally, which includes remote and
#' compressed files. The reason is that this would necessitate reading from the
#' connection twice. In these cases, you'll have to either provide the field
#' structure explicitly with another `fwf_*()` function or download (and
#' decompress, if relevant) the file first.
#'
#' @details
#' Here's a enhanced example using the contents of the file accessed via
#' `readr_example("fwf-sample.txt")`.
#'
#' ```
#' 1 2 3 4
#' 123456789012345678901234567890123456789012
#' [ name 20 ][state 10][ ssn 12 ]
#' John Smith WA 418-Y11-4111
#' Mary Hartford CA 319-Z19-4341
#' Evan Nolan IL 219-532-c301
#' ```
#'
#' Here are some valid field specifications for the above (they aren't all
#' equivalent! but they are all valid):
#'
#' ```
#' fwf_widths(c(20, 10, 12), c("name", "state", "ssn"))
#' fwf_positions(c(1, 30), c(20, 42), c("name", "ssn"))
#' fwf_cols(state = c(21, 30), last = c(6, 20), first = c(1, 4), ssn = c(31, 42))
#' fwf_cols(name = c(1, 20), ssn = c(30, 42))
#' fwf_cols(name = 20, state = 10, ssn = 12)
#' ```
#'
#' @seealso [read_table()] to read fixed width files where each
#' column is separated by whitespace.
#'
#' @section Second edition changes:
#' Comments are no longer looked for anywhere in the file.
#' They are now only ignored at the start of a line.
#' Comments are now only ignored if they appear at the start of a line.
#' Comments elsewhere in a line are no longer treated specially.
#'
#' @inheritParams datasource
#' @inheritParams tokenizer_fwf
#' @inheritParams read_delim
#' @param col_positions Column positions, as created by [fwf_empty()],
#' [fwf_widths()] or [fwf_positions()]. To read in only selected fields,
#' use [fwf_positions()]. If the width of the last column is variable (a
#' ragged fwf file), supply the last end position as NA.
#' `fwf_widths()`, `fwf_positions()`, or `fwf_cols()`. To read in only
#' selected fields, use `fwf_positions()`. If the width of the last column
#' is variable (a ragged fwf file), supply the last end position as `NA`.
#' @export
#' @examples
#' fwf_sample <- readr_example("fwf-sample.txt")
Expand Down Expand Up @@ -181,8 +219,8 @@ fwf_empty <- function(

#' @rdname read_fwf
#' @export
#' @param widths Width of each field. Use NA as width of last field when
#' reading a ragged fwf file.
#' @param widths Width of each field. Use `NA` as the width of the last field
#' when reading a ragged fixed-width file.
#' @param col_names Either NULL, or a character vector column names.
fwf_widths <- function(widths, col_names = NULL) {
if (edition_first()) {
Expand All @@ -195,7 +233,9 @@ fwf_widths <- function(widths, col_names = NULL) {
#' @rdname read_fwf
#' @export
#' @param start,end Starting and ending (inclusive) positions of each field.
#' Use NA as last end field when reading a ragged fwf file.
#' **Positions are 1-based**: the first character in a line is at position 1.
#' Use `NA` as the last value of `end` when reading a ragged fixed-width
#' file.
fwf_positions <- function(start, end = NULL, col_names = NULL) {
if (edition_first()) {
stopifnot(length(start) == length(end))
Expand Down
2 changes: 1 addition & 1 deletion R/source.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#' Files ending in `.gz`, `.bz2`, `.xz`, or `.zip` will
#' be automatically uncompressed. Files starting with `http://`,
#' `https://`, `ftp://`, or `ftps://` will be automatically
#' downloaded. Remote gz files can also be automatically downloaded and
#' downloaded. Remote `.gz` files can also be automatically downloaded and
#' decompressed.
#'
#' Literal data is most useful for examples and tests. To be recognised as
Expand Down
2 changes: 1 addition & 1 deletion man/count_fields.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/datasource.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_delim.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_delim_chunked.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_file.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

75 changes: 56 additions & 19 deletions man/read_fwf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_lines.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_lines_chunked.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_log.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_table.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/spec_delim.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/tokenize.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 3 additions & 4 deletions man/write_delim.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.