Piping in Linux is a powerful technique that allows you to connect the output of one command to the input of another. It's represented by the vertical bar (|) symbol.
Example:
ls -l | grep "txt"In this example:
ls -llists all files in the current directory with detailed information.- The pipe (
|) symbol connects the output ofls -lto the input ofgrep. grep "txt"filters the output ofls -lto only show files with the ".txt" extension.
Note
In essence, piping allows you to chain commands together to create complex workflows and perform data processing efficiently.
grep is a versatile command-line tool used to search for patterns within text files. It's a fundamental utility for anyone working with text data.
Here's a breakdown of some frequently used options:
-
-i:Ignore case.grep -i "hello" myfile.txt # Searches for "hello" or "Hello" regardless of case.
-
-v:Invert match.grep -v "error" logfile.txt # Displays lines that *don't* contain "error".
-
-c:Count matches.grep -c "pattern" file.txt # Counts the number of lines containing "pattern".
-
-e:Extended regular expressions.grep -e "pattern1\|pattern2" file.txt # Searches for either "pattern1" or "pattern2".
[!NOTE] You can also use the
egrepcommand which is equivalent to `grep -e -
-h:Suppress file names.grep -h "pattern" file1.txt file2.txt # Doesn't prepend file names to matches.
-
-H:Print file names with matches.grep -H "pattern" file1.txt file2.txt # Prints the filename before each match.
-
-A <num>: Print lines after the match.grep -A 2 "pattern" file.txt # Prints the matching line plus two lines after it.
-
-B <num>: Print lines before the match.grep -B 2 "pattern" file.txt # Prints the matching line plus two lines before it.
-
-C <num>: Print lines before and after the match.grep -C 2 "pattern" file.txt # Prints the matching line plus two lines before and after it.
Note
These are just a few of the many options available for grep. Refer to the grep manual page (man grep) for a complete list and detailed explanations.
1. egrep: Extended grep
Purpose: Uses extended regular expressions, which offer more powerful pattern-matching capabilities than basic regular expressions.
Example:
egrep -i "hello|world" file.txt
# searches for "hello" or "world" (case-insensitive) using extended regular expressions.2. fgrep: Fixed string grep
Purpose: Searches for fixed strings rather than regular expressions. This can be faster for simple pattern matching.
Example:
fgrep "hello" file.txt
# searches for the exact string "hello".3. pdfgrep: Grep for PDF files
Purpose: A specialized version of grep designed to search for text within PDF files.
Example:
pdfgrep "keyword" myfile.pdf
# searches for the keyword "keyword" within the PDF file "myfile.pdf".4. zgrep: Grep for compressed files
Purpose: Searches for patterns within compressed files (e.g., gzip, bzip2).
Example:
zgrep "pattern" compressed_file.gz
# searches for "pattern" within the compressed file "compressed_file.gz".Note
These are just a few of the available grep variants. There are other specialized versions for different file formats and use cases.
cut is a command-line utility used to extract specific columns or fields from text data. It's particularly useful for working with tabular data.
-d:Specify the delimiter character.
cut -d ',' -f 2 file.csv
# extracts the second field (column) from `file.csv`, assuming the fields are separated by commas.-c:Specify the character positions to extract.
cut -c 1-5 file.txt
# extracts characters 1 to 5 from each line of `file.txt`.-f:Specify the fields to extract.
cut -f 1,3 file.csv
# extracts the first and third fields from `file.csv`.- range:Specify a range of characters or fields.
cut -c 1-5 file.txt
# extracts characters 1 to 5 (same as using -c).
cut -f 2-5 file.csv
# extracts fields 2 to 5 from `file.csv`.Extracting the second field from a CSV file:
cut -d ',' -f 2 file.csvExtracting characters 1 to 5 from each line of a file:
cut -c 1-5 file.txtExtracting fields 2 to 4 from a tab-delimited file:
cut -d '\t' -f 2-4 file.tsvTip
The -d option is crucial for specifying the delimiter character used in your data. Common delimiters include commas (,), tabs (\t), and spaces.
tr is a command-line utility used to translate characters from one set to another. It's often used for simple text manipulation tasks.
-d:Delete characters.- Example:
tr -d '[:punct:]' file.txtremoves punctuation characters fromfile.txt.
- Example:
- Convert lowercase letters to uppercase:
tr 'a-z' 'A-Z' < file.txt
- Remove all vowels from a file:
tr -d 'aeiouAEIOU' < file.txt
- Replace spaces with underscores:
tr ' ' '_' < file.txt
Note
tr can also be used to squeeze repeated characters or translate characters based on a one-to-one mapping.
wc is a command-line utility used to count the number of lines, words, and characters in a file or standard input.
-l:Count the number of lines.- Example:
wc -l file.txtcounts the number of lines infile.txt.
- Example:
-w:Count the number of words.- Example:
wc -w file.txtcounts the number of words infile.txt.
- Example:
-c:Count the number of characters.- Example:
wc -c file.txtcounts the number of characters infile.txt.
- Example:
You can combine these options to get multiple counts:
Example:
wc -lwc file.txt
# counts the number of lines, words, and characters in file.txt.Note
The definition of a "word" can vary depending on the locale settings and other factors. However, wc generally uses whitespace (spaces, tabs, newlines) to delimit words.
sort is a command-line utility used to sort lines of text based on various criteria.
-r:Reverse the sort order.sort -r file.txt` # sorts the lines in descending order.-n:Sort numerically.sort -n file.txt` # sorts lines based on the numeric value of the first field.-h:Sort human-readable numbers.sort -h file.txt` # sorts lines based on human-readable numbers (e.g., "100K", "2M").-o:Output to a file.sort -o sorted_file.txt file.txt` # sorts `file.txt` and saves the result to `sorted_file.txt`.
-u:Unique.sort -u file.txt` # removes duplicate lines.
Tip
The default sorting order is ascending, and the sorting is typically based on the first field unless you specify otherwise.
comm is a command-line utility used to compare two files line by line and output the lines that are unique to each file or common to both.
comm file1.txt file2.txtThis command will output three columns:
- Lines unique to
file1.txt - Lines unique to
file2.txt - Lines common to both files
- -1: Suppress column 1 (lines unique to
file1.txt). - -2: Suppress column 2 (lines unique to
file2.txt). - -3: Suppress column 3 (lines common to both files).
comm -12 file1.txt file2.txtThis will output only the lines that are common to both file1.txt and file2.txt.
[!NOTE] Additional Notes:
- The files must be sorted for
commto produce accurate results.- You can use
sortto sort files before comparing them withcomm.
sed (Stream Editor) is a powerful command-line tool used to manipulate text streams. It can edit, filter, and transform text based on regular expressions.
The general syntax for sed is:
sed 's/pattern/replacement/' file.txtThis command substitutes the first occurrence of pattern with replacement in file.txt.
-e:Execute multiple commands.-n:Suppress automatic printing of lines.-i:Edit files in-place.-f:Read commands from a file.
- Substitution:
- Replace a pattern with a replacement string:
sed 's/old_string/new_string/' file.txt - Replace all occurrences:
sed 's/old_string/new_string/g' file.txt - Use backreferences:
sed 's/\([a-z]\)\([A-Z]\)/\2\1/' file.txt
- Replace a pattern with a replacement string:
- Deletion:
- Delete lines matching a pattern:
sed '/pattern/d' file.txt - Delete lines containing a specific character:
sed '/^$/d' file.txt # Delete empty lines
- Delete lines matching a pattern:
- Insertion:
- Insert text before a pattern:
sed '/pattern/i\new text' file.txt - Insert text after a pattern:
sed '/pattern/a\new text' file.txt
- Insert text before a pattern:
- Changing:
- Change the first character of each line to uppercase:
sed 's/^./\u&/' file.txt
- Change the first character of each line to uppercase:
- Replace all occurrences of "old" with "new":
sed 's/old/new/g' file.txt - Delete lines containing "error":
sed '/error/d' file.txt - Insert a new line before lines starting with "#":
sed '/^#/i\new line' file.txt
Note
sed is a powerful tool with many advanced features. Refer to the sed manual page for a complete list of options and examples.