/[pkgs]/devel/sed/sedfaq.txt
ViewVC logotype

Contents of /devel/sed/sedfaq.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Sat Oct 2 19:01:05 2004 UTC (5 years, 1 month ago) by jakub
Branch: MAIN
CVS Tags: sed-4_1_5-1_1, F-12-split, show, F-7-split, sed-4_2_1-3_fc12, sed-4_1_5-1_2, sed-4_1_2-4, FC-5-split, FC-4-split, F-10-split, F-11-split, sed-4_1_5-7_fc7, sed-4_1_5-5_fc6, sed-4_1_5-2_1, F-9-split, RHEL-4-split, FC-6-split, F-8-split, sed-4_1_4-1, sed-4_1_5-2_2, FC-3-split, sed-4_1_4-1_1, sed-4_1_5-2, sed-4_1_5-1, sed-4_1_5-4_fc6, sed-4_1_5-10_fc9, sed-4_2_1-2_fc12, sed-4_1_5-12_fc11, sed-4_1_5-2_2_1, sed-4_2_1-4_fc13, sed-4_1_5-3_fc6, sed-4_2_1-1_fc12, sed-4_1_5-11_fc11, sed-4_1_5-6_fc7, sed-4_1_5-9_fc8, sed-4_1_2-5, sed-4_1_2-2, sed-4_1_2-3, HEAD
File MIME type: text/plain
4.1.2-2
1
2 Archive-Name: editor-faq/sed
3 Posting-Frequency: irregular
4 Last-modified: 10 March 2003
5 Version: 015
6 URL: http://sed.sourceforge.net/sedfaq.html
7 Maintainer: Eric Pement (pemente@northpark.edu)
8
9 THE SED FAQ
10
11 Frequently Asked Questions about
12 sed, the stream editor
13
14 CONTENTS
15
16 1. GENERAL INFORMATION
17 1.1. Introduction - How this FAQ is organized
18 1.2. Latest version of the sed FAQ
19 1.3. FAQ revision information
20 1.4. How do I add a question/answer to the sed FAQ?
21 1.5. FAQ abbreviations
22 1.6. Credits and acknowledgements
23 1.7. Standard disclaimers
24
25 2. BASIC SED
26 2.1. What is sed?
27 2.2. What versions of sed are there, and where can I get them?
28
29 2.2.1. Free versions
30
31 2.2.1.1. Unix platforms
32 2.2.1.2. OS/2
33 2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
34 2.2.1.4. MS-DOS
35 2.2.1.5. CP/M
36 2.2.1.6. Macintosh v8 or v9
37
38 2.2.2. Shareware and Commercial versions
39
40 2.2.2.1. Unix platforms
41 2.2.2.2. OS/2
42 2.2.2.3. Windows 95/98, Windows NT, Windows 2000
43 2.2.2.4. MS-DOS
44
45 2.3. Where can I learn to use sed?
46
47 2.3.1. Books
48 2.3.2. Mailing list
49 2.3.3. Tutorials, electronic text
50 2.3.4. General web and ftp sites
51
52 3. TECHNICAL
53 3.1. More detailed explanation of basic sed
54 3.1.1. Regular expressions on the left side of "s///"
55 3.1.2. Escape characters on the right side of "s///"
56 3.1.3. Substitution switches
57 3.2. Common one-line sed scripts. How do I . . . ?
58
59 - double/triple-space a file?
60 - convert DOS/Unix newlines?
61 - delete leading/trailing spaces?
62 - do substitutions on all/certain lines?
63 - delete consecutive blank lines?
64 - delete blank lines at the top/end of the file?
65
66 3.3. Addressing and address ranges
67 3.4. Address ranges in GNU sed and HHsed
68 3.5. Debugging sed scripts
69 3.6. Notes about s2p, the sed-to-perl translator
70 3.7. GNU/POSIX extensions to regular expressions
71
72 4. EXAMPLES
73 ONE-CHARACTER QUESTIONS
74 4.1. How do I insert a newline into the RHS of a substitution?
75 4.2. How do I represent control-codes or non-printable characters?
76 4.3. How do I convert files with toggle characters, like +this+,
77 to look like [i]this[/i]?
78
79 CHANGING STRINGS
80 4.10. How do I perform a case-insensitive search?
81 4.11. How do I match only the first occurrence of a pattern?
82 4.12. How do I parse a comma-delimited (CSV) data file?
83 4.13. How do I handle fixed-length, columnar data?
84 4.14. How do I commify a string of numbers?
85 4.15. How do I prevent regex expansion on substitutions?
86 4.16. How do I convert a string to all lowercase or capital letters?
87
88 CHANGING BLOCKS (consecutive lines)
89 4.20. How do I change only one section of a file?
90 4.21. How do I delete or change a block of text if the block contains
91 a certain regular expression?
92 4.22. How do I locate a paragraph of text if the paragraph contains a
93 certain regular expression?
94 4.23. How do I match a block of specific consecutive lines?
95 4.23.1. Try to use a "/range/, /expression/"
96 4.23.2. Try to use a "multi-line\nexpression"
97 4.23.3. Try to use a block of "literal strings"
98 4.24. How do I address all the lines between RE1 and RE2, excluding the lines themselves?
99 4.25. How do I join two lines if line #1 ends in a [certain string]?
100 4.26. How do I join two lines if line #2 begins in a [certain string]?
101 4.27. How do I change all paragraphs to long lines?
102
103 SHELL AND ENVIRONMENT
104 4.30. How do I read environment variables with sed ...
105 4.31.1. ... on Unix platforms?
106 4.31.2. ... on MS-DOS or 4DOS platforms?
107 4.32. How do I export or pass variables back into the environment ...
108 4.32.1. ... on Unix platforms?
109 4.32.2. ... on MS-DOS or 4DOS platforms?
110 4.33. How do I handle shell quoting in sed?
111
112 FILES, DIRECTORIES, AND PATHS
113 4.40. How do I read (insert/add) a file at the top of a textfile?
114 4.41. How do I make substitutions in every file in a directory, or
115 in a complete directory tree?
116 4.41.1. ... ssed solution
117 4.41.2. ... Unix solution
118 4.41.3. ... DOS solution
119 4.42. How do I replace "/some/UNIX/path" in a substitution?
120 4.43. How do I replace "C:\SOME\DOS\PATH" in a substitution?
121 4.44. How do I emulate file-includes, using sed?
122
123 5. WHY ISN'T THIS WORKING?
124 5.1. Why don't my variables like $var get expanded in my sed script?
125 5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
126 5.3. Why does my DOS version of sed process a file part-way through
127 and then quit?
128 5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
129 stingy pattern matching")
130 5.5. What is CSDPMI*B.ZIP and why do I need it?
131 5.6. Where are the man pages for GNU sed?
132 5.7. How do I tell what version of sed I am using?
133 5.8. Does sed issue an exit code?
134 5.9. The 'r' command isn't inserting the file into the text.
135 5.10. Why can't I match or delete a newline using the \n escape
136 sequence? Why can't I match 2 or more lines using \n?
137 5.11. My script aborts with an error message, "event not found".
138
139 6. OTHER ISSUES
140 6.1. I have a problem that stumps me. Where can I get help?
141 6.2. How does sed compare with awk, perl, and other utilities?
142 6.3. When should I use sed?
143 6.4. When should I NOT use sed?
144 6.5. When should I ignore sed and use Awk or Perl instead?
145 6.6. Known limitations among sed versions
146 6.7. Known incompatibilities between sed versions
147
148 6.7.1. Issuing commands from the command line
149 6.7.2. Using comments (prefixed by the '#' sign)
150 6.7.3. Special syntax in REs
151 6.7.4. Word boundaries
152 6.7.5. Commands which operate differently
153
154 7. KNOWN BUGS AMONG SED VERSIONS
155 7.1. ssed v3.59
156 7.2. GNU sed v4.0 - v4.0.5
157 7.3. GNU sed v3.02.80
158 7.4. GNU sed v3.02
159 7.5. GNU sed v2.05
160 7.6. GNU sed v1.18
161 7.7. GNU sed v1.03
162 7.8. sed v1.6 (Briscoe)
163 7.9. sed v1.5 (Helman)
164 7.10. sedmod v1.0 (Chen)
165 7.11. HP-UX sed
166 7.12. SunOS sed v4.1
167 7.13. SunOS sed v5.6
168 7.14. Ultrix sed v4.3
169 7.15. Digital Unix sed
170
171
172 ------------------------------
173
174 1. GENERAL INFORMATION
175
176 1.1. Introduction - How this FAQ is organized
177
178 This FAQ is organized to answer common (and some uncommon)
179 questions about sed, quickly. If you see a term or abbreviation in
180 the examples that seems unclear, see if the term is defined in
181 section 1.5. If not, send your comment to pemente[at]northpark.edu.
182
183 1.2. Latest version of the sed FAQ
184
185 The newest version of the sed FAQ is usually here:
186
187 http://sed.sourceforge.net/sedfaq.html (HTML version)
188 http://sed.sourceforge.net/sedfaq.txt (plain text)
189 http://www.student.northpark.edu/pemente/sed/sedfaq.html
190 http://www.student.northpark.edu/pemente/sed/sedfaq.txt
191 http://www.faqs.org/faqs/editor-faq/sed
192 ftp://rtfm.mit.edu/pub/faqs/editor-faq/sed
193
194 Another FAQ file on sed by a different author can be found here:
195
196 http://www.dreamwvr.com/sed-info/sed-faq.html
197
198 1.3. FAQ revision information
199
200 In the plaintext version, changes are shown by a vertical bar (|)
201 placed in column 78 of the affected lines. To remove the vertical
202 bars (use double quotes for MS-DOS):
203
204 sed 's/ *|$//' sedfaq.txt > sedfaq2.txt
205
206 In the HTML version, vertical bars do not appear. New or altered
207 portions of the FAQ are indicated by printing in dark blue type.
208
209 In the text version, words needing emphasis may be surrounded by
210 the underscore '_' or the asterisk '*'. In the HTML version, these
211 are changed to italics and boldface, respectively.
212
213 1.4. How do I add a question/answer to the sed FAQ?
214
215 Word your question briefly and send it to pemente[at]northpark.edu,
216 indicating your proposed change. We'll post it on the sed-users
217 mailing list (see section 2.3.2) and discuss it there. If it's
218 good, your contribution will be added to the next edition.
219
220 1.5. FAQ abbreviations
221
222 files = one or more filenames, separated by whitespace
223 gsed = GNU sed
224 ssed = super-sed
225 RE = Regular Expressions supported by sed
226 LHS = the left-hand side ("find" part) of "s/find/repl/" command
227 RHS = the right-hand side ("replace" part) of "s/find/repl/" cmd
228 nn+ = version _nn_ or higher (e.g., "15+" = version 1.5 and above)
229
230 files: "files" stands for one or more filenames entered on the
231 command line. The names may include any wildcards your shell
232 understands (such as ``zork*'' or ``Aug[4-9].let''). Sed will
233 process each filename passed to it by the shell.
234
235 RE: For details on regular expressions, see section 3.1.1., below.
236
237 1.6. Credits and acknowledgements
238
239 Many of the ideas for this FAQ were taken from the Awk FAQ:
240 http://www.faqs.org/faqs/computer-lang/awk/faq/
241 ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq
242
243 and from the old Perl FAQ:
244 http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/index.html
245
246 The following individuals have contributed significantly to this
247 document, and have provided input and wording suggestions for
248 questions, answers, and script examples. Credit goes to these
249 contributors (in alphabetical order by last name):
250
251 Al Aab, Yiorgos Adamopoulos, Paolo Bonzini, Walter Briscoe, Jim
252 Dennis, Carlos Duarte, Otavio Exel, Sven Guckes, Aurelio Jargas,
253 Mark Katz, Toby Kelsey, Eric Pement, Greg Pfeiffer, Ken Pizzini,
254 Niall Smart, Simon Taylor, Peter Tillier, Greg Ubben, Laurent
255 Vogel.
256
257 1.7. Standard disclaimers
258
259 While a serious attempt has been made to ensure the accuracy of the
260 information presented herein, the contributors and maintainers of
261 this document do not claim the absence of errors and make no
262 warranties on the information provided. If you notice any mistakes,
263 please let us know so we can fix it.
264
265 ------------------------------
266
267 2. BASIC SED
268
269 2.1. What is sed?
270
271 "sed" stands for Stream EDitor. Sed is a non-interactive editor,
272 written by the late Lee E. McMahon in 1973 or 1974. A brief history
273 of sed's origins may be found in an early history of the Unix
274 tools, at <http://www.columbia.edu/~rh120/ch106.x09>.
275
276 Instead of altering a file interactively by moving the cursor on
277 the screen (as with a word processor), the user sends a script of
278 editing instructions to sed, plus the name of the file to edit (or
279 the text to be edited may come as output from a pipe). In this
280 sense, sed works like a filter -- deleting, inserting and changing
281 characters, words, and lines of text. Its range of activity goes
282 from small, simple changes to very complex ones.
283
284 Sed reads its input from stdin (Unix shorthand for "standard
285 input," i.e., the console) or from files (or both), and sends the
286 results to stdout ("standard output," normally the console or
287 screen). Most people use sed first for its substitution features.
288 Sed is often used as a find-and-replace tool.
289
290 sed 's/Glenn/Harold/g' oldfile >newfile
291
292 will replace every occurrence of "Glenn" with the word "Harold",
293 wherever it occurs in the file. The "find" portion is a regular
294 expression ("RE"), which can be a simple word or may contain
295 special characters to allow greater flexibility (for example, to
296 prevent "Glenn" from also matching "Glennon").
297
298 My very first use of sed was to add 8 spaces to the left side of a
299 file, so when I printed it, the printing wouldn't begin at the
300 absolute left edge of a piece of paper.
301
302 sed 's/^/ /' myfile >newfile # my first sed script
303 sed 's/^/ /' myfile | lp # my next sed script
304
305 Then I learned that sed could display only one paragraph of a file,
306 beginning at the phrase "and where it came" and ending at the
307 phrase "for all people". My script looked like this:
308
309 sed -n '/and where it came/,/for all people/p' myfile
310
311 Sed's normal behavior is to print (i.e., display or show on screen)
312 the entire file, including the parts that haven't been altered,
313 unless you use the -n switch. The "-n" stands for "no output". This
314 switch is almost always used in conjunction with a 'p' command
315 somewhere, which says to print only the sections of the file that
316 have been specified. The -n switch with the 'p' command allow for
317 parts of a file to be printed (i.e., sent to the console).
318
319 Next, I found that sed could show me only (say) lines 12-18 of a
320 file and not show me the rest. This was very handy when I needed to
321 review only part of a long file and I didn't want to alter it.
322
323 # the 'p' stands for print
324 sed -n 12,18p myfile
325
326 Likewise, sed could show me everything else BUT those particular
327 lines, without physically changing the file on the disk:
328
329 # the 'd' stands for delete
330 sed 12,18d myfile
331
332 Sed could also double-space my single-spaced file when it came time
333 to print it:
334
335 sed G myfile >newfile
336
337 If you have many editing commands (for deleting, adding,
338 substituting, etc.) which might take up several lines, those
339 commands can be put into a separate file and all of the commands in
340 the file applied to file being edited:
341
342 # 'script.sed' is the file of commands
343 # 'myfile' is the file being changed
344 sed -f script.sed myfile # 'script.sed' is the file of commands
345
346 It is not our intention to convert this FAQ file into a full-blown
347 sed tutorial (for good tutorials, see section 2.3). Rather, we hope
348 this gives the complete novice a few ideas of how sed can be used.
349
350 2.2. What versions of sed are there, and where can I get them?
351
352 2.2.1. Free versions
353
354 Note: "Free" does not mean "public domain" nor does it necessarily
355 mean you will never be charged for it. All versions of sed in this
356 section except the CP/M versions are based on the GNU general
357 public license and are "free software" by that standard (for
358 details, see http://www.gnu.org/philosophy/free-sw.html). This
359 means you can get the source code and develop it further.
360
361 At the URLs listed in this category, sed binaries or source code
362 can be downloaded and used without fees or license payments.
363
364 2.2.1.1. Unix platforms
365
366 ssed v3.60
367 ssed is the version recommended by the FAQ maintainers, since it
368 shares the same codebase with GNU sed, has the most options, and is
369 free software (you can get the source). Though there were earlier
370 version of ssed distributed, sites for these are not being listed.
371
372 http://sed.sourceforge.net/grabbag/ssed
373 http://freshmeat.net/project/sed/
374
375 GNU sed v4.0.5
376 This is the latest official version of GNU sed. It offers in-place
377 text replacement as an option switch.
378
379 ftp://ftp.gnu.org/pub/gnu/sed/sed-4.0.5.tar.gz
380 http://freshmeat.net/project/sed
381
382 BSD multi-byte sed (Japanese)
383 Based on the latest version of GNU sed, which supports multi-byte
384 characters.
385
386 ftp://ftp1.freebsd.org/pub/FreeBSD/FreeBSD-stable/packages/Latest/ja-sed.tgz
387
388 GNU sed v3.02.80
389 An alpha test release which was the base for the development of
390 ssed and GNU sed v4.0.
391
392 ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz
393
394 GNU sed v3.02a
395 Interim version with most features of GNU sed v3.02.80.
396
397 GNU sed v3.02
398 ftp://ftp.gnu.org/pub/gnu/sed/sed-3.02.tar.gz
399
400 Precompiled versions:
401
402 GNU sed v3.02-8
403 source code and binaries for Debian GNU/Linux
404
405 http://www.debian.org/Packages/stable/base/sed.html
406
407 For some time, the GNU project <http://www.gnu.org> used Eric S.
408 Raymond's version of sed (ESR sed v1.1), but eventually dropped it
409 because it had too many built-in limits. In 1991 Howard Helman
410 modified the GNU/ESR sed and produced a flexible version of sed
411 v1.5 available at several sites (Helman's version permitted things
412 like \<...\> to delimit word boundaries, \xHH to enter hex code and
413 \n to indicate newlines in the replace string). This version did
414 not catch on with the GNU project and their version of sed has
415 moved in a similar but different direction.
416
417 sed v1.3, by Eric Steven Raymond (released 4 June 1998)
418 http://catb.org/~esr/sed-1.3.tar.gz
419
420 Eric Raymond <esr@snark.thyrsus.com> wrote one of the earliest
421 versions of sed. On his website <http://www.catb.org/~esr/> which
422 also distributes many freeware utilities he has written or worked
423 on, he describes sed v1.1 this way:
424
425 "This is the fast, small sed originally distributed in the GNU
426 toolkit and still distributed with Minix. The GNU people ditched it
427 when they built their own sed around an enhanced regex package --
428 but it's still better for some uses (in particular, faster and less
429 memory-intensive)." (Version 1.3 fixes an unidentified bug and adds
430 the L command to hexdump the current pattern space.)
431
432 2.2.1.2. OS/2
433
434 GNU sed v3.02.80
435 http://www2s.biglobe.ne.jp/~vtgf3mpr/gnu/sed.htm
436
437 GNU sed v3.02
438 http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2-bin.zip # binaries
439 http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2.zip # source
440
441 2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
442
443 GNU sed v4.0.5
444 32-bit binaries and docs. Precompiled versions not available (yet).
445
446 GNU sed v3.02.80
447 32-bit binaries and docs, using DJGPP compiler. For details on new
448 features, see Unix section, above.
449
450 http://www.student.northpark.edu/pemente/sed/sed3028a.zip # DOS binaries
451 ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz # source
452 ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028b.zip # binaries
453 ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028d.zip # docs
454 ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028s.zip # source
455
456 GNU sed v2.05
457 32-bit binaries, no docs. Requires 80386 DX (SX will not run) and
458 must be run in a DOS window or in a full screen DOS session under
459 Microsoft Windows. Will not run in MS-DOS mode (outside Win/Win95).
460 We recommend using the latest version of GNU sed.
461 http://www.simtel.net/pub/win95/prog/gsed205b.zip
462 ftp://ftp.cdrom.com/pub/simtelnet/win95/prog/gsed205b.zip
463
464 GNU sed v1.03
465 modified by Frank Whaley.
466
467 This version was part of the "Virtually UN*X" toolset, hosted by
468 itribe.net; that website is now closed. Gsed v1.03 supported Win9x
469 long filenames, as well as hex, decimal, binary, and octal
470 character representations.
471
472 The Cygwin toolkit:
473 http://www.cygwin.com
474
475 Formerly know as "GNU-Win32 tools." According to their home page,
476 "The Cygwin tools are Win32 ports of the popular GNU development
477 tools for Windows NT, 95 and 98. They function through the use of
478 the Cygwin library which provides a UNIX-like API on top of the
479 Win32 API." The version of sed used is GNU sed v3.02.
480
481 Minimalist GNU for Windows (MinGW):
482 http://www.mingw.org
483 http://mingw.sourceforge.net
484
485 According to their home page, "MinGW ('Minimalist GNU for Windows')
486 refers to a set of runtime headers, used in building a compiler
487 system based on the GNU GCC and binutils projects. It compiles and
488 links code to be run on Win32 platforms ... MinGW uses Microsoft
489 runtime libraries, distributed with the Windows operating system."
490 The version of sed used is GNU sed v3.02.
491
492 sed v1.5 (a/k/a HHsed), by Howard Helman
493 Compiled with Mingw32 for 32-bit environments described above. This
494 version should support Win95 long filenames.
495 http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sed15.exe
496 http://www.student.northpark.edu/pemente/sed/sed15exe.zip
497
498 2.2.1.4. MS-DOS
499
500 sed v1.6 (from HHsed), by Walter Briscoe
501
502 This is a forthcoming version, now in beta testing, but with many
503 new features. It corrects all the bugs in sed v1.5, and adds the
504 best features of sedmod v1.0 (below). It is available in 16-bit and
505 32-bit compiled versions for MS-DOS. Sorry, no URLs available yet.
506
507 sed v1.5 (a/k/a HHsed), by Howard Helman
508 uncompiled source code (Turbo C)
509 ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15.zip
510 ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15.zip
511
512 DOS executable and documentation
513 ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15x.zip
514 ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15x.zip
515
516 sedmod v1.0, by Hern Chen
517 http://www.ptug.org/sed/SEDMOD10.ZIP
518 http://www.student.northpark.edu/pemente/sed/sedmod10.zip
519 ftp://garbo.uwasa.fi/pc/unix/sedmod10.zip
520
521 GNU sed v3.02.80
522 See section 2.2.1.3 ("Microsoft Windows"), above.
523
524 GNU sed v2.05
525 Does not run under MS-DOS.
526
527 GNU sed v1.18
528 32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
529 or better. Also requires 3 CWS*.EXE extenders on the path. See
530 section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"), below.
531 We recommend using a newer version of GNU sed.
532 http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
533 ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
534 http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
535 ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
536
537 GNU sed v1.06
538 16-bit binaries and source. Should run under any MS-DOS system.
539 http://www.simtel.net/pub/gnu/gnuish/sed106.zip
540 ftp://ftp.cdrom.com/pub/simtelnet/gnu/gnuish/sed106.zip
541
542 2.2.1.5. CP/M
543
544 ssed v2.2, by Chuck A. Forsberg
545
546 Written for CP/M, ssed (for "small/stupid stream editor) supports
547 only the a(ppend), c(hange), d(elete) and i(nsert) options, and
548 apparently doesn't support regular expressions. A -u switch will
549 "unsqueeze" compressed files and was used mainly in conjunction
550 with DIF.COM for source code maintenance. (file: ssed22.lbr)
551
552 change, by Michael M. Rubenstein
553
554 Rubenstein released a version of sed called CHANGE.COM (the
555 TTOOLS.LBR archive member CHANGE.CZM is a "crunched" file).
556 CHANGE.COM supports full RE's except grouping and backreferences,
557 and its only function is global substitution. (file: ttools.lbr)
558
559 2.2.1.6. Macintosh v8 or v9
560
561 Since sed is a command-line utility, it is not customary to think
562 of sed being used on a Mac. Nonetheless, the following instructions
563 from Aurelio Jargas describe the process for running sed on MacOS
564 version version 8 or 9.
565
566 (1) Download and install the Apple DiskCopy application
567
568 ftp://ftp.apple.com/developer/Development_Kits
569
570 (2) Download and install Apple MPW
571
572 ftp://ftp.apple.com/developer/Tool_Chest/Core_Mac_OS_Tools/MPW_etc./
573
574 (3) Download and expand Matthias Neeracher's GNU sed for MPW. (They
575 seem to have misnumbered the sed filename.)
576
577 ftp://sunsite.cnlab-switch.ch/software/platform/macos/src/mpw_c/sed-2.03.sit.bin
578
579 (4) Enter the sed-3.02 directory and doubleclick the 'sed' file
580
581 (5) MPW Shell will open up. It will be a command window instead of
582 a command line, but sed should work as expected. For example:
583
584 echo aa | sed 's/a/Z/g'<ENTER>
585
586 Note that ENTER is different from RETURN on an iMac. Apple *also*
587 has its own version of sed on MPW, called "StreamEdit", with a
588 syntax fairly similar to that of normal sed.
589
590 2.2.2. Shareware and Commercial versions
591
592 2.2.2.1. Unix platforms
593
594 [ Additional information needed. ]
595
596 2.2.2.2. OS/2
597
598 Hamilton Labs:
599 http://www.hamiltonlabs.com/cshell.htm
600
601 A sizable set of Unix/C shell utilities designed for OS/2. Price is
602 $350 in the US, $395 elsewhere, with FedEx shipping, unconditional
603 guarantee, unlimited support and free updates. A demo version of
604 the suite can be downloaded from this site, but a stand-alone copy
605 of sed is not available.
606
607 2.2.2.3. Windows 95/98, Windows NT, Windows 2000
608
609 Hamilton Labs:
610 http://www.hamiltonlabs.com/cshell.htm
611
612 A sizable set of Unix/C shell utilities designed for Win9x, WinNT,
613 and Win2K. Price is $350 in the US, $395 elsewhere, with FedEx
614 shipping, unconditional guarantee, unlimited support and free
615 updates. A demo version of the suite can be downloaded from this
616 site, but a stand-alone copy of sed is not available.
617
618 Interix:
619 http://www.interix.com
620
621 Interix (formerly known as OpenNT) is advertised as "a complete
622 UNIX system environment running natively on Microsoft Windows NT",
623 and is licensed and supported by Softway Systems. It offers over
624 200 Unix utilities, and supports Unix shells, sockets, networking,
625 and more. A single-user edition runs about $200. A free demo or
626 evaluation copy will run for 31 days and then quit; to continue
627 using it, you must purchase the commercial version.
628
629 MKS NuTCRACKER Professional
630 http://www.datafocus.com/products/nutc/
631
632 A different, yet related product line offered by MKS (Mortice Kern
633 Systems, below); the awkward spelling "NuTCRACKER" is intentional.
634 Various packages offer hundreds of Unix utilities for Win32
635 environments. Sed is not available as a separate product.
636
637 UnixDos:
638 http://www.unixdos.com
639
640 UnixDos is a suite of 82 Unix utilities ported over to the Windows
641 environments. There are 16-bit versions for Win3.x and 32-bit
642 versions for WinNT/Win95. It is distributed as uncrippled shareware
643 for the first 30 days. After the test period, the utilities will
644 not run and you must pay the registration fee of $50.
645
646 Their version of sed supports "\n" in the RHS of expressions, and
647 increases the length of input lines to 10,000 characters. By
648 special arrangement with the owners, persons who want a licensed
649 version of sed *only* (without the other utilities) may pay a
650 license fee of $10.
651
652 U/WIN:
653 http://www.research.att.com/sw/tools/uwin/
654
655 U/WIN is a suite of Unix utilities created for WinNT and Win95
656 systems. It is owned by AT&T, created by David Korn (author of the
657 Unix korn shell), and is freely distributed only to educational
658 institutions, AT&T employees, or certain researchers; all others
659 must pay a fee after a 90-day evaluation period expires. U/WIN
660 operates best with the NTFS (WinNT file system) but will run in
661 degraded mode with the FAT file system and in further degraded mode
662 under Win95. A minimal installation takes about 25 to 30 megs of
663 disk space. Sed is not available as a separate file for download,
664 but comes with the suite.
665
666 2.2.2.4. MS-DOS
667
668 Mix C/Utilities Toolchest
669 http://www.mixsoftware.com/product/utility.htm
670
671 According to their web page, "The C/Utilities Toolchest adds over
672 40 powerful UNIX utilities to your MS-DOS operating system. The
673 result is an environment very similar to UNIX operating systems,
674 yet 100% compatible with MS-DOS programs and commands." The
675 toolchest costs $19.95, with source code available for an
676 additional fee. Mix C's version of sed is not available separately.
677
678 MKS (Mortice Kern Systems) Toolkit
679 http://www.mks.com
680
681 Sed comes bundled with the MKS Toolkit, which is distributed only
682 as commercial software; it is not available separately.
683
684 Thompson Automation Software
685 http://www.tasoft.com
686
687 The Thompson Toolkit contains over 100 familiar Unix utilities,
688 including a version of the Unix Korn shell. It runs under MS-DOS,
689 OS/2, Win3.x, Win9x, and WinNT. Sed is one of the utilities, though
690 Thompson is better known for its version of awk for DOS, TAWK. The
691 toolkit runs about $150; sed is not available separately.
692
693 2.3. Where can I learn to use sed?
694
695 2.3.1. Books
696
697 _Sed & Awk, 2d edition_, by Dale Dougherty & Arnold Robbins
698 (Sebastopol, Calif: O'Reilly and Associates, 1997)
699 ISBN 1-56592-225-5
700 http://www.oreilly.com/catalog/sed2/noframes.html
701
702 About 40 percent of this book is devoted to sed, and maybe 50
703 percent is devoted to awk. The other 10 percent covers regexes and
704 concepts common to both tools. If you prefer hard copy, this is
705 definitely the best single place to learn to use sed, including its
706 advanced features.
707
708 The first edition is also very useful. Several typos crept into the
709 first printing of the first edition (though if you follow the
710 tutorials closely, you'll recognize them right away). A list of
711 errors from the first printing of _sed & awk_ is available at
712 <http://www.cs.colostate.edu/~dzubera/sedawk.txt>, and errors in
713 the 2nd are at <http://www.cs.colostate.edu/~dzubera/sedawk2.txt>,
714 though most of these were corrected in later printings. The second
715 edition tells how POSIX standards have affected these tools and
716 covers the popular GNU versions of sed and awk. Price is about (US)
717 $30.00
718
719 -----
720
721 _Mastering Regular Expressions, 2d ed.,_ by Jeffrey E. F. Friedl
722 (Sebastopol, Calif: O'Reilly and Associates, 2002)
723 ISBN 0-596-00289-0
724 http://regex.info
725 http://www.oreilly.com/catalog/regex2/
726 http://public.yahoo.com/~jfriedl/regex/ (for the first edition)
727
728 Knowing how to use "regular expressions" is essential to effective
729 use of most Unix tools. This book focuses on how regular
730 expressions can be best implemented in utilities such as perl, vi,
731 emacs, and awk, but also touches on sed as well. Friedl's home page
732 (above) gives links to other sites which help students learn to
733 master regular expressions. His site also gives a Perl script for
734 determining a syntactically valid e-mail address, using regexes:
735
736 http://public.yahoo.com/~jfriedl/regex/code.html
737
738 -----
739
740 _Awk und Sed_, by Helmut Herold.
741 (Bonn: Addison-Wesley, 1994; 288 pages)
742 2nd edition to be released in March 2003
743 ISBN 3-8273-2094-1
744 http://www.addison-wesley.de/main/main.asp?page=home/bookdetails&ProductID=37214
745
746 2.3.2. Mailing list
747
748 If you are interested in learning more about sed (its syntax, using
749 regular expressions, etc.) you are welcome to subscribe to a
750 sed-oriented mailing list. In fact, there are two mailing lists
751 about sed: one in English named "sed-users", moderated by Sven
752 Guckes; and one in Portuguese named "sed-BR" (for sed-Brazil),
753 moderated by Aurelio Marinho Jargas. The average volume of mail for
754 "sed-users" is about 35 messages a week; the average volume of mail
755 for "sed-BR" is about 15 messages a week.
756
757 sed-BR mailing list: http://br.groups.yahoo.com/group/sed-br/
758 sed-users mailing list: http://groups.yahoo.com/group/sed-users/
759
760 To subscribe to sed-users, send a blank message to:
761
762 sed-users-subscribe@yahoogroups.com
763
764 To unsubscribe from sed-users, send a blank message to:
765
766 sed-users-unsubscribe@yahoogroups.com
767
768 2.3.3. Tutorials, electronic text
769
770 The original users manual for sed, by Lee E. McMahon, from the
771 7th edition UNIX Manual (1978), with the classic "Kubla Khan"
772 example and tutorial, in formatted text format:
773 http://sed.sourceforge.net/grabbag/tutorials/sed_mcmahon.txt
774
775 The source code to the preceding manual. Use "troff -ms sed" to
776 print this file properly:
777 http://plan9.bell-labs.com/7thEdMan/vol2/sed
778 http://cm.bell-labs.com/7thEdMan/vol2/sed
779
780 "Do It With Sed", by Carlos Duarte
781 http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sedtut_1.html
782
783 "Sed: How to use sed, a special editor for modifying files
784 automatically", by Bruce Barnett and General Electric Company
785 http://www.grymoire.com/Unix/Sed.html
786
787 U-SEDIT2.ZIP, by Mike Arst (16 June 1990)
788 ftp://ftp.cs.umu.se/pub/pc/u-sedit2.zip
789 ftp://ftp.uni-stuttgart.de/pub/systems/msdos/util/unixlike/u-sedit2.zip
790 ftp://sunsite.icm.edu.pl/vol/wojsyl/garbo/pc/editor/u-sedit2.zip
791 ftp://ftp.sogang.ac.kr/pub/msdos/garbo_pc/editor/u-sedit2.zip
792
793 U-SEDIT3.ZIP, by Mike Arst (24 Jan. 1992)
794 http://www.student.northpark.edu/pemente/sed/u-sedit3.zip
795 CompuServe DTPFORUM, "PC DTP Utilities" library, file SEDDOC.ZIP
796
797 Another sed FAQ
798 http://www.dreamwvr.com/sed-info/sed-faq.html
799
800 sed-tutorial, by Felix von Leitner
801 http://www.math.fu-berlin.de/~leitner/sed/tutorial.html
802
803 "Manipulating text with sed," chapter 14 of the SCO OpenServer
804 "Operating System Users Guide"
805 http://ou800doc.caldera.com/SHL_automate/CTOC-Manipulating_text_with_sed.html
806
807 "Combining the Bourne-shell, sed and awk in the UNIX environment
808 for language analysis," by Lothar Schmitt and Kiel Christianson.
809 This basic tutorial on the Bourne shell, sed and awk downloads as a
810 71-page PostScript file (compressed to 290K with gzip). You may
811 need to navigate down from the root to get the file.
812 ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz
813 available upon request from Lothar Schmitt <lothar@u-aizu.ac.jp>
814
815 2.3.4. General web and ftp sites
816
817 http://sed.sourceforge.net/grabbag # Collected scripts
818 http://main.rtfiber.com.tw/~changyj/sed/ # Yao-Jen Chang
819 http://www.math.fu-berlin.de/~guckes/sed/ # Sven Guckes
820 http://www.math.fu-berlin.de/~leitner/sed/ # Felix von Leitner
821 http://www.dbnet.ece.ntua.gr/~george/sed/ # Yiorgos Adamopoulos
822 http://www.student.northpark.edu/pemente/sed/ # Eric Pement
823
824 http://spacsun.rice.edu/FAQ/sed.html
825 ftp://algos.inesc.pt/pub/users/cdua/scripts.tar.gz (sed and shell scripts)
826
827 "Handy One-Liners For Sed", compiled by Eric Pement. A large list
828 of 1-line sed commands which can be executed from the command line.
829 http://sed.sourceforge.net/sed1line.txt
830 http://www.student.northpark.edu/pemente/sed/sed1line.txt
831
832 "Handy One-Liners For Sed", translated to Portuguese
833 http://wmaker.lrv.ufsc.br/sed_ptBR.html
834
835 The Single UNIX Specification, Version 3 (technical man page)
836 http://www.opengroup.org/onlinepubs/007904975/utilities/sed.html
837
838 Getting started with sed
839 http://www.cs.hmc.edu/tech_docs/qref/sed.html
840
841 masm to gas converter
842 http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
843
844 mail2html.zip
845 http://www.crispen.org/src/#mail2html
846
847 sample uses of sed in batch files and scripts (Benny Pederson)
848 http://users.cybercity.dk/~bse26236/batutil/help/SED.HTM
849
850 dc.sed - the most complex and impressive sed script ever written.
851 This sed script by Greg Ubben emulates the Unix dc (desk
852 calculator), including base conversion, exponentiation, square
853 roots, and much more.
854 http://sed.sourceforge.net/grabbag/scripts/dc_overview.htm
855
856 If you should find other tutorials or scripts that should be added
857 to this document, please forward the URLs to the FAQ maintainer.
858
859 ------------------------------
860
861 3. TECHNICAL
862
863 3.1. More detailed explanation of basic sed
864
865 Sed takes a script of editing commands and applies each command, in
866 order, to each line of input. After all the commands have been
867 applied to the first line of input, that line is output. A second
868 input line is taken for processing, and the cycle repeats. Sed
869 scripts can address a single line by line number or by matching a
870 /RE pattern/ on the line. An exclamation mark '!' after a regex
871 ('/RE/!') or line number will select all lines that do NOT match
872 that address. Sed can also address a range of lines in the same
873 manner, using a comma to separate the 2 addresses.
874
875 $d # delete the last line of the file
876 /[0-9]\{3\}/p # print lines with 3 consecutive digits
877 5!s/ham/cheese/ # except on line 5, replace 'ham' with 'cheese'
878 /awk/!s/aaa/bb/ # unless 'awk' is found, replace 'aaa' with 'bb'
879 17,/foo/d # delete all lines from line 17 up to 'foo'
880
881 Following an address or address range, sed accepts curly braces
882 '{...}' so several commands may be applied to that line or to the
883 lines matched by the address range. On the command line, semicolons
884 ';' separate each instruction and must precede the closing brace.
885
886 sed '/Owner:/{s/yours/mine/g;s/your/my/g;s/you/me/g;}' file
887
888 Range addresses operate differently depending on which version of
889 sed is used (see section 3.4, below). For further information on
890 using sed, consult the references in section 2.3, above.
891
892 3.1.1. Regular expressions on the left side of "s///"
893
894 All versions of sed support Basic Regular Expressions (BREs). For
895 the syntax of BREs, enter "man ed" at a Unix shell prompt. A
896 technical description of BREs from IEEE POSIX 1003.1-2001 and the
897 Single UNIX Specification Version 3 is available online at:
898 http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag_09_03
899
900 Sed normally supports BREs plus '\n' to match a newline in the
901 pattern space, plus '\xREx' as equivalent to '/RE/', where 'x' is any
902 character other than a newline or another backslash.
903
904 Some versions of sed support supersets of BREs, or "extended
905 regular expressions", which offer additional metacharacters for
906 increased flexibility. For additional information on extended REs
907 in GNU sed, see sections 3.7 ("GNU/POSIX extensions to regular
908 expressions") and 6.7.3 ("Special syntax in REs"), below.
909
910 Though not required by BREs, some versions of sed support \t to
911 represent a TAB, \r for carriage return, \xHH for direct entry of
912 hex codes, and so forth. Other versions of sed do not.
913
914 ssed (super-sed) introduced many new features for LHS pattern
915 matching, too many to give here. The complete list is found in
916 section 6.7.3.H ("ssed"), below.
917
918 3.1.2. Escape characters on the right side of "s///"
919
920 The right-hand side (the replacement part) in "s/find/replace/" is
921 almost always a string literal, with no interpolation of these
922 metacharacters:
923
924 . ^ $ [ ] { } ( ) ? + * |
925
926 Three things *are* interpolated: ampersand (&), backreferences, and
927 options for special seds. An ampersand on the RHS is replaced by
928 the entire expression matched on the LHS. There is _never_ any
929 reason to use grouping like this:
930
931 s/\(some-complex-regex\)/one two \1 three/
932
933 since you can do this instead:
934
935 s/some-complex-regex/one two & three/
936
937 To enter a literal ampersand on the RHS, type '\&'.
938
939 Grouping and backreferences: All versions of sed support grouping
940 and backreferences on the LHS and backreferences only on the RHS.
941 Grouping allows a series of characters to be collected in a set,
942 indicating the boundaries of the set with \( and \). Then the set
943 can be designated to be repeated a certain number of times
944
945 \(like this\)* or \(like this\)\{5,7\}.
946
947 Groups can also be nested "\(like \(this\) is here\)" and may
948 contain any valid RE. Backreferences repeat the contents of a
949 particular group, using a backslash and a digit (1-9) for each
950 corresponding group. In other words, "/\(pom\)\1/" is another way
951 of writing "/pompom/". If groups are nested, backreference numbers
952 are counted by matching \( in strict left to right order. Thus,
953 /..\(the \(word\)\) \("foo"\)../ is matched by the backreference
954 \3. Backreferences can be used in the LHS, the RHS, and in normal
955 RE addressing (see section 3.3). Thus,
956
957 /\(.\)\1\(.\)\2\(.\)\3/; # matches "bookkeeper"
958 /^\(.\)\(.\)\(.\)\3\2\1$/; # finds 6-letter palindromes
959
960 Seds differ in how they treat invalid backreferences where no
961 corresponding group occurs. To insert a literal ampersand or
962 backslash into the RHS, prefix it with a backslash: \& or \\.
963
964 ssed, sed16, and sedmod permit additional options on the RHS. They
965 all support changing part of the replacement string to upper case
966 (\u or \U), lower case (\l or \L), or to end case conversion (\E).
967 Both sed16 and sedmod support awk-style word references ($1, $2,
968 $3, ...) and $0 to insert the entire line before conversion.
969
970 echo ab ghi | sed16 "s/.*/$0 - \U$2/" # prints "ab ghi - GHI"
971
972 *Note:* This feature of sed16 and sedmod will break sed scripts which
973 put a dollar sign and digit into the RHS. Though this is an unlikely
974 combination, it's worth remembering if you use other people's scripts.
975
976 3.1.3. Substitution switches
977
978 Standard versions of sed support 4 main flags or switches which may
979 be added to the end of an "s///" command. They are:
980
981 N - Replace the Nth match of the pattern on the LHS, where
982 N is an integer between 1 and 512. If N is omitted,
983 the default is to replace the first match only.
984 g - Global replace of all matches to the pattern.
985 p - Print the results to stdout, even if -n switch is used.
986 w file - Write the pattern space to 'file' if a replacement was
987 done. If the file already exists when the script is
988 executed, it is overwritten. During script execution,
989 w appends to the file for each match.
990
991 GNU sed 3.02 and ssed also offer the /I switch for doing a
992 case-insensitive match. For example,
993
994 echo ONE TWO | gsed "s/one/unos/I" # prints "unos TWO"
995
996 GNU sed 4.x and ssed add the /M switch, to simplify working with
997 multi-line patterns: when it is used, ^ or $ will match BOL or EOL.
998 \` and \' remain available to match the start and end of pattern
999 space, respectively.
1000
1001 ssed supports two more switches, /S and /X, when its Perl mode is
1002 used. They are described in detail in section 6.7.3.H, below.
1003
1004 3.1.4. Command-line switches
1005
1006 All versions of sed support two switches, -e and -n. Though sed
1007 usually separates multiple commands with semicolons (e.g., "H;d;"),
1008 certain commands could not accept a semicolon command separator.
1009 These include :labels, 't', and 'b'. These commands had to occur
1010 last in a script, separated by -e option switches. For example:
1011
1012 # The 'ta' means jump to label :a if last s/// returns true
1013 sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
1014
1015 The -n switch turns off sed's default behavior of printing every
1016 line. With -n, lines are printed only if explicitly told to. In
1017 addition, for certain versions of sed, if an external script begins
1018 with "#n" as its first two characters, the output is suppressed
1019 (exactly as if -n had been entered on the command line). A list of
1020 which versions appears in section 6.7.2., below.
1021
1022 GNU sed 4.x and ssed support additional switches. -l (lowercase L),
1023 followed by a number, lets you adjust the default length of the 'l'
1024 and 'L' commands (note that these implementations of sed also
1025 support an argument to these commands, to tailor the length
1026 separately of each occurrence of the command).
1027
1028 -i activates in-place editing (see section 4.41.1, below). -s
1029 treats each file as a separate stream: sed by default joins all the
1030 files, so $ represents the last line of the last file; 15 means the
1031 15th line in the joined stream; and /abc/,/def/ might match across
1032 files.
1033
1034 When -s is used, however all addresses refer to single files. For
1035 example, $ represents the last line of each input file; 15 means
1036 the 15th line of each input file; and /abc/,/def/ will be "reset"
1037 (in other words, sed will not execute the commands and start
1038 looking for /abc/ again) if a file ends before /def/ has been
1039 matched. Note that -i automatically activates this interpretation
1040 of addresses.
1041
1042 3.2. Common one-line sed scripts
1043
1044 A separate document of over 70 handy "one-line" sed commands is
1045 available at
1046 http://sed.sourceforge.net/sed1line.txt
1047
1048 Here are several common sed commands for one-line use. MS-DOS users
1049 should replace single quotes ('...') with double quotes ("...") in
1050 these examples. A specific filename usually follows the script,
1051 though the input may also come via piping or redirection.
1052
1053 # Double space a file
1054 sed G file
1055
1056 # Triple space a file
1057 sed 'G;G' file
1058
1059 # Under UNIX: convert DOS newlines (CR/LF) to Unix format
1060 sed 's/.$//' file # assumes that all lines end with CR/LF
1061 sed 's/^M$// file # in bash/tcsh, press Ctrl-V then Ctrl-M
1062
1063 # Under DOS: convert Unix newlines (LF) to DOS format
1064 sed 's/$//' file # method 1
1065 sed -n p file # method 2
1066
1067 # Delete leading whitespace (spaces/tabs) from front of each line
1068 # (this aligns all text flush left). '^t' represents a true tab
1069 # character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
1070 sed 's/^[ ^t]*//' file
1071
1072 # Delete trailing whitespace (spaces/tabs) from end of each line
1073 sed 's/[ ^t]*$//' file # see note on '^t', above
1074
1075 # Delete BOTH leading and trailing whitespace from each line
1076 sed 's/^[ ^t]*//;s/[ ^]*$//' file # see note on '^t', above
1077
1078 # Substitute "foo" with "bar" on each line
1079 sed 's/foo/bar/' file # replaces only 1st instance in a line
1080 sed 's/foo/bar/4' file # replaces only 4th instance in a line
1081 sed 's/foo/bar/g' file # replaces ALL instances within a line
1082
1083 # Substitute "foo" with "bar" ONLY for lines which contain "baz"
1084 sed '/baz/s/foo/bar/g' file
1085
1086 # Delete all CONSECUTIVE blank lines from file except the first.
1087 # This method also deletes all blank lines from top and end of file.
1088 # (emulates "cat -s")
1089 sed '/./,/^$/!d' file # this allows 0 blanks at top, 1 at EOF
1090 sed '/^$/N;/\n$/D' file # this allows 1 blank at top, 0 at EOF
1091
1092 # Delete all leading blank lines at top of file (only).
1093 sed '/./,$!d' file
1094
1095 # Delete all trailing blank lines at end of file (only).
1096 sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
1097
1098 # If a line ends with a backslash, join the next line to it.
1099 sed -e :a -e '/\\$/N; s/\\\n//; ta' file
1100
1101 # If a line begins with an equal sign, append it to the previous
1102 # line (and replace the "=" with a single space).
1103 sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
1104
1105 3.3. Addressing and address ranges
1106
1107 Sed commands may have an optional "address" or "address range"
1108 prefix. If there is no address or address range given, then the
1109 command is applied to all the lines of the input file or text
1110 stream. Three commands cannot take an address prefix:
1111
1112 - labels, used to branch or jump within the script
1113 - the close brace, '}', which ends the '{' "command"
1114 - the '#' comment character, also technically a "command"
1115
1116 An address can be a line number (such as 1, 5, 37, etc.), a regular
1117 expression (written in the form /RE/ or \xREx where 'x' is any
1118 character other than '\' and RE is the regular expression), or the
1119 dollar sign ($), representing the last line of the file. An
1120 exclamation mark (!) after an address or address range will apply
1121 the command to every line EXCEPT the ones named by the address. A
1122 null regex ("//") will be replaced by the last regex which was
1123 used. Also, some seds do not support \xREx as regex delimiters.
1124
1125 5d # delete line 5 only
1126 5!d # delete every line except line 5
1127 /RE/s/LHS/RHS/g # substitute only if RE occurs on the line
1128 /^$/b label # if the line is blank, branch to ':label'
1129 /./!b label # ... another way to write the same command
1130 \%.%!b label # ... yet another way to write this command
1131 $!N # on all lines but the last, get the Next line
1132
1133 Note that an embedded newline can be represented in an address by
1134 the symbol \n, but this syntax is needed only if the script puts 2
1135 or more lines into the pattern space via the N, G, or other
1136 commands. The \n symbol does *not* match the newline at an
1137 end-of-line because when sed reads each line into the pattern space
1138 for processing, it strips off the trailing newline, processes the
1139 line, and adds a newline back when printing the line to standard
1140 output. To match the end-of-line, use the '$' metacharacter, as
1141 follows:
1142
1143 /tape$/ # matches the word 'tape' at the end of a line
1144 /tape$deck/ # matches the word 'tape$deck' with a literal '$'
1145 /tape\ndeck/ # matches 'tape' and 'deck' with a newline between
1146
1147 The following sed commands usually accept *only* a single address.
1148 All other commands (except labels, '}', and '#') accept both single
1149 addresses and address ranges.
1150
1151 = print to stdout the line number of the current line
1152 a after printing the current line, append "text" to stdout
1153 i before printing the current line, insert "text" to stdout
1154 q quit after the current line is matched
1155 r file prints contents of "file" to stdout after line is matched
1156
1157 Note that we said "usually." If you need to apply the '=', 'a',
1158 'i', or 'r' commands to each and every line within an address
1159 range, this behavior can be coerced by the use of braces. Thus,
1160 "1,9=" is an invalid command, but "1,9{=;}" will print each line
1161 number followed by its line for the first 9 lines (and then print
1162 the rest of the rest of the file normally).
1163
1164 Address ranges occur in the form
1165
1166 <address1>,<address2> or <address1>,<address2>!
1167
1168 where the address can be a line number or a standard /regex/.
1169 <address2> can also be a dollar sign, indicating the end of file.
1170 Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a
1171 notation of the form +num, indicating the next _num_ lines after
1172 <address1> is matched.
1173
1174 Address ranges are:
1175
1176 (1) Inclusive. The range "/From here/,/eternity/" matches all the
1177 lines containing "From here" up to and including the line
1178 containing "eternity". It will not stop on the line just prior to
1179 "eternity". (If you don't like this, see section 4.24.)
1180
1181 (2) Plenary. They always match full lines, not just parts of lines.
1182 In other words, a command to change or delete an address range will
1183 change or delete whole lines; it won't stop in the middle of a
1184 line.
1185
1186 (3) Multi-linear. Address ranges normally match 2 lines or more.
1187 The second address will never match the same line the first address
1188 did; therefore a valid address range always spans at least two
1189 lines, with these exceptions which match only one line:
1190
1191 - if the first address matches the last line of the file
1192 - if using the syntax "/RE/,3" and /RE/ occurs only once in the
1193 file at line 3 or below
1194 - if using HHsed v1.5. See section 3.4.
1195
1196 (4) Minimalist. In address ranges with /regex/ as <address2>, the
1197 range "/foo/,/bar/" will stop at the first "bar" it finds, provided
1198 that "bar" occurs on a line below "foo". If the word "bar" occurs
1199 on several lines below the word "foo", the range will match all the
1200 lines from the first "foo" up to the first "bar". It will not
1201 continue hopping ahead to find more "bar"s. In other words, address
1202 ranges are not "greedy," like regular expressions.
1203
1204 (5) Repeating. An address range will try to match more than one
1205 block of lines in a file. However, the blocks cannot nest. In
1206 addition, a second match will not "take" the last line of the
1207 previous block. For example, given the following text,
1208
1209 start
1210 stop start
1211 stop
1212
1213 the sed command '/start/,/stop/d' will only delete the first two
1214 lines. It will not delete all 3 lines.
1215
1216 (6) Relentless. If the address range finds a "start" match but
1217 doesn't find a "stop", it will match every line from "start" to the
1218 end of the file. Thus, beware of the following behaviors:
1219
1220 /RE1/,/RE2/ # If /RE2/ is not found, matches from /RE1/ to the
1221 # end-of-file.
1222
1223 20,/RE/ # If /RE/ is not found, matches from line 20 to the
1224 # end-of-file.
1225
1226 /RE/,30 # If /RE/ occurs any time after line 30, each
1227 # occurrence will be matched in sed15+, sedmod, and
1228 # GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
1229 # from the 2nd occurrence of /RE/ to the end-of-file.
1230
1231 If these behaviors seem strange, remember that they occur because
1232 sed does not look "ahead" in the file. Doing so would stop sed from
1233 being a stream editor and have adverse effects on its efficiency.
1234 If these behaviors are undesirable, they can be circumvented or
1235 corrected by the use of nested testing within braces. The following
1236 scripts work under GNU sed 3.02:
1237
1238 # Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
1239 # not found, do nothing.
1240 /RE1/{:a;N;/RE2/!ba;your_commands;}
1241
1242 # Execute your_commands on range "20,/RE/", but if /RE/ is not
1243 # found, do nothing.
1244 20{:a;N;/RE/!ba;your_commands;}
1245
1246 As a side note, once we've used N to "slurp" lines together to test
1247 for the ending expression, the pattern space will have gathered
1248 many lines (possibly thousands) together and concatenated them as a
1249 single expression, with the \n sequence marking line breaks. The
1250 REs *within* the pattern space may have to be modified (e.g., you
1251 must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
1252 of '/.*/') and other standard sed commands will be unavailable or
1253 difficult to use.
1254
1255 # Execute your_commands on range "/RE/,30", but if /RE/ occurs
1256 # on line 31 or later, do not match it.
1257 1,30{/RE/,$ your_commands;}
1258
1259 For related suggestions on using address ranges, see sections 4.2,
1260 4.15, and 4.19 of this FAQ. Also, note the following section.
1261
1262 3.4. Address ranges in GNU sed and HHsed
1263
1264 (1) GNU sed 3.02+, ssed, and sed15+ all support address ranges like:
1265
1266 /regex/,+5
1267
1268 which match /regex/ plus the next 5 lines (or EOF, whichever comes
1269 first).
1270
1271 (2) GNU sed v3.02.80 (and above) and ssed support address ranges of:
1272
1273 0,/regex/
1274
1275 as a special case to permit matching /regex/ if it occurs on the
1276 first line. This syntax permits a range expression that matches
1277 every line from the top of the file to the first instance of
1278 /regex/, even if /regex/ is on the first line.
1279
1280 (3) HHsed (sed15) has an exceptional way of implementing
1281
1282 /regex1/,/regex2/
1283
1284 If /RE1/ and /RE2/ both occur on the *same* line, HHsed will match
1285 that single line. In other words, an address range block can
1286 consist of just one line. HHsed will then look for the next
1287 occurrence of /regex1/ to begin the block again.
1288
1289 Every other version of sed (including sed16) requires 2 lines to
1290 match an address range, and thus /regex1/ and /regex2/ cannot
1291 successfully match just one line. See also the comments at
1292 section 7.9.4, below.
1293
1294 (4) BEGIN~STEP selection: ssed and GNU sed (v2.05 and above) offer
1295 a form of addressing called "BEGIN~STEP selection". This is *not* a
1296 range address, which selects an inclusive block of consecutive
1297 lines from /start/ to /finish/. But I think it seems to belong here.
1298
1299 Given an expression of the form "M~N", where M and N are integers,
1300 GNU sed and ssed will select every Nth line, beginning at line M.
1301 (With gsed v2.05, M had to be less than N, but this restriction is
1302 no longer necessary). Both M and N may equal 0 ("0~0" selects every
1303 line). These examples illustrate the syntax:
1304
1305 sed '1~3d' file # delete every 3d line, starting with line 1
1306 # deletes lines 1, 4, 7, 10, 13, 16, ...
1307
1308 sed '0~3d' file # deletes lines 3, 6, 9, 12, 15, 18, ...
1309
1310 sed -n '2~5p' file # print every 5th line, starting with line 2
1311 # prints lines 2, 7, 12, 17, 22, 27, ...
1312
1313 (5) Finally, GNU sed v2.05 has a bug in range addressing (see
1314 section 7.5), which was fixed in the higher versions.
1315
1316
1317 3.5. Debugging sed scripts
1318
1319 The following two debuggers should make it easier to understand how
1320 sed scripts operate. They can save hours of grief when trying to
1321 determine the problems with a sed script.
1322
1323 (1) sd (sed debugger), by Brian Hiles
1324
1325 This debugger runs under a Unix shell, is powerful, and is easy to
1326 use. sd has conditional breakpoints and spypoints of the pattern
1327 space and hold space, on any scope defined by regex match and/or
1328 script line number. It can be semi-automated, can save diagnostic
1329 reports, and shows potential problems with a sed script before it
1330 tries to execute it. The script is robust and requires the Unix
1331 shell utilities plus the Bourne shell or Korn shell to execute.
1332
1333 http://sed.sourceforge.net/grabbag/scripts/sd.ksh.txt (2003)
1334 http://sed.sourceforge.net/grabbag/scripts/sd.sh.txt (1998)
1335
1336 (2) sedsed, by Aurelio Jargas
1337
1338 This debugger requires Python to run it, and it uses your own
1339 version of sed, whatever that may be. It displays the current input
1340 line, the pattern space, and the hold space, before and after each
1341 sed command is executed.
1342
1343 http://sedsed.sourceforge.net
1344
1345
1346 3.6. Notes about s2p, the sed-to-perl translator
1347
1348 s2p (sed to perl) is a Perl program to convert sed scripts into the
1349 Perl programming language; it is included with many versions of
1350 Perl. These problems have been found when using s2p:
1351
1352 (1) Doesn't recognize the semicolon properly after s/// commands.
1353
1354 s/foo/bar/g;
1355
1356 (2) Doesn't trim trailing whitespace after s/// commands. Even lone
1357 trailing spaces, without comments, produce an error.
1358
1359 (3) Doesn't handle multiple commands within braces. E.g.,
1360
1361 1,4{=;G;}
1362
1363 will produce perl code with missing braces, and miss the second "G"
1364 command as well. In fact, any commands after the first one are
1365 missed in the perl output script, and the output perl script will
1366 also contain mismatched braces.
1367
1368 3.7. GNU/POSIX extensions to regular expressions
1369
1370 GNU sed supports "character classes" in addition to regular
1371 character sets, such as [0-9A-F]. Like regular character sets,
1372 character classes represent any single character within a set.
1373
1374 "Character classes are a new feature introduced in the POSIX
1375 standard. A character class is a special notation for describing
1376 lists of characters that have a specific attribute, but where the
1377 actual characters themselves can vary from country to country
1378 and/or from character set to character set. For example, the notion
1379 of what is an alphabetic character differs in the USA and in
1380 France." [quoted from the docs for GNU awk v3.1.0.]
1381
1382 Though character classes don't generally conserve space on the
1383 line, they help make scripts portable for international use. The
1384 equivalent character sets _for U.S. users_ follows:
1385
1386 [[:alnum:]] - [A-Za-z0-9] Alphanumeric characters
1387 [[:alpha:]] - [A-Za-z] Alphabetic characters
1388 [[:blank:]] - [ \x09] Space or tab characters only
1389 [[:cntrl:]] - [\x00-\x19\x7F] Control characters
1390 [[:digit:]] - [0-9] Numeric characters
1391 [[:graph:]] - [!-~] Printable and visible characters
1392 [[:lower:]] - [a-z] Lower-case alphabetic characters
1393 [[:print:]] - [ -~] Printable (non-Control) characters
1394 [[:punct:]] - [!-/:-@[-`{-~] Punctuation characters
1395 [[:space:]] - [ \t\v\f] All whitespace chars
1396 [[:upper:]] - [A-Z] Upper-case alphabetic characters
1397 [[:xdigit:]] - [0-9a-fA-F] Hexadecimal digit characters
1398
1399 Note that [[:graph:]] does not match the space " ", but [[:print:]]
1400 does. Some character classes may (or may not) match characters in
1401 the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
1402 which C library was used to compile sed. For non-English languages,
1403 [[:alpha:]] and other classes may also match high ASCII characters.
1404
1405 ------------------------------
1406
1407 4. EXAMPLES
1408
1409 ONE-CHARACTER QUESTIONS
1410
1411 4.1. How do I insert a newline into the RHS of a substitution?
1412
1413 Several versions of sed permit '\n' to be typed directly into the
1414 RHS, which is then converted to a newline on output: ssed,
1415 gsed302a+, gsed103 (with the -x switch), sed15+, sedmod, and
1416 UnixDOS sed. The _easiest_ solution is to use one of these
1417 versions.
1418
1419 For other versions of sed, try one of the following:
1420
1421 (a) If typing the sed script from a Bourne shell, use one backslash
1422 "\" if the script uses 'single quotes' or two backslashes "\\" if
1423 the script requires "double quotes". In the example below, note
1424 that the leading '>' on the 2nd line is generated by the shell to
1425 prompt the user for more input. The user types in slash,
1426 single-quote, and then ENTER to terminate the command:
1427
1428 [sh-prompt]$ echo twolines | sed 's/two/& new\
1429 >/'
1430 two new
1431 lines
1432 [bash-prompt]$
1433
1434 (b) Use a script file with one backslash '\' in the script,
1435 immediately followed by a newline. This will embed a newline into
1436 the "replace" portion. Example:
1437
1438 sed -f newline.sed files
1439
1440 # newline.sed
1441 s/twolines/two new\
1442 lines/g
1443
1444 Some versions of sed may not need the trailing backslash. If so,
1445 remove it.
1446
1447 (c) Insert an unused character and pipe the output through tr:
1448
1449 echo twolines | sed 's/two/& new=/' | tr "=" "\n" # produces
1450 two new
1451 lines
1452
1453 (d) Use the "G" command:
1454
1455 G appends a newline, plus the contents of the hold space to the end
1456 of the pattern space. If the hold space is empty, a newline is
1457 appended anyway. The newline is stored in the pattern space as "\n"
1458 where it can be addressed by grouping "\(...\)" and moved in the
1459 RHS. Thus, to change the "twolines" example used earlier, the
1460 following script will work:
1461
1462 sed '/twolines/{G;s/\(two\)\(lines\)\(\n\)/\1\3\2/;}'
1463
1464 (e) Inserting full lines, not breaking lines up:
1465
1466 If one is not *changing* lines but only inserting complete lines
1467 before or after a pattern, the procedure is much easier. Use the
1468 "i" (insert) or "a" (append) command, making the alterations by an
1469 external script. To insert "This line is new" BEFORE each line
1470 matching a regex:
1471
1472 /RE/i This line is new # HHsed, sedmod, gsed 3.02a
1473 /RE/{x;s/$/This line is new/;G;} # other seds
1474
1475 The two examples above are intended as "one-line" commands entered
1476 from the console. If using a sed script, "i\" immediately followed
1477 by a literal newline will work on all versions of sed. Furthermore,
1478 the command "s/$/This line is new/" will only work if the hold
1479 space is already empty (which it is by default).
1480
1481 To append "This line is new" AFTER each line matching a regex:
1482
1483 /RE/a This line is new # HHsed, sedmod, gsed 3.02a
1484 /RE/{G;s/$/This line is new/;} # other seds
1485
1486 To append 2 blank lines after each line matching a regex:
1487
1488 /RE/{G;G;} # assumes the hold space is empty
1489
1490 To replace each line matching a regex with 5 blank lines:
1491
1492 /RE/{s/.*//;G;G;G;G;} # assumes the hold space is empty
1493
1494 (f) Use the "y///" command if possible:
1495
1496 On some Unix versions of sed (not GNU sed!), though the s///
1497 command won't accept '\n' in the RHS, the y/// command does. If
1498 your Unix sed supports it, a newline after "aaa" can be inserted
1499 this way (which is not portable to GNU sed or other seds):
1500
1501 s/aaa/&~/; y/~/\n/; # assuming no other '~' is on the line!
1502
1503 4.2. How do I represent control-codes or nonprintable characters?
1504
1505 Several versions of sed support the notation \xHH, where "HH" are
1506 two hex digits, 00-FF: ssed, GNU sed v3.02.80 and above, GNU sed
1507 v1.03, sed16 and sed15 (HHsed). Try to use one of those versions.
1508
1509 Sed is not intended to process binary or object code, and files
1510 which contain nulls (0x00) will usually generate errors in most
1511 versions of sed. The latest versions of GNU sed and ssed are an
1512 exception; they permit nulls in the input files and also in
1513 regexes.
1514
1515 On Unix platforms, the 'echo' command may allow insertion of octal
1516 or hex values, e.g., `echo "\0nnn"` or `echo -n "\0nnn"`. The echo
1517 command may also support syntax like '\\b' or '\\t' for backspace
1518 or tab characters. Check the man pages to see what syntax your
1519 version of echo supports. Some versions support the following:
1520
1521 # replace 0x1A (32 octal) with ASCII letters
1522 sed 's/'`echo "\032"`'/Ctrl-Z/g'
1523
1524 # note the 3 backslashes in the command below
1525 sed "s/.`echo \\\b`//g"
1526
1527 4.3. How do I convert files with toggle characters, like +this+, to
1528 look like [i]this[/i]?
1529
1530 Input files, especially message-oriented text files, often contain
1531 toggle characters for emphasis, like ~this~, *this*, or =this=. Sed
1532 can make the same input pattern produce alternating output each
1533 time it is encountered. Typical needs might be to generate HMTL
1534 codes or print codes for boldface, italic, or underscore. This
1535 script accomodates multiple occurrences of the toggle pattern on
1536 the same line, as well as cases where the pattern starts on one
1537 line and finishes several lines later, even at the end of the file:
1538
1539 # sed script to convert +this+ to [i]this[/i]
1540 :a
1541 /+/{ x; # If "+" is found, switch hold and pattern space
1542 /^ON/{ # If "ON" is in the (former) hold space, then ..
1543 s///; # .. delete it
1544 x; # .. switch hold space and pattern space back
1545 s|+|[/i]|; # .. turn the next "+" into "[/i]"
1546 ba; # .. jump back to label :a and start over
1547 }
1548 s/^/ON/; # Else, "ON" was not in the hold space; create it
1549 x; # Switch hold space and pattern space
1550 s|+|[i]|; # Turn the first "+" into "[i]"
1551 ba; # Branch to label :a to find another pattern
1552 }
1553 #---end of script---
1554
1555 This script uses the hold space to create a "flag" to indicate
1556 whether the toggle is ON or not. We have added remarks to
1557 illustrate the script logic, but in most versions of sed remarks
1558 are not permitted after 'b'ranch commands or labels.
1559
1560 If you are sure that the +toggle+ characters never cross line
1561 boundaries (i.e., never begin on one line and end on another), this
1562 script can be reduced to one line:
1563
1564 s|+\([^+][^+]*\)+|[i]\1[/i]|g
1565
1566 If your toggle pattern contains regex metacharacters (such as '*'
1567 or perhaps '+' or '?'), remember to quote them with backslashes.
1568
1569 CHANGING STRINGS
1570
1571 4.10. How do I perform a case-insensitive search?
1572
1573 Several versions of sed support case-insensitive matching: ssed and
1574 GNU sed v3.02+ (with I flag after s/// or /regex/); sedmod with the
1575 -i switch; and sed16 (which supports both types of switches).
1576
1577 With other versions of sed, case-insensitive searching is awkward,
1578 so people may use awk or perl instead, since these programs have
1579 options for case-insensitive searches. In gawk/mawk, use "BEGIN
1580 {IGNORECASE=1}" and in perl, "/regex/i". For other seds, here are
1581 three solutions:
1582
1583 Solution 1: convert everything to upper case and search normally
1584
1585 # sed script, solution 1
1586 h; # copy the original line to the hold space
1587 # convert the pattern space to solid caps
1588 y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
1589 # now we can search for the word "CARLOS"
1590 /CARLOS/ {
1591 # add or insert lines. Note: "s/.../.../" will not work
1592 # here because we are searching a modified pattern
1593 # space and are not printing the pattern space.
1594 }
1595 x; # get back the original pattern space
1596 # the original pattern space will be printed
1597 #---end of sed script---
1598
1599 Solution 2: search for both cases
1600
1601 Often, proper names will either start with all lower-case ("unix"),
1602 with an initial capital letter ("Unix") or occur in solid caps
1603 ("UNIX"). There may be no need to search for every possibility.
1604
1605 /UNIX/b match
1606 /[Uu]nix/b match
1607
1608 Solution 3: search for all possible cases
1609
1610 # If you must, search for any possible combination
1611 /[Ca][Aa][Rr][Ll][Oo][Ss]/ { ... }
1612
1613 Bear in mind that as the pattern length increases, this solution
1614 becomes an order of magnitude slower than the one of Solution 1, at
1615 least with some implementations of sed.
1616
1617 4.11. How do I match only the first occurrence of a pattern?
1618
1619 (1) The general solution is to use GNU sed or ssed, with one of
1620 these range expressions. The first script ("print only the first
1621 match") works with any version of sed:
1622
1623 sed -n '/RE/{p;q;}' file # print only the first match
1624 sed '0,/RE/{//d;}' file # delete only the first match
1625 sed '0,/RE/s//to_that/' file # change only the first match
1626
1627 (2) If you cannot use GNU sed and if you *know* the pattern will
1628 not occur on the first line, this will work:
1629
1630 sed '1,/RE/{//d;}' file # delete only the first match
1631 sed '1,/RE/s//to_that/' file # change only the first match
1632
1633 (3) If you cannot use GNU sed and the pattern *might* occur on the
1634 first line, use one of the following commands (credit for short GNU
1635 script goes to Donald Bruce Stewart):
1636
1637 sed '/RE/{x;/Y/!{s/^/Y/;h;d;};x;}' file # delete (one way)
1638 sed -e '/RE/{d;:a' -e '$!N;$ba' -e '}' file # delete (another way)
1639 sed '/RE/{d;:a;N;$ba;}' file # same script, GNU sed
1640 sed -e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}' file # change
1641
1642 Still another solution, using a flag in the hold space. This is
1643 portable to all seds and works if the pattern is on the first line:
1644
1645 # sed script to change "foo" to "bar" only on the first occurrence
1646 1{x;s/^/first/;x;}
1647 1,/foo/{x;/first/s///;x;s/foo/bar/;}
1648 #---end of script---
1649
1650 4.12. How do I parse a comma-delimited (CSV) data file?
1651
1652 Comma-delimited data files can come in several forms, requiring
1653 increasing levels of complexity in parsing and handling. They are
1654 often referred to as CSV files (for "comma separated values") and
1655 occasionally as SDF files (for "standard data format"). Note that
1656 some vendors use "SDF" to refer to variable-length records with
1657 comma-separated fields which are "double-quoted" if they contain
1658 character values, while other vendors use "SDF" to designate
1659 fixed-length records with fixed-length, nonquoted fields! (For help
1660 with fixed-length fields, see question 4.23)
1661
1662 The term "CSV" became a de-facto standard when Microsoft Excel used
1663 it as an optional output file format.
1664
1665 Here are 4 different forms you may encounter in comma-delimited data:
1666
1667 (a) No quotes, no internal commas
1668
1669 1001,John Smith,PO Box 123,Chicago,IL,60699
1670 1002,Mary Jones,320 Main,Denver,CO,84100,
1671
1672 (b) Like (a), with quotes around each field
1673
1674 "1003","John Smith","PO Box 123","Chicago","IL","60699"
1675 "1004","Mary Jones","320 Main","Denver","CO","84100"
1676
1677 (c) Like (b), with embedded commas
1678
1679 "1005","Tom Hall, Jr.","61 Ash Ct.","Niles","OH","44446"
1680 "1006","Bob Davis","429 Pine, Apt. 5","Boston","MA","02128"
1681
1682 (d) Like (c), with embedded commas and quotes
1683
1684 "1007","Sue "Red" Smith","19 Main","Troy","MI","48055"
1685 "1008","Joe "Hey, guy!" Hall","POB 44","Reno","NV","89504"
1686
1687 In each example above, we have 7 fields and 6 commas which function
1688 as field separators. Case (c) is a very typical form of these data
1689 files, with double quotes used to enclose each field and to protect
1690 internal commas (such as "Tom Hall, Jr.") from interpretation as
1691 field separators. However, many times the data may include both
1692 embedded quotation marks as well as embedded commas, as seen by
1693 case (d), above.
1694
1695 Case (d) is the closest to Microsoft CSV format. *However*, the
1696 Microsoft CSV format allows embedded newlines within a
1697 double-quoted field. If embedded newlines within fields are a
1698 possibility for your data, you should consider using something
1699 other than sed to work with the data file.
1700
1701 Before handling a comma-delimited data file, make sure that you
1702 fully understand its format and check the integrity of the data.
1703 Does each line contain the same number of fields? Should certain
1704 fields be composed only of numbers or of two-letter state
1705 abbreviations in all caps? Sed (or awk or perl) should be used to
1706 validate the integrity of the data file before you attempt to alter
1707 it or extract particular fields from the file.
1708
1709 After ensuring that each line has a valid number of fields, use sed
1710 to locate and modify individual fields, using the \(...\) grouping
1711 command where needed.
1712
1713 In case (a):
1714
1715 sed 's/^[^,]*,[^,]*,[^,]*,[^,]*,/.../'
1716 ^ ^ ^
1717 | | |_ 3rd field
1718 | |_______ 2nd field
1719 |_____________ 1st field
1720
1721 # Unix script to delete the second field for case (a)
1722 sed 's/^\([^,]*\),[^,]*,/\1,,/' file
1723
1724 # Unix script to change field 1 to 9999 for case (a)
1725 sed 's/^[^,]*,/9999,/' file
1726
1727 In cases (b) and (c):
1728
1729 sed 's/^"[^"]*","[^"]*","[^"]*","[^"]*",/.../'
1730 1st-- 2nd-- 3rd-- 4th--
1731
1732 # Unix script to delete the second field for case (c)
1733 sed 's/^\("[^"]*"\),"[^"]*",/\1,"",/' file
1734
1735 # Unix script to change field 1 to 9999 for case (c)
1736 sed 's/^"[^"]*",/"9999",/' file
1737
1738
1739 In case (d):
1740
1741 One way to parse such files is to replace the 3-character field
1742 separator "," with an unused character like the tab or vertical
1743 bar. (Technically, the field separator is only the comma while the
1744 fields are surrounded by "double quotes", but the net _effect_ is
1745 that fields are separated by quote-comma-quote, with quote
1746 characters added to the beginning and end of each record.) Search
1747 your datafile _first_ to make sure that your character appears
1748 nowhere in it!
1749
1750 sed -n '/|/p' file # search for any instance of '|'
1751 # if it's not found, we can use the '|' to separate fields
1752
1753 Then replace the 3-character field separator and parse as before:
1754
1755 # sed script to delete the second field for case (d)
1756 s/","/|/g; # global change of "," to bar
1757 s/^\([^|]*\)|[^|]|/\1||/; # delete 2nd field
1758 s/|/","/g; # global change of bar back to ","
1759 #---end of script---
1760
1761 # sed script to change field 1 to 9999 for case (d)
1762 # Remember to accommodate leading and trailing quote marks
1763 s/","/|/g;
1764 s/^[^|]*|/"9999|/;
1765 s/|/","/g;
1766 #---end of script---
1767
1768 Note that this technique works only if _each_ and _every_ field is
1769 surrounded with double quotes, including empty fields.
1770
1771 The following solution is for more complex examples of (d), such
1772 as: not all fields contain "double-quote" marks, or the presence of
1773 embedded "double-quote" marks within fields, or extraneous
1774 whitespace around field delimiters. (Thanks to Greg Ubben for this
1775 script!)
1776
1777 # sed script to convert case (d) to bar-delimited records
1778 s/^ *\(.*[^ ]\) *$/|\1|/;
1779 s/" *, */"|/g;
1780 : loop
1781 s/| *\([^",|][^,|]*\) *, */|\1|/g;
1782 s/| *, */|\1|/g;
1783 t loop
1784 s/ *|/|/g;
1785 s/| */|/g;
1786 s/^|\(.*\)|$/\1/;
1787 #---end of script---
1788
1789 For example, it turns this (which is badly-formed but legal):
1790
1791 first,"",unquoted ,""this" is, quoted " ,, sub "quote" inside, f", lone " empty:
1792
1793 into this:
1794
1795 first|""|unquoted|""this" is, quoted "||sub "quote" inside|f"|lone " empty:
1796
1797 Note that the script preserves the "double-quote" marks, but
1798 changes only the commas where they are used as field separators. I
1799 have used the vertical bar "|" because it's easier to read, but you
1800 may change this to another field separator if you wish.
1801
1802 If your CSV datafile is more complex, it would probably not be
1803 worth the effort to write it in sed. For such a case, you should
1804 use Perl with a dedicated CSV module (there are at least two recent
1805 CSV parsers available from CPAN).
1806
1807 4.13. How do I handle fixed-length, columnar data?
1808
1809 Sed handles fixed-length fields via \(grouping\) and backreferences
1810 (\1, \2, \3 ...). If we have 3 fields of 10, 25, and 9 characters
1811 per field, our sed script might look like so:
1812
1813 s/^\(.\{10\}\)\(.\{25\}\)\(.\{9\}\)/\3\2\1/; # Change the fields
1814 ^^^^^^^^^^^~~~~~~~~~~~========== # from 1,2,3 to 3,2,1
1815 field #1 field #2 field #3
1816
1817 This is a bit hard to read. By using GNU sed or ssed with the -r
1818 switch active, it can look like this:
1819
1820 s/^(.{10})(.{25})(.{9})/\3\2\1/; # Using the -r switch
1821
1822 To delete a field in sed, use grouping and omit the backreference
1823 from the field to be deleted. If the data is long or difficult to
1824 work with, use ssed with the -R switch and the /x flag after an s///
1825 command, to insert comments and remarks about the fields.
1826
1827 For records with many fields, use GNU awk with the FIELDWIDTHS
1828 variable set in the top of the script. For example:
1829
1830 awk 'BEGIN{FIELDWIDTHS = "10 25 9"}; {print $3 $2 $1}' file
1831
1832 This is much easier to read than a similar sed script, especially
1833 if there are more than 5 or 6 fields to manipulate.
1834
1835 4.14. How do I commify a string of numbers?
1836
1837 Use the simplest script necessary to accomplish your task. As
1838 variations of the line increase, the sed script must become more
1839 complex to handle additional conditions. Whole numbers are
1840 simplest, followed by decimal formats, followed by embedded words.
1841
1842 Case 1: simple strings of whole numbers separated by spaces or
1843 commas, with an optional negative sign. To convert this:
1844
1845 4381, -1222333, and 70000: - 44555666 1234567890 words
1846 56890 -234567, and 89222 -999777 345888777666 chars
1847
1848 to this:
1849
1850 4,381, -1,222,333, and 70,000: - 44,555,666 1,234,567,890 words
1851 56,890 -234,567, and 89,222 -999,777 345,888,777,666 chars
1852
1853 use one of these one-liners:
1854
1855 sed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed
1856 sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' # other seds
1857
1858 Case 2: strings of numbers which may have an embedded decimal
1859 point, separated by spaces or commas, with an optional negative
1860 sign. To change this:
1861
1862 4381, -6555.1212 and 70000, 7.18281828 44906982.071902
1863 56890 -2345.7778 and 8.0000: -49000000 -1234567.89012
1864
1865 to this:
1866
1867 4,381, -6,555.1212 and 70,000, 7.18281828 44,906,982.071902
1868 56,890 -2,345.7778 and 8.0000: -49,000,000 -1,234,567.89012
1869
1870 use the following command for GNU sed:
1871
1872 sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'
1873
1874 and for other versions of sed:
1875
1876 sed -f case2.sed files
1877
1878 # case2.sed
1879 s/^/ /; # add space to start of line
1880 :a
1881 s/\( [-0-9]\{1,\}\)\([0-9]\{3\}\)/\1,\2/g
1882 ta
1883 s/ //; # remove space from start of line
1884 #---end of script---
1885
1886 4.15. How do I prevent regex expansion on substitutions?
1887
1888 Sometimes you want to *match* regular expression metacharacters as
1889 literals (e.g., you want to match "[0-9]" or "\n"), to be replaced
1890 with something else. The ordinary way to prevent expanding
1891 metacharacters is to prefix them with a backslash. Thus, if "\n"
1892 matches a newline, "\\n" will match the two-character string of
1893 'backslash' followed by 'n'.
1894
1895 But doing this repeatedly can become tedious if there are many
1896 regexes. The following script will replace alternating strings of
1897 literals, where no character is interpreted as a regex
1898 metacharacter:
1899
1900 # filename: sub_quote.sed
1901 # author: Paolo Bonzini
1902 # sed script to add backslash to find/replace metacharacters
1903 N; # add even numbered line to pattern space
1904 s,[]/\\$*[],\\&,g; # quote all of [, ], /, \, $, or *
1905 s,^,s/,; # prepend "s/" to front of pattern space
1906 s,$,/,; # append "/" to end of pattern space
1907 s,\n,/,; # change "\n" to "/", making s/from/to/
1908 #---end of script---
1909
1910 Here's a sample of how sub_quote.sed might be used. This example
1911 converts typical sed regexes to perl-style regexes. The input file
1912 consists of 10 lines:
1913
1914 [0-9]
1915 \d
1916 [^0-9]
1917 \D
1918 \+
1919 +
1920 \?
1921 ?
1922 \|
1923 |
1924
1925 Run the command "sed -f sub_quote.sed input", to transform the
1926 input file (above) to 5 lines of output:
1927
1928 s/\[0-9\]/\\d/
1929 s/\[^0-9\]/\\D/
1930 s/\\+/+/
1931 s/\\?/?/
1932 s/\\|/|/
1933
1934 The above file is itself a sed script, which can then be used to
1935 modify other files.
1936
1937 4.16. How do I convert a string to all lowercase or capital letters?
1938
1939 The easiest method is to use a new version of GNU sed, ssed, sedmod
1940 or sed16 and employ the \U, \L, or other switches on the right side
1941 of an s/// command. For example, to convert any word which begins
1942 with "reg" or "exp" into solid capital letters:
1943
1944 sed -r "s/\<(reg|exp)[a-z]+/\U&/g" # gsed4.+ or ssed
1945 sed "s/\<reg[a-z]+/\U&/g; s/\<exp[a-z]+/\U&/g" # sed16 and sedmod
1946
1947 As you can see, sedmod and sed16 do not support alternation (|),
1948 but they do support case conversion. If none of these versions of
1949 sed are available to you, some sample scripts for this task are
1950 available from the Seder's Grab Bag:
1951
1952 http://sed.sourceforge.net/grabbag/scripts
1953
1954 Note that some case conversion scripts are listed under "Filename
1955 manipulation" and others are under "Text formatting."
1956
1957 CHANGING BLOCKS (consecutive lines)
1958
1959 4.20. How do I change only one section of a file?
1960
1961 You can match a range of lines by line number, by regexes (say, all
1962 lines between the words "from" and "until"), or by a combination of
1963 the two. For multiple substitutions on the same range, put the
1964 command(s) between braces {...}. For example:
1965
1966 # replace only between lines 1 and 20
1967 1,20 s/Johnson/White/g
1968
1969 # replace everywhere EXCEPT between lines 1 and 20
1970 1,20 !s/Johnson/White/g
1971
1972 # replace only between words "from" and "until". Note the
1973 # use of \<....\> as word boundary markers in GNU sed.
1974 /from/,/until/ { s/\<red\>/magenta/g; s/\<blue\>/cyan/g; }
1975
1976 # replace only from the words "ENDNOTES:" to the end of file
1977 /ENDNOTES:/,$ { s/Schaff/Herzog/g; s/Kraft/Ebbing/g; }
1978
1979 For technical details on using address ranges, see section 3.3
1980 ("Addressing and Address ranges").
1981
1982 4.21. How do I delete or change a block of text if the block contains
1983 a certain regular expression?
1984
1985 The following deletes the block between 'start' and 'end'
1986 inclusively, if and only if the block contains the string
1987 'regex'. Written by Russell Davies, with additional comments:
1988
1989 # sed script to delete a block if /regex/ matches inside it
1990 :t
1991 /start/,/end/ { # For each line between these block markers..
1992 /end/!{ # If we are not at the /end/ marker
1993 $!{ # nor the last line of the file,
1994 N; # add the Next line to the pattern space
1995 bt
1996 } # and branch (loop back) to the :t label.
1997 } # This line matches the /end/ marker.
1998 /regex/d; # If /regex/ matches, delete the block.
1999 } # Otherwise, the block will be printed.
2000 #---end of script---
2001
2002 Note: When the script above reaches /regex/, the entire multi-line
2003 block is in the pattern space. To replace items inside the block,
2004 use "s///". To change the entire block, use the 'c' (change)
2005 command:
2006
2007 /regex/c\
2008 1: This will replace the entire block\
2009 2: with these two lines of text.
2010
2011 4.22. How do I locate a paragraph of text if the paragraph contains a
2012 certain regular expression?
2013
2014 Assume that paragraphs are separated by blank lines. For regexes
2015 that are single terms, use one of the following scripts:
2016
2017 sed -e '/./{H;$!d;}' -e 'x;/regex/!d' # most seds
2018 sed '/./{H;$!d;};x;/regex/!d' # GNU sed
2019
2020 To print paragraphs only if they contain 3 specific regular
2021 expressions (RE1, RE2, and RE3), in any order in the paragraph:
2022
2023