How to Read a File and Sort It by Length in Java and Output to Another File

Standard UNIX utility

sort
Sortunix.png

The sort command

Original author(south) Ken Thompson (AT&T Bell Laboratories)
Developer(s) Various open-source and commercial developers
Initial release November 3, 1971; 50 years ago  (1971-eleven-03)
Operating organization Multics, Unix, Unix-like, V, Program ix, Inferno, MSX-DOS, IBM i
Platform Cross-platform
Type Command
License coreutils: GPLv3+

In computing, sort is a standard control line program of Unix and Unix-like operating systems, that prints the lines of its input or chain of all files listed in its argument list in sorted order. Sorting is done based on one or more than sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of control-line options that tin vary by implementation. For case the "-r" flag will reverse the sort guild.

History [edit]

A sort control that invokes a general sort facility was first implemented within Multics.[1] Later, it appeared in Version ane Unix. This version was originally written by Ken Thompson at AT&T Bong Laboratories. By Version four Thompson had modified information technology to use pipes, merely sort retained an option to name the output file considering information technology was used to sort a file in place. In Version 5, Thompson invented "-" to represent standard input.[2]

The version of sort bundled in GNU coreutils was written by Mike Haertel and Paul Eggert.[three] This implementation employs the merge sort algorithm.

Like commands are available on many other operating systems, for example a sort command is role of ASCII's MSX-DOS2 Tools for MSX-DOS version 2.[4]

The sort command has also been ported to the IBM i operating arrangement.[five]

Syntax [edit]

sort [OPTION]... [FILE]...        

With no FILE, or when FILE is -, the control reads from standard input.

Parameters [edit]

Name Description Unix Program 9 Inferno FreeBSD Linux MSX-DOS IBM i
-b,
--ignore-leading-blanks
Ignores leading blanks. Yes Yes No Yes Aye No Aye
-c Check that input file is sorted. No Yes No Yes Aye No Yeah
-C Like -c, simply does non written report the first bad line. No No No Yes Yes No No
-d,
--dictionary-guild
Considers simply blanks and alphanumeric characters. Aye Yes No Yes Yeah No Yes
-f,
--ignore-case
Fold lower case to upper case characters. Yeah Aye No Yes Yeah No Yes
-thousand,
--general-numeric-sort,
--sort=general-numeric
Compares co-ordinate to general numerical value. Aye Yes No Yeah Yes No No
-h,
--homo-numeric-sort,
--sort=human-numeric
Compare human readable numbers (e.g., 2K 1G). Yep No No Yes Yep No No
-i,
--ignore-nonprinting
Considers but printable characters. Yes Aye No Yeah Yes No Yes
-k,
--key= POS1 [, POS2 ]
Offset a key at POS1 (origin 1), cease it at POS2 (default finish of line) No No No Yes Yes No No
-m Merge only; input files are causeless to be presorted. No Yep No Yep Yeah No Yeah
-Thou,
--calendar month-sort,
--sort=calendar month
Compares (unknown) < 'January' < ... < 'Dec'. Yes Yes No Aye Yes No No
-n,
--numeric-sort,
--sort=numeric
Compares co-ordinate to string numerical value. Yeah Yes Aye Yes Yes No Yes
-o OUTPUT Uses OUTPUT file instead of standard output. No Yeah No Aye Yes No Yeah
-r,
--reverse
Reverses the issue of comparisons. Yes Yes Aye Yes Yes No Yeah
-R,
--random-sort,
--sort=random
Shuffles, but groups identical keys. Encounter also: shuf Yes No No Yes Yes No No
-due south Stabilizes sort by disabling terminal-resort comparison. No No No Yep Yes No No
-S size,
--buffer-size= size
Use size for the maximum size of the retentiveness buffer. No No No Yes No No No
-tx 'Tab character' separating fields is 10. No Yeah No No Yes No Aye
-t char,
--field-separator= char
Uses char instead of non-blank to bare transition. No No No Yes Yes No No
-T dir,
--temporary-directory= dir
Uses dir for temporaries. No Yep No Aye Yes No No
-u,
--unique
Unique processing to suppress all just ane in each set of lines having equal keys. No Aye No Yes Yeah No Aye
-5,
--version-sort
Natural sort of (version) numbers within text No No No Yes Yes No No
-w Similar -i, merely ignore only tabs and spaces. No Yep No No No No No
-z,
--zero-terminated
Stop lines with 0 byte, not newline No No No Yes Yes No No
--help Display help and exit No No No Yes Yes No No
--version Output version data and exit No No No Yes Yeah No No
/R Reverses the event of comparisons. No No No No No Yep No
/S Specify the number of digits to determine how many digits of each line should exist judged. No No No No No Yes No
/A Sort past ASCII lawmaking. No No No No No Yeah No
/H Include hidden files when using wild cards. No No No No No Yes No

Examples [edit]

Sort a file in alphabetical society [edit]

                        $            cat phonebook            Smith, Brett     555-4321            Doe, John        555-1234            Doe, Jane        555-3214            Avery, Cory      555-4132            Fogarty, Suzie   555-2314          
                        $            sort phonebook            Avery, Cory      555-4132            Doe, Jane        555-3214            Doe, John        555-1234            Fogarty, Suzie   555-2314            Smith, Brett     555-4321          

Sort past number [edit]

The -n option makes the plan sort co-ordinate to numerical value. The du control produces output that starts with a number, the file size, and so its output can be piped to sort to produce a list of files sorted by (ascending) file size:

                        $            du /bin/*            |            sort -n            4       /bin/domainname            24      /bin/ls            102     /bin/sh            304     /bin/csh          

The find command with the ls pick prints file sizes in the 7th field, and then a list of the LaTeX files sorted by file size is produced by:

                        $            detect . -name            "*.tex"            -ls            |            sort -k 7n          

Columns or fields [edit]

Use the -k option to sort on a certain cavalcade. For instance, use "-1000 ii" to sort on the second column. In old versions of sort, the +1 option made the program sort on the second cavalcade of data (+2 for the third, etc.). This usage is deprecated.

                        $            true cat zipcode            Adam  12345            Bob   34567            Joe   56789            Sam   45678            Wendy 23456          
                        $            sort -m 2n zipcode            Adam  12345            Wendy 23456            Bob   34567            Sam   45678            Joe   56789          

Sort on multiple fields [edit]

The -grand m,n option lets y'all sort on a key that is potentially equanimous of multiple fields (beginning at column yard, end at cavalcade n):

                        $            cat quota            fred 2000            bob 1000            an 1000            chad m            don 1500            eric 500          
                        $            sort -k2,2n -k1,1 quota            eric 500            an m            bob 1000            chad 1000            don 1500            fred 2000          

Here the first sort is done using column two. -k2,2n specifies sorting on the fundamental starting and ending with column 2, and sorting numerically. If -k2 is used instead, the sort key would begin at column two and extend to the finish of the line, spanning all the fields in betwixt. -k1,ane dictates breaking ties using the value in column 1, sorting alphabetically past default. Note that bob, and chad accept the same quota and are sorted alphabetically in the final output.

Sorting a piping delimited file [edit]

                        $            sort -k2,2,-k1,1 -t'|'            zipcode            Adam|12345            Wendy|23456            Sam|45678            Joe|56789            Bob|34567          

Sorting a tab delimited file [edit]

Sorting a file with tab separated values requires a tab character to be specified as the cavalcade delimiter. This illustration uses the shell's dollar-quote notation[6] [7] to specify the tab every bit a C escape sequence.

                        $            sort -k2,2 -t            $'\t'            phonebook            Doe, John	555-1234            Fogarty, Suzie	555-2314            Doe, Jane	555-3214            Avery, Cory	555-4132            Smith, Brett	555-4321          

Sort in reverse [edit]

The -r option just reverses the gild of the sort:

                        $            sort -rk 2n zipcode            Joe   56789            Sam   45678            Bob   34567            Wendy 23456            Adam  12345          

Sort in random [edit]

The GNU implementation has a -R --random-sort option based on hashing; this is non a full random shuffle because information technology volition sort identical lines together. A true random sort is provided past the Unix utility shuf.

Sort past version [edit]

The GNU implementation has a -5 --version-sort option which is a natural sort of (version) numbers within text. Two text strings that are to be compared are split into blocks of letters and blocks of digits. Blocks of messages are compared alpha-numerically, and blocks of digits are compared numerically (i.east., skipping leading zeros, more than digits means larger, otherwise the leftmost digits that differ determine the result). Blocks are compared left-to-right and the get-go non-equal block in that loop decides which text is larger. This happens to piece of work for IP addresses, Debian package version strings and similar tasks where numbers of variable length are embedded in strings.

Come across likewise [edit]

  • Collation
  • List of Unix commands
  • uniq
  • shuf

References [edit]

  1. ^ "Multics Commands". world wide web.multicians.org.
  2. ^ McIlroy, G. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bong Labs. 139.
  3. ^ "sort(1): sort lines of text files - Linux man page". linux.die.net.
  4. ^ "MSX-DOS2 Tools User'southward Manual - MSX-DOS2 TOOLS ユーザーズマニュアル". April 1, 1993 – via Internet Annal.
  5. ^ IBM. "IBM System i Version vii.2 Programming Qshell" (PDF) . Retrieved 2020-09-05 .
  6. ^ "The GNU Bash Reference Manual, for Bash, Version four.ii: Section three.1.ii.iv ANSI-C Quoting". Free Software Foundation, Inc. 28 December 2010. Retrieved i February 2013. Words of the class $'string' are treated especially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard.
  7. ^ Fowler, Glenn S.; Korn, David 1000.; Vo, Kiem-Phong. "KornShell FAQ". Archived from the original on 2013-05-27. Retrieved 3 March 2015. The $'...' string literal syntax was added to ksh93 to solve the problem of entering special characters in scripts. Information technology uses ANSI-C rules to interpret the string between the '...'.

Farther reading [edit]

  • Shotts (Jr), William E. (2012). The Linux Command Line: A Complete Introduction. No Starch Press. ISBN978-1593273897.
  • McElhearn, Kirk (2006). The Mac OS X Command Line: Unix Nether the Hood. John Wiley & Sons. ISBN978-0470113851.

External links [edit]

  • Original Sort manpage The original BSD Unix program'due south manpage
  • sort(1)  – Linux User Manual – User Commands
  • sort(ane)  – Plan nine Developer's Manual, Volume 1
  • sort(1)  – Inferno General commands Transmission
  • Further details virtually sort at Softpanorama

boughnercablecony.blogspot.com

Source: https://en.wikipedia.org/wiki/Sort_%28Unix%29

Related Posts

0 Response to "How to Read a File and Sort It by Length in Java and Output to Another File"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel