Saturday, October 13, 2012

[BASH] Count frequency of each letter in an English document

use only coreutils of Linux

first, convert all upper to lower by tr command, then grep with -o option, sort, uniq, then sort again!

hvn@lappy:~/Downloads$ tr '[:upper:]' '[:lower:]' <  prideandprejudice.txt | grep -o [a-z] | sort -nr | uniq -c | sort -nr
  71194 e
  48159 t
  42695 a
  41379 o
  38944 i
  38721 n
  34582 h
  33870 s
  33468 r
  22843 d
  22071 l
  15510 u
  15124 m
  14060 c
  13033 y
  12573 w
  12381 f
  10444 g
   9363 b
   8683 p
   5840 v
   3342 k
    970 j
    938 z
    867 x
    638 q

No comments: