Skip to main content

Raymii.org Raymii.org Logo

Quis custodiet ipsos custodes?
Home | About | All pages | Cluster Status | RSS Feed | Gopher

Complete word count analysis of Security Now, episode 1 trough 370.

Published: 09-09-2012 | Author: Remy van Elst | Text only version of this article


❗ This post is over ten years old. It may no longer be up to date. Opinions may have changed.

Table of Contents

  • Steve only
  • Leo Only

  • Security Now is a podcast by Leo Laporte and Steve Gibson released on the Twit.tv network.

    Steve pays to get the podcast transcribed, and the files are up over are grc.com.

    I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!

    Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.

    You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days.

    I decided to run my analyzer over the complete podcast text archive. This is from episode 001 to 371.

    Get the files:

    for i in {001..371}; do curl http://www.grc.com/sn/sn-${i}.txt >> sn.txt; echo $i; done
    

    Clean the files up:

    cat sn.txt | LC_CTYPE=C tr -cd '[:alnum:] [:space:]' > csn.txt
    

    Analyze the text file:

    cat csn.txt | LC_CTYPE=C tr [:space:] '\n' | grep -v "^\s*$" | sort | uniq -c | sort -bnr > count-combined.txt
    

    Result:

    ed count-combined.txt 
    461930
    1,20np
    1       65548 the
    2       49919 to
    3       42284 that
    4       40759 STEVE
    5       40065 I
    6       39496 a
    7       35321 of
    8       31706 and
    9       30845 it
    10      29930 is
    11      24634 you
    12      22213 And
    13      20365 in
    14      16467 this
    15      14406 was
    16      13811 So
    17      13761 its
    18      13711 for
    19      12847 have
    20      11599 on
    

    Full result

    Steve only

    cat sn.txt | grep "STEVE:" > stonly.txt     
    
    cat stonly.txt | LC_CTYPE=C tr -cd '[:alnum:] [:space:]' > stonlyclean.txt
    
    cat stonlyclean.txt | LC_CTYPE=C tr [:space:] '\n' | grep -v "^\s*$" | sort | uniq -c | sort -bnr > sto.txt
    

    Result

    ed sto.txt 
    461930
    1,20np
    1       65548 the
    2       49919 to
    3       42284 that
    4       40759 STEVE
    5       40065 I
    6       39496 a
    7       35321 of
    8       31706 and
    9       30845 it
    10      29930 is
    11      24634 you
    12      22213 And
    13      20365 in
    14      16467 this
    15      14406 was
    16      13811 So
    17      13761 its
    18      13711 for
    19      12847 have
    20      11599 on 
    

    Steve only

    Leo Only

    cat sn.txt | grep "LEO:" > leoonly.txt     
    
    cat leoonly.txt | LC_CTYPE=C tr -cd '[:alnum:] [:space:]' > leoonlyclean.txt
    
    cat leoonlyclean.txt | LC_CTYPE=C tr [:space:] '\n' | grep -v "^\s*$" | sort | uniq -c | sort -bnr > leoc.txt
    

    Result

    ed leoc.txt 
    367236
    1,20np
    1       40349 LEO
    2       30161 the
    3       25301 to
    4       24623 I
    5       23060 a
    6       19027 you
    7       17115 it
    8       16441 that
    9       15115 of
    10      13676 and
    11      12256 is
    12      9785 in
    13      8689 And
    14      8282 this
    15      7633 have
    16      7552 on
    17      7094 for
    18      6492 its
    19      6032 do
    20      5922 know    
    

    Leo only

    Tags: analyze , articles , bash , leo-laporte , podcast , security-now , steve-gibson