[PS1] Powershell で CSV ファイルを読み込む

2014年12月17日

 PowerShell Advent Calendar 2014 17日目の記事です。

 CSV ファイルを参照するとき、その PC に Excel が入っていればよいのですが、そうでない場合メモ帳で・・・、となります。しかし素の csv テキストを目で追うというのは辛いです(汗

 Windows 7 以降だと Powershell が標準でインストールされているので、それを使えば、Excel 程ではないですがある程度目に優しい形に整形することが可能です。

 たとえば、以下のような sample.csv を操作する場合を考えます。

都道府県コード,*,*,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47
都道府県,,,北海道,青森県,岩手県,宮城県,秋田県,山形県,福島県,茨城県,栃木県,群馬県,埼玉県,千葉県,東京都,神奈川県,新潟県,富山県,石川県,福井県,山梨県,長野県,岐阜県,静岡県,愛知県,三重県,滋賀県,京都府,大阪府,兵庫県,奈良県,和歌山県,鳥取県,島根県,岡山県,広島県,山口県,徳島県,香川県,愛媛県,高知県,福岡県,佐賀県,長崎県,熊本県,大分県,宮崎県,鹿児島県,沖縄県
県庁所在地,,,札幌市,青森市,盛岡市,仙台市,秋田市,山形市,福島市,水戸市,宇都宮市,前橋市,さいたま市,千葉市,新宿区,横浜市,新潟市,富山市,金沢市,福井市,甲府市,長野市,岐阜市,静岡市,名古屋市,津市,大津市,京都市,大阪市,神戸市,奈良市,和歌山市,鳥取市,松江市,岡山市,広島市,山口市,徳島市,高松市,松山市,高知市,福岡市,佐賀市,長崎市,熊本市,大分市,宮崎市,鹿児島市,那覇市
面積,km2,,8万3456.20,9607.04,1万5278.77,7285.73,1万1612.22,9323.44,1万3782.75,6095.69,6408.28,6363.16,3797.25,5156.58,2187.42,2415.84,1万2583.46,4247.4,4185.47,4189.27,4465.37,1万3562.23,1万0621.17,7780.09,5164.06,5776.87,4017.36,4613,1896.83,8395.47,3691.09,4726.12,3507.26,6707.57,7113,8478.52,6112.22,4145.69,1876.47,5677.38,7105.01,4976.17,2439.58,4095.22,7405.21,6339.33,7734.78,9187.8,2275.28
可住地面積,km2,,21899.07,3203.45,3709.81,3129.79,3154.52,2850.03,4218.02,3975.87,2946.03,2294.56,2565.6,3487.76,1395.86,1459.51,4481.28,1850.29,1382.73,1066.03,950.33,3333.82,2144.72,2730.98,2951.63,2021.76,1289.04,1155.06,1313.87,2755.57,850.53,1097.31,912.01,1255.99,2210.47,2254.82,1749.12,1021.5,991.25,1669.78,1168.16,2737.19,1339.8,1617.12,2745.58,1769.05,1835,3242.51,1159.05
可住地比率,%,,26.2,33.3,24.3,43,27.2,30.6,30.6,65.2,46,36.1,67.6,67.6,63.8,60.4,35.6,43.6,33,25.4,21.3,24.5,20.2,35.1,57.3,35,32.1,25,69.4,32.8,23,23.2,26,18.7,31.1,26.6,28.6,24.6,52.9,29.4,16.4,55.1,54.9,39.5,37.1,27.9,23.7,35.3,51
人口,国勢調査,2010,"5,506,419","1,373,339","1,330,147","2,348,165","1,085,997","1,168,924","2,029,064","2,969,770","2,007,683","2,008,068","7,194,556","6,216,289","13,159,388","9,048,331","2,374,450","1,093,247","1,169,788","806,314","863,075","2,152,449","2,080,773","3,765,007","7,410,719","1,854,724","1,410,777","2,636,092","8,865,245","5,588,133","1,400,728","1,002,198","588,667","717,397","1,945,276","2,860,750","1,451,338","785,491","995,842","1,431,493","764,456","5,071,968","849,788","1,426,779","1,817,426","1,196,529","1,135,233","1,706,242","1,392,818"
人口,国勢調査,2005,"5,627,737","1,436,657","1,385,041","2,360,218","1,145,501","1,216,181","2,091,319","2,975,167","2,016,631","2,024,135","7,054,243","6,056,462","12,576,601","8,791,597","2,431,459","1,111,729","1,174,026","821,592","884,515","2,196,114","2,107,226","3,792,377","7,254,704","1,866,963","1,380,361","2,647,660","8,817,166","5,590,601","1,421,310","1,035,969","607,012","742,223","1,957,264","2,876,642","1,492,606","809,950","1,012,400","1,467,815","796,292","5,049,908","866,369","1,478,632","1,842,233","1,209,571","1,153,042","1,753,179","1,361,594"
人口,国勢調査,1995,"5,692,321","1,481,663","1,419,505","2,328,739","1,213,667","1,256,958","2,133,592","2,955,530","1,984,390","2,003,540","6,759,311","5,797,782","11,773,605","8,245,900","2,488,364","1,123,125","1,180,068","826,996","881,996","2,193,984","2,100,315","3,737,689","6,868,336","1,841,358","1,287,005","2,629,592","8,797,268","5,401,877","1,430,862","1,080,435","614,929","771,441","1,950,750","2,881,748","1,555,543","832,427","1,027,006","1,506,700","816,704","4,933,393","884,316","1,544,934","1,859,793","1,231,306","1,175,819","1,794,224","1,273,440"
人口,国勢調査,1970,"5,184,287","1,427,520","1,371,383","1,819,223","1,241,376","1,225,618","1,946,077","2,143,551","1,580,021","1,658,909","3,866,472","3,366,624","11,408,071","5,472,247","2,360,982","1,029,695","1,002,420","744,230","762,029","1,956,917","1,758,954","3,089,895","5,386,163","1,543,083","889,768","2,250,087","7,620,480","4,667,928","930,160","1,042,736","568,777","773,575","1,707,026","2,436,135","1,511,448","791,111","907,897","1,418,124","786,882","4,027,416","838,468","1,570,245","1,700,229","1,155,566","1,051,105","1,729,150","945,111"
人口,国勢調査,1945,"3,518,389","1,083,250","1,227,789","1,462,254","1,211,871","1,326,350","1,957,356","1,944,344","1,546,355","1,546,081","2,047,261","1,966,862","3,488,284","1,865,667","2,389,653","953,834","887,510","724,856","839,057","2,121,050","1,518,649","2,220,358","2,857,851","1,394,286","860,911","1,603,796","2,800,958","2,821,892","779,685","936,006","563,220","860,275","1,564,626","1,885,471","1,356,491","835,763","863,700","1,361,484","775,578","2,746,855","830,431","1,318,589","1,556,490","1,124,513","913,687","1,538,466",-
人口,国勢調査,1920,"2,359,183","756,454","845,540","961,768","898,537","968,925","1,362,750","1,350,400","1,046,479","1,052,610","1,319,533","1,336,155","3,699,428","1,323,390","1,776,474","724,276","747,360","599,155","583,453","1,562,722","1,070,407","1,550,387","2,089,762","1,069,270","651,050","1,287,147","2,587,847","2,301,799","564,607","750,411","454,675","714,712","1,217,698","1,541,905","1,041,013","670,212","677,852","1,046,720","670,895","2,188,249","673,895","1,136,182","1,233,233","860,282","651,097","1,415,582","571,572"
  1. 前準備 (読み込み)
    1. ファイルの読み込む

       まず gc (Get-Content) で読み込みます。
       エンコード指定が必要であれば -Encoding を使用して指定します

      $txt = gc .sample.csv
      
    2. 読み込んだファイルをcsvファイルとして評価する

       読み込んだファイルを csv ファイルとして評価します。
       -Header 指定なしで読み込むと一行目を列名として評価してくれます。しかし一行目に重複する項目がありエラーになる場合や、ヘッダーなしCSVファイルの場合などは -header を指定します。
       上記 sample.csv の場合、一行目の 2 列目と 3 列目が同じ値があるために、列名の評価に失敗します。したがって、-header に @(1..50) を与えています。# まともに列名を付けても良いですが、面倒くさいので(汗

      $csv = $txt | ConvertFrom-Csv -Header @(1..50)
      
  2. 参照
    1. 全体を見る

       評価した結果が $csv に入っているので、それを ogv (Out-GridView) で参照します。

      $csv | ogv
      
    2. 特定の行を選択する

       たとえば、3行目だけ見たい場合。(配列は 0 オリジン)

      $csv[2] | ogv
      
    3. 特定の行(複数行)を選択する

       たとえば、4行目~6行目を見たい場合。

      $csv[3..5] | ogv
      
    4. 連続でない複数行を選択する

       たとえば、2行目と7行目から最後のあたりまで見たい場合は以下。99 は「最終行よりも大きい」程度の意味です。

      $csv[@(1) + @(6..99)] | ogv
      
    5. 16列と26列と30列を射影する

       ogv (Out-GridView) はなぜか 30列までしか表示してくれません。(誰か 30 列以上表示させる方法教えて!)
       なので、見たい行のみを指定します。
       指定の際、select 1,2,… ではなく select "1","2",… のように項目名をダブルクォーテーションで囲んでおく所がポイントです。

      $csv | select "1","2","16","26","30" | ogv
      
    6. 列の内容によって選択する

       たとえば、3列目が 1989 以上を選択したい場合。

      $csv[@(1)+@(6..99)] | ? {$_."3" -ge 1989} | ogv
      
    7. 複数の選択結果を合成する (Union)

       異なる選択結果を選択結果を足し合わせることができます。

      $a = @($csv[1])
      $b = $csv[6..99] | ? {$_."3" -ge 1989}
      $a + $b | ogv
      
    8. 新しい列を作る

       新しい計算列を作る場合、Name に列名、Expression に列を生成するコード(無名関数)を設定します。

      $csv | ? {$_."1" -eq "人口"} |
        select "3",
               @{Name="東京"; Expression={"{0:0}万人" -f ($_."16" / 10000) }} |
        ogv
      





カテゴリー: OS, PowerShell, Windows

Follow comments via the RSS Feed | Leave a comment | Trackback URL

コメントを投稿する

日本語が含まれない投稿は無視されますのでご注意ください。(スパム対策)


«   »
 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org