collections.Counter(値の取得,least common,to dataframe) ｜自作で機械学習モデル・AIの使い方を学ぶ

collections.Counterとは

collections.Counterは、Pythonの標準ライブラリであるcollectionsモジュールに含まれるクラスです。このクラスを使うことで、リストや文字列などのイテラブルオブジェクトに含まれる要素の出現頻度を数えることができます。

collections.Counterで値の取得

まずは、collections.Counterを使って値の取得をしてみましょう。

from collections import Counter
lst = ['apple', 'banana', 'orange', 'apple', 'orange', 'orange', 'banana']
cnt = Counter(lst)
print(cnt['apple']) # 2
print(cnt['banana']) # 2
print(cnt['orange']) # 3

上記のコードでは、リストlstに含まれる要素の出現頻度をcollections.Counterで数え、それぞれの要素の出現回数を取得しています。cnt[‘apple’]のように、collections.Counterオブジェクトに要素を指定することで、その要素の出現回数を取得することができます。

collections.Counterのleast commonの取得方法

collections.Counterでは、most_common()メソッドを使うことで、出現回数が多い順に要素を取得することができます。

from collections import Counter
lst = ['apple', 'banana', 'orange', 'apple', 'orange', 'orange', 'banana']
cnt = Counter(lst)
print(cnt.most_common()) # [('orange', 3), ('apple', 2), ('banana', 2)]

上記のコードでは、リストlstに含まれる要素の出現頻度をcollections.Counterで数え、most_common()メソッドを使って、出現回数が多い順に要素を取得しています。most_common()メソッドに引数を指定することで、上位何件の要素を取得するかを指定することもできます。

逆に、least commonの要素を取得したい場合には、most_common()メソッドの結果をリバースすることで実現できます。

from collections import Counter
lst = ['apple', 'banana', 'orange', 'apple', 'orange', 'orange', 'banana']
cnt = Counter(lst)
print(list(reversed(cnt.most_common()))) # [('banana', 2), ('apple', 2), ('orange', 3)]

collections.Counterの値をデータフレームに変換

collections.Counterで数えた結果を、Pandasのデータフレームに変換することで、より扱いやすくなります。

from collections import Counter
import pandas as pd
lst = ['apple', 'banana', 'orange', 'apple', 'orange', 'orange', 'banana']
cnt = Counter(lst)
df = pd.DataFrame.from_dict(cnt, orient='index').reset_index()
df = df.rename(columns={'index':'fruit', 0:'count'})
print(df)

上記のコードでは、リストlstに含まれる要素の出現頻度をcollections.Counterで数え、その結果をPandasのデータフレームに変換しています。from_dict()メソッドによって、collections.Counterオブジェクトを辞書型に変換しています。また、reset_index()メソッドによって、辞書型をデータフレームに変換しています。最後に、rename()メソッドによって、列名を変更しています。

データフレーム化のメリット

データフレームに変換することで、より柔軟にデータを扱うことができます。例えば、データフレームに対して、Pandasの様々なメソッドを適用することができます。

from collections import Counter
import pandas as pd
lst = ['apple', 'banana', 'orange', 'apple', 'orange', 'orange', 'banana']
cnt = Counter(lst)
df = pd.DataFrame.from_dict(cnt, orient='index').reset_index()
df = df.rename(columns={'index':'fruit', 0:'count'})
print(df.sort_values(by='count', ascending=False))

上記のコードでは、データフレームに含まれる要素を、count列の値が大きい順に並び替えています。sort_values()メソッドを使うことで、データフレームを任意の列でソートすることができます。

Pythonでのcollections.Counterの活用例

collections.Counterは、様々な場面で活用することができます。例えば、以下のような場合に使われます。

テキストデータの単語の出現頻度を数える
ログデータのIPアドレスの出現頻度を数える
トランザクションデータの商品の出現頻度を数える

まとめ

collections.Counterは、Pythonの標準ライブラリであるcollectionsモジュールに含まれるクラスであり、リストや文字列などのイテラブルオブジェクトに含まれる要素の出現頻度を数えることができます。また、most_common()メソッドを使うことで、出現回数が多い順に要素を取得することができます。さらに、データフレームに変換することで、Pandasの様々なメソッドを使ってデータを柔軟に扱うことができます。