# First post - language detection

This is my first blog post. In this post I would like to talk about language detection.

# Problem Statement

Language detection is very popular problem that comes first on data mining. Suppose we have multiple content posts saved in .csv file, where we need to detect the language.

Let's start to solve this problem by using python3 and library langdetect. This solution is very simple, but at same time good enough for most cases.

First of all we need to add import libraries on python

import pandas as pd
from langdetect import detect

Second step, we need to load data from .csv file.

df = pd.read_csv("input_file.csv")

Suppose, we got schema where one of the fields is 'PostContent' in what we need to detect language. We can check it by calling this line of code:

print (df.columns)

Now, we need to define a safe function that will have fallback logic in case if language can not be detected. Let's use for fallback english language with the code 'en'.

def safe_detect(s):
    try:
        return detect(str(s))
    except:
        return 'en'

Now, we are ready to process the whole column in one line.

df['PostLang'] = df.apply(lambda row: safe_detect(row['PostContent']), axis=1)

So, we have detected language in a new column 'Lang', let's save the result to file.

df.to_csv('output_file.csv')

In this simple example we study how to use 'langdetect' library for language detection on your data. On next posts we will find the way to train and use advanced models for language detection.