HDInsight Hadoop – Word Count(2)

티스토리 뷰

Microsft Azure/고급 분석

HDInsight Hadoop – Word Count(2)

정홍주 2015. 8. 14. 08:00

HDInsight Hadoop – Word Count(2)

앞에서 Hadoop Command 창에서 Word Count를 실행해보았습니다. 사실 Hadoop에 연결해야 하는 번거로움이 있기는 합니다. Microsoft Azure의 PowerShell을 통해 원격에서도 Word Count 등의 MapReduce 작업을 실행할 수 있습니다. Azure PowerShell은 http://azure.microsoft.com/ko-kr/ 의 다운로드 링크를 통해 설치할 수 있습니다.

아래는 PowerShell ISE를 통해 진행합니다.

l Word Count

# 변수 선언

$subscriptionName = “Azure subscription name"

$clusterName = "HDInsight cluster name"

# Azure 로그온, 구독 선택

Import-AzurePublishSettingsFile "G:\HDInsight\Azure.publishsettings"

Select-AzureSubscription $subscriptionName

# MapReduce 작업 (한줄로 실행)

$wordCountJobDefinition = New-AzureHDInsightMapReduceJobDefinition

-JarFile "wasb:///example/jars/hadoop-mapreduce-examples.jar"

-ClassName "wordcount"

-Arguments "wasb:///example/data/gutenberg/davinci.txt", "wasb:///example/data/WordCountOutput"

# 작업 제출 (한줄로 실행)

$wordCountJob = Start-AzureHDInsightJob -Cluster $clusterName

-JobDefinition $wordCountJobDefinition | Wait-AzureHDInsightJob -WaitTimeoutInSeconds 3600

# 작업 결과 확인

Get-AzureHDInsightJobOutput -Cluster $clusterName -JobId $wordCountJob.JobId -StandardError

Word Count를 PowerShell 로 hadoop-mapreduce-examples.jar 파일을 통해 davinci.txt 파일을 처리한 결과입니다. Map과 Reduce 작업 결과를 확인할 수 있습니다.

실제 출력 파일이 생성된 것을 관리 포털의 저장소, 컨테이너에서 확인 가능합니다.

l 출력 결과 문자열 확인

출력 결과 파일을 다운로드하여 cat 명령으로 특정 문자를 검색할 수 있습니다. 저장소에 있는 출력 파일을 다운로드하여 결과를 화면에 표시해줍니다.

# 변수 선언

$subscriptionName = “Azure subscription name"

$clusterName = "HDInsight cluster name"

$storageAccountName = "Azure storage account name"

$containerName = "Blob storage container name

# 저장소 계정 정보

$storageAccountKey = Get-AzureStorageKey $storageAccountName | %{ $_.Primary }

$storageContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey

# 작업 결과 다운로드 (한줄로 실행)

Get-AzureStorageBlobContent -Container $ContainerName

-Blob example/data/WordCountOutput/part-r-00000 -Context $storageContext -Force

cat ./example/data/WordCountOutput/part-r-00000 | findstr "there"

아래는 위 PowerShell의 결과입니다.

hadoop-mapreduce-examples.jar 의 Map과 Reduce 함수는 아래 링크를 참고할 수 있습니다.

https://azure.microsoft.com/ko-kr/documentation/articles/hdinsight-develop-deploy-java-mapreduce/

매번 작업에 jar 파일을 생성하기에는 제한적이며 보다 더 쉽게 구성할 수 있도록 Hive, Pig 등을 제공하고 있습니다. 다음 글에서는 Hive를 통해 MapReduce 작업을 살펴 보겠습니다.

'Microsft Azure > 고급 분석' 카테고리의 다른 글

HDInsight Sample – Sensor Data Analysis (0)	2015.09.25
Microsoft Azure HDInsight 쿼리 콘솔 (0)	2015.08.28
HDInsight Hadoop – Word Count(1) (0)	2015.08.07
HDInsight - Hadoop 시작 (0)	2015.08.07
Big Data와 Microsoft Azure HDInsight (0)	2015.07.19

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

글 보관함

redJu(홍주)

티스토리 뷰

HDInsight Hadoop – Word Count(2)

'Microsft Azure > 고급 분석' 카테고리의 다른 글

티스토리툴바