Machine Learning Notepad: Hadoop : PIG

What is Apache Pig?

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets ofdata representing them as data flows. Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.

To write data analysis programs, Pig provides a highlevel language known as Pig Latin. This language provides various operators using which programmers can develop their own functions for reading, writing, and processing data.

To analyze data using Apache Pig, programmers need to write scripts using Pig Latin language.

All scripts are internally converted to Map and Reduce tasks. Apache Pig has a component known as Pig Engine that accepts the Pig Latin scripts as input and converts those scripts into MapReduce jobs.

Why Do We Need Apache Pig?

Programmers who are not so good at Java normally used to struggle working with Hadoop, especially while performing any MapReduce tasks. Apache Pig is a boon for all such programmers.

• Using Pig Latin, programmers can perform MapReduce tasks easily without having to type complex codes in Java.

• Apache Pig uses multi-query approach, thereby reducing the length of codes.

For example, an operation that would require you to type 200 lines of code (LoC) in Java can be easily done by typing as less as just 10 LoC in Apache Pig. Ultimately Apache Pig reduces the development time by almost 16 times.

• Pig Latin is SQL-like language and it is easy to learn Apache Pig when you are familiar with SQL.

• Apache Pig provides many built-in operators to support data operations like joins, filters, ordering, etc. In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce.

Features of Pig

Apache Pig comes with the following features:

• Rich set of operators: It provides many operators to perform operations like join, sort, filer, etc.

• Ease of programming: Pig Latin is similar to SQL and it is easy to write a Pig script if you are good at SQL.

• Optimization opportunities: The tasks in Apache Pig optimize their execution automatically, so the programmers need to focus only on semantics of the language.

• Extensibility: Using the existing operators, users can develop their own functions to read, process, and write data.

• UDF’s: Pig provides the facility to create User-defined Functions in other programming languages such as Java and invoke or embed them in Pig Scripts.

• Handles all kinds of data: Apache Pig analyses all kinds of data, both structured as well as unstructured. It stores the results in HDFS.

Apache Pig Vs MapReduce

Listed below are the major differences between Apache Pig and MapReduce.

Apache Pig	MapReduce
Apache Pig is a data flow language.	MapReduce is a data processing paradigm.
It is a high level language.	MapReduce is low level and rigid.
Performing a Join operation in Apache Pig is pretty simple.	It is quite difficult in MapReduce to perform a Join operation between datasets.
Any novice programmer with a basic knowledge of SQL can work conveniently with Apache Pig.	Exposure to Java is must to work with MapReduce.
Apache Pig uses multi-query approach, thereby reducing the length of the codes to a great extent.	MapReduce will require almost 20 times more the number of lines to perform the same task.
There is no need for compilation. On execution, every Apache Pig operator is converted internally into a MapReduce job.	MapReduce jobs have a long compilation process.

Apache Pig Vs SQL

Listed below are the major differences between Apache Pig and SQL.

Pig	SQL
Pig Latin is a procedural language.	SQL is a declarative language.
In Apache Pig, schema is optional. We can store data without designing a schema (values are stored as $01, $02 etc.)	Schema is mandatory in SQL.
The data model in Apache Pig is nested relational.	The data model used in SQL is flat relational.
Apache Pig provides limited opportunity for Query optimization.	There is more opportunity for query optimization in SQL.

In addition to above differences, Apache Pig Latin;

• Allows splits in the pipeline.

• Allows developers to store data anywhere in the pipeline.

• Declares execution plans.

• Provides operators to perform ETL (Extract, Transform, and Load) functions.

Apache Pig Vs Hive

Both Apache Pig and Hive are used to create MapReduce jobs. And in some cases, Hive operates on HDFS in a similar way Apache Pig does. In the following table, we have listed a few significant points that set Apache Pig apart from Hive.

Apache Pig	Hive
Apache Pig uses a language called Pig Latin. It was originally created at Yahoo.	Hive uses a language called HiveQL. It was originally created at Facebook.
Pig Latin is a data flow language.	HiveQL is a query processing language.
Pig Latin is a procedural language and it fits in pipeline paradigm.	HiveQL is a declarative language.
Apache Pig can handle structured, unstructured, and semi-structured data.	Hive is mostly for structured data.

Applications of Apache Pig

Apache Pig is generally used by data scientists for performing tasks involving ad-hoc processing and quick prototyping. Apache Pig is used;

• To process huge data sources such as web logs.

• To perform data processing for search platforms.

• To process time sensitive data loads.

Apache Pig Architecture

The language used to analyze data in Hadoop using Pig is known as Pig Latin. It is a high level data processing language which provides a rich set of data types and operators to perform various operations on the data.

To perform a particular task Programmers using Pig, programmers need to write a Pig script using the Pig Latin language, and execute them using any of the execution mechanisms (Grunt Shell, UDFs, Embedded). After execution, these scripts will go through a series of transformations applied by the Pig Framework, to produce the desired output.

Internally, Apache Pig converts these scripts into a series of MapReduce jobs, and thus, it makes the programmer’s job easy. The architecture of Apache Pig is shown below.

Apache Pig – Components

As shown in the figure, there are various components in the Apache Pig framework. Let us take a look at the major components.

Parser

Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking, and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.

In the DAG, the logical operators of the script are represented as the nodes and the data flows are represented as edges.

Optimizer

The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown.

Compiler

The compiler compiles the optimized logical plan into a series of MapReduce jobs.

Execution engine

Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.

Pig Latin – Data Model

The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such as map and tuple. Given below is the diagrammatical representation of Pig Latin’s data model.

Atom

Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig.

A piece of data or a simple atomic value is known as a field.

Ex: ‘001’ or ‘rajiv’ or ‘Hyderabad’

Tuple

A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS.

Ex: (001, rajiv, hyd)

Bag

A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

Ex: cat emp

ravi,m,10000

rani,f,40000

ram,m,50000

vani,f,60000

mani,m,90000

bags are two types:-

i) Outerbag

ii) Innerbag

Outerbag:-

collection all tuples of a dataset is called outerbag.

outer bag is referenced by "Relation name" simply called as "Alias of Relation"

Relation

A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order).

emp ---> relation

___________________

(ravi,m,10000)

(rani,f,40000)

(ram,m,50000)

(vani,f,60000)

(mani,m,90000)

____________________

Innerbag:-

A bag placed as a field is called inner bag

grp = group emp by sex;

grp

___________________________

group:chararray , emp:bag

________________________________

(f,{(rani,f,40000),(vani,f,60000)})

(m,{(ravi,m,10000),(ram,m,50000),(mani,m,90000)})

{(rani,f,40000),(vani,f,60000)}---> innner bag.

when you group data, you get inner bags.

Pig has two start-up modes:

1. Local mode- pig -x local

2. Hdfs mode- pig -x mapreduce

Pig Latin – Data Model

As discussed in the above, the data model of Pig is fully nested. A Relation is the outermost structure of the Pig Latin data model. And it is a bag where -

• A bag is a collection of tuples.

• A tuple is an ordered set of fields.

• A field is a piece of data.

Pig Latin – Statemets

While processing data using Pig Latin, statements are the basic constructs.

• These statements work with relations. They include expressions and schemas.

• Every statement ends with a semicolon (;).

• We will perform various operations using operators provided by Pig Latin, through statements.

• Except LOAD and STORE, while performing all other operations, Pig Latin statements take a relation as input and produce another relation as output.

As soon as you enter a Load statement in the Grunt shell, its semantic checking will be carried out. To see the contents of the schema, you need to use the Dump operator. Only after performing the dump operation, the MapReduce job for loading the data into the file system will be carried out.

Example

Given below is a Pig Latin statement, which loads data to Apache Pig.

Student_data = LOAD 'student_data.txt' USING PigStorage(',')as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Pig Latin – Data types

Data Type	Description and Example
int	Represents a signed 32-bit integer. Example: 8
long	Represents a signed 64-bit integer. Example: 5L
float	Represents a signed 32-bit floating point. Example: 5.5F
double	Represents a 64-bit floating point. Example: 10.5
chararray	Represents a character array (string) in Unicode UTF-8 format. Example: ‘tutorials point’
Bytearray	Represents a Byte array (blob).
Boolean	Represents a Boolean value. Example: true/ false.
Datetime	Represents a date-time. Example:1970-01-01T00:00:00.000+00:00
Biginteger	Represents a Java BigInteger. Example: 60708090709
Bigdecimal	Represents a Java BigDecimal Example: 185.98376256272893883
Complex Types
Tuple	A tuple is an ordered set of fields. Example: (raja, 30)
Bag	A bag is a collection of tuples. Example: {(raju,30),(Mohhammad,45)}
Map	A Map is a set of key-value pairs. Example:[ ‘name’#’Raju’, ‘age’#30]

Pig Latin – Arithmetic Operators

The following table describes the arithmetic operators of Pig Latin. Suppose a=10 and b=20.

Operator	Description	Example
+	Addition - Adds values on either side of the operator	a + b will give 30
-	Subtraction - Subtracts right hand operand from left hand operand	a - b will give -10
*	Multiplication - Multiplies values on either side of the operator	a * b will give 200
/	Division – Divides left hand operand by right hand operand	b / a will give 2
%	Modulus – Divides left hand operand by right hand operand and returns remainder	b % a will give 0
? :	Bincond – Evaluates the Boolean operators. It has three operands as shown below. variable x = (expression) ? value1 if true : value2 if false.	b = (a == 1)? 20: 30; if a=1 the value of b is 20. if a!=1 the value of b is 30.
CASE WHEN THEN ELSE END	Case - The case operator is equivalent to nested bincond operator.	CASE f2 % 2 WHEN 0 THEN 'even' WHEN 1 THEN 'odd' END

Pig Latin – Comparison Operators

Operator	Description	Example
==	Equal – Checks if the values of two operands are equal or not; if yes, then the condition becomes true.	(a = b) is not true.
!=	Not Equal – Checks if the values of two operands are equal or not. If the values are not equal, then condition becomes true.	(a != b) is true.
>	Greater than – Checks if the value of the left operand is greater than the value of the right operand. If yes, then the condition becomes true.	(a > b) is not true.
<	Less than – Checks if the value of the left operand is less than the value of the right operand. If yes, then the condition becomes true.	(a < b) is true.
>=	Greater than or equal to – Checks if the value of the left operand is greater than or equal to the value of the right operand. If yes, then the condition becomes true.	(a >= b) is not true.
<=	Less than or equal to – Checks if the value of the left operand is less than or equal to the value of the right operand. If yes, then the condition becomes true.	(a <= b) is true.
matches	Pattern matching – Checks whether the string in the left-hand side matches with the constant in the right-hand side.	f1 matches '.tutorial.'

Pig Latin – Relational Operations

The following table describes the relational operators of Pig Latin.

Operator	Description

	Loading and Storing
LOAD	To Load the data from the file system (local/HDFS) into a relation.
STORE	To save a relation to the file system (local/HDFS).
	Filtering
FILTER	To remove unwanted rows from a relation.
DISTINCT	To remove duplicate rows from a relation.
FOREACH… GENERATE:	To generate data transformations based on columns of data.
STREAM	To transform a relation using an external program.
	Grouping and Joining
JOIN	To join two or more relations.
COGROUP	To group the data in two or more relations.
GROUP	To group the data in a single relation.
CROSS	To create the cross product of two or more relations.
	Sorting
ORDER	To arrange a relation in a sorted order based on one or more fields (ascending or descending).
LIMIT	To get a limited number of tuples from a relation.
	Combining and Splitting

UNION	To combine two or more relations into a single relation.
SPLIT	To split a single relation into two or more relations.
	Diagnostic Operators
DUMP	To print the contents of a relation on the console.
DESCRIBE	To describe the schema of a relation.
EXPLAIN	To view the logical, physical, or MapReduce execution plans to compute a relation.
ILLUSTRATE	To view the step-by-step execution of a series of statements.

The Load Operator

You can load data into Apache Pig from the file system (HDFS/ Local) using LOAD operator of Pig Latin.

Syntax

The load statement consists of two parts divided by the “=” operator. On the left-hand side, we need to mention the name of the relation where we want to store the data, and on the right-hand side, we have to define how we store the data. Given below is the syntax of the Load operator.

Ex: cat student_data.txt

001,Rajiv,Reddy,9848022337,Hyderabad

002,siddarth,Battacharya,9848022338,Kolkata

003,Rajesh,Khanna,9848022339,Delhi

004,Preethi,Agarwal,9848022330,Pune

005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar 006,Archana,Mishra,9848022335,Chennai

Relation_name = LOAD 'Input file path' USING function as schema;

Schema: (column1 : data type, column2 : data type, column3 : data type);

1. PigStorage()—TextInputFormat

2. BinStoarge()—SequenceInputFormat(BinaryFiles)

Default storage method is PigStorage(). Default delimiter is ‘\t’.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',')as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

grunt>emp = load ‘emp’ using PigStorage (‘,’) as (ecode:int, ename:chararray, esal:int, sex:chararray, dno:int);

Store operator

This chapter explains how to store data in Apache Pig using the Store operator.

Syntax

STORE Relation_name INTO ' required_directory_path ' [USING function];

Ex: cat student_data.txt

001,Rajiv,Reddy,9848022337,Hyderabad

002,siddarth,Battacharya,9848022338,Kolkata

003,Rajesh,Khanna,9848022339,Delhi

004,Preethi,Agarwal,9848022330,Pune

005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar 006,Archana,Mishra,9848022335,Chennai.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

let us store the relation in the HDFS directory “pig_Output” as shown below.

grunt> STORE student INTO 'pig_Output/' USING PigStorage (',');

Output

After executing the store statement, you will get the following output. A directory is created with the specified name and the data will be stored in it.

--------------------------------------------------------------------------------------------------

hdfs dfs -ls 'pig_Output/' Found 2 items

rw-r--r- 1 Hadoop supergroup 0 2015-10-05 13:03 pig_Output/_SUCCESS

rw-r--r- 1 Hadoop supergroup 224 2015-10-05 13:03 pig_Output/part-m-00000

You can observe that two files were created after executing the store statement.

Using cat command, list the contents of the file named part-m-00000 as shown below.

$ hdfs dfs -cat 'pig_Output/part-m-00000'

1,Rajiv,Reddy,9848022337,Hyderabad

2,siddarth,Battacharya,9848022338,Kolkata

3,Rajesh,Khanna,9848022339,Delhi

4,Preethi,Agarwal,9848022330,Pune

5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar

6,Archana,Mishra,9848022335,Chennai

Diagnostic Operators

Diagnostic Operators. Pig Latin provides four different types of diagnostic operators:

• Dump operator

• Describe operator

• Explanation operator

• Illustration operator

Dump Operator

The Dump operator is used to run the Pig Latin statements and display the results on the screen. It is generally used for debugging Purpose.

Syntax

grunt>Dump Relation_Name

Example:

we have a file student_data.txt in HDFS with the following content.

1,Rajiv,Reddy,9848022337,Hyderabad

2,siddarth,Battacharya,9848022338,Kolkata

3,Rajesh,Khanna,9848022339,Delhi

4,Preethi,Agarwal,9848022330,Pune

5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar

6,Archana,Mishra,9848022335,Chennai.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Output

Once you execute the above Pig Latin statement, it will start a MapReduce job to read data from HDFS. It will produce the following output. Output will be terminal.

grunt>Dump student

1,Rajiv,Reddy,9848022337,Hyderabad

2,siddarth,Battacharya,9848022338,Kolkata

3,Rajesh,Khanna,9848022339,Delhi

4,Preethi,Agarwal,9848022330,Pune

5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar

6,Archana,Mishra,9848022335,Chennai.

Describe Operator

The describe operator is used to view the schema of a relation.

Syntax:

grunt>describe Relation_Name

grunt>describe student;

Output

Once you execute the above Pig Latin statement, it will produce the following output.

grunt> student: { id: int,firstname: chararray,lastname: chararray,phone:

chararray,city: chararray }

Explain Operator

The explain operator is used to display the logical, physical, and MapReduce execution plans of a relation.

Syntax

Given below is the syntax of the explain operator.

grunt> explain Relation_name;

Example

Assume we have a file student_data.txt in HDFS.

grunt> explain student;

Illustrate operator

The illustrate operator gives you the step-by-step execution of a sequence of statements.

Syntax:

grunt> illustrate Relation_name;

Example

Assume we have a file student_data.txt in HDFS.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

grunt> illustrate student;

Output

On executing the above statement, you will get the following output.

grunt> illustrate student;

INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$M ap - Aliases being processed per job phase (AliasName[line,offset]): M: student[1,10] C: R:

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

Group Operator

The group operator is used to group the data in one or more relations. It collects the data having the same key.

Syntax

Given below is the syntax of the group operator.

Group_data = GROUP Relation_name BY age;

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

1,Rajiv,Reddy,21,9848022337,Hyderabad

2,siddarth,Battacharya,22,9848022338,Kolkata

3,Rajesh,Khanna,22,9848022339,Delhi

4,Preethi,Agarwal,21,9848022330,Pune

5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

6,Archana,Mishra,23,9848022335,Chennai

7,Komal,Nayak,24,9848022334,trivendram

8,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Apache Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);

grunt> group_data = GROUP student_details by age;

grunt> Dump group_data;

Output

Then you will get output displaying the contents of the relation named groyp_data as shown below. Here you can observe that the resulting schema has two columns –

• One is age, by which we have grouped the relation.

• The other is a bag, which contains the group of tuples, student records with the respective age.

(21,{(4,Preethi,Agarwal,21,9848022330,Pune),(1,Rajiv,Reddy,21,9848022337,Hydera bad)})

(22,{(3,Rajesh,Khanna,22,9848022339,Delhi),(2,siddarth,Battacharya,22,984802233 8,Kolkata)})

(23,{(6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336

,Bhuwaneshwar)})

(24,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334, trivendram)})

You can see the schema of the table after grouping the data using the describe command as shown below.

grunt> Describe group_data;

group_data:{group:int,student_details:

{(id:int,firstname:chararray,lastname:chararray,age:int,phone:chararray,city:chararray)}}

In the same way, you can get the sample illustration of the schema using the illustrate command as shown below.

$ Illustrate group_data;

It will produce the following output:

-------------------------------------------------------------------------------

|group_data|group:int||student_details:bag{:tuple(id:int,firstname:chararray,lastname:chararray,age:i nt,phone:chararray,city:chararray)}|

| | 21| { 4, Preethi, Agarwal, 21, 9848022330, Pune), (1, Rajiv, Reddy, 21, 9848022337, Hyderabad)}|

| | 22 | {(2,siddarth,Battacharya,22,9848022338,Kolkata),

(003,Rajesh,Khanna,22,9848022339,Delhi)}|

-------------------------------------------------------------------------------

Grouping by Multiple Columns

Let us group the relation by age and city as shown below.

grunt> group_multiple = GROUP student_details by (age, city);

You can verify the content of the schema named group_multiple using the Dump operator as shown below.

grunt> Dump group_multiple;

((21,Pune),{(4,Preethi,Agarwal,21,9848022330,Pune)})

((21,Hyderabad),{(1,Rajiv,Reddy,21,9848022337,Hyderabad)})

((22,Delhi),{(3,Rajesh,Khanna,22,9848022339,Delhi)})

((22,Kolkata),{(2,siddarth,Battacharya,22,9848022338,Kolkata)})

((23,Chennai),{(6,Archana,Mishra,23,9848022335,Chennai)})

((23,Bhuwaneshwar),{(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)})

((24,Chennai),{(8,Bharathi,Nambiayar,24,9848022333,Chennai)}) ((24,trivendram),{(7,Komal,Nayak,24,9848022334,trivendram)})

Group All

You can group a relation by all the columns as shown below.

grunt> group_all = GROUP student_details All;

Now, verify the content of the schema group_all as shown below.

grunt> Dump group_all;

(all,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334 ,trivendram),

(6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336,Bhuw aneshwar),

(4,Preethi,Agarwal,21,9848022330,Pune),(3,Rajesh,Khanna,22,9848022339,Delhi),

(2,siddarth,Battacharya,22,9848022338,Kolkata),(1,Rajiv,Reddy,21,9848022337,Hyd erabad)})

Cogroup Operator

Cogrop is used for to group Two or more relations

Assume that we have two files namely student_details.txt and employee_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

employee_details.txt

001,Robin,22,newyork

002,BOB,23,Kolkata

003,Maya,23,Tokyo

004,Sara,25,London

005,David,23,Bhuwaneshwar

006,Maggy,22,Chennai

And we have loaded these files into Pig with the schema names student_details and employee_details respectively, as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);

employee_details = LOAD 'hdfs://localhost:9000/pig_data/employee_details.txt' USING PigStorage(',')as (id:int, name:chararray, age:int, city:chararray); Now, let us group the records/tuples of the relations student_details and employee_details with the key age, as shown below.

grunt> cogroup_data = COGROUP student_details by age, employee_details by age;

Output :

Dump cogroup_data;

21,{(4,Preethi,Agarwal,21,9848022330,Pune),

(1,Rajiv,Reddy,21,9848022337,Hyderabad)},

{ })

(22,{ (3,Rajesh,Khanna,22,9848022339,Delhi),

(2,siddarth,Battacharya,22,9848022338,Kolkata) },

{ (6,Maggy,22,Chennai),(1,Robin,22,newyork) })

(23,{(6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336 ,Bhuwaneshwar)},

{(5,David,23,Bhuwaneshwar),(3,Maya,23,Tokyo),(2,BOB,23,Kolkata)})

(24,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334, trivendram)},

{ })

(25,{ },

{(4,Sara,25,London)})

The cogroup operator groups the tuples from each schema according to age where each group depicts a particular age value.

For example, if we consider the 1^st tuple of the result, it is grouped by age 21. And it contains two bags –

• the first bag holds all the tuples from the first schema (student_details in this case) having age 21, and

• the second bag contains all the tuples from the second schema

(employee_details in this case) having age 21.

Join Operator

The join operator is used to combine records from two or more relations. While performing a join operation, we declare one (or a group of) tuple(s) from each relation, as keys. When these keys match, the two particular tuples are matched, else the records are dropped. Joins can be of the following types:

• Inner-join

• Outer-join : left join, right join, and full join

customers.txt

1,Ramesh,32,Ahmedabad,2000.00

2,Khilan,25,Delhi,1500.00

3,kaushik,23,Kota,2000.00

4,Chaitali,25,Mumbai,6500.00

5,Hardik,27,Bhopal,8500.00 6,Komal,22,MP,4500.00

7,Muffy,24,Indore,10000.00

orders.txt

102,2009-10-08 00:00:00,3,3000

100,2009-10-08 00:00:00,3,1500

101,2009-11-20 00:00:00,2,1560

103,2008-05-20 00:00:00,4,2060

Load these two files into Pig with the schemas customers and orders..

Inner Join

An inner join returns rows when there is a match in both tables.

Syntax

Here is the syntax of performing inner join operation using the JOIN operator.

Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key;

Example

Let us perform inner join operation on the two relations customers and orders as shown below.

grunt> coustomer_orders = JOIN customers BY id, orders BY customer_id;

Output:

Verify the relation coustomer_orders using the DUMP operator as shown below.

Dump coustomer_orders;

You will get the following output that will the contents of the relation named coustomer_orders.

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)

(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)

(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)

(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

Outer Join

An outer join operation is carried out in three ways –

• Left outer join

• Right outer join

• Full outer join

Left Outer Join

The left outer Join operation returns all rows from the left table, even if there are no matches in the right relation.

Syntax

Given below is the syntax of performing left outer join operation using the JOIN operator.

Relation3_name = JOIN Relation1_name BY id LEFT OUTER, Relation2_name BY customer_id;

Example

Let us perform left outer join operation on the two relations customers and orders as shown below.

grunt> outer_left = JOIN customers BY id LEFT OUTER, orders BY customer_id;

Output

Verify the relation outer_left using the DUMP operator as shown below.

Dump outer_left;

It will produce the following output, displaying the contents of the relation outer_left.

(1,Ramesh,32,Ahmedabad,2000,,,,)

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)

(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)

(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)

(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

(5,Hardik,27,Bhopal,8500,,,,)

(6,Komal,22,MP,4500,,,,)

(7,Muffy,24,Indore,10000,,,,)

Right Outer Join

The right outer join operation returns all rows from the right table, even if there are no matches in the left table.

Syntax

Given below is the syntax of performing right outer join operation using the JOIN operator.

grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;

Example

Let us perform right outer join operation on the two relations customers and orders as shown below.

grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id; outer_right using the DUMP operator as shown below.

grunt> Dump outer_right;

Output

It will produce the following output, displaying the contents of the relation outer_right.

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)

(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)

(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)

(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

Full Outer Join

The full outer join operation returns rows when there is a match in one of the relations.

Syntax

Given below is the syntax of performing full outer join using the JOIN operator.

grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;

Example

Let us perform full outer join operation on the two relations customers and orders as shown below.

grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;

Output

Verify the relation outer_full using the DUMP operator as shown below.

grunt> Dump outer_full;

It will produce the following output, displaying the contents of the relation outer_full.

(1,Ramesh,32,Ahmedabad,2000,,,,)

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)

(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)

(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)

(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

(5,Hardik,27,Bhopal,8500,,,,)

(6,Komal,22,MP,4500,,,,)

(7,Muffy,24,Indore,10000,,,,)

Cross Operator

The cross operator computes the cross-product of two or more relations. This chapter explains with example how to use the cross operator in Pig Latin.

Syntax

Given below is the syntax of the Cross operator.

Relation3_name = CROSS Relation1_name, Relation2_name;

Example

Assume that we have two files namely customers.txt and orders.txt in the /pig_data/ directory of HDFS as shown below. customers.txt

1,Ramesh,32,Ahmedabad,2000.00

2,Khilan,25,Delhi,1500.00

3,kaushik,23,Kota,2000.00

4,Chaitali,25,Mumbai,6500.00

5,Hardik,27,Bhopal,8500.00 6,Komal,22,MP,4500.00

7,Muffy,24,Indore,10000.00

orders.txt

102,2009-10-08 00:00:00,3,3000

100,2009-10-08 00:00:00,3,1500

101,2009-11-20 00:00:00,2,1560

103,2008-05-20 00:00:00,4,2060

And we have loaded these two files into Pig with the schemas customers and orders as shown below.

customers = LOAD 'pig_data/customers.txt' USING PigStorage(',')as (id:int, name:chararray, age:int, address:chararray, salary:int);

orders = LOAD 'pig_data/orders.txt' USING

PigStorage(',')as (oid:int, date:chararray, customer_id:int, amount:int);

Let us now get the cross-product of these two schemas using the cross operator on these two schemas as shown below.

cross_data = CROSS customers, orders;

Output

It will produce the following output, displaying the contents of the relation cross_data.

(7,Muffy,24,Indore,10000,103,2008-05-20 00:00:00,4,2060)

(7,Muffy,24,Indore,10000,101,2009-11-20 00:00:00,2,1560)

(7,Muffy,24,Indore,10000,100,2009-10-08 00:00:00,3,1500)

(7,Muffy,24,Indore,10000,102,2009-10-08 00:00:00,3,3000)

(6,Komal,22,MP,4500,103,2008-05-20 00:00:00,4,2060)

(6,Komal,22,MP,4500,101,2009-11-20 00:00:00,2,1560)

(6,Komal,22,MP,4500,100,2009-10-08 00:00:00,3,1500)

(6,Komal,22,MP,4500,102,2009-10-08 00:00:00,3,3000)

(5,Hardik,27,Bhopal,8500,103,2008-05-20 00:00:00,4,2060)

(5,Hardik,27,Bhopal,8500,101,2009-11-20 00:00:00,2,1560)

(5,Hardik,27,Bhopal,8500,100,2009-10-08 00:00:00,3,1500)

(5,Hardik,27,Bhopal,8500,102,2009-10-08 00:00:00,3,3000)

(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

(4,Chaitali,25,Mumbai,6500,101,2009-20 00:00:00,4,2060)

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)

(2,Khilan,25,Delhi,1500,100,2009-10-08 00:00:00,3,1500)

(2,Khilan,25,Delhi,1500,102,2009-10-08 00:00:00,3,3000)

(1,Ramesh,32,Ahmedabad,2000,103,2008-05-20 00:00:00,4,2060)

(1,Ramesh,32,Ahmedabad,2000,101,2009-11-20 00:00:00,2,1560)

(1,Ramesh,32,Ahmedabad,2000,100,2009-10-08 00:00:00,3,1500)

(1,Ramesh,32,Ahmedabad,2000,102,2009-10-08 00:00:00,3,3000)

(4,Chaitali,25,Mumbai,6500,100,2009-10-08 00:00:00,3,1500)

(4,Chaitali,25,Mumbai,6500,102,2009-10-08 00:00:00,3,3000)

(3,kaushik,23,Kota,2000,103,2008-05-20 00:00:00,4,2060)

(3,kaushik,23,Kota,2000,101,2009-11-20 00:00:00,2,1560)

(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)

(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)

(2,Khilan,25,Delhi,1500,103,2008-05-20 00:00:00,4,2060)

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)

(2,Khilan,25,Delhi,1500,100,2009-10-08 00:00:00,3,1500) (2,Khilan,25,Delhi,1500,102,2009-10-08 00:00:00,3,3000)

(1,Ramesh,32,Ahmedabad,2000,103,2008-05-20 00:00:00,4,2060)

(1,Ramesh,32,Ahmedabad,2000,101,2009-11-20 00:00:00,2,1560)

(1,Ramesh,32,Ahmedabad,2000,100,2009-10-08 00:00:00,3,1500)

(1,Ramesh,32,Ahmedabad,2000,102,2009-10-08 00:00:00,3,3000)

Union Operator

The UNION operator of Pig Latin is used to merge the content of two relations. To perform UNION operation on two relations, their columns and domains must be identical

Syntax

Given below is the syntax of the UNION operator.

grunt> Relation_name3 = UNION Relation_name1, Relation_name2;

Example

Assume that we have two files namely student_data1.txt and student_data2.txt in the /pig_data/ directory of HDFS as shown below.

Student_data1.txt

001,Rajiv,Reddy,9848022337,Hyderabad

002,siddarth,Battacharya,9848022338,Kolkata

003,Rajesh,Khanna,9848022339,Delhi

004,Preethi,Agarwal,9848022330,Pune

005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar 006,Archana,Mishra,9848022335,Chennai.

Student_data2.txt

7,Komal,Nayak,9848022334,trivendram.

8,Bharathi,Nambiayar,9848022333,Chennai.

And we have loaded these two files into Pig with the schemas student1 and student2 as shown below.

student1 = LOAD 'hdfs://localhost:9000/pig_data/student_data1.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

student2 = LOAD 'hdfs://localhost:9000/pig_data/student_data2.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

Let us now merge the contents of these two relations using the UNION operator as shown below.

student = UNION student1, student2;

Output

Verify the relation student using the DUMP operator as shown below.

Dump student;

It will display the following output, displaying the contents of the relation student.

(1,Rajiv,Reddy,9848022337,Hyderabad)

(2,siddarth,Battacharya,9848022338,Kolkata)

(3,Rajesh,Khanna,9848022339,Delhi)

(4,Preethi,Agarwal,9848022330,Pune)

(5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar)

(6,Archana,Mishra,9848022335,Chennai)

(7,Komal,Nayak,9848022334,trivendram)

(8,Bharathi,Nambiayar,9848022333,Chennai)

Split Operator

The Split operator is used to split a relation into two or more relations.

Syntax

Given below is the syntax of the SPLIT operator.

grunt> SPLIT Relation1_name INTO Relation2_name IF (condition1), Relation2_name (condition2),

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);

Let us now split the relation into two, one listing the employees of age less than 23, and the other listing the employees having the age between 22 and 25.

SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age<25);

Output

Verify the relations student_details1 and student_details2 using the DUMP operator as shown below.

Dump student_details1;

Dump student_details2;

It will produce the following output, displaying the contents of the relations student_details1 and student_details2 respectively.

Dump student_details1;

(1,Rajiv,Reddy,21,9848022337,Hyderabad)

(2,siddarth,Battacharya,22,9848022338,Kolkata)

(3,Rajesh,Khanna,22,9848022339,Delhi)

(4,Preethi,Agarwal,21,9848022330,Pune)

Dump student_details2;

(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)

(6,Archana,Mishra,23,9848022335,Chennai)

(7,Komal,Nayak,24,9848022334,trivendram) (8,Bharathi,Nambiayar,24,9848022333,Chennai)

Filter Operator

The filter operator is used to select the required tuples from a relation based on a condition.

Syntax

Given below is the syntax of the FILTER operator.

grunt> Relation2_name = FILTER Relation1_name BY (condition);

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);

Let us now use the Filter operator to get the details of the students who belong to the city Chennai.

filter_data = FILTER student_details BY city == 'Chennai';

Output

Verify the relation filter_data using the DUMP operator as shown below.

Dump filter_data;

It will produce the following filter_data as follows.

(6,Archana,Mishra,23,9848022335,Chennai)

(8,Bharathi,Nambiayar,24,9848022333,Chennai)

Distinct operator

The Distinct operator is used to remove redundant (duplicate) tuples from a relation.

Syntax

Given below is the syntax of the DISTINCT operator.

grunt> Relation_name2 = DISTINCT Relatin_name1;

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

001,Rajiv,Reddy,9848022337,Hyderabad

002,siddarth,Battacharya,9848022338,Kolkata

003,Rajesh,Khanna,9848022339,Delhi

004,Preethi,Agarwal,9848022330,Pune

005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar

006,Archana,Mishra,9848022335,Chennai

And we have loaded this file into Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

Let us now remove the redundant (duplicate) tuples from the relation named student_details using the DISTINCT operator, and store it as another relation named data as shown below.

distinct_data = DISTINCT student_details;

OUTPUT

Verify the relation distinct_data using the DUMP operator as shown below.

Dump distinct_data;

It will produce the following distinct_data as follows.

(1,Rajiv,Reddy,9848022337,Hyderabad)

(2,siddarth,Battacharya,9848022338,Kolkata)

(3,Rajesh,Khanna,9848022339,Delhi)

(4,Preethi,Agarwal,9848022330,Pune)

(5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar)

(6,Archana,Mishra,9848022335,Chennai)

Foreach operator

The FOREACH operator is used to generate specified data transformations based on the column data.

Syntax

Given below is the syntax of foreach operator.

grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);

Let us now get the id, age, and city values of each student from the relation student_details and store it into another relation named data using the foreach operator as shown below.

foreach_data = FOREACH student_details GENERATE id,age,city;

Out Put:

Verify the relation foreach_data using the DUMP operator as shown below.

Dump foreach_data;

(1,21,Hyderabad)

(2,22,Kolkata)

(3,22,Delhi)

(4,21,Pune)

(5,23,Bhuwaneshwar)

(6,23,Chennai)

(7,24,trivendram)

(8,24,Chennai)

Order By Operator

The ORDER BY operator is used to display the contents of a relation in a sorted order based on one or more fields.

Syntax

Given below is the syntax of the ORDER BY operator.

grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);

Let us now sort the relation in a descending order based on the age of the student and store it into another relation named data using the ORDER BY operator as shown below.

order_by_data = ORDER student_details BY age DESC;

Output

Verify the relation order_by_data using the DUMP operator as shown below.

Dump order_by_data;

It will produce the following output, displaying the contents of the relation order_by_data.

(8,Bharathi,Nambiayar,24,9848022333,Chennai)

(7,Komal,Nayak,24,9848022334,trivendram)

(6,Archana,Mishra,23,9848022335,Chennai)

(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)

(3,Rajesh,Khanna,22,9848022339,Delhi)

(2,siddarth,Battacharya,22,9848022338,Kolkata)

(4,Preethi,Agarwal,21,9848022330,Pune)

(1,Rajiv,Reddy,21,9848022337,Hyderabad)

Limit Operator

The LIMIT operator is used to get a limited number of tuples from a relation.

Syntax

Given below is the syntax of the LIMIT operator.

grunt> Result = LIMIT Relation_name required number of tuples;

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below. student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

And we have loaded this file into Pig with the schema name student_details as shown below.

student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);

Now, let’s sort the relation in descending order based on the age of the student and store it into another relation named limit_data using the ORDER BY operator as shown below.

limit_data = LIMIT student_details 4;

Output

Verify the relation limit_data using the DUMP operator as shown below.

Dump limit_data;

It will produce the following output, displaying the contents of the relation limit_data as follows.

(1,Rajiv,Reddy,21,9848022337,Hyderabad)

(2,siddarth,Battacharya,22,9848022338,Kolkata)

(3,Rajesh,Khanna,22,9848022339,Delhi)

(4,Preethi,Agarwal,21,9848022330,Pune)

Friday, May 13, 2016

Hadoop : PIG

Features of Pig

Apache Pig Vs MapReduce

Apache Pig Vs SQL

Apache Pig Vs Hive

Applications of Apache Pig

Apache Pig – Components

Parser

Optimizer

Execution engine

Pig Latin – Data Model

Atom

Tuple

Bag

Relation

Pig Latin – Statemets

Example

Pig Latin – Data types

Pig Latin – Arithmetic Operators

Pig Latin – Comparison Operators

Pig Latin – Relational Operations

The Load Operator

Syntax

Syntax

Output

rw-r--r- 1 Hadoop supergroup 224 2015-10-05 13:03 pig_Output/part-m-00000

Dump Operator

Syntax

Output

Output

Syntax

Example

Syntax

Example

Output

Grouping by Multiple Columns

Group All

Output :

Inner Join

Syntax

Example

Output:

Outer Join

Left Outer Join

Syntax

Example

Output

Right Outer Join

Syntax

Example

Output

Full Outer Join

Syntax

Example

Syntax

Example

Output

Syntax

Example

Output

Syntax

Example

Output

Syntax

Example

Output

Syntax

Example

OUTPUT

Syntax

Example

Out Put:

Syntax

Example

Output

Syntax

Example

Output

No comments: