Interview Pig

PIG Interview Questions and Answers

[vc_row][vc_column][vc_column_text]This is the first section of our Interview series where we will be sharing different Hadoop interview questions and answers.

PIG interview questions and answers are the very first section of this series and we will be taking you through different questions being asked on PIG in Hadoop interviews.

PIG Interview Questions and Answers

This PIG interview questions and answers series has been finalized based on the input provided by various candidates in different Big Data interviews.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_message]Note: You can also help us by sharing the questions you think can be asked or those which you have faced in comment. You can also suggest the answers of questions and we will include that in this PIG interview questions and answers.[/vc_message][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Also, we have tried to make the answers precise and short so that you can get the message what we want to convey and explain well. So, let’s get started.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_custom_heading text=”PIG Interview Questions and Answers” font_container=”tag:h2|text_align:left|color:%230d8491″ google_fonts=”font_family:Cabin%3Aregular%2Citalic%2C500%2C500italic%2C600%2C600italic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal”][/vc_column][/vc_row][vc_row][vc_column][vc_separator style=”double” border_width=”2″][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]Here are the list of questions those are asked in Hadoop interviews as a part of PIG. We will keep on updating this post frequently and so if you are preparing for an interview, you should definitely keep on checking these pig questions and answers regularly. You can also subscribe us to get notified.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”1. What is PIG?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” css_animation=”top-to-bottom” use_custom_fonts_h2=”true” use_custom_fonts_h4=”true” i_on_border=”true”]PIG is one of the top-level projects of Apache software foundation which is a part of Hadoop ecosystems and provides the engine for data flow in parallel in Hadoop.

It has a language called, PIG Latin which is used to express the data flow which works on the top of MapReduce. PIG was initially developed by Yahoo which later got donated to Apache.

The ability of procedural extension of Pig language makes it highly recommendable for ETL (Extract Transform Load). Pig can also be used as an Ad-Hoc data analysis.

PIG was developed based on one philosophy and that is Pigs can eat anything, live anywhere, can be easily controlled and modified by the user.[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”3. What is BloomMapFile?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]BloomMapFile is a class that extends the MapFile class. It is used in HBase table format to provide quick membership test for the keys using dynamic bloom filters.[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”2. When we write a= load …, what does ‘a’ called?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Here ‘a’ is called as Relation in PIG[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”4. What are the complex data types in PIG?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true” i_on_border=”true”]Map, Tuple, and Bag are the three complex data types in PIG and those are as below-

Map: It is a collection of data element where elements have PIG data type. Usually, it is an unstructured data type.

Tuple: Tuple is a collection of Map, called Field. A tuple can have multiple fields and can be of different data types.

Bag: It is a collection of Tuples. It holds the entire tuple and map data and represented in {}.

For example, let’s consider this- {(‘Noida’, ‘201301’), ([‘area’ ’#’ ‘Sec 15’, ‘PIN’#201301])}

Here in PIG complex data types, it will be like- {is bag, (is tuple, [is Map[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”5. What are the differences between PIG and MapReduce?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Below are some of the basic differences between PIG and MapReduce-

PIG MapReduce
Apache Pig is a dataflow language. MapReduce is a data processing paradigm.
It is a high-level language. MapReduce is low level and rigid.
Performing a Join operation in Apache Pig is pretty simple. It is quite difficult in MapReduce to perform a Join operation between datasets.
Any novice programmer with a basic knowledge of SQL can work conveniently with Apache Pig. Exposure to Java is a must to work with MapReduce.
Apache Pig uses multi-query approach, thereby reducing the length of the codes to a great extent. MapReduce will require almost 20 times more the number of lines to perform the same task.
There is no need for compilation. On execution, every Apache Pig operator is converted internally into a MapReduce job. MapReduce jobs have a long compilation process.
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”6. What are the differences between PIG and SQL” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Here are some of the major differences between PIG and SQL-

Pig

SQL

Pig Latin is a procedural language. SQL is a declarative language.
In Apache Pig, the schema is optional. We can store data without designing a schema (values are stored as $01, $02 etc.) Schema is mandatory in SQL.
The data model in Apache Pig is nested relational. The data model used in SQL is flat relational.
Apache Pig provides limited opportunity for Query optimization. There is more opportunity for query optimization in SQL.
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”7. What are the differences between PIG and HIVE” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]This is the very common questions being asked in many interviews. Here are some of the point to point differences between PIG and Hive.

Pig

Hive

Apache Pig uses a language called Pig Latin. It was originally created at Yahoo. Hive uses a language called HiveQL. It was originally created on Facebook.
Pig Latin is a data flow language. HiveQL is a query processing language.
Pig Latin is a procedural language and it fits in pipeline paradigm. HiveQL is a declarative language.
Apache Pig can handle structured, unstructured, and semi-structured data. Hive is mostly for structured data.
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”8. How does PIG work?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Every time you write a pig script and run, it gets transformed into MapReduce program and runs above HDFS.[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”9. What are the different EVAL functions available in PIG?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Below are some of the EVAL functions available in PIG-

AVG
CONCAT
MAX
MIN
SUM
SIZE
COUNT
COUNT_STAR
DIFF
TOKENIZE
IsEmpty
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”10. What are different String functions available in PIG?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Below are some of the important PIG STRING functions available-

UPPER
LOWER
TRIM
SUBSTRING
INDEXOF
STRSPLIT
LAST_INDEX_OF
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”11. What is the use of foreach operation in Pig scripts?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick-outline” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Foreach is used to apply a transformation to each element in Data Bag. It will further generate new data items.

Eg. A = LOAD ‘data’ AS (f1,f2,f3);

B= Foreach A Generate F1+5;[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”12. What is Flatten and what it do in PIG?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Sometimes data are in Bag and Tuple and if we want to remove the level of nesting from the data, Flatten is being used. It is a modifier similar to UDF (but powerful than UDF) which un-nest the Bag and Tuple.[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”13. What are the different modes in which PIG can run and explain those?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]There are couple of modes in which PIG can run and those are as below-

Local Mode: Runs on the Local file system and doesn’t even need Hadoop to be installed.
MapReduce Mode: Runs on Hadoop cluster. It is necessary to start Hadoop and both script and data will be stored in HDFS.
[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”14. What are the debugging tools used for Apache Pig scripts?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true” i_on_border=”true”]There are mainly three ways to debug a PIG script-

Describe: Review the schema. You can view the schema with describe
Eg. grunt> student = LOAD ‘hdfs://localhost:9000/pig_data/student_data.txt’ USING PigStorage(‘,’) as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Now run, describe on Student-

grunt> describe student;

and you will get the below output-

grunt> student: { id: int,firstname: chararray,lastname: chararray,phone: chararray,city: chararray }

Explain: Logical, physical and MapReduce execution plans
Illustrate: Step by Step execution of each step
If you will do, Illustrate student, you will have the output like below-

grunt> illustrate student;

INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$M ap – Aliasesbeing processed per job phase (AliasName[line,offset]): M: student[1,10] C: R:———————————————————————————————|student | id:int | firstname:chararray | lastname:chararray | phone:chararray | city:chararray |——————————————————————————————— | | 002 | siddarth | Battacharya | 9848022338 | Kolkata |———————————————————————————————

[/vc_cta][/vc_column][/vc_row][vc_row][vc_column][vc_cta h2=”15. What are the different execution modes available in PIG?” h2_font_container=”color:%23dd3333″ h2_google_fonts=”font_family:Arvo%3Aregular%2Citalic%2C700%2C700italic|font_style:400%20regular%3A400%3Anormal” txt_align=”justify” add_icon=”left” i_type=”typicons” i_icon_typicons=”typcn typcn-tick” i_color=”juicy_pink” use_custom_fonts_h2=”true”]Below are the three different execution modes available in PIG-

Interactive Mode (Also known as Grunt Mode)
Note: Pig interactive shell is known as Grunt Shell. It provides a shell for users to interact with HDFS.

Batch Mode
Embedded Mode
[/vc_cta][/vc_column][/vc_row]

2 Comments

  • Hi my name is Natalie Murray and I just wanted to drop you a quick note here instead of mailing you.

    I liked your site a lot. Could you please reply with the possible advertisement options.

Leave a Comment