Example HiveRC File to Configure Hive Preferences

Anyone who regularly works with query languages invariably develops personal preferences. For T-SQL, it may be a preference for which metadata is included with the results of a query, how quoted identifiers are handled, or what the default behavior should be for transactions. These types of settings can typically be configured at a session level, and Hive is no exception in allowing this. In fact, Hive provides users with an impressive number of configurable session properties. Honestly, you’ll probably never need to change the majority of these settings, and if/when you do, it’ll most likely apply to a specific Hive script (i.e. to improve performance). However, there are a handful of Hive settings that you may wish to always enable if they’re not already defaulted server-wide, such as displaying column headers. One option is to set these manually at the start of each session, using the SET command. But this can quickly get tedious if you have more than 1 or 2 settings to change. A better option in those scenarios, and the topic of this blog post, is to use a HiveRC file to configure your personal preferences for Hive’s default behavior.

For those of you not familiar with the concept, Linux commonly uses RC files — which I believe stands for “runtime configuration,” but don’t quote me on that 🙂 — for defining preferences, and various applications support these, typically in the format of .<app>rc. These will usually live in a user’s home directory, and some examples include .bashrc, .pythonrc, and .hiverc.

Now that we have a little context, let’s walk through how to create your personal .hiverc file. Note that all of these steps take place on the same server you use for connecting to Hive.

 

Now, from inside Vim, do the following:

You should be back at your bash prompt. Now run these commands to verify everything is working as expected.

That’s all there is to it! Not too hard, huh? But to make things even easier, I’ve posted an example of my personal HiveRC file on my Hadoopsie GitHub repo.

hiverc

That’s all for now, you awesome nerds. 🙂

Read More

My Mac Setup for Hadoop Development

I’ve recently decided to switch to a Mac. Having been such a proponent of all-things-Microsoft in the past, and having invested so much time in my dev skills using a PC, this was a pretty huge move for me. In fact, it took me a very long time to make the decision. But the more time I spent trying to figure out how to do Hadoop dev better, and faster, the more clear it became to me that switching to a Mac would help with these things. After only a few weeks, I’ve already found that many of the things that were very painful on a PC are exceedingly easy on a Mac, such as installing Hadoop locally.

Now, this post isn’t to convince you to switch to a Mac. A person’s OS preference is very personal, and as such, discussions can get almost as heated as religious and political discussions. 🙂 However, for those who are already considering switching to a Mac from a PC, I thought it’d be helpful to outline some of the applications I’ve installed that have improved my Hadoop dev experience.

Applications

App What is it? Why do I use it? Where to get it?
HomeBrew Homebrew installs the stuff you need that Apple didn’t. Makes local app installs super easy. I used this to install Maven, MySQL, Python, Hadoop, Pig, & Spark, & much more. http://brew.sh/
iTerm2 iTerm2 is a replacement for Terminal and the successor to iTerm. For connecting to Hadoop via SSH. This provides some nice features, such as tabs and status colors, which makes it easier to keep track of numerous simultaneous activities in Hadoop. https://www.iterm2.com/
IntelliJ IDEA The Community Edition is an excellent free IDE For development of Pig, Hive, Spark, & Python scripts https://www.jetbrains.com/idea/
0xDBE (EAP) New Intelligent IDE for DBAs and SQL Developers For SQL Server & MySQL development (And yes, I *do* miss SSMS, but I don’t want to have to run a VM to use it) https://www.jetbrains.com/dbe/

My config

IntelliJ Plugins

  • Apache Pig Language Plugin
  • Python Community Edition
  • etc

iTerm2

Bash Profile

    I also added the following code to my local & Hadoop .bashrc profiles. This changes the title of a Bash window. This isn’t specific to iTerm2, and I could have done this on my PC if I had known about it at the time. So if you are using either Terminal or a PC SSH client (i.e. PuTTY, PowerShell), you may still be able to take advantage of this if your client displays window titles.

 

 

 

This is an example of how you would call the code at the start of any new Bash session

iterm2

My Dev Process
I have 3 monitors set up, which are typically configured as:

Monitor 1

  • email
  • calendar
  • web browser (Slack, JIRA, etc.)

Monitor 2

  • IntelliJ

Monitor 3

  • iTerm2, with tabs already open for
    • Pig
    • Hive
    • Hadoop Bash (HDFS, log files, etc.)
    • misc (Python, Spark, etc.)
    • local Bash (GitHub commits, etc.)

In general, I write the code in IntelliJ and copy/paste it into iTerm2. This provides nice syntax highlighting and makes it easy to check my code into GitHub when I’m done. Once I’m past the initial dev phase, I SCP the actual scripts over to the prod Hadoop shell box for scheduling. Overall, I’ve found that this approach makes iterative dev much faster.

That’s pretty much the highlights, though I’ll continue to add to this as I stumble across tweaks, hacks, and apps that make my life easier.

Hopefully for those just starting out on a Mac, you’ve found this post helpful for getting up and running with Hadoop dev. For those who have already made the switch — or who have always used a Mac — did I miss something? Is there a killer app that you love for Hadoop dev? If so, please let me know! 🙂

Read More