Strategic Insights and Clickworthy Content Development

Tag: Software development

Will Containers Replace VMs?

While an across-the-board migration from virtual machines to containers isn’t likely, there are issues developers and operations personnel should consider to ensure the best solution for the enterprise.

Chances are, your company uses virtual machines on premises and in the cloud. Meanwhile, the developers are likely considering containers, if they haven’t adopted them already, to simplify development and deployment, improve application scalability and increase application delivery speed.

Architecturally speaking, VMs and containers have enough architectural similarities that some question the long-term survival of VMs, especially since VMs are already a couple of decades old. There are also serverless cloud options available now that dynamically manage the allocation of machine resources so humans don’t have to do it.

“If you embrace [a serverless] service, then you truly don’t have to worry about the virtual machines or the IaaS instances under the hood. You just turn your containers over to the service and they take care of all the provisioning underneath to run those containers,” said Tony Iams, research director at Gartner. “We’re far away from being able to do that on-premises, but if you’re doing this in the cloud, you no longer worry about [virtual machine] instances in the cloud.”

Most enterprises have been using VMs for quite some time and that will continue to be true for the foreseeable future.

“Most of the container infrastructure deployments are going to be on virtual machines,” said Iams. “If you look at where container runtimes are deployed as well as container orchestration systems such as Kubernetes, more often than not, it’s on virtual infrastructure and there’s a very important reason for that. In most cases, especially in today’s enterprise environments, the basic provisioning processes for infrastructure are going to be based on virtual machines, so even if you wanted to switch to provisioning on bare metal, it’s quite possible that you wouldn’t have any processes in place to do that.”

As organizations graduate to more sophisticated container deployments using orchestration platforms like Kubernetes, they can face significant provisioning challenges.

“Kubernetes has to be upgraded pretty frequently given the rapid pace of development, so you need to have solid provisioning processes in place. Most likely that’s going to be based on virtual machines,” said Iams. “Our guidance is not to make any changes there, to continue using the tools and processes you have in place, which is likely based on virtual machines, and focus on achieving the benefits of Kubernetes that are achieved higher up in the stack.”

Mitch Pirtle, principal at Space Monkey Labs and the creator of Joomla, an open source content management system, agrees that VMs will continue to provide value, although what’s considered a VM will likely change over time.

“The main benefit of the VM is that you can deploy that entire stack fully configured and ready-to-roll. My biggest issue with VMs is that if you want to deploy a stack that has unique scaling needs, you’re going to be in a world of hurt,” said Pirtle. “Containers, on the other hand, have a huge upside, especially in the enterprise space. You can scale different parts of your platform as needed, independent of each other.”

Container interest is fueled by developers

Developers are under constant pressure to deliver higher quality software at lower costs in ever-faster time frames. Containers enable applications to run in different environments, and even across environments, without a lot of extra work. That way, developers can write an application once and make minor modifications to it rather than writing the same application time and again. In addition, containers help facilitate DevOps efforts.

“One of the most important benefits containers provide is that once you have a containerized application, it runs in exactly the same environment at every stage of the lifecycle, from initial development through testing and deployment, so you get mobility of a workload at every stage of its lifecycle,” said Iams. “In the past, you would develop an application and turn it over to production. Any environment they would be running it in would run into problems, so they’d kick it back to developers and you’d have to try to recreate the environment that it was running in. A lot of those issues go away once you containerize a workload.”

Mike Nims, Director Advisory, Workday at KPMG said containers have greatly simplified his job because they abstract so much complexity.

“When I was a DBA, everybody knew I was on the Homer Simpson server, my instance lived on Homer Simpson, and my instance was Bart. In a container situation, I don’t even know where I sit,” said Nims. “Taking that technical view away is extremely beneficial [because] it also allows the developer to focus more on the code, integration or UI they’re working on as opposed to the hardware or the server.”

Containers aren’t a panacea

The use case tends to define whether VMs need to be used in conjunction with containers, or not. Businesses operating in highly regulated environments and others which place a high premium on security want multitenancy capabilities so virtualized workloads can run in isolation.

“With containers, you don’t quite get the same level of isolation,” said Iams. “However, you do get some isolation that may be sufficient for internal use cases.”

One concern is whether workloads running across containers are operating in the same trust domain. If not, extra steps must be taken to ensure isolation.

“That’s what’s underlying some of these new [sandboxed container runtime] initiatives you hear about, such as gVisor. [D]evelopers are trying to come up with new virtualization mechanisms that give you the same kind of isolation you would have between virtual machines and the efficiency and low resource consumption of containers,” said Iams. “That’s early-stage, for the time being. If you want to have that kind of isolation, you need virtual machines.”

However, containers are helping to eliminate application-level friction for end users and IT.

“My developers and I don’t ever think about this. A cloud container is really extending that user experience into an area of IT that for the longest time has been controlled by technical gearheads,” said KPMG’s Nims. “My wife, who doesn’t work in IT, could go to Amazon Cloud and set up a MySQL environment to do database work if she wanted to. That’s a container.”

Meanwhile, he thinks enterprise architects and DBAs should consider how cloud containers can be used to manage applications at scale.

“As a DBA, one [my] biggest pain points was the ancillary applications I was responsible for managing that weren’t necessarily tied to my database. When I did it, it was five to one or 10 to one databases per DBA. Then, each of these customers would have another 20 to 30 applications. I couldn’t manage that so we would hire SAs,” said Nims. “If I have a MySQL environment in a cloud container, I can patch it across all of my instances. Back in the day, I would have to individually patch each instance. That’s huge in terms of scalability and management because now you can have a control room managing hundreds of thousands of applications, potentially, with a couple of guys.”

Developers and operations have to work together

Operations teams have spent the last two decades configuring and provisioning VMs. Now, they need to get familiar with containers and container orchestration platforms, if they’re not already, given the popularity of containers among developers.

“It will take some time before both sides of this initiative come to terms and arrive at consistent operational processes,” said Iams. “If you’re doing this on-premises, it doesn’t get set up in a vacuum. Operations still has to configure storage and networking, and there may be some authentication and security-type mechanisms that have to be configured to make sure that this great new containerized infrastructure works well. Ops can’t do that in isolation.”

Of course, getting container infrastructure to work well isn’t an event, it’s a process, especially given the furious pace of container-related innovation.

“Getting the pieces to work well together is a challenge,” said Iams. “There are upgrades that are going to happen at multiple levels of the stack that are going to require processes to be in place that reflect the priorities of both groups.”

Bottom line

VMs and containers will continue to coexist for some time, mainly because businesses require the benefits of both. Even if containers could replace VMs for every conceivable use case, a mainstream shift wouldn’t happen overnight because today’s businesses are heavily dependent on and extremely familiar with VMs.

…But the bugs remain

As seen in SD Times.

bugSoftware teams are under pressure to deliver higher-quality software faster, but as high-profile failures and lackluster app ratings indicate, it’s easier said than done. With the tremendous growth of agile development, finding bugs earlier in the development cycle has become an imperative, but not all organizations are succeeding equally well.

“Developers realize they need better tools to investigate problems, but we need to make sure we’re not creating problems in the first place,” said Gil Zilberfeld, product manager of unit testing solution provider Typemock.

Software teams are using all kinds of tools, including bug and defect trackers, SCM tools, testing suites, and ALM suites, and yet software quality has not improved generally, according to William Nichols, a senior member of the technical staff at the Software Engineering Institute.

“The data don’t suggest that the software being produced is any better than it was a decade or 20 years ago, whether you measure it by lines of code or function points and defects,” he said. “We’re seeing one to seven defects per 1,000 lines of code. We’re making the same mistakes, and the same mistakes cause the same problems.”

One problem is focusing too much on the speed of software delivery rather than software quality. Nichols said this is a symptom of unrealistic management expectations. Tieren Zhou, founder and CEO of testing and ALM solution provider TechExcel, considered it a matter of attention: what’s sexy versus what matters.

“Bug fixing is less interesting than building features,” said Zhou. “In the interest of acquiring new customers, you may be losing old customers who are not happy with your products.”

While software failures are often blamed on coding and inadequate testing, there are many other reasons why software quality isn’t what it should be as evidenced by defects that are being injected at various points within the software life cycle.

Bug and defect tracking go agile

Bug and defect tracking is becoming less of a siloed practice as organizations embrace agile practices. Because agile teams are cross-functional and collaborative, tools are evolving to better align with their needs.

“We’re moving from isolation to transparency,” said Paula Rome, senior project manager at Seapine, a provider of testing and ALM solutions. “It makes no sense to have critical decision-making information trapped in systems.”

Since software teams no longer have weeks to dedicate to QA, developers and testers are working closer together than ever. While pair programming and test-driven development practices can help improve the quality of code, not every team is taking advantage of those and other means that can help find and fix defects earlier in the life cycle.

“There’s a need to find problems earlier, more often and faster, but what you’re seeing are .01 releases that fix a patch, or software teams using their customer base as a bug-tracking and bug-finding system,” said Archie Roboostoff, experience director for the Borland portfolio at Micro Focus, a software-quality tool provider.

Atlassian, maker of the JIRA issue and bug tracker, is helping teams get more insight into bugs with its latest 6.2 release. Instead of viewing bugs in “open” and “closed” terms, users can now see how many commits have been made, whether the peer reviews were success or not, and whether the code has been checked into production or not.

“The process of fixing a bug is a multi-stage process,” said Dan Chuparkoff, head of JIRA product marketing at Atlassian. “Developers check things out of the master branch, write some code, submit their code for peer reviews, peers comment on their code, the developers make some adjustments, check it into the master branch, and roll it up into production. Those steps are completely invisible in most bug systems so the stakeholders have trouble seeing whether something’s close to being finished or not.”

uTest (soon to be known as Applause) offers “in the wild” testing, which is a crowdsourced approach to quality assurance that enables organizations to find issues in production before their customers do.

Software teams are using the service to supplement their lab tests, although some, especially those doing three builds a week, are running lab and in-the-wild tests in parallel.

“In an agile world and in a continuous world, you want to make sure things are thoroughly tested and want to accelerate your sprints,” said Matt Johnston, chief strategy officer of uTest. “We help them catch things that were missed in lab testing, and we’re helping them find things they can’t reproduce.”

To keep pace with faster software release velocities, Hamid Shojaee, CEO of Scrum software provider Axosoft, is focusing on usability so individuals and teams can do a better job of resolving defects in less time.

“The custom pieces of information associated with each tracked bug are different for every team,” he said. “Creating custom fields has been a time-consuming and difficult thing to do when you’re customizing a bug-tracking tool. We have an intuitive user interface, so what would have taken you 20 to 30 minutes takes seconds.”

AccuRev is also enabling teams to spend less time using tools and more time problem-solving.

“Defect tracking can be cumbersome,” said Joy Darby, a director of engineering at AccuRev. “By the time software gets to QA, they have to ask questions or reference e-mails or look at a white board. With a central repository, you have instant access to all the artifacts, all the tests that were done, the build results, and any sort of complex code analysis you may have done.”

While more tools are evolving to support continuous integration and deployment, organizational cultures are not moving as quickly.

“While we’re all off iterating, the business is off waterfalling,” said Jeff Dalton, a Standard CMMI Appraisal Method for Process Improvement lead appraiser and CMMI instructor. “Software teams are accelerating their delivery cycles while the rest of the business still views software in terms of phases, releases, large planning efforts, large requirements, and 12-month delivery cycles.”

The disconnect between agile and traditional ways of working can work against software quality when funding is not tied to the outcome of sprints, for example.

Adopting a life cycle view of quality

As software teams become more agile, discrete workflows become collaborative ones that require a life-cycle view of software assets and interoperable tools. Even with the greater level of visibility life-cycle approaches provide, the root cause of problems nevertheless may be overlooked in the interest of finding and fixing specific bugs.

“We’ve inflated processes and tools in order to support something that could have been figured out earlier in the process if we had defined a better spec,” said Typemock’s Zilberfeld. “Instead, we spend five hours talking about something the customer doesn’t care about.”

Most ALM tools are open enough to support other tools whose capabilities equal or surpass what is in the ALM suite. Conversely, narrower tool providers are looking at bugs and defects in a broader sense because customers want to use the tools in a broader context than they have in the past.

“Software is no longer an isolated venture. It really affects all parts of the business,” said Atlassian’s Chuparkoff. “Modern issue trackers have REST APIs that allow you to easily connect your issue tracker to the entire product life cycle. We wanted to make sure JIRA can integrate with your proprietary tool and other tools via REST APIs or plug-ins from our marketplace. We realize people aren’t going to use JIRA in a silo by itself.”

Octo Consulting Group, which provides technology and management consulting services to federal agencies, is one of many organizations that are using JIRA in tandem with ALM solutions.

“Bug and defect tracking is part of ALM,” said Octo Consulting Group CTO Ashok Nare. “While we use JIRA, and there are a lot of good ALM products like Rally, VersionOne and CollabNet…the tools are really there to facilitate a process.”

Despite the broader life-cycle views, software quality efforts often focus on development and testing even though many defects are caused by ill-defined requirements or user stories.

“Philosophically, we didn’t use to think about bugs and defects in terms of requirements problems or customer problems or management problems, so we focused on code,” said CMMI’s Dalton. “But what we found was the code did what it was supposed to do, but didn’t do what the customer wanted it to do. It’s important to understand where the defect is injected into the process, because if we know that, we can change the process to fix it.”

Dalton prefers the process model approach, which includes prototypes, mockups and wireframes as part of requirements and the design process because they solve problems caused in the early stages when they’re the least costly to fix.

“Every time there’s an assumption or something fuzzy in the requirements it leads to defects,” said Adam Sandman, director at Inflectra (a maker of test-management software). “If you can’t define it, you can’t build it well.”

Inflectra, TechExcel, Seapine and the other ALM solution providers tie requirements, development, testing and other life-cycle stages together so that, among other things, defects can be identified, fixed and prevented from coming back in future iterations or releases.

“We’re connecting the dots, making it possible to have transparency between the silos so you get the data you need when you need it,” said Seapine’s Rome.

In addition to providing solutions, TechExcel is trying to help software teams deliver better products by promoting the concept of “QA floaters” who, as part of an agile team, help developers define test cases and run test cases in parallel with developers.

“When developers and QA floaters are both testing, you have a built-in process that helps you find and fix bugs earlier so the developer can satisfy a requirement or story,” said TechExcel’s Zhou. “When you tie in total traceability, you tie requirements, development and testing together in a way that improves productivity and software quality.”

Who owns software quality?
Software quality has become everyone’s job, but not everyone sees it that way, which is one reason why defects continue to fall through organizational cracks.

“When you separate the accountability and resources, that’s where disaster always starts,” said Andreas Kuehlmann, SVP of research and development at testing solution provider Coverity. “A lot of teams have gotten to the point where the developers are doing a little bit of testing, but the rest is tossed over the fence to QA who can’t even start the executable.”

Coverity offers three products that move testing into development: Quality Advisor, which uses deep semantic and static analysis to identify bugs in code when the code is compiling; Security Advisor, which uses the same technology to find security vulnerabilities; and Test Advisor which identifies the most risky code.

“Moving testing into development requires a lot to be done from a workflow perspective,” said Kuehlmann. “You have to have tests running 24×7, you have to have the tools and infrastructure in place, and you have to change developers’ mindsets. That’s really hard. The role of QA is evolving into more like a sign-off check.”

The dynamics between coders and testers is changing, but not in a uniform way. A minority of organizations are collapsing coding and testing into a single function, although the majority is leveraging the skill sets of both developers and QA with the goal of optimizing delivery speed and quality.

“Developers are really good at solving problems, and test engineers are good at finding vulnerabilities,” said Atlassian’s Chuparkoff. “If a developer can run an automated test after he finishes his code, he can fix the bug immediately while he’s in the thinking mode of fixing it. It’s a lot more efficient than fixing it four days later after someone gave you the issue.”

Annotated screen shots help speed up issue resolution, which is why Axosoft, Atlassian and Seapine have added the capability to their tools.

“You have to make sure people are taking the time to put the proper reproduction steps in to make sure those bugs are fixed,” said Axosoft’s Shojaee.

Not everyone on the team may be responsible for fixing defects, but many have the potential to inject them. Because software is increasingly the face of businesses, organizations are starting to realize that software quality isn’t a technical problem; it’s a business problem. For example, uTest’s Johnston recently met with the CIO of a major media company who considers software quality the CEO’s responsibility since a major portion of the company’s revenue is driven by digital experiences.

“If that sentiment can win the day, a lot more companies will be successful in the app economy,” said Johnston.

The complexity paradox
On one hand, the software landscape is becoming more complex, and at the same time, tools and approaches to software development are becoming more abstract, all of which can make finding and fixing defects more difficult.

“It’s not about Windows and Linux anymore,” said Inflectra’s Sandman. “Now you have all these mobile devices and frameworks, and you’re seeing constant updates to browsers. If you’re building systems with frameworks and jQuery plug-ins and something goes wrong, do you fix it, ask the vendor to fix it, or ask the open-source community to fix it? Inevitably the bugs may not be in your application but in the infrastructure you’re relying on.”

Micro Focus’ Roboostoff agreed. “If users see quality as something that works, and your product doesn’t work, then it’s hugely defective,” he said.

“When I had my Web server, application server and database server sitting in my office, I could rest assured a problem was somewhere in the closet and I’d find it eventually. Now, I might have some REST services sitting in Amazon, some service-based message in Azure, six CDNs around the world, and A/B testing for optimizing linking going on, and then on Monday morning half of my customers say something is slow.”

Because there is so much complexity and because the landscape is changing so fast at so many levels, edge-case testing is becoming more important.

“When you consider there are about 160,000 combinations of devices, browsers and platforms you have to test for, most customers aren’t coming close to where they should be,” said Roboostoff. “Since it isn’t practical, you pick the biggest screen and the smallest screen, the newest devices and the oldest devices to lower that risk profile.”

The fragmentation that is continuing to occur at so many levels can cause errors that are difficult to identify and rectify.

“One brand may have four to 10 different codebases, four to 10 product road maps, varying skill sets to accomplish all that, and a multitude of platforms and devices they are building software for that they have to test against,” said Johnston. “Meanwhile, users expect things to operate like a light switch.”

The U.S. government established a standardized approach to security assessment called the Federal Risk and Authorization Management Program (FedRAMP), which is apparently benefitting some software developers and consultants who need to be responsible for their software quality but are not in control of the cloud infrastructure. Octo Consulting Group’s Nare said that FedRAMP’s certification simplifies the testing he would otherwise have to do.

“As the level of abstraction goes up, if you’re only testing the top layer, you have to assume that the lower layers underneath like the infrastructure in the cloud and the PaaS are fundamentally sound so that everything is working the way it’s supposed to,” he said. “When we do security testing today and we test our applications, we don’t certify the whole stack anymore because the cloud service providers have already been certified. Otherwise you might have to write tests at the infrastructure of PaaS level.”

Meanwhile, most organizations are trying to wrap their arms around the breadth of testing and defect resolution practices necessary to deliver Web and mobile applications that provide the scalability, performance, and security customers expect.

“If you’re going to build better quality software faster, you need to make sure that the build actually works,” said Andreas Grabner, technology strategist at Compuware (an IT services company). “The software I write is more complex because it is interacting with things I can’t control.”

And that’s just the current state of Web and mobile development. With the Internet of Things looming, some tool providers expect that mainstream developers will have to write applications for devices other than smartphones, and as a result, system complexity and the related bug- and defect-tracking challenges will increase.

“If you think about the Web and the fragmentation of mobile devices, the complexity has increased by an order of magnitude,” said Johnston. “If you think about wearables or automobiles or smart appliances or smartwatches, it’s going to get exponentially worse.”

There’s no excuse for bad quality

There are many reasons why software quality falls short of user expectations, but the problem is that users don’t want to hear it. Even though every user complaint won’t make it to the top of a backlog, what customers consider “bugs” and “defects” have a nasty habit of making headlines, resulting in seething customer reviews and negatively impacted revenue.

“It’s unacceptable to tell users that you can’t reproduce a bug. These days they have all the cards,” said Johnston. “We live in a world where app quality—functional quality, usability quality, performance quality and security quality—are differentiators, and yet quality is still thought of as a cost center.”

Bug and defect tracking is all about problem-solving, but unfortunately some of the lingering problems aren’t being addressed despite impressive tool advancements because organizations change slower than technology does.

“I can have a product that’s completely bug free and has a great user experience, but if you get no value out of it, the quality is bad,” said Micro Focus’ Roboostoff. “People need to understand quality. It’s not about function; it’s about the customer perception of your product, your brand, and your company.”


Hadoop is Now a General-purpose Platform

As seen in SD Times

HadoopApache Hadoop adoption is accelerating among enterprises and advanced computing environments as the project, related projects, and ecosystem continue to expand. While there were valid reasons to avoid the 1.x versions, skeptics are reconsidering since Hadoop 2 (particularly the latest 2.2.0 version) provides a viable choice for a wider range of users and uses.

“The Hadoop 1.x generation was not easy to deploy or easy to manage,” said Juergen Urbanski, former chief technologist of T-Systems, the IT consulting division of Deutsche Telecom. “The many moving parts that make up a Hadoop cluster were difficult for users to configure. Fortunately, Hadoop 2 fills in many of the gaps. Manageability is a key expectation, particularly for the more critical business use cases.”

Hadoop 2.2.0 adds the YARN resource-management framework to the core set of Hadoop modules, which include the Hadoop Common set of utilities, the Hadoop Distributed File System (HDFS), and Hadoop MapReduce for parallel processing. Other improvements include enhancements to HDFS, binary compatibility for Map/Reduce applications built on Hadoop 1.x, and support for running Hadoop on Windows.
Meanwhile, Hadoop-related projects and commercial products are proliferating along with the ecosystem. Collectively, the new Hadoop capabilities provide a more palatable and workable solution, not only for enterprise developers, business analysts and IT, but also a larger community of data scientists.

“There are many technologies that are helping Hadoop realize its potential as being a more general-purpose platform for computing,” said Doug Cutting, co-creator of Hadoop. “We started out as a batch processing system. People used it to do computations on large data sets that they couldn’t do before, and they could do it affordably. Now there’s an ever-increasing amount of data processing that organizations can do using this one platform.”

YARN expands the possibilities
The limitations of Map/Reduce were the genesis of Apache Hadoop NextGen MapReduce (a.k.a. YARN), according to Arun Murthy, release manager for Hadoop 2.

“It was apparent as early as 2008 that Map/Reduce was going to become a limiting factor because it’s just one algorithm,” he said. “If you’re trying to do things like machine learning and modeling, Map/Reduce is not the right algorithm to do it.”

Rather than replacing Map/Reduce altogether, it was supplemented with YARN to provide things like resource management and fault tolerance as base primitives in the platform, while allowing end users to do different things as they process and track the data in different ways.

“The architecture had to be more general-purpose than Map/Reduce,” said Murthy. “We kept the good parts of Map/Reduce, such as scale and simple APIs, but we had to allow other things to coexist on the same platform.”

The original Hadoop MapReduce was based on the Google Map/Reduce paper, while Hadoop HDFS was based on the Google File System paper. HDFS provides a mechanism to store huge amounts of heterogeneous data cheaply; Map/Reduce enables highly efficient parallel processing.

“Map/Reduce is a mature concept that comes from LISP and functional programming,” said Murthy. “Google scaled Map/Reduce out in a massive way while keeping a real simple interface for the end user so the end user does not have to deal with the nitty-gritty details of scheduling, resource management, fault tolerance, network partitions, and other crazy stuff. It allowed the end user to just deal with the business logic.”

Because YARN is an open framework, users are free to use algorithms other than Map/Reduce. In addition, applications can run on and integrate with it.

“The scientific and security computing communities depend on Open MPI technologies, which weren’t even an option in Hadoop 1,” said Edmon Begoli, CTO of analytics consulting firm PYA Analytics. “The architecture of Hadoop 2 and YARN allows you to plug in your own resource manager and your own parallel processing algorithms. People in the high-performance computing community have been talking about YARN enthusiastically for years.”

HDFS: Aspirin for other headaches
Some CIOs have been reluctant to bring Hadoop into the enterprise because there have been too many barriers to entry, although Hadoop 2 improvements are turning the tide.

“I think two of the deal breakers were NameNode federation and the Quorum Journal Manager, which is basically a failover for the HDFS NameNode,” said Jonathan Ellis, project chair for Apache Cassandra. “Historically, if your NameNode went down, you were basically screwed because you’d lose some amount of data.”

Hadoop 2 introduces the Quorum Journal Manager, where changes to the NameNodes are recorded to replicated machines to avoid data loss, he said. NameNode federation allows a pool of NameNodes to share responsibility for an HDFS cluster.

“NameNode federation is a bit of a hack because each NameNode still only knows about the file set it owns, so at the client level you have to somehow teach the client to look for some files on one NameNode and other files on another NameNode,” said Ellis.

HDFS is nevertheless an economically feasible way to store terabytes or even petabytes of data. Facebook has a single cluster that stores more than 100PB on Hadoop, according to Murthy.

“It’s amazing how much data you can store on Hadoop,” he said. “But you have to interact with the data, interrogate it, and come up with insights. That’s where YARN comes in. Now you have a general-purpose data operating system, and on top of it you can run applications like Apache Storm.”

John Haddad, senior director of product marketing at Informatica, said the Hadoop 2 improvements allow his organization to run more types of applications and workloads.

“Various teams can run a variety of different applications on the cluster concurrently,” he said. “Hadoop 1 lacked some of the security, high availability and flexibility necessary to have different applications, different types of workloads, and more than one organization or team submitting jobs to the cluster.”

Gearing up for prime time
The number and types of Hadoop open-source projects and commercial offerings are expanding rapidly. Hadoop-related projects include HBase, a highly scalable distributed database; the Hive data warehouse infrastructure; the Pig language and framework for parallel computing; and Ambari, which provisions, manages and monitors Apache Hadoop clusters.

“It seems like we’ve got 20 or 30 new projects every week,” said Cutting. “We have all these separate, independent projects that work together, so they’re interdependent but under separate control so the ecosystem can evolve.”

Meanwhile, solution providers are building products for or integrating their products with Hadoop. Collectively, Hadoop improvements, open-source projects and compatible commercial products are allowing organizations to tailor it to their needs, rather than having to shoehorn what they are doing into a limited set of capabilities. And the results are impressive.

For example, Oak Ridge National Laboratory used Hadoop to help the Center for Medicare and Medicaid Services identify tens of millions of dollars in overpayments and fraudulent transactions in just three weeks.

“Using only two or three engineers, we were able to approach and understand the data from different angles using Hive on Hadoop because it allowed us to write SQL-like queries and use a machine-learning library or run straight Map/Reduce queries,” said PYA Analytics’ Begoli. “In the traditional warehousing world, the same project would have taken months unless you had a very expensive data warehouse platform and very expensive technology consulting resources to help you.”

The groundswell of innovation is enabling Hadoop to move beyond its batch-processing roots to include real-time and near-real-time analytics.


The groundswell of innovation is enabling Hadoop to move beyond its batch-processing roots to include real-time and near-real-time analytics.

Skeptics are doing a double take
Hadoop 2 is converting more skeptics than Hadoop 1 because it’s more mature, it’s easier (but not necessarily easy) to implement, it has more options, and its community is robust.

“You can bring Hadoop into your organization and not worry about vendor lock-in or what happens if the provider disappears,” said Murthy. “We have contributions from about 2,000 people at this point.”

There are also significant competitive pressures at work. Organizations that have adopted Hadoop are improving the effectiveness of things like fraud detection, portfolio management, ad targeting, search, and customer behavior by combining structured and unstructured data from internal and external sources that commonly include social networks, mobile devices and sensors.

“We’re seeing organizations start off with basic things like data warehouse optimization, and then move on to other cool and interesting things that can drive more revenue from the company,” said Informatica’s Haddad.

For example, Yahoo has been deploying YARN in production for a year, and the throughput of the YARN clusters has more than doubled. According to Murthy, Yahoo’s 35,000-node cluster now processes 130 to 150 jobs per day versus 50 to 60 before YARN.

“When you’ve got 2x over 35,000 to 40,000 nodes, that’s phenomenal,” he said. “It’s a pretty compelling story to tell a CIO that if you just upgrade your software from Hadoop 1 to Hadoop 2, you’ll see 2x throughput improvements in your jobs.”

Of course, Hadoop 2.2.0 isn’t perfect. Nothing is. And some question what Hadoop will become as it continues to evolve.

Hadoop co-creator Cutting said the beauty of Hadoop as an open-source project is that new things can replace old things naturally. That prospect somewhat concerns PYA Analytics’ Begoli, however.

“I’m concerned about the explosion of frameworks because it happened with Java and it’s happening with JavaScript,” he said. “When everyone is contributing something, it can be too much or the original vision can be diluted. On the other hand, a lot of brilliant teams are contributing to Hadoop. There are management tools, SQL tools, third-party tools and a lot of other things that are being incubated to deliver advanced capabilities.”

While Hadoop’s full impact has yet to be realized, Hadoop 2 is a major step forward.

Well-known Hadoop implementations

Amazon Web Services: Amazon Elastic MapReduce uses Hadoop in order to provide a quick, easy and cost-effective way to distribute and process large amounts of data across a resizable cluster of Amazon EC2 instances. It can be used to analyze click-stream data, process vast amounts of genomic data and other large scientific data sets, and process logs generated by Web and mobile applications.


Tech Buying: 6 Reasons Why IT Still Matters

ErrorOriginally published in InformationWeek, and available as a slideshow here.

Making major tech purchases, especially big data analytics and business intelligence tools, without consulting IT may cause major problems. Here’s why.

Although shadow IT is not new, the percentage of business tech purchases made outside IT is significant and growing. When Bain & Company conducted in-depth interviews with 67 marketing, customer service, and supply chain professionals in February 2014, it found that nearly one-third of technology purchasing power had moved to executives outside of IT. Similarly, member-based advisory firm CEB has estimated that non-IT departments control 30% of enterprise IT spend. By 2020, Gartner estimates, 90% of tech spending will occur outside IT.

There are many justifications for leaving IT in the dark about departmental tech purchases. For one thing, departmental technology budgets seem to point to departmental decision making. Meanwhile, cloud-based solutions, including analytics services, have become more popular with business users because they are easy to set up. In addition, their relatively low subscription rates or pay-per-use models may be more attractive from a budgetary standpoint than their traditional on-premises counterparts, which require significant upfront investments and IT consideration. Since the cost and onboarding barriers to cloud service adoption are generally lower than for on-premises products, IT’s involvement may seem to be unnecessary.

Besides, IT is busy. Enterprise environments are increasingly complex, and IT budgets are not growing proportionally, so the IT department is resource-constrained. Rather than waiting for IT — or complicating decision-making by getting others involved — non-IT tech buyers anxious to deploy a solution may be tempted to act first and answer questions later.

However, making tech purchase without IT’s involvement may result in unforeseen problems. On the following pages, we reveal six risks associated with making business tech purchases without involving IT.

1. Tech Purchases Affect Everybody
Tech purchases made without IT’s involvement may affect IT and the IT ecosystem in ways that someone outside IT couldn’t anticipate. You might be introducing technical risk factors or tapping IT resources IT will have to troubleshoot after the fact. To minimize the potential of unforeseen risks, IT can perform an in-depth assessment of your department’s requirements, the technology options, their trade-offs, and the potential ripple effect that your tech purchase might have across the organization. This kind of risk/benefit analysis is important. Even if it seems like a barrier for your department to get what it wants, it’s better for the entire organization in the long run.
Also, you may need help connecting to data sources, integrating data sources, and ensuring the quality of data, all of which require specific expertise. IT can help you understand the scope of an implementation in greater detail than you might readily see.

2. Sensitive Information May Be Compromised
Information security policies need to be defined, monitored, and enforced. While it’s common for businesses to have security policies in place, education about those policies, and the enforcement of those policies, sometimes fall short. Without appropriate precautions, security leaks can happen innocently, or you could be opening the door to intentional bad actors.
Cloud-based services can expose organizations to risks that users haven’t considered, especially when the service’s terms of use are not understood. Asurvey of 4,140 business and IT managers, conducted in July 2012 by The Ponemon Institute and sponsored by Thales e-Security, revealed that 63% of respondents did not know what cloud providers are doing to protect their sensitive or confidential data.

3. Faulty Data = Erroneous Conclusions
There is no shortage of data to analyze. However, inadequate data quality and access to only a subset of information can negatively impact the accuracy of analytics and, ultimately, decision making.
In an interview with InformationWeek, Jim Sterne, founder of the eMetrics Summit and the Digital Analytics Association, warned that the relative reliability of sources needs to be considered since CRM system data, onsite user behavior data, and social media sentiment analysis data are not equally trustworthy.
“If I’m looking at a dashboard as a senior executive and I know where the data came from and how it was cleansed and blended, I’m looking at the numbers as if they have equal weight,” he said. “It’s like opening up a spice cabinet and assuming each spice is as spicy as any other. I will make bad decisions because I don’t know how the information was derived.”

4. Not Getting What You Bought
Similar products often sound alike, but their actual capabilities can vary greatly. IT can help identify important differences.
While it may be tempting to purchase a product based on its exhaustive feature set or its latest enhancements, feature-based buying often proves to be a mistake because it omits or minimizes strategic thinking. To reduce the risk of buyer’s remorse, consulting with IT can help you assess your current and future requirements and help you choose a solution that aligns with your needs.

5. Scope Creep
Business users typically want immediate benefits from big data, analytics packages, and BI systems. But, if the project has a lot of technological complexity — and particularly if it involves tech dependencies that are outside the control of your department — it’s often best to implement in phases. Approaching large initiatives as one big project may prove to be more complicated, time-consuming, and costly than anticipated.
IT can help you break a large, difficult-to-manage project into several smaller projects, each of which has its own timeline and goals. That way, you can set realistic end-user and C-suite expectations and effectively control risks. Phasing large projects can also provide you with the flexibility you need to adjust your implementation as business requires.

6. Missing Out On Prior Experience
IT professionals and outsourced IT resources often have prior experience with BI and analytics implementations that are specific or relevant to your department. Some of them have implemented solutions in other companies, departments, or industries and have gained valuable insight from those experiences. When armed with such knowledge, they can help you understand potential opportunities, challenges, and pitfalls you may not have considered which can affect planning, implementation, and the choice of solutions.