Strategic Insights and Clickworthy Content Development

Author: misslisa (Page 9 of 10)

I'm a writer, editor, analyst, and writing coach.

What You Should Know About Machine-Aided Analytics

More organizations are supplementing their analytics capabilities with intelligent systems that are easier to use than ever. While the results may look impressive, the devil is in the details.

There is a key difference between traditional analytics systems and some of the newer analytics systems that is very important. If you understand the difference, you’ll be a step ahead of your peers.

Old: Input → Output

Traditional analytics systems tend to be rules-based which means they have “if/then” scenarios built into them, so if a user clicks the red button, one result occurs. If she clicks the blue button, then another result occurs. The key thing to know here is that, assuming the programming is done right, an input results in a predictable output. That’s great, but it doesn’t work so well with the complex Big Data we have today, which is why machine learning is gaining momentum.

Modern systems use machine learning to provide more intelligent solutions. The solutions are more “intelligent” because the machine learns what humans feed it, and depending on the algorithms used, they may be capable of learning on their own. Training by humans and self-learning allows such systems to “see” things in the data that weren’t apparent before, such as patterns and relationships. The other major value, of course, is the ability to comb through massive amounts of structured and unstructured data faster than a human could, understand the data, make predictions on it, and perhaps make recommendations. It is the latter characteristics — prediction and prescription — that are most obvious to analytics users.

What’s not well understood is what can potentially go wrong. An analytics system designed for general purpose use is likely not what someone on Wall Street would use. That person would want a solution that’s tailored to the needs of the financial services industry. Making the wrong movie prediction is one thing; making the wrong trade is another.

As users, it’s easy to assume that the analytics we get or come up with are accurate, but there is so much that can affect accuracy — data quality, algorithms, models, interpretation. And, as I mentioned in my last post, bias which can impact all of those things and more.

Why you should care

There is a shortage of really good data science and analytics talent. One answer to the problem is to build solutions that abstract the complexity of all the nasty stuff — data collection, data preparation, choice of algorithms and models, etc. — so business users don’t have to worry about it. On one hand, the abstraction is good because it enables solutions that are easy to use and don’t require much, if any, training.

But what if the underlying math or assumptions aren’t exactly right? How would you know what effect that might have? To understand how and why those systems are working the way they are requires someone who understands all the hairy technical stuff, like a car mechanic. That means, like a car, do not pop the hood and start tinkering with things unless you know what you’re doing.

Some solutions don’t have a pop-the-hood option. They’re black boxes, which means no one can see what’s going on inside. The opaqueness doesn’t make business users nervous, but it’s troublesome to experts who didn’t build the system in the first place.

Bottom line, you’re probably going to get spurious results once in a while, and when you do ask why. If it’s not obvious to you, ask for help.,

…But the bugs remain

As seen in SD Times.

bugSoftware teams are under pressure to deliver higher-quality software faster, but as high-profile failures and lackluster app ratings indicate, it’s easier said than done. With the tremendous growth of agile development, finding bugs earlier in the development cycle has become an imperative, but not all organizations are succeeding equally well.

“Developers realize they need better tools to investigate problems, but we need to make sure we’re not creating problems in the first place,” said Gil Zilberfeld, product manager of unit testing solution provider Typemock.

Software teams are using all kinds of tools, including bug and defect trackers, SCM tools, testing suites, and ALM suites, and yet software quality has not improved generally, according to William Nichols, a senior member of the technical staff at the Software Engineering Institute.

“The data don’t suggest that the software being produced is any better than it was a decade or 20 years ago, whether you measure it by lines of code or function points and defects,” he said. “We’re seeing one to seven defects per 1,000 lines of code. We’re making the same mistakes, and the same mistakes cause the same problems.”

One problem is focusing too much on the speed of software delivery rather than software quality. Nichols said this is a symptom of unrealistic management expectations. Tieren Zhou, founder and CEO of testing and ALM solution provider TechExcel, considered it a matter of attention: what’s sexy versus what matters.

“Bug fixing is less interesting than building features,” said Zhou. “In the interest of acquiring new customers, you may be losing old customers who are not happy with your products.”

While software failures are often blamed on coding and inadequate testing, there are many other reasons why software quality isn’t what it should be as evidenced by defects that are being injected at various points within the software life cycle.

Bug and defect tracking go agile

Bug and defect tracking is becoming less of a siloed practice as organizations embrace agile practices. Because agile teams are cross-functional and collaborative, tools are evolving to better align with their needs.

“We’re moving from isolation to transparency,” said Paula Rome, senior project manager at Seapine, a provider of testing and ALM solutions. “It makes no sense to have critical decision-making information trapped in systems.”

Since software teams no longer have weeks to dedicate to QA, developers and testers are working closer together than ever. While pair programming and test-driven development practices can help improve the quality of code, not every team is taking advantage of those and other means that can help find and fix defects earlier in the life cycle.

“There’s a need to find problems earlier, more often and faster, but what you’re seeing are .01 releases that fix a patch, or software teams using their customer base as a bug-tracking and bug-finding system,” said Archie Roboostoff, experience director for the Borland portfolio at Micro Focus, a software-quality tool provider.

Atlassian, maker of the JIRA issue and bug tracker, is helping teams get more insight into bugs with its latest 6.2 release. Instead of viewing bugs in “open” and “closed” terms, users can now see how many commits have been made, whether the peer reviews were success or not, and whether the code has been checked into production or not.

“The process of fixing a bug is a multi-stage process,” said Dan Chuparkoff, head of JIRA product marketing at Atlassian. “Developers check things out of the master branch, write some code, submit their code for peer reviews, peers comment on their code, the developers make some adjustments, check it into the master branch, and roll it up into production. Those steps are completely invisible in most bug systems so the stakeholders have trouble seeing whether something’s close to being finished or not.”

uTest (soon to be known as Applause) offers “in the wild” testing, which is a crowdsourced approach to quality assurance that enables organizations to find issues in production before their customers do.

Software teams are using the service to supplement their lab tests, although some, especially those doing three builds a week, are running lab and in-the-wild tests in parallel.

“In an agile world and in a continuous world, you want to make sure things are thoroughly tested and want to accelerate your sprints,” said Matt Johnston, chief strategy officer of uTest. “We help them catch things that were missed in lab testing, and we’re helping them find things they can’t reproduce.”

To keep pace with faster software release velocities, Hamid Shojaee, CEO of Scrum software provider Axosoft, is focusing on usability so individuals and teams can do a better job of resolving defects in less time.

“The custom pieces of information associated with each tracked bug are different for every team,” he said. “Creating custom fields has been a time-consuming and difficult thing to do when you’re customizing a bug-tracking tool. We have an intuitive user interface, so what would have taken you 20 to 30 minutes takes seconds.”

AccuRev is also enabling teams to spend less time using tools and more time problem-solving.

“Defect tracking can be cumbersome,” said Joy Darby, a director of engineering at AccuRev. “By the time software gets to QA, they have to ask questions or reference e-mails or look at a white board. With a central repository, you have instant access to all the artifacts, all the tests that were done, the build results, and any sort of complex code analysis you may have done.”

While more tools are evolving to support continuous integration and deployment, organizational cultures are not moving as quickly.

“While we’re all off iterating, the business is off waterfalling,” said Jeff Dalton, a Standard CMMI Appraisal Method for Process Improvement lead appraiser and CMMI instructor. “Software teams are accelerating their delivery cycles while the rest of the business still views software in terms of phases, releases, large planning efforts, large requirements, and 12-month delivery cycles.”

The disconnect between agile and traditional ways of working can work against software quality when funding is not tied to the outcome of sprints, for example.

Adopting a life cycle view of quality

As software teams become more agile, discrete workflows become collaborative ones that require a life-cycle view of software assets and interoperable tools. Even with the greater level of visibility life-cycle approaches provide, the root cause of problems nevertheless may be overlooked in the interest of finding and fixing specific bugs.

“We’ve inflated processes and tools in order to support something that could have been figured out earlier in the process if we had defined a better spec,” said Typemock’s Zilberfeld. “Instead, we spend five hours talking about something the customer doesn’t care about.”

Most ALM tools are open enough to support other tools whose capabilities equal or surpass what is in the ALM suite. Conversely, narrower tool providers are looking at bugs and defects in a broader sense because customers want to use the tools in a broader context than they have in the past.

“Software is no longer an isolated venture. It really affects all parts of the business,” said Atlassian’s Chuparkoff. “Modern issue trackers have REST APIs that allow you to easily connect your issue tracker to the entire product life cycle. We wanted to make sure JIRA can integrate with your proprietary tool and other tools via REST APIs or plug-ins from our marketplace. We realize people aren’t going to use JIRA in a silo by itself.”

Octo Consulting Group, which provides technology and management consulting services to federal agencies, is one of many organizations that are using JIRA in tandem with ALM solutions.

“Bug and defect tracking is part of ALM,” said Octo Consulting Group CTO Ashok Nare. “While we use JIRA, and there are a lot of good ALM products like Rally, VersionOne and CollabNet…the tools are really there to facilitate a process.”

Despite the broader life-cycle views, software quality efforts often focus on development and testing even though many defects are caused by ill-defined requirements or user stories.

“Philosophically, we didn’t use to think about bugs and defects in terms of requirements problems or customer problems or management problems, so we focused on code,” said CMMI’s Dalton. “But what we found was the code did what it was supposed to do, but didn’t do what the customer wanted it to do. It’s important to understand where the defect is injected into the process, because if we know that, we can change the process to fix it.”

Dalton prefers the process model approach, which includes prototypes, mockups and wireframes as part of requirements and the design process because they solve problems caused in the early stages when they’re the least costly to fix.

“Every time there’s an assumption or something fuzzy in the requirements it leads to defects,” said Adam Sandman, director at Inflectra (a maker of test-management software). “If you can’t define it, you can’t build it well.”

Inflectra, TechExcel, Seapine and the other ALM solution providers tie requirements, development, testing and other life-cycle stages together so that, among other things, defects can be identified, fixed and prevented from coming back in future iterations or releases.

“We’re connecting the dots, making it possible to have transparency between the silos so you get the data you need when you need it,” said Seapine’s Rome.

In addition to providing solutions, TechExcel is trying to help software teams deliver better products by promoting the concept of “QA floaters” who, as part of an agile team, help developers define test cases and run test cases in parallel with developers.

“When developers and QA floaters are both testing, you have a built-in process that helps you find and fix bugs earlier so the developer can satisfy a requirement or story,” said TechExcel’s Zhou. “When you tie in total traceability, you tie requirements, development and testing together in a way that improves productivity and software quality.”

Who owns software quality?
Software quality has become everyone’s job, but not everyone sees it that way, which is one reason why defects continue to fall through organizational cracks.

“When you separate the accountability and resources, that’s where disaster always starts,” said Andreas Kuehlmann, SVP of research and development at testing solution provider Coverity. “A lot of teams have gotten to the point where the developers are doing a little bit of testing, but the rest is tossed over the fence to QA who can’t even start the executable.”

Coverity offers three products that move testing into development: Quality Advisor, which uses deep semantic and static analysis to identify bugs in code when the code is compiling; Security Advisor, which uses the same technology to find security vulnerabilities; and Test Advisor which identifies the most risky code.

“Moving testing into development requires a lot to be done from a workflow perspective,” said Kuehlmann. “You have to have tests running 24×7, you have to have the tools and infrastructure in place, and you have to change developers’ mindsets. That’s really hard. The role of QA is evolving into more like a sign-off check.”

The dynamics between coders and testers is changing, but not in a uniform way. A minority of organizations are collapsing coding and testing into a single function, although the majority is leveraging the skill sets of both developers and QA with the goal of optimizing delivery speed and quality.

“Developers are really good at solving problems, and test engineers are good at finding vulnerabilities,” said Atlassian’s Chuparkoff. “If a developer can run an automated test after he finishes his code, he can fix the bug immediately while he’s in the thinking mode of fixing it. It’s a lot more efficient than fixing it four days later after someone gave you the issue.”

Annotated screen shots help speed up issue resolution, which is why Axosoft, Atlassian and Seapine have added the capability to their tools.

“You have to make sure people are taking the time to put the proper reproduction steps in to make sure those bugs are fixed,” said Axosoft’s Shojaee.

Not everyone on the team may be responsible for fixing defects, but many have the potential to inject them. Because software is increasingly the face of businesses, organizations are starting to realize that software quality isn’t a technical problem; it’s a business problem. For example, uTest’s Johnston recently met with the CIO of a major media company who considers software quality the CEO’s responsibility since a major portion of the company’s revenue is driven by digital experiences.

“If that sentiment can win the day, a lot more companies will be successful in the app economy,” said Johnston.

The complexity paradox
On one hand, the software landscape is becoming more complex, and at the same time, tools and approaches to software development are becoming more abstract, all of which can make finding and fixing defects more difficult.

“It’s not about Windows and Linux anymore,” said Inflectra’s Sandman. “Now you have all these mobile devices and frameworks, and you’re seeing constant updates to browsers. If you’re building systems with frameworks and jQuery plug-ins and something goes wrong, do you fix it, ask the vendor to fix it, or ask the open-source community to fix it? Inevitably the bugs may not be in your application but in the infrastructure you’re relying on.”

Micro Focus’ Roboostoff agreed. “If users see quality as something that works, and your product doesn’t work, then it’s hugely defective,” he said.

“When I had my Web server, application server and database server sitting in my office, I could rest assured a problem was somewhere in the closet and I’d find it eventually. Now, I might have some REST services sitting in Amazon, some service-based message in Azure, six CDNs around the world, and A/B testing for optimizing linking going on, and then on Monday morning half of my customers say something is slow.”

Because there is so much complexity and because the landscape is changing so fast at so many levels, edge-case testing is becoming more important.

“When you consider there are about 160,000 combinations of devices, browsers and platforms you have to test for, most customers aren’t coming close to where they should be,” said Roboostoff. “Since it isn’t practical, you pick the biggest screen and the smallest screen, the newest devices and the oldest devices to lower that risk profile.”

The fragmentation that is continuing to occur at so many levels can cause errors that are difficult to identify and rectify.

“One brand may have four to 10 different codebases, four to 10 product road maps, varying skill sets to accomplish all that, and a multitude of platforms and devices they are building software for that they have to test against,” said Johnston. “Meanwhile, users expect things to operate like a light switch.”

The U.S. government established a standardized approach to security assessment called the Federal Risk and Authorization Management Program (FedRAMP), which is apparently benefitting some software developers and consultants who need to be responsible for their software quality but are not in control of the cloud infrastructure. Octo Consulting Group’s Nare said that FedRAMP’s certification simplifies the testing he would otherwise have to do.

“As the level of abstraction goes up, if you’re only testing the top layer, you have to assume that the lower layers underneath like the infrastructure in the cloud and the PaaS are fundamentally sound so that everything is working the way it’s supposed to,” he said. “When we do security testing today and we test our applications, we don’t certify the whole stack anymore because the cloud service providers have already been certified. Otherwise you might have to write tests at the infrastructure of PaaS level.”

Meanwhile, most organizations are trying to wrap their arms around the breadth of testing and defect resolution practices necessary to deliver Web and mobile applications that provide the scalability, performance, and security customers expect.

“If you’re going to build better quality software faster, you need to make sure that the build actually works,” said Andreas Grabner, technology strategist at Compuware (an IT services company). “The software I write is more complex because it is interacting with things I can’t control.”

And that’s just the current state of Web and mobile development. With the Internet of Things looming, some tool providers expect that mainstream developers will have to write applications for devices other than smartphones, and as a result, system complexity and the related bug- and defect-tracking challenges will increase.

“If you think about the Web and the fragmentation of mobile devices, the complexity has increased by an order of magnitude,” said Johnston. “If you think about wearables or automobiles or smart appliances or smartwatches, it’s going to get exponentially worse.”

There’s no excuse for bad quality

There are many reasons why software quality falls short of user expectations, but the problem is that users don’t want to hear it. Even though every user complaint won’t make it to the top of a backlog, what customers consider “bugs” and “defects” have a nasty habit of making headlines, resulting in seething customer reviews and negatively impacted revenue.

“It’s unacceptable to tell users that you can’t reproduce a bug. These days they have all the cards,” said Johnston. “We live in a world where app quality—functional quality, usability quality, performance quality and security quality—are differentiators, and yet quality is still thought of as a cost center.”

Bug and defect tracking is all about problem-solving, but unfortunately some of the lingering problems aren’t being addressed despite impressive tool advancements because organizations change slower than technology does.

“I can have a product that’s completely bug free and has a great user experience, but if you get no value out of it, the quality is bad,” said Micro Focus’ Roboostoff. “People need to understand quality. It’s not about function; it’s about the customer perception of your product, your brand, and your company.”

 

Big Data: The Interdisciplinary Vortex

As seen in  InformationWeek.

vortexGetting the most from data requires information sharing across departmental boundaries. Even though information silos remain common, CIOs and business leaders in many organizations are cooperating to enable cross-functional data sharing to improve business process efficiencies, lower costs, reduce risks, and identify new opportunities.

Interdepartmental data sharing can take a company only so far, however, as evidenced by the number of companies using (or planning to use) external data. To get to the next level, some organizations are embracing interdisciplinary approaches to big data.

Why Interdisciplinary Problem-Solving May Be Overlooked

Breaking down departmental barriers isn’t easy. There are the technical challenges of accessing, cleansing, blending, and securing data, as well as very real cultural habits that are difficult to change.

Today’s businesses are placing greater emphasis on data scientists, business analysts, and data-savvy staff members. Some of them also employ or retain mathematicians and statisticians, although they may not have considered tapping other forms of expertise that could help enable different and perhaps more accurate forms of data analysis and new innovations.

“Thinking of big data as one new research area is a misunderstanding of the entire impact that big data will have,” said Dr. Wolfgang Kliemann, associate VP for research at Iowa State University. “You can’t help but be interdisciplinary because big data is affecting all kinds of things including agriculture, engineering, and business.”

Although interdisciplinary collaboration is mature in many scientific and academic circles, applying non-traditional talent to big data analysis is a stretch for most businesses.

But there are exceptions. For example, Ranker, a platform for lists and crowdsourced rankings, employs a chief data scientist who is also a moral psychologist.

“I think psychology is particularly useful because the interesting data today is generated by people’s opinions and behaviors,” said Ravi Iyer, chief data scientist at Ranker. “When you’re trying to look at the error that’s associated with any method of data connection, it usually has something to do with a cognitive bias.”

Ranker has been working with a UC Irvine professor in the cognitive sciences department who studies the wisdom of crowds.

“We measure things in different ways and understand the psychological biases each method of data creates. Diversity of opinion is the secret to both our algorithms and the philosophy behind the algorithms,” said Iyer. “Most of the problems you’re trying to solve involve people. You can’t just think of it as data, you have to understand the problem area you’re trying to solve.”

Why Interdisciplinary Problem-Solving Will Become More Common

Despite the availability of new research methods, online communities, and social media streams, products still fail and big-name companies continue to make high-profile mistakes. They have more data available than ever before, but there may be a problem with the data, the analysis, or both. Alternatively, the outcome may fall short of what is possible.

“A large retail chain is interested in figuring out how to optimize supply management, so they collect the data from sales, run it through a big program, and say, ‘this is what we need.’ This approach leads to improvements for many companies,” said Kliemann. “The question is, if you use this specific program and approach, what is your risk of not having the things you need at a given moment? The way we do business analytics these days, that question cannot be answered.”

One mistake is failing to understand the error structure of the data. With such information, it’s possible to identify missing pieces of data, what the possible courses of action are, and the risk associated with a particular strategy.

“You need new ideas under research, ideas of data models, [to] understand data errors and how they propagate through models,” said Kliemann. “If you don’t understand the error structure of your data, you make predictions that are totally worthless.”

Already, organizations are adapting their approaches to accommodate the growing volume, velocity, and variety of data. In the energy sector, cheap sensors, cheap data storage, and fast networks are enabling new data models that would have been impossible just a few years ago.

“Now we can ask ourselves questions such as if we have variability in wind, solar, and other alternative energies, how does it affect the stability of a power system? [We can also ask] how we can best continue building alternative energies that make the system better instead of jeopardizing it,” said Kleinman.

Many universities are developing interdisciplinary programs focused on big data to spur innovation and educate students entering the workforce about how big data can affect their chosen field. As the students enter the workforce, they will influence the direction and culture of the companies for which they work. Meanwhile, progressive companies are teaming up with universities with the goal of applying interdisciplinary approaches to real-world big data challenges.

In addition, the National Science Foundation (NSF) is trying to accelerate innovation through Big Data Regional Innovation Hubs. The initiative encourages federal agencies, private industry, academia, state and local governments, nonprofits, and foundations to develop and participate in big data research and innovation projects across the country. Iowa State University is one of about a dozen universities in the Midwestern region working on a proposal.

In short, interdisciplinary big data problem-solving will likely become more common in industry as organizations struggle to understand the expanding universe of data. Although interdisciplinary problem-solving is alive and well in academia and in many scientific research circles, most businesses are still trying to master interdepartmental collaboration when it comes to big data.

Six Characteristics of Data-Driven Rock Stars

As seen in InformationWeek

Rock starData is being used in and across more functional aspects of today’s organizations. Wringing the most business value out of the data requires a mix of roles that may include data scientists, business analysts, data analysts, IT, and line-of-business titles. As a result, more resumes and job descriptions include data-related skills.

A recent survey by technology career site Dice revealed that nine of the top 10 highest-paying IT jobs require big data skills. On the Dice site, searches and job postings including big data skills have increased 39% year-over-year, according to Dice president Shravan Goli. Some of the top-compensated skills include big data, data scientist, data architect, Hadoop, HBase, MapReduce, and Pig — and the pay range for those skills ranges from more than $116,000 to more than $127,000, according to data Dice provided to InformationWeek.

However, the gratuitous use of such terms can cloud the main issue, which is whether the candidate and the company can turn that data into specific, favorable outcomes — whether that’s increasing the ROI of a pay-per-click advertising campaign or building a more accurate recommendation engine.

If data skills are becoming necessary for more roles in an organization, it follows that not all data-driven rock stars are data scientists. Although data scientists are considered the black belts, it is possible for other roles to distinguish themselves based on their superior understanding and application of data. Regardless of a person’s title or position in an organization, there are some traits common to data-driven rock stars that have more to do with attitudes and behaviors than technologies, tools, and methods. Click through for six of them.  [Note to readers:  This appeared as a slideshow.]

They Understand Data

Of course data-driven rock stars are expected to have a keener understanding of data than their peers, but what exactly does that mean? Whether a data scientist or a business professional, the person should know where the data came from, the quality of it, the reliability of it, and what methods can be used to analyze it, appropriate to the person’s role in the company.

How they use numbers is also telling. Rather than presenting a single number to “prove” that a certain course of action is the right one, a data-driven rock star is more likely to compare the risks and benefits of alternative courses of action so business leaders can make more accurate decisions.

“‘Forty-two’ is not a good answer,” said Wolfgang Kliemann, associate VP for research at Iowa State University. “‘Forty-two, under the following conditions and with a probability of 1.2% chance that something else may happen,’ is a better answer.”

They’re Curious

Data-driven rock stars are genuinely curious about what data indicates and does not indicate. Their curiosity inspires them to explore data, whether toggling between data visualizations, drilling down into data, correlating different pieces of data, or experimenting with an alternative algorithm. The curiosity may be inspired by data itself, a particular problem, or problem-solving methods that have been used in a similar or different context.

Data scientists are expected to be curious because their job involves scientific exploration. Highly competitive organizations hire them to help uncover opportunities, risks, behaviors, and other things that were previously unknown. Meanwhile, some of those companies are encouraging “out of the box” thinking from business leaders and employees to fuel innovation, which increasingly includes experimenting with data. Some businesses even offer incentives for data-related innovation.

They Actively Collaborate with Others

The data value chain has a lot of pieces. No one person understands everything there is to know about data structure, data management, analytical methods, statistical analysis, business considerations, and other factors such as privacy and security. Although data-driven rock stars tend to know more about such issues than their peers, they don’t operate in isolation because others possess knowledge they need. For example, data scientists need to be able to talk to business leaders and business leaders have to know something about data. Similarly, a data architect or data analyst may not have the ability to manipulate, explore, understand, and dig through large data sets, but a data scientist could dig through and discover patterns and then bring in statistical and programming knowledge to create forward-looking products and services, according to Dice president Shravan Goli.

They Try to Avoid Confirmation Bias

Data can be used to prove anything, especially a person’s opinion. Data-driven rock stars are aware of confirmation bias, so they are more likely to try to avoid it. While the term itself may not be familiar, they know it is not a best practice to disregard or omit evidence simply because it differs from their opinions.

“People like to think that the perspective they bring is the only perspective or the best perspective. I’m probably not immune to that myself,” said Ravi Ivey, chief data scientist at Ranker, a platform for lists and crowdsourced rankings. “They have their algorithms and don’t appreciate experiments or the difference between exploratory and confirmatory research. I don’t think they respect the traditional scientific method as such.”

The Data Science Association’s Data Science Code of Professional Conduct has a rule dedicated specifically to evidence, data quality, and evidence quality. Several of its subsections are relevant to confirmation bias. Among them are failing to “disclose any and all data science results or engage in cherry-picking” and failing to “disclose failed experiments or disconfirming evidence known to the data scientist to be directly adverse to the position of the client.”

They Update Their Skill Sets

Technology, tools, techniques, and available data are always evolving. The data-driven rock star is motivated to continually expand his or her knowledge base through learning, which may involve attending executive education programs, training programs, online courses, boot camps, or meetups, depending on the person’s role in the company.

“I encourage companies to think about growing their workforce because there aren’t enough people graduating with data science degrees,” said Dice president Shravan Goli. “You have to create a pathway for people who are smart, data-driven, and have the ability to analyze patterns so they have to add a couple more skills.”

Job descriptions and resumes increasingly include more narrowly defined skills because it is critical to understand which specific types of big data and analytical skills a candidate possesses. A data-driven rock star understands the technologies, tools, and methods of her craft as well as when and how to apply them.

They’re Concerned About Business Impact

With so much data available and so many ways of analyzing it, it’s easy to get caught up in the technical issues or the tasks at hand while losing site of the goal: using data in a way that positively impacts the business. A data-driven rock star understands that.

Making a business impact requires three things, according to IDC adjunct research adviser Fred McGee: having a critical mass of data available in a timely manner, using analytics to glean insights, and applying those insights in a manner that advances business objectives.

A data-driven rock star understands the general business objectives as well as the specific objective to which analytical insights are being applied. Nevertheless, some companies are still falling short of their goals. Three-quarters of data analytics leaders from major companies recently told McKinsey & Company that, despite using advanced analytics, their companies had improved revenue and costs by less than 1%.

Pitch Closes That May Not Help You

smiley-1041796_640As a journalist, I’m pitched constantly.  I’d say that 20 percent of the pitches I get are good and perhaps 5 percent are excellent.  How would I know?  Lots of journalism experience and lots of PR experience.

Interestingly, whether a pitch gets a response or not can boil down to a few words.

“If you’re interested in X let me know.”  I don’t need to respond, then, with your permission.  If the close had been different, I probably would have said, “You should try pitching X instead.”  Likely, this person will follow up and ask if I got their pitch.  Yup.

The same close is often posed as a question:  “Are you interested?”  This is an easy one to answer most of the time because the answers are binary (yes or no).  These are so easy to say “no” to without explanation.

I guess my issue with all of this is the PR person doesn’t understand why they’re getting no response or curt responses, neither of which feel good.  I understand.  I spent a lot of years as a PR pro and PR exec, and I know how frustrating pitching can be.   OTOH, when you’re sorting through a pile of pitches, we can and will choose the path of least resistance whenever possible.

What Your PR Client Should NOT Do

Bulldozer

Runaway clients can hurt coverage

Media interviews are an interesting thing to “manage.” There are clients who just want you to set up interviews, clients who value your involvement and guidance, and clients who are like helium balloons that just lost their strings.

Every now and then, even the best clients can get a little out of control, because they’re so passionate.  Passion is fine, but when it gets to the point of bulldozing an interview, it’s time for media training.

Why Bulldozing is a Bad Thing

I’m one of those journalists who prepares for interviews.  I have a set of questions I develop for a story because somebody is going to ask me for one and I need to define the scope of the interviews.  Sometimes I have to improvise when I’m interviewing which is fine, but when the whole interview is off-script, it may cause problems for everyone involved.

Sometimes I can’t get a word in, let alone a question, if it’s a telephone interview (which is very rare these days).  If it’s an email response, I’ll read through it, but…

Why I Have a “Script”

I develop a list of questions for every set of interviews I do.  I’m happy to send them in advance when requested, but I tend not to send them as a matter of course.  Occasionally, whether or not the interviewee has the questions in advance, that person will say, “I know you want to cover this, but…[I’ve decided the angle of the story should be something else]” or “I’ve looked at your questions and [I’m going to ignore them].”  Then they wonder why they’re not included in the story, or why the other guy was quoted multiple times.

There are several answers to to these types of queries which are:

  • The content was irrelevant
  • The content was difficult or impossible to use given its lack of structure
  • The content doesn’t dovetail well with other conversations
  • It’s just too much work to use

An important thing to know is: I write on assignment.  That means an editor says, “write this,” or I pitch an idea, and that’s what I’m expected to deliver.

The Good News

The good news is that most interviewees have figured out that the best way to conduct interviews is to answer the questions asked, directly.  It’s fine to give examples, cite use cases, or use analogies as supplementary material as long as the content is relevant to the angle of the story.  If they respond to questions  in a relevant manner, their chances of being included in a story or getting more coverage than they would otherwise get can improve significantly.

I do my best to include everyone I interview, but it’s not always possible.  I am happy to explain the situation to the PR rep, if asked.  After all, I spent many years sitting on that side of the desk.

Thankfully for all of us, the bulldozers are few and far between.  If your client is one of them, you’re wise to explain why bulldozing isn’t wise.  It will help you better manage client expectations down the line.

How Corporate Culture Impedes Data Innovation

As seen in InformationWeek

Floppy disk

Corporate culture moves slower than tech

Competing in today’s data-intensive business environment requires unprecedented organizational agility and the ability to drive value from data. Although businesses have allocated significant resources to collecting and storing data, their abilities to analyze it, act upon it, and use it to unlock new opportunities are often stifled by cultural impediments.

While the need to update technology may be obvious, it may be less obvious that corporate cultures must also adapt to changing times. The necessary adjustments to business values, business practices, and leadership strategies can be uncomfortable and difficult to manage, especially when they conflict with the way the company operated in the past.

If your organization isn’t realizing the kind of value from its big data and analytics investments that it should be, the problem may have little to do with technology. Even with the most effective technologies in place, it’s possible to limit the value they provide by clinging to old habits.

Here are five ways that cultural issues can negatively affect data innovation:

1. The Vision And Culture Are At Odds

Data-driven aspirations and “business as usual” may well be at odds. What served a company well up to a certain point may not serve the company well going forward.

“You need to serve the customer as quickly as possible, and that may conflict with the way you measured labor efficiencies or productivity in the past,” explained Ken Gilbert, director of business analytics at the University of Tennessee Office of Research and Economic Development, in an interview with InformationWeek.

[ What matters more: Technology or people? Read Technology Is A Human Endeavor. ]

Companies able to realize the most benefit from their data are aligning their visions, corporate mindsets, performance measurement, and incentives to effect widespread cultural change. They are also more transparent than similar organizations, meaning that a wide range of personnel has visibility into the same data, and data is commonly shared among departments, or even across the entire enterprise.

“Transparency doesn’t come naturally,” Gilbert said. “Companies don’t tend to share information as much as they should.”

Encouraging exploration is also key. Companies that give data access to more executives, managers, and employees than they did in the past have to also remove limits that may be driven by old habits. For example, some businesses discourage employees from exploring the data and sharing their original observations.

2. Managers Need Analytics Training

Companies that are training their employees in ways to use analytical tools may not be reaching managers and executives who choose not to participate because they are busy or consider themselves exempt. In the most highly competitive companies, executives, managers, and employees are expected to be — or become — data savvy.

Getting the most from BI and big data analytics means understanding what the technology can do, and how it can be used to best achieve the desired business outcomes. There are many executive programs that teach business leaders how to compete with business analytics and big data, including the Harvard Business School Executive Education program.

3. Expectations Are Inconsistent

This problem is not always obvious. While it’s clear the value of BI and big data analytics is compromised when the systems are underutilized, less obvious are inconsistent expectations about how people within the organization should use data.

“Some businesses say they’re data-driven, but they’re not actually acting on that. People respond to what they see rather than what they hear,” said Gilbert. “The big picture should be made clear to everybody — including how you intend to grow the business and how analytics fits into the overall strategy.”

4. Fiefdoms Restrict Data Sharing

BI and analytics have moved out from the C-suite, marketing, and manufacturing to encompass more departments, but not all organizations are taking advantage of the intelligence that can be derived from cross-functional data sharing. An Economist Intelligence Unit survey of 530 executives around the world revealed that information-sharing issues represented the biggest obstacle to becoming a data-driven organization.

“Some organizations supply data on a need-to-know basis. There’s a belief that somebody in another area doesn’t need to know how my area is performing when they really do,” Gilbert said. “If you want to use data as the engine of business growth, you have to integrate data from internal and external sources across lines, across corporate boundaries.”

5. Little-Picture Implementations

Data is commonly used to improve the efficiency or control the costs of a particular business function. However, individual departmental goals may not align with the strategic goal of the organization, which is typically to increase revenue, Gilbert said.

“If the company can understand what the customer values, and build operational systems to better deliver, that is the company that’s going to win. If the company is being managed in pieces, you may save a dime in one department that costs the company a dollar in revenue.”

Why Great Pitches Don’t Make the Cut

I and some very talented PR pros are absolutely anguished that we can’t work together on my latest story.  Do I care?  Yes.  Is there anything I can do about it?  No and yes.

I can get well over 100 responses to a HARO query I post.  I get more than 50 often.  The statistics shouldn’t discourage you; they’re meant to give you some insight into why great pitches sometimes don’t make the cut.

Often, it’s a matter of timing.  My queries are designed to start at 5:35 a.m. ET and end at 7:00 pm ET.  Because I’m on PT, I start receiving pitches while I’m dreaming and throughout my work day.

HARO sends PR pitches in batch, which a lot of PR people don’t know.  That means, I’ll get no emails for an hour or two, or maybe more and than BAM!  I’ll get lots of them, all of a sudden.

The batch process is frustrating for both of us.  I sometimes wonder whether I’m going to get enough of the right sources for a story, and usually I end up with way too many.  Almost without fail, some of the best pitches come later in the day.  The pitch itself may not be top-notch, but the client is.  Alternatively, the pitch is excellent.  It hits all the points, includes interesting information and all that.  Ultimately, we’re both anguished because I have too many sources.  This current story has 16 which is waaay to many.  It takes a lot of creativity to fit that many people into a story, and a lot of fact-checking.  At some point, I just have to say “no,” when I really want to say “yes.”

The good news is, I can say “yes” to something.  That’s explaining the situation to the person and leaving the door open for future pitches.  People seem to appreciate that, which makes it worth the effort.

So, to recap:  Your pitch will have a better chance of succeeding if it’s relevant and timely.

Believe me, I understand why pitches are late:  You’ve been busy with other things, you needed to talk to your client, whatever.  I get it and I don’t fault you for it.

If you have any complaints about HARO, I’d love to hear them.  I’m not out to bash HARO.  I just want to understand what’s driving you nuts on the other side of the system.

Keep up the good work.

Hadoop is Now a General-purpose Platform

As seen in SD Times

HadoopApache Hadoop adoption is accelerating among enterprises and advanced computing environments as the project, related projects, and ecosystem continue to expand. While there were valid reasons to avoid the 1.x versions, skeptics are reconsidering since Hadoop 2 (particularly the latest 2.2.0 version) provides a viable choice for a wider range of users and uses.

“The Hadoop 1.x generation was not easy to deploy or easy to manage,” said Juergen Urbanski, former chief technologist of T-Systems, the IT consulting division of Deutsche Telecom. “The many moving parts that make up a Hadoop cluster were difficult for users to configure. Fortunately, Hadoop 2 fills in many of the gaps. Manageability is a key expectation, particularly for the more critical business use cases.”

Hadoop 2.2.0 adds the YARN resource-management framework to the core set of Hadoop modules, which include the Hadoop Common set of utilities, the Hadoop Distributed File System (HDFS), and Hadoop MapReduce for parallel processing. Other improvements include enhancements to HDFS, binary compatibility for Map/Reduce applications built on Hadoop 1.x, and support for running Hadoop on Windows.
Meanwhile, Hadoop-related projects and commercial products are proliferating along with the ecosystem. Collectively, the new Hadoop capabilities provide a more palatable and workable solution, not only for enterprise developers, business analysts and IT, but also a larger community of data scientists.

“There are many technologies that are helping Hadoop realize its potential as being a more general-purpose platform for computing,” said Doug Cutting, co-creator of Hadoop. “We started out as a batch processing system. People used it to do computations on large data sets that they couldn’t do before, and they could do it affordably. Now there’s an ever-increasing amount of data processing that organizations can do using this one platform.”

YARN expands the possibilities
The limitations of Map/Reduce were the genesis of Apache Hadoop NextGen MapReduce (a.k.a. YARN), according to Arun Murthy, release manager for Hadoop 2.

“It was apparent as early as 2008 that Map/Reduce was going to become a limiting factor because it’s just one algorithm,” he said. “If you’re trying to do things like machine learning and modeling, Map/Reduce is not the right algorithm to do it.”

Rather than replacing Map/Reduce altogether, it was supplemented with YARN to provide things like resource management and fault tolerance as base primitives in the platform, while allowing end users to do different things as they process and track the data in different ways.

“The architecture had to be more general-purpose than Map/Reduce,” said Murthy. “We kept the good parts of Map/Reduce, such as scale and simple APIs, but we had to allow other things to coexist on the same platform.”

The original Hadoop MapReduce was based on the Google Map/Reduce paper, while Hadoop HDFS was based on the Google File System paper. HDFS provides a mechanism to store huge amounts of heterogeneous data cheaply; Map/Reduce enables highly efficient parallel processing.

“Map/Reduce is a mature concept that comes from LISP and functional programming,” said Murthy. “Google scaled Map/Reduce out in a massive way while keeping a real simple interface for the end user so the end user does not have to deal with the nitty-gritty details of scheduling, resource management, fault tolerance, network partitions, and other crazy stuff. It allowed the end user to just deal with the business logic.”

Because YARN is an open framework, users are free to use algorithms other than Map/Reduce. In addition, applications can run on and integrate with it.

“The scientific and security computing communities depend on Open MPI technologies, which weren’t even an option in Hadoop 1,” said Edmon Begoli, CTO of analytics consulting firm PYA Analytics. “The architecture of Hadoop 2 and YARN allows you to plug in your own resource manager and your own parallel processing algorithms. People in the high-performance computing community have been talking about YARN enthusiastically for years.”

HDFS: Aspirin for other headaches
Some CIOs have been reluctant to bring Hadoop into the enterprise because there have been too many barriers to entry, although Hadoop 2 improvements are turning the tide.

“I think two of the deal breakers were NameNode federation and the Quorum Journal Manager, which is basically a failover for the HDFS NameNode,” said Jonathan Ellis, project chair for Apache Cassandra. “Historically, if your NameNode went down, you were basically screwed because you’d lose some amount of data.”

Hadoop 2 introduces the Quorum Journal Manager, where changes to the NameNodes are recorded to replicated machines to avoid data loss, he said. NameNode federation allows a pool of NameNodes to share responsibility for an HDFS cluster.

“NameNode federation is a bit of a hack because each NameNode still only knows about the file set it owns, so at the client level you have to somehow teach the client to look for some files on one NameNode and other files on another NameNode,” said Ellis.

HDFS is nevertheless an economically feasible way to store terabytes or even petabytes of data. Facebook has a single cluster that stores more than 100PB on Hadoop, according to Murthy.

“It’s amazing how much data you can store on Hadoop,” he said. “But you have to interact with the data, interrogate it, and come up with insights. That’s where YARN comes in. Now you have a general-purpose data operating system, and on top of it you can run applications like Apache Storm.”

John Haddad, senior director of product marketing at Informatica, said the Hadoop 2 improvements allow his organization to run more types of applications and workloads.

“Various teams can run a variety of different applications on the cluster concurrently,” he said. “Hadoop 1 lacked some of the security, high availability and flexibility necessary to have different applications, different types of workloads, and more than one organization or team submitting jobs to the cluster.”

Gearing up for prime time
The number and types of Hadoop open-source projects and commercial offerings are expanding rapidly. Hadoop-related projects include HBase, a highly scalable distributed database; the Hive data warehouse infrastructure; the Pig language and framework for parallel computing; and Ambari, which provisions, manages and monitors Apache Hadoop clusters.

“It seems like we’ve got 20 or 30 new projects every week,” said Cutting. “We have all these separate, independent projects that work together, so they’re interdependent but under separate control so the ecosystem can evolve.”

Meanwhile, solution providers are building products for or integrating their products with Hadoop. Collectively, Hadoop improvements, open-source projects and compatible commercial products are allowing organizations to tailor it to their needs, rather than having to shoehorn what they are doing into a limited set of capabilities. And the results are impressive.

For example, Oak Ridge National Laboratory used Hadoop to help the Center for Medicare and Medicaid Services identify tens of millions of dollars in overpayments and fraudulent transactions in just three weeks.

“Using only two or three engineers, we were able to approach and understand the data from different angles using Hive on Hadoop because it allowed us to write SQL-like queries and use a machine-learning library or run straight Map/Reduce queries,” said PYA Analytics’ Begoli. “In the traditional warehousing world, the same project would have taken months unless you had a very expensive data warehouse platform and very expensive technology consulting resources to help you.”

The groundswell of innovation is enabling Hadoop to move beyond its batch-processing roots to include real-time and near-real-time analytics.

 

The groundswell of innovation is enabling Hadoop to move beyond its batch-processing roots to include real-time and near-real-time analytics.

Skeptics are doing a double take
Hadoop 2 is converting more skeptics than Hadoop 1 because it’s more mature, it’s easier (but not necessarily easy) to implement, it has more options, and its community is robust.

“You can bring Hadoop into your organization and not worry about vendor lock-in or what happens if the provider disappears,” said Murthy. “We have contributions from about 2,000 people at this point.”

There are also significant competitive pressures at work. Organizations that have adopted Hadoop are improving the effectiveness of things like fraud detection, portfolio management, ad targeting, search, and customer behavior by combining structured and unstructured data from internal and external sources that commonly include social networks, mobile devices and sensors.

“We’re seeing organizations start off with basic things like data warehouse optimization, and then move on to other cool and interesting things that can drive more revenue from the company,” said Informatica’s Haddad.

For example, Yahoo has been deploying YARN in production for a year, and the throughput of the YARN clusters has more than doubled. According to Murthy, Yahoo’s 35,000-node cluster now processes 130 to 150 jobs per day versus 50 to 60 before YARN.

“When you’ve got 2x over 35,000 to 40,000 nodes, that’s phenomenal,” he said. “It’s a pretty compelling story to tell a CIO that if you just upgrade your software from Hadoop 1 to Hadoop 2, you’ll see 2x throughput improvements in your jobs.”

Of course, Hadoop 2.2.0 isn’t perfect. Nothing is. And some question what Hadoop will become as it continues to evolve.

Hadoop co-creator Cutting said the beauty of Hadoop as an open-source project is that new things can replace old things naturally. That prospect somewhat concerns PYA Analytics’ Begoli, however.

“I’m concerned about the explosion of frameworks because it happened with Java and it’s happening with JavaScript,” he said. “When everyone is contributing something, it can be too much or the original vision can be diluted. On the other hand, a lot of brilliant teams are contributing to Hadoop. There are management tools, SQL tools, third-party tools and a lot of other things that are being incubated to deliver advanced capabilities.”

While Hadoop’s full impact has yet to be realized, Hadoop 2 is a major step forward.

Well-known Hadoop implementations

Amazon Web Services: Amazon Elastic MapReduce uses Hadoop in order to provide a quick, easy and cost-effective way to distribute and process large amounts of data across a resizable cluster of Amazon EC2 instances. It can be used to analyze click-stream data, process vast amounts of genomic data and other large scientific data sets, and process logs generated by Web and mobile applications.

 

Six Ways to Master the Data-Driven Enterprise

As seen in InformationWeek.

StatisticsBig data is changing the way companies and industries operate. Although virtually all businesses acknowledge the trend, not all of them are equally prepared to meet the challenge. The companies in the best position to compete have transformed themselves into “data-driven” organizations.

Data-driven organizations routinely use data to inform strategy and decision-making. Although other businesses share the same goal, many of them are still struggling to build the necessary technological capabilities, or otherwise their culture is interfering with their ability to use data, or both.

Becoming a data-driven organization isn’t easy, however. In fact, it’s very difficult. While all organizations have a glut of data, their abilities to collect it, cleanse it, integrate it, manage it, access it, secure it, govern it, and analyze it vary significantly from company to company. Even though each of these factors helps ensure that data can be used with higher levels of confidence, it’s difficult for a business to realize the value of its data if its corporate culture lags behind its technological capabilities.

Data-driven organizations have extended the use of data across everyday business functions, from the C-suite to the front lines. Rather than hoping that executives, managers, and employees will use business intelligence (BI) and other analytical tools, companies that are serious about the use of data are training employees, making the systems easier to use, making it mandatory to use the systems, and monitoring the use of the systems. Because their ability to compete effectively depends on their ability to leverage data, such data-driven organizations make a point of aligning their values, goals, and strategies with their ability to execute.

On the following pages we reveal the six traits common to data-driven organizations that make them stand out from their competitors.

Forward Thinkers

Data-driven enterprises consider where they are, where they want to go, and how they want to get there. To ensure progress, they establish KPIs to monitor the success of business operations, departments, projects, employees, and initiatives. Quite often, these organizations have also established one or more cross-functional committees of decision-makers who collectively ensure that business goals, company practices, and technology implementations are in sync.

“The companies that have integrated data into their business strategies see it as a means of growing their businesses. They use it to differentiate themselves by providing customers with better service, quicker turnaround, and other things that the competition can’t meet,” said Ken Gilbert, director of business analytics at the University of Tennessee’s Office of Research and Economic Development, in an interview with InformationWeek. “They’re focused on the long-term and big-picture objectives, rather than tactical objectives.”

Uncovering Opportunities

Enterprises have been embracing BI and big data analytics with the goal of making better decisions faster. While that goal remains important to data-driven enterprises, they also are trying to uncover risks and opportunities that may not have been discoverable previously, either because they didn’t know what questions to ask or because previously used technology lacked the capability.

According to Gartner research VP Frank Buytendijk, fewer than half of big data projects focus on direct decision-making. Other objectives include marketing and sales growth, operational and financial performance improvement, risk and compliance management, new product and service innovation, and direct or indirect data monetization.

Hypothesis Trumps Assumption

People have been querying databases for decades to get answers to known questions. The shortcoming of that approach is assuming that the question asked is the optimal question to ask.

Data-driven businesses aim to continuously improve the quality of the questions they ask. Some of them also try to discover, through machine learning or other means, what questions they should be asking that they have not yet asked.

The desire to explore data is also reflected in the high demand for interactive self-service capabilities that enable users to adjust their thinking and their approaches in an iterative fashion.

Pervasive Analytics

Data analytics has completely transformed the way marketing departments operate. More departments than ever are using BI and other forms of analytics to improve business process efficiencies, reduce costs, improve operational performance, and increase customer satisfaction. A person’s role in the company influences how the data is used.

Big data and analytics are now on the agendas of boards of directors, which means that executives not only have to accept and support the use of the technologies, they also have to use them — meaning they have to lead by example. Aberdeen’s 2014 Business Analytics survey indicated that data-driven organizations are 63% more likely than the average organization to have “strong” or “highly pervasive” adoption of advanced analytical capabilities among corporate management.

Failure Is Acceptable

Some companies encourage employees to experiment because they want to fuel innovation. With experimentation comes some level of failure, which progressive companies are willing to accept within a given range.

Encouraging exploration and accepting the risk of failure that accompanies it can be difficult cultural adjustments, since failure is generally considered the opposite of success. Many organizations have made significant investments in big data, analytics, and BI solutions. Yet, some hesitate to encourage data experimentation among those who are not data scientists or business analysts. This is often because, historically, the company’s culture has encouraged conformity rather than original thinking. Such a mindset not only discourages innovation, it fails to acknowledge that the failure to take risks may be more dangerous than risking failure.

Data Scientists And Machine Learning

Data-driven companies often hire data scientists and use machine learning so they can continuously improve their ability to compete. Microsoft, IBM, Accenture, Google, and Amazon ranked first through fifth, respectively, in a recent list of 7,500 companies hiring data scientists. Google, Netflix, Amazon, Pandora, and PayPal are a few examples of companies using machine learning with the goal of developing deeper, longer-lasting, and more profitable relationships with their customers than previously possible.

« Older posts Newer posts »