Bringing Data Replication to Hadoop

One of the more challenging things about Hadoop from a solution provider perspective is figuring out how to make money on an open source platform that goes beyond merely selling the cluster that the big data framework is running on.

Michael Vizard

September 9, 2015

2 Min Read
Bringing Data Replication to Hadoop

One of the more challenging things about Hadoop from a solution provider perspective is figuring out how to make money on an open source platform that goes beyond merely selling the cluster that the big data framework is running on.

With that goal in mind WANdisco has begun letting customers have access to its replication software for Hadoop environments for up to 14 days free.

James Campigli, co-founder and chief product officer for WANdisco, said deploying Hadoop is one thing, but trying to manage multiple instances of Hadoop in production environments is quite another. As such, it’s only a matter of time before organizations need to start finding efficient ways to move data between one instance of Hadoop and another, he said.

Much of that demand for data replication between Hadoop environments will be driven by different use cases for Hadoop. In some cases Hadoop is being adopted as a central data lake from which all applications eventually will drink. In those instances, Hadoop is essentially providing applications with access to a universal file systems. However, there also are many scenarios where Hadoop is being used as the foundation for modern data warehouse applications that run directly on top of Hadoop. As more of those applications get distributed inside and out of the cloud, Campigli said there will be a need a greater need to replicate data between instances of Hadoop using WANdisco Fusion Enterprise Edition for Hadoop.

Designed to make it possible to replicate data between Hadoop clusters using a wide area network (WAN), WANdisco Fusion Enterprise Edition makes sure that all the all Hadoop servers and clusters are fully readable and writeable, and that they are always in sync and recover automatically from each other after planned or unplanned downtime. There are no read-only backup servers and clusters that would be used when the primary active cluster goes offline.

Since launching WANdisco Fusion Enterprise Edition for Hadoop, WANdisco has been laying the groundwork for channel partners looking to build a big data practice around WANdisco software that is sold on a subscription basis, Campigli said.

The biggest issue potential WANdisco partners might face, however, is that most IT organizations today assume that replicating anything involving big data simply isn’t feasible—a perception that WANdisco is actively trying to counter by making the software available for free for two weeks. Whether that’s enough time to make a believer out an IT organization remains be seen. But from the perspective of at least getting the customer to listen to what might be possible across distributed Hadoop environments, it’s certainly not a bad place to get started.

Read more about:

AgentsMSPsVARs/SIs

About the Author

Michael Vizard

Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like