Utilizing big data has many benefits. These include reducing costs, improving productivity, and informing key decisions. Studies show that Fortune 1000 companies can gain more than $65 million additional net income when increasing their data accessibility by just 10%. 

As the world becomes increasingly digitised, large companies see larger data pools and face growing problems effectively using big data. At this point, what needs to be done is to extract key information hidden in the data by making an accurate analysis – something like looking for a needle in a haystack. Intimidated? Well, you don’t have to be. We will be guiding you through how we used the Hadoop ecosystem to address a specific big data problem.

The problem: Our customers who use portfolio builders create their own financial portfolios by using stock data. This stock data is updated daily by another API. At the beginning of the project, there was no problem as our data size was relatively manageable. However, once we added mutual funds with ETFs, the data size and volume increased. As a result, performance noticeably decreased in the PostgreSQL database. Thus, we thought of trying big data tools to remedy this problem.

For us, using big data as a solution was broken down into 3 parts. First of all, we chose to use the Hadoop Distributed File System (HDFS) as data storage. Secondly, we used Sqoop to transfer the data from PostgreSQL to HDFS. After all the data was ready, we experimented using Hive and HBase with queries.

First step: Solving the Storage Problem

We needed a storage infrastructure designed specifically to store, manage and retrieve massive amounts of data or big data. These big data storage infrastructures enable the storing and sorting of data so that it is easily accessed, used, and processed by applications and services.

HDFS: In Hadoop applications, HDFS is the main data storage system and represents a distributed file system that offers access to application data for high throughput. It is part of the big data environment and offers a way for vast quantities of structured and unstructured data to be handled. To handle the computational load, HDFS distributes the processing of massive data sets over low-cost computer clusters. One thing to bear in mind is that HDFS is not suitable for real-time processing. If you have such a need, the final topic of this article on the HBase database will be of help to you.

We have two tips for using the HDFS system. First of all, spend time understanding the system and become familiar with the data. Following this, it is essential to understand what your company needs and expects from the operation. Once these two check boxes have been ticked, the only thing left is to prepare the necessary environments and move the data to HDFS. Companies usually undergo this shift when they are running batch processing. 

The screen below illustrates a single node cluster configuration for data node and name node data saving. YARN is a major component of Hadoop and allows data to be processed through the various procedures stored in HDFS. As all processes should be tested to make sure they work, we ran the YARN and HDFS systems separately on the platform. Below is an illustration of the process.

Next step: Data Ingestion into New Environment

The next step is to transfer the data to the Hadoop data lake. These transfers can be made in real-time or in batches. 

Sqoop: When you are ready to conduct data analysis, Sqoop helps you transfer the data to the Hadoop environment. Sqoop is an open-source tool that allows you to ingest data from many different databases into HDFS. It also can export data from HDFS back into an external database like Oracle or MSSQL. 

Many companies use a Relatable Database Management System (RDBMS) for daily transactions such as customer movements. This is a sample Sqoop script that we have used to transfer over a 75million records from PostgreSQL to HDFS. This script can be tailored to your company’s needs and can be used for different analyses by transferring newly incoming stock data from any RDBMS database to the Hadoop environment.

You can use the code blog below to transfer your local system data to the Hadoop environment.

Final Step: Performance Comparison 

We tend to use the PostgreSQL database as a structure and we detail here our experiences during some trials. While we did not utilise complex queries, there were still some delays spanning 2-3 seconds to 7-10 seconds. 

Hive:  Hive provides easy, familiar batch processing for Apache Hadoop and uses current Structure Query Language (SQL) competencies to conduct batch queries on data stored in Hadoop. Queries are written using HiveQ, a SQL-like language, and executed via MapReduce or Apache Spark. This makes it easy for more users to process and analyze infinite quantities of data making Hive the most useful for data preparation, ETL, and data mining.

Hive enables companies that have their data files in HDFS to be a significant source of SQL queries. We can leverage Hive to tackle Hadoop data lakes and connect them to BI tools (like OracleBI or Tableau) for visibility.

Here are the steps you need to take to use Hive after uploading the files to HDFS. First of all, you need to create a table. Following this, you will connect the table with the file extension on HDFS. The images below illustrate these two steps. 

After the table has been connected, we can easily filter and pre-process our file on HDFS by accessing it via Hive.

HBase: Apache HBase is a non-relational, column-oriented database management system operating on HDFS and supports jobs via MapReduce. Being column-oriented means that each column in the system is a contiguous unit of page. An HBase column represents an object attribute; if the table stores diagnostic logs from servers in your setting, each row may be a log record, and a regular column may be the timestamp of when the log record was written. The column could also represent the name of the server from which the record originated. HBase also supports other high-level languages for data processing. HBase is suitable for your current process if you don’t need a relational database and require quick access to data.


As we mentioned before, HBase does not store files internally. Hence, we need to connect directly to HDFS and transfer the stored files into HBase. You can refer to the sample code blog we have used below to initiate the transfer. Don’t forget to create a table in HBase before doing so.

In HBase, there are no data types; data is stored as byte arrays in the HBase table cells. When the value is stored in the cell, the content or value is distinguished by the timestamp. This means that every cell in the HBase table can contain multiple data versions. In the picture below, you can see how HBase has stored our data. A key assigns values for each column when given a date, and the rows are sorted according to row keys.

Results: When we analyzed the historical data, Hive gave us faster performances. However, when users wanted to see the stock data they were filtering instantly, PostgreSQL was faster here. Hive loses a lot of time preparing to run map-reduce, so it is only used in the historical batch analysis. Thus, it is not suitable for Online Transaction Procession (OLTP). 

Once we tested HBase performance over PostgreSQL, we saw some performance improvement, but it failed to satisfy. When processing a small amount of data, all other nodes are left idle, and only a single node is utilized. Petabytes of data must be stored in this distributed environment to use HBase effectively. Since we do not have such a large data pool and prefer an official SQL structure, we chose not to proceed with the HBase.


In this walkthrough, we have illustrated how Bambu utilized big data tools to solve a problem we were facing. We hope that this demystifies your impression of big data tools and has given you insight into effectively deploying them. 

We have also shown that there is more than one data processing tool in the Hadoop environment. To determine which tool to use, you need to first look at your data and focus on your problem. When the appropriate big data tool is chosen, data processing is made much more accessible. 

Even though many industries have embraced digital platforms, we see that the wealth management industry is still hesitant to undergo a digital transformation. Forbes has reported on a study that found that only 16% of US and Canadian banks employ fully digital verification tools for their customers to open an account online securely. When considering how the wealth management landscape is changing, this hesitancy in adopting digital platforms is highly concerning. More regulations are being imposed on wealth managers in recent years, giving them less freedom and time to advise clients. Furthermore, clients are becoming increasingly used to digital interaction and expect wealth managers to provide such platforms and opportunities. These conditions result in a growing dissatisfaction amongst clients, which necessitates change. Why is digital transformation in the wealth management industry occurring at a slow pace? Let’s look into the challenges that firms face, causing them to hesitate. 


A significant reason for this hesitation comes from the daunting task of cultural transformation. Yann Charraie, Managing Director of One Wealth Place, shares on episode 28 of our podcast how company size can be a considerable factor influencing the adoption of digital technology. Yann believes that digital transformation cannot take place without a cultural shift within the company. Their successful legacy and sedimented methods cause them to be resistant to change for large and established financial institutions. Furthermore, due to the sheer number of subsidiaries large institutions have, it can be challenging to implement a cultural shift across the entire company. This cultural transformation and getting employees on the same page is thus a daunting task that larger companies face, slowing down the pace of digital adoption. Cornerstone has released a report supporting this, highlighting that bank executives do not have a homogenous understanding of digital transformation. Therefore, there are often misconceptions regarding how far along their institutions are when implementing digital solutions. These different levels of understanding result in friction within the company and further slow down the adoption of digital platforms. 


Beyond the challenges of cultural transformation, the mindset that financial institutions have has slowed down the adoption of digital platforms. Debbie Watkins, CEO and co-founder of Lucy, shares on episode 19 of WealthTech Unwrapped about how large financial institutions are reluctant to understand their customers’ challenges. This causes them to be stuck in their ways, relying on archaic practices even though their customers seek alternative methods to manage their wealth. 


Within this terrain of friction and hesitation, financial institutions can alleviate much of this by partnering up with Fintech firms. While digital transformation is intimidating and challenging, Fintech firms can assist with onboarding the digital platforms, freeing wealth managers up to help their clients. However, some misconceptions about Fintech firms within the industry are dissuading this mutually beneficial partnership. 


Misconception 1: Fintechs only work with loans and transactions 

A common misconception about Fintechs is that they only work within the narrow fields of lending and payment. This is far from the truth as Fintechs work in other areas such as investment planning and insurance. Furthermore, Fintechs offer an array of different products and services which can help value-add the operations of financial institutions. 


To provide an example, Bambu has partnered with Vestwell, and by leveraging our wealth management APIs, Vestwell can offer personalised investment strategies. This helps their clients better prepare for retirement based on actionable retirement goals that they can work towards. Read more on this partnership here.


Misconception 2: Fintechs only influence large markets 

Fintechs have been accused of only targeting large markets such as the US, Europe, or China. While the Fintech scene in these regions is booming, this does not mean that Fintech has no influence elsewhere. On the contrary, Fintech is everywhere and has embedded itself in every aspect of our lives.


On an episode of our Wealth Tech Unwrapped podcast, Oscar Decotelli, CEO of DXA invest shares about how he is trying to change the negative perception of South America by enabling his customers to invest in South American companies. Through DXA’s digital platform, everyone can participate and invest in these companies regardless of their level of wealth. This is but one example of the influence that Fintechs can have in markets all over the world.


Digital Transformation Made Simple

In addressing these misconceptions, we hope to have shed some light on Fintech as an industry and put some concerns to rest. Collaborating with Fintechs can alleviate many pain points that wealth management firms have when implementing digital technology. When you partner with Bambu, you can leave the tech to us. With numerous projects completed and many satisfied clients, we’ve shown that digital transformation doesn’t have to be that complicated. Contact us at sales@bambu.co to find out how we can help you embark on your digital transformation journey. 

Robo-advisors are very much the new kids on the block in the realm of wealth management. CNB reports how analysts have predicted Robo-advisory to grow into a $1.2 trillion industry by 2024. With many eyes turned towards Robo-advisory, concerns have been raised about the dwindling role of human advisors. Historically, offering financial advice has been left to human advisors. However, discussions around the rise of Robo-advisors and how they might one day replace human advice have resulted in the bifurcation of these two modes of advisory. We believe that this is a false binary, and rather than replacing humans, technology is here to enhance. Hybrid models are the most common model deployed to optimise the quality of advisory services. These models utilise a combination of human and digital capabilities, highlighting how both modes of the advisory can be used to support one another. According to research done by Accenture, there is also user demand as clients prefer using hybrid models to manage their finances. Let us dive into why this is the case and how exactly hybrid models operate. 


Hybrid Models – Why you should use them

Today’s hybrid models are characterised by a digital platform used by the client, alongside a human advisor who provides the necessary support and information. Under this model, clients will reach out to their financial advisors for support when facing any difficult financial decisions. Tobias Henry writes in “The WealthTech Book” about how in its most basic form, the hybrid model combines the prime components of human-based advice with digital advice. This harmony offers a flexible and tailored wealth management solution to clients of all demographics. The hybrid model is also highly beneficial for financial advisors as the digital component of this approach increases the advisor’s scalability. With digital technology taking care of the laborious and time-consuming backend work, the financial advisor is now able to attract and serve more clients while maintaining high-quality service.

In an episode of Wealthtech Unwrapped, Sam Beeby shares how hybrid models are imperative in this digital age. Sam notes that standalone Robo-advisors are yet able to offer holistic lifetime advice. As a result, a large proportion of Robo-advisory users are those confident enough to manage their finances. April Rudin, Founder and President of the Rudin Group, supports this when she shares how ultra-high net worth boomers have the highest rate of adoption of digital technology. This is because they are mobile, global, and have sophisticated portfolios. Indeed, not every investor is like this aforementioned demographic, confident enough to manage their finances. Thus, organisations should be focusing on providing this hybrid model, making financial planning more accessible to the masses.

Sam also adds that a hybrid model is best equipped to build a user’s trust in technology. Since we have yet to arrive at a stage where everyone is comfortable with fully trusting Robo-advisory, the presence of the human component is critical. Chuin Ting, CEO of MoneyOwl, pushes this point further by sharing the ethics of technology on our podcast. Ultimately, technology is designed by us and is influenced by human biases, good or bad. As a result, technology itself has specific trust attributes that need to be navigated by both managers and clients. To foster a trusting relationship with technology, the human element in a hybrid model is crucial.


Partnerships – Moving Forward

Ultimately, hybrid models bring together the strengths and make up for digital and human advisory weaknesses. Rather than viewing the two forms of advisory in silos, the gamut of positive benefits illustrated here highlight how they should be used in tandem. 

Are you looking to create your own hybrid Robo-advisor platform? With years of experience under our belt, we at Bambu are well equipped to service all of your Robo-advisory needs. Contact us at sales@bambu.co to learn more about how we can help seamlessly integrate Robo-advisory solutions and present your clientele with a fluid hybrid experience.