Understand the Difference Between a Data Engineer, a Data Analyst and a Data Scientist in order to Choose the Right Course to Study in Malaysia
For more information contact 01111408838
Speaking at last week’s Women of Silicon Roundabout conference in London, Dr. Rebecca Pope, the head of data science and engineering at KPMG, said you don’t need to be an excellent statistician or a high class mathematician to work in big data. Nor do you need a lot of prior programming knowledge.
However, you do need an interest in statistics, you do need to be willing to learn how to code, and you do need to know how to do some high level mathematical operations.
For more information contact 01111408838
Please fill up the Form below and I will WhatsApp you and provide you with sound advise on how to choose the best private university or college in Malaysia to study at. If you do not give your mobile number or full name as in IC, your query will not be answered. Our knowledgeable & experienced counsellor will send you a message on WhatsApp & provide assistance from there.
Choosing the Right Course to Become a Data Engineer, a Data Analyst or a Data Scientist in Malaysia
Pope herself didn’t study pure statistics (she’s a neuroscientist). Nor did she study programming. Instead, she learned how to program after graduating, and she attended “endless hackathons.”
“I started learning R. But my advice would be that if you are launching a career in data science you should specialize in Python – make Python the first language you learn,” said Pope.
Data scientists are not just statisticians, said Pope. “A statistician is interested in building a model that builds a relationship between a variable and an outcome.” A data scientist wants to do something more: predict. Data scientists train models on data so that models can predict the future as accurately as possible.
Big data jobs come in stages. A business use has to be established and raw data has to be made fit for purpose (so-called ‘data wrangling’), then the algorithms that analyze the data are written and tested on the data available, and – if they’re machine learning algorithms – they learn from the data and to predict the future. Visualisations and APIs have to be created so that the business can engage with the resulting product.
Different sorts of data professionals are engaged at different stages. Or, you can be a generalist data scientist operating across the spectrum.
Data Engineer Vs Data Analyst Vs Data Scientist
- Data Scientist
- Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, massage and organize them. Then they apply all their analytic powers — industry knowledge, contextual understanding, skepticism of existing assumptions — to uncover hidden solutions to business challenges.
- Data Analyst
- Data analysts collect, process and perform statistical analyses of data. Their skills may not be as advanced as data scientists (e.g. they may not be able to create new algorithms), but their goals are the same — to discover how data can be used to answer questions and solve problems.
- Data Engineer
- Data engineers build massive reservoirs for big data. They develop, construct, test and maintain architectures such as databases and large-scale data processing systems. Once continuous pipelines are installed to — and from — these huge “pools” of filtered information, data scientists can pull relevant data sets for their analyses.
What does a data engineer do?
Pope put together the following chart showing the skills data engineers need and the tasks they perform. Basically it’s a lot of software engineering and preparing of data.
The data engineer’s job is “the representation and movement of data so that it is consumable and usable,” said Pope. If you’re a data engineer you need to take the raw data, clean it, move it into a database, tag it, and generally make sure it’s ready for the next stage of the process…
Pope said the programming languages and platforms you’ll need for data engineering jobs are:
- Apache Spark
- Kubernetes NiFI
What does a data analyst do?
After the data engineer, comes the data analyst. The chart below shows where data analysts operate. They’re all about interfacing with the business to find out what’s required of the data and developing visualisations that allow the business to easily interpret what the data says.
The data analyst’s job is, “about interpreting current information to make it useful for the business,” said Pope. There’s not much machine learning modelling or machine learning deployment in the role of the data analyst.
If you want to be a data analyst, Pope said it will help if you understand how to use RapidMiner predictive analytics software and Postgresql, an open source relational database.
What does a data scientist do?
Lastly there’s the ‘pure data scientist’. This is what most people image they’ll be doing if they’re working with data. Data scientists interface heavily with the business and work with data engineers. They train machine learning programs on specially prepared data in order to provide easy to use visualisations that suit the business’s needs.
The role of the data scientist is to create models that can extrapolate from the data and make suggestions that are relevant to the business, said Pope.
Data scientists need to understand statistics, but Pope said most machine learning algorithms are based on multivariable calculus and linear and non-linear algebra. “This is the level of mathematics you need to know,” she added.
You’ll also need good data visualization and people skills so that you can present your model and its findings to the business – and encourage them to use it.
Getting a job in big data
Pope is hiring at KPMG. And she isn’t just looking for PhDs and high accomplished Masters students. Being a good data scientist is about being the “Swiss army knife” who can operate across the spectrum of data engineer, data analyst and data scientist, she said.
When Pope recruits at KPMG she says she’s “blind” to the degrees candidates have done: what matters most is how well they perform on the technical challenge set by the firm. “I am far more interested in what technology you can build and what you can drive for our client base [than qualifications],” said Pope.
To this end, she suggested that rather than studying an expensive Masters or higher qualification, you pursue internships and work experience and compete on platforms like Kaggle.
“It’s not about being a deep technical expert in Scala or Python. It’s about working out what you need in order to answer the questions being posed by the business,” Pope concluded.