Skills
Over the course of my career as a data analyst and data scientist, I have been able to learn and pick up several skills along the data science process.
Languages: R and Python
I have a good grasp in writing code in Python mostly for data science and AI related projects. I am as well proficient in using R via RStudio for data cleaning, data analysis, data visualization as well as machine learning.
Data extraction
Data extraction involves pulling data from various sources for further analysis. Using various Python and R libraries, I am able to read data from various file formats such as json, xml and csv. I am also able to read data from both SQL and MongoDB databases and extract data from the web via scraping and APIs.
Data analysis
Data analysis is simply applying various statistical techniques so as to understand patterns, test hypotheses and make data driven decisions. Based on the type of data and the objective, I use various descriptive and inferential statistics to investigate the data. This is accomplished by an array of libraries in both Python and R.
Machine Learning
Machine learning enables systems to learn from data. In addition to understanding data, I am also able to build machine learning models. I use Scikit-Learn in Python to achieve this.
Databases: SQL, MongoDB
I am well versed with working with databases such as SQL and MongoDB.
Statistics
My educational background has enabled me to handle key statistical topics such as descriptive and inferential statistics, probability theory, statistical significance and hypothesis testing.
Data cleaning and wrangling
Data cleaning involves getting rid of inaccurate data while data wrangling is simply transforming the format of data. This step is the most time consuming in the data cycle as it is crucial in gathering accurate insights. I mostly use Pandas and Numpy Python libraries to clean and wrangle data. In R, I work with libraries such as Tidyverse.
Data visualization
Data visualization provides a clear graphic representation of data which makes it easy to identify patterns. In Python, I work with Matplotlib, Seaborn and Plotly to plot various graphs. I am also familiar with Dash which is a Python framework that is used to build web applications. In R, I work with Ggplot to create beautiful visualizations.
Cloud computing
Given the large amounts of data, it has become necessary to use cloud computing processes to manage this data. I am familiar with AWS and Azure technologies.
Version Control Systems - Git
Version control ensures that I am able to maintain a detailed history of projects that I work on. I use Git to achieve this.