About Me

I am a recent graduate from the University of California, Santa Barbara with a Bachelors of Science in Statistics & Data Science. I am currently seeking an entry-level Data Scientist/Analyst position in which I can apply my gained academic knowledge to real life projects. I am an eager learner and would love to learn new skillsets that are applicable to data.

  • Programming
    Advanced: RStudio, Python(Pandas,Numpy)
    Intermediate: SAS, SQL(MySQL), Latex, Excel, HTML, CSS, JavaScript, Git
  • Statistics and Probability
    Hypothesis Testing, Bayesian/Frequentist Data Analysis, Time Series Analysis, Data Visualization, Mathematics, Nonparametric Statistics, and Statistical Analysis
  • Machine Learning:
    Supervised Learning : Linear/Logistic/Lasso/Ridge Regression, K-nn, Decision Trees, QDA, LDA, Random Forest
    Unsupervised Learning : Support Vector Machine, Neural Network
  • University of California Santa Barbara : Technical Support
    September 2021 - March 2022
    Upheld maximum organization by properly sorting orders and inventory on a database containing over 4000 Professors and TAs
    Responsible for day-to-day front desk operations for providing media and classroom equipment to UCSB faculty and organizations, as well as technical support for computers and other equipment for 100+ classrooms
    Set up and broke down equipment for client events and supervised functionality of equipment during events
  • University of California Santa Barbara
    September 2018 - June 2022
    Bachelors of Science in Statistics and Data Science, Cumulative GPA: 3.33
    Applicable Classes: Machine Learning, Statistical Data Science, Probablity and Statistics (A, B, and C), Regression Analysis, SAS Base Programming, C++, Python, Bayesian Data Analysis, Time Series Analysis, Nonparametric Statistics, Design of Experiments
  • IBM Data Science Certificate From Coursera
    Currently Pursuing
    From July to Present Day

My Work

Machine Learning Project on Education and Poverty

- Lead a team to clean and analyze two data sets : Census (3142 rows with 31 features) and Education (3143 rows with 42 features) in R markdown
- Performed binary classification of Poverty feature (using multiple R libraries) and compared the results from multiple machine learning models and concluded that employment was the most important factor in determining the state of poverty within a given area followed by skin color
- Conducted PCA / clustering with complete linkage for dimensionality reduction of datasets and created dendrograms for visualization

Time Series Forecasting of Beer Production in Australia

- Performed transformations and decomposition of a dataset containing 154 observations in order to achieve stationarity and invertibility of time series data
- Found candidate SARIMA models and performed diagnostic testing of candidate SARIMA models by using various graphs (QQ-plots, ACF, PACF) and tests (Shapiro-Wilk, Box-Pierce, Box-Ljung, Mcleod-li / checking for roots) and concluded that beer production will continue to play a huge role in Australia’s economy due to the forecast showing that beer production in Australia inherently increases each yea

Contact Me

mtp.tan.pham@gmail.com

714-925-9026

Download CV