Heng Yi Sheng pursued his BSc in Computer Science at Asia Pacific University, Malaysia, followed by a MSc in Data Science at University of Bristol, UK. His academic journey has been complemented by contributions to the field, including the publication of two research papers in conferences such as Future Technologies Conference (FTC) 2022 and International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2024. Notably, his dedication and innovative approach were recognized with the AWS Special Award at the RHB datathon - Get Your Hack On: Data Edition 2022. He is proficient in an array of data science tools such as Python, R, SQL, PowerBI, and many more, whereby he possess a diverse skill set refined through rigorous academic training and practical real-world applications.
Heng Yi Sheng (Essien)
+6012-8283390
yishenghengapu@gmail.com
MSc Data Science • Sept 2022 - Sept 2023
BSc Computer Science (Data Analytics)• Feb 2019 - May 2022
Digital Graduate Analyst • October 2023 - Present
Business Intelligence Intern • March 2021 - July 2021 (internship)
This section highlights my expertise in domain knowledge and programming skills which includes:
The emergence of fan fiction websites, where fans write their own storied about a topic/genre, has resulted in serious content rating issues. The websites are accessible to general audiences but often includes explicit content. The authors can rate their own fan fiction stories but this is not required and many stories are unrated. This motivates automatically predicting the content rating using recent natural languages processing techniques. The length of the fan fiction text, ambiguity in ratings schemes, self-annotated (weak) labels, and style of writing all make automatic content rating prediction very difficult. In this paper, we propose several embedding techniques and classification models to address these problem. Based on a dataset from a popular fan fiction website, we show that binary classification is better than multiclass classification and can achieve nearly 70% accuracy using a transformer-based model. When computation is considered, we show that a traditional word em bedding technique and Logistic Regression produce the best results with 66% accuracy and 0.1 seconds computation (approximately 15,000 times faster than DistilBERT). We further show that many of the labels are not correct and require subsequent preprocessing techniques to correct the labels. We propose an Active Learning approach, that while the results are not conclusive, suggest further work to address.
Natural Language Processing, Text Classification, Fan Fiction, Content Rating Classification, Explainable Artificial Intelligence (XAI), Active Learning.The emergence of machine learning and artificial intelligence has created new opportunities for data-intensive science within the financial industry. The implementation of machine learning algorithms still faces doubt and distrust, mainly in the credit risk domain due to the lack of transparency in terms of decision making. This paper presents a comprehensive review of research dedicated to the application of machine learning in credit risk modelling and how Explainable Artificial Intelligence (XAI) could increase the robustness of a predictive model. In addition to that, some fully developed credit risk software available in the market is also reviewed. It is evident that adopting complex machine learning models produced high performance but had limited interpretability. Thus, the review also studies some XAI techniques that helps to overcome this problem whilst breaking out from the nature of the ‘black-box’ concept. XAI models mitigate the bias and establish trust and compliance with the regulators to ensure fairness in loan lending in the financial industry.
Credit Risk, Explainable Artificial Intelligence (XAI), LIME, Machine Learning, SHAP