Twitter Stream Real-time Data Pipelines
- Implemented a real-time processor with Spark for popular Twitter hashtags.
- Designed and implemented positive/negative word monitor with Kafka and Spark (60 Tweets per second).
- Optimized the processing with Flink with better efficiency and suitability.
- Visualized the results with Ajax and Javascript chart for 1% of all public Tweets.
Configurable Web Server on Google Cloud
- Mar 2020 - June 2020
- Built a configurable and scalable web server with real-time logging in Object-Oriented programming via C++ and Shell.
- Developed different classes for server configuration parsing, HTTP request parsing, and multiple types of request handling with Boost library.
- Wrote unit and integration tests with more than 80% test coverage.
- Deployed on Google Cloud for public access with robust request echoing, file serving, and status checking functionalities.
Android Chrome RRC Request Latency Measurement
- June 2018 – Aug 2018
- Calculated the latency and frequency of RRC connection setup during Google Chrome users’ daily web browsing on Android phones with information in JSON format extracted from low-level network communication packages.
- Analyzed connection pattern together with download bytes for different types of browsing via Excel and R.
- Decreased latency in some web pages’ loading and reloading by 0.2s by setting up RRC connection.
Political Sentiments Analysis on Reddit Text
- Apr 2018 - June 2018
- Aggregated people’s attitudes towards the two Parties and Donald Trump by NLP on Reddit posts and comments.
- Fit tokenized and lemmatized sentences from Reddit text into Machine Learning model (Logistic Regression) in Python, which learns to label sentiments of positive/negative towards two parties and Donald Trump.
- Combined queries to MySQL database, and visualized clear political sentiments fluctuation over states in time series graph with R.