We are looking for a data engineer to execute large-scale data ingestion projects, one-off urgent ingestion projects, and to develop tooling for data ingestion at a larger scale with more accuracy and at higher levels of automation. The Academic Analytics (AcA) data warehouse has data on hundreds of entities, with both unstructured/text information and structured data fields/triples. This role will be responsible for growing the number of entities and the number of triples while identifying opportunities to restructure existing entities for improved efficiency or more accurate reflection of the domain. Successful candidates will be able to demonstrate an ability and history of thoughtfulness and curiosity in data ingestion, generation and problem solving.
You will need to:
- Have strong experience with data-oriented products
- Identify and proactively create new data ingestion and processing tooling to eliminate manual processes, inefficient or repetitive work, or address quality issues
- Have strong experience with ETL tools
- Have ingested large scale structured data in the past
- Execute one-off imports of data
- Demonstrate common sense in applying business logic to ontological/schema decisions
- Be able connect to public databases to ingest data
- Have deep scraper experience
- Make thoughtful judgements on data quality to clean data sources for import
- Use third-party APIs and web scraping tools to source data at scale
- Use Python, Jupyter notebooks, and Pandas to inspect and analyze data sources
- Bachelor’s degree from an accredited college or university; a degree in Computer Science, Engineering preferred.
- 5+ years of professional experience.
- Experience with Microsoft SQL Server
- Excellent written and verbal skills.
- Ability to work on multiple projects simultaneously.
- Experience with UX methodology
- Exposure to agile methodologies and best practices
- Exposure to cloud and DevOps environments
- Have experience with disambiguation techniques
- Experience with C#
- Experience collaborating directly with clients
- Strong Microsoft SQL Server experience
- Experience with Beautiful Soup and Selenium
- Willing to engage in modest manual data editing when appropriate