The senior data engineer will work on building, operating, and scaling data solutions, commercial data platforms and IT tools. This is a hands-on IT role that blends software development, data engineering and bioinformatics. They will work closely with scientific and IT experts, business analytics teams, and decision makers to enable data access, integrated data reuse and vastly improve time-to-solution for data and analytics initiatives.
This role will require both creative and collaborative working with IT experts, scientists, and line function analytical bioinformaticians. Scope of work will include evangelizing effective data engineering practices and promoting better understanding of data and analytics. The senior data engineer will support fast paced ad-hoc data analysis, individual projects and longer-term enterprise-wide solutions.
1. Understanding of data and analytic requirements to choose the right IT tools, ways of working and solutions for the job (i.e. speed versus reuse; ad-hoc versus production)
2. Hands-on develop IT frameworks, architectures, integration ETL schemes, databases, production pipelines, visualization and applications for large-scale data processing
- Data source ingestion to end user visualization
- Assemble large, complex data sets that meet functional and non-functional business requirements
- IT requirements, design and test documents to support technical implementation
- Set up and operate heavy lifting associated with engineering data for analytics
- Move IT solutions effectively into production, and manage / optimize these solutions for end users
- Automate manual processes for data preparation and integration, optimize data delivery, re-design infrastructure for greater scalability, etc. to improve productivity
- Streamline and prepare data for analysis through understanding of data flow and integration
- Create and drive standards for data capture, storage, and transformation
3. Apply data governance and data security requirements to solutions
- Participate in ensuring compliance and governance during data use
- It will be the responsibility of the data engineer to ensure that the data users and consumers use the data provisioned to them responsibly through data governance and compliance initiatives
- Work with data governance teams (and information stewards within these teams) and participate in vetting and promoting content created in the business and by data scientists to the curated data catalog for governed reuse
4. Become a data and analytics “evangelist,” “data guru” and “fixer”. Promote the available data and analytics capabilities and expertise and educate users in leveraging these capabilities in achieving their business goals.
QUALIFICATIONS & EXPERIENCE
- A Bachelor or Master degree in computer science, statistics, applied mathematics, computational biology, data management, data science, information systems, bioinformatics or a related field
- Combination of IT software engineering, package application programming, data engineering, data integration, and data visualization skills with data science or big data experience
- Demonstrated hands-on work experience in developing IT solutions in big data, small data, and/or complex data
- Familiar with popular commercial and open source pipeline, data visualization and analysis tools
- Solid understanding of bioinformatics-computational experimental lifecycle and model design
- Good understanding of in-process manufacture, research and clinical trial data
- An understanding of tools for the analysis of high dimensional data
- Ability to easily partner with business users and speak the language of data with the business
- High energy, confident, gets things done, yet easy going personality
- Experience in Next Generation Sequencing-RNA sequencing data analysis and other bioinformatics tools
- Deep understanding of computational methods, scripting and programming languages, and relevant concepts in cancer biology, immunology and/or genetics
- Prior experience implementing centralized integrated data and analytics tools and solutions
- Experience in biotechnology field
SKILLS & COMPETENCIES
- Strong software engineering experience using computational programming languages (i.e. R, Python, Java, C++), pipeline lifecycle tools, and popular database programming languages (ie. Complex SQL, PL/SQL) for relational databases, operational data stores, ETL and data lakes
- Strong ability to design, build and manage “production ready” data pipelines for data structures encompassing data transformation, data integration, data models, schemas and meta-data
- Experience with integration of data from multiple data sources to support down-stream scientific analysis
- Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies (i.e. ETL/ELT, data replication/CDC, message-oriented data movement, API design and access and upcoming data ingestion and integration technologies such as stream data integration, CEP and data virtualization)
- Basic experience working with popular data discovery, analytics and BI software tools like Tableau, Qlik, PowerBI, etc. for semantic-layer-based data discovery and end user visualization
- Experience in working with data science teams in refining and optimizing data science and machine learning models and algorithms
- Demonstrated success in working with datasets to extract business value using popular data preparation tools to reduce or even automate parts of the tedious data preparation tasks.
- Basic experience in working with data governance/data quality and data security teams in moving data pipelines into production with appropriate data quality, governance and security standards and certification.
- Demonstrated ability to work across multiple deployment environments including cloud, on-premises and hybrid, multiple operating systems and through containerization techniques such as Docker, AWS, etc.
- Adept in agile methodologies and capable of applying DevOps and increasingly DataOps principles to data pipelines to improve the communication, integration, reuse and automation of data flows between data managers and consumers across an organization.
- Prior experience as bioinformatician, biotech software programmer or data architect a plus
- Highly collaborative and supportive of business and of its ideals and strategies
- Practical in approach to decision making, recommendations and problem solving that is principle-based
- An understanding of the principles of oncology / immuno-oncology
- Prior experience in complex biotechnology and / or pharmaceuticals industry