Infrastructure Monitoring Engineer

Company Name: MIT Lincoln Laboratory

Location: Lexington, MA, US - 02420

Job Duration: 2021-07-21 to 2021-08-20

Overview

Our Enterprise Technology and Tools (ETT) Team provides resources to assist with compliance execution, reporting, and vulnerability-remediation activities. These services are offered across the Laboratory through the ISD Embedded System Administrator (ESA) Group. In addition, the ETT Team is responsible for the overall operation of monitoring tools. Such tools include SolarWinds, the Laboratory’s IT service management (ITSM) tool (Footprints), and the Laboratory’s central visibility tool—BigFix.

Job Description

The IT IC III will be responsible for administering the Lab’s IT infrastructure monitoring solution. They are charged with taking first-hand responsibility for administering & maintaining the alerting system that allows ISD and division IT professionals to respond to early warning signs of service & mission critical failures. They also provide support for root cause analysis and infrastructure/service optimization.

 

Their responsibilities include: the monitoring, detection, alerting, of the efficiency and integrity of the MIT Lincoln Lab IT infrastructure; ensuring data quality and accessibility to monitoring data; and successful operation of the SolarWinds Orion Platform. This position serves as a SME on engineered service reliability monitoring and development of infrastructure monitoring solutions for MIT Lincoln Lab. They also provide support for root cause analysis and infrastructure/service optimization.

 

The position requires excellent listening and feedback skills, along with a willingness to focus on enterprise solutions. The successful candidate must exercise strong collaboration skills to work closely with other teams in IT and Laboratory Research areas to provide high quality services and value to help build an actionable, data-driven culture.

Primary Duties

Administer & optimize Lab IT infrastructure monitoring platform.

  • Performs infrastructure and application maintenance to ensure that systems are in compliance with and consistent to Laboratory policy.
  • Guide direction by monitoring industry trends and recommending technologies to pursue.
  • Develop & maintain structured enterprise monitoring framework.
  • Develop framework for repeatable and consistent monitoring of hardware and services.
  • Monitor enterprise hardware to ensure the proper operations of servers, routers, switches, firewalls, and data center technologies.
  • Develop API driven monitoring solutions for automation and efficiency.
  • Develop alerting based on framework requirements and feedback.
  • Analyze infrastructure monitoring results
  • Assists in root cause investigations to prevent incidents from re-occurring.
  • Evaluate signals to noise ratio and continually improve ratio with subscriber feedback.
  • Data & analytics on metrics and KPI’s for event monitoring management.
  • Develops standardized documentation for services and creation of reports on key metrics.
  • Responsible for data quality and accuracy as a data source to integrated reporting.

Primary Duties (cont)

Provide 1st level support to the IT Asset Configuration Management platform.

  • Maintain general awareness of IT Asset Configuration Management platforms.
  • Understands requirements needed to maintain operational service.
  • Ability to perform basic upgrade and configurations as directed by vendor support or MITLL SME.
  • Provide general support to ISD and System Admin community
  • Respond to level 1 requests and resolve issues and problems using ticketing system as related to IT Asset & Configuration Management.
  • Assist in creating knowledge base and documentation based on repeat support requests.

 

Provide 1st level support to the data & analytics infrastructure & platform.

  • Maintain general awareness of platform.
  • Understands requirements needed to maintain operational service.
  • Ability to perform basic upgrade and configurations as directed by vendor support or MITLL SME.
  • Address data and analytics questions based on subject area expertise and documented workflows. Escalate issues to higher support levels as appropriate, following up on solution provided.

Primary Duties (cont2)

Contribute to development of integration & optimization solutions.

  • Maintain general awareness of custom solutions and integrations.
  • Understands requirements needed to maintain operational service.
  • Ability to troubleshoot broken workflows and update/utilize code repositories.
  • Develop or co-develop programmatic data integration solutions.
  • Maintain awareness of Lab’s existing solutions to utilize existing enterprise tools.

 

This position is under general supervision of the ETT Central Visibility Team lead.

 

This position does not have direct financial responsibility. However technical expertise may be required for assisting with product selection and annual product support renewals.

 

This position will maintain frequent contact with internal department and/or Laboratory user community as well as external vendors to maintain communications related to problem resolution, systems upgrades, services and product research. 

Knowlege and Skills

Required Minimum

  • Bachelor’s degree in computer science or 5+ years relevant technical experience.
  • SolarWinds Certification.
  • Expert technical depth with infrastructure monitoring best practices.
  • Direct hand-on expert experience in maintaining SolarWinds infrastructure.
  • Demonstrate in-depth enterprise-level networking knowledge and apply knowledge to complex challenges.
  • Exceptional working knowledge and experience administering, server operating systems.
  • One programming or scripting language with ability to automate processes through API.
  • Cloud service monitoring experience or certification.
  • Advanced knowledge of routers, switches, firewalls, and remote technologies.
  • Enterprise-networking knowledge in TCP/IP, Routing, SMTP/SNMP, LAN/WAN.
  • Ability to work independently toward delivery of goals as well as collaborate in team efforts.
  • Excellent customer service skills, including presentation, verbal and written communication skills.

Knowledge and Skills (cont)

Preferred

  • First-hand experience in government cloud monitoring.
  • ITIL foundations experience and/or certification.
  • Hands-on experience in object-oriented programming & data structures (NoSQL / SQL).
  • Strong foundational skills in organizing and able to execute multiple IT projects with minimal supervision.
  • Demonstrate the ability to learn new technologies and disciplines quickly.
  • Network/telecom certifications.
  • Experience with mentoring or teaching infrastructure monitoring to peers.
  • Experience with patch & configuration management tools.
  • Knowledge of GoogleSRE.

 

Experience: 

9+ years experience in the information technology field.

7+ years experience in IT infrastructure monitoring.

5+ years experience operating SolarWinds Orion suite.

3+ years experience in implementing IT projects.

2+ years experience in synthetic transaction monitoring.

2+ years experience in service/site reliability monitoring.

2+ years experience in programmatic process automation or data modification via APIs.

 

Other Information

Ability to obtain and maintain a government security clearance.

Occasional off-hour/on-call support is necessary. A certain degree of flexibility of schedule is required as some work (planned/unplanned) must be done outside of major production hours during pre-scheduled maintenance windows.

 

Additional Information

This position requires an individual with excellent communication (both oral and written) and organizational skills.  The individual must be able to work in a fast-paced environment, at times with minimal supervision, and execute operations, project and administrative tasks with a high degree of quality, while following existing processes and establishing new operational procedures and best practices where necessary.  Additionally, the position requires the ability to work with members of other teams and staff to accomplish department and organizational goals.

For Benefits Information, click http://hrweb.mit.edu/benefits

 

Selected candidate will be subject to a pre-employment background investigation and must be able to obtain and maintain a Secret level DoD security clearance.

 

MIT Lincoln Laboratory is an Equal Employment Opportunity (EEO) employer. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, veteran status, disability status, or genetic information; U.S. citizenship is required.

 

 

 

Requisition ID: 34050